Ops Orchestrator Multi-Agent System

A comprehensive multi-agent system built on AWS Bedrock AgentCore that provides automated incident triaging, ChatOps collaboration, and report generation for operational workflows.

Quick Start Guide

Prerequisites

First, ensure you have completed all prerequisite setup as outlined in the Prerequisites section below.

Local Deployment Steps

Configure and Launch the Agent Runtime
```
python ops_orchestrator_runtime.py --configure --launch
```
This will:
- Configure the AgentCore runtime environment
- Set up authentication and gateway connections
- Launch the ops orchestrator agent with runtime capabilities
Navigate to Parent Directory and Invoke Agent
```
cd ..
python invoke_agent.py
```
Agent Invocation Options You can invoke the agent using either:

Option A: HTTP/REST API
- Use standard HTTP requests to interact with the agent
- The agent will be accessible via the configured gateway endpoint
Option B: AWS SDK (boto3)
- Use the AWS Bedrock AgentCore SDK for direct agent invocation
- Requires your agent ARN for programmatic access
- Example:
```
import boto3
client = boto3.client('bedrock-agentcore')
response = client.invoke_agent(
    agentId='your-agent-arn',
    message='Your incident description here'
)
```

Summary of Deployment Process

✅ Complete prerequisites setup
✅ Run python ops_orchestrator_runtime.py --configure --launch
✅ Navigate up one directory: cd ..
✅ Invoke agent: python invoke_agent.py
✅ Use HTTP or boto3 for agent communication

Architecture Overview

The Ops Orchestrator Agent is a sophisticated multi-agent system that consists of three specialized agents working collaboratively:

Lead Agent (Issue Triaging) - Automated incident analysis and classification
ChatOps Agent - Real-time collaboration through Teams, Slack, and Gmail
Ticket Creator Agent - Automated ticket creation in JIRA and other systems

Each agent leverages AWS Bedrock AgentCore memory primitives and connects to external services through an MCP (Model Context Protocol) gateway with OAuth2 authentication.

Prerequisites

AWS Requirements

AWS CLI configured with appropriate permissions
Access to AWS Bedrock AgentCore services
IAM permissions for:
- bedrock:*
- bedrock-agentcore:*
- s3:*
- lambda:*
- iam:*
- cognito-idp:*
- secretsmanager:*
- logs:*
- cloudwatch:*

Python Dependencies

pip install boto3 pyyaml python-keycloak requests openai anthropic

External Service Authentication

You’ll need API credentials for the services you want to integrate:

JIRA: Username and API token
GitHub: Personal Access Token or OAuth app credentials
Slack: Bot token (optional)

Environment Setup

Create a .env file or export the following environment variables:

# AWS Configuration
export AWS_REGION="us-east-1"
export AWS_ACCOUNT_ID="your-account-id"

# JIRA Integration (Required)
export JIRA_USERNAME="your-jira-username"
export JIRA_API_TOKEN="[REDACTED]"
export JIRA_DOMAIN="yourcompany.atlassian.net"

# GitHub Integration (Required)
export GITHUB_TOKEN="[REDACTED]"

# Optional: GitHub OAuth (for advanced features)
export GITHUB_CLIENT_ID="your-oauth-client-id"
export GITHUB_CLIENT_SECRET="[REDACTED]"

# Optional: JIRA OAuth (for advanced features)
export JIRA_CLIENT_ID="your-jira-oauth-client-id"
export JIRA_CLIENT_SECRET="[REDACTED]"

# Optional: Keycloak Authentication (Alternative to Cognito)
export KEYCLOAK_URL="http://localhost:8080/"
export KEYCLOAK_ADMIN_USER="admin"
export KEYCLOAK_ADMIN_PASS="[REDACTED]"

Configuration Setup

The system uses a config.yaml file for configuration. Here’s the essential structure:

general:
  name: "ops-orchestrator-agent"
  description: "Multi-agent system for operations orchestration"

agent_information:
  ops_orchestrator_agent_model_info: 
    model_id: gpt-4o-2024-08-06
    inference_parameters:
      temperature: 0.1
      max_tokens: 2048
    
    # Memory configuration for each agent
    memories:
      lead_agent:
        use_existing: false  # Set to true if you have existing memory
        memory_id: null      # Fill if reusing existing memory
      chat_ops_agent:
        use_existing: false
        memory_id: null
      ticket_agent:
        use_existing: false
        memory_id: null
    
    # Gateway configuration
    gateway_config:
      name: "ops-gw"
      
      # Authentication method (choose one)
      inbound_auth:
        type: "cognito"  # or "keycloak"
        cognito:
          create_user_pool: true
          user_pool_name: "agentcore-gateway-ops"
          resource_server_id: "ops_orchestrator_agent"
          resource_server_name: "agentcore-gateway-ops"
          client_name: "agentcore-client-ops"
          scopes:
            - ScopeName: "gateway:read"
              ScopeDescription: "Read access"
            - ScopeName: "gateway:write"
              ScopeDescription: "Write access"
      
      credentials:
        use_cognito: true
        use_existing: false
        create_new_access_token: false
        gateway_id: null
        mcp_url: null
        access_token: null
      
      # S3 bucket for storing API specifications
      bucket_name: "ops-orchestrator-gateway-bucket"
      
      # Target integrations
      targets:
        - name: "jira-integration"
          spec_file: /absolute/path/to/tools/jira_api_spec.yaml
          type: "openapi"
          api_type: "jira"
          endpoint: "https://your-jira-instance.atlassian.net"
          authentication:
            type: "basic"
            credentials:
              username: "${JIRA_USERNAME}"
              password: "${JIRA_API_TOKEN}"
        
        - name: "github-integration" 
          spec_file: /absolute/path/to/tools/github_api_spec.yaml
          type: "openapi"
          api_type: "github"
          endpoint: "https://api.github.com"
          authentication:
            type: "bearer"
            credentials:
              token: "${GITHUB_TOKEN}"

Authentication Options

Option 1: AWS Cognito (Default)

The system automatically creates:

Cognito User Pool for authentication
Resource server with custom scopes
Machine-to-machine client for API access
Access tokens for gateway authentication

Option 2: Keycloak Authentication

For organizations using Keycloak for identity management:

Start Keycloak (if running locally):

docker run -p 8080:8080 \
  -e KEYCLOAK_ADMIN=admin \
  -e KEYCLOAK_ADMIN_PASSWORD=[REDACTED] \
  quay.io/keycloak/keycloak:latest start-dev

Update config.yaml: ```yaml inbound_auth: type: “keycloak” keycloak: url: “${KEYCLOAK_URL}” admin_user: “${KEYCLOAK_ADMIN_USER}” admin_pass: “${KEYCLOAK_ADMIN_PASS}” realm_name: “ops-orchestrator-realm” client_id: “ops-orchestrator-gateway-client” create_realm: true scopes:
- “gateway:read”
- “gateway:write”
- “ops:manage”
- “incidents:create”

credentials: use_keycloak: true

## Service Integrations

### JIRA Integration

The system integrates with JIRA for automated ticket creation and management.

**Required Setup:**
1. Create a JIRA API token in your Atlassian account
2. Set environment variables:
```bash
export JIRA_USERNAME="your-email@company.com"
export JIRA_API_TOKEN="[REDACTED]"
export JIRA_DOMAIN="yourcompany.atlassian.net"

GitHub Integration

Integrates with GitHub for repository management and issue tracking.

Required Setup:

Create a GitHub Personal Access Token with appropriate scopes
Set environment variable:
```
export GITHUB_TOKEN="[REDACTED]"
```

Agent Memory System

Each agent uses AWS Bedrock AgentCore memory with different strategies:

Lead Agent Memory

User Preferences: Stores user-specific incident handling preferences
Semantic Memory: Contextual understanding of technical issues
Summary Memory: Session-based conversation summaries
Custom Issue Triaging: Specialized memory for incident classification

ChatOps Agent Memory

User Preferences: Communication preferences and channels
Semantic Memory: Chat context and collaboration patterns
Summary Memory: Chat session summaries
ChatOps Memory: Communication templates and escalation procedures

Ticket Creator Agent Memory

User Preferences: Ticket creation preferences and templates
Semantic Memory: Ticket patterns and classifications
Summary Memory: Ticket creation session history
Ticket Creator Memory: Template management and field mapping

Memory Configuration

memories:
  lead_agent:
    use_existing: false      # Set to true to reuse existing memory
    memory_id: null          # Memory ID if reusing
  chat_ops_agent:
    use_existing: false
    memory_id: null
  ticket_agent:
    use_existing: false
    memory_id: null

Troubleshooting

Common Issues

Memory Creation Fails

❌ Error creating memory for agent: AccessDenied

Solution: Check AWS permissions for bedrock-agentcore:*

Gateway Creation Fails

❌ Error creating gateway: ValidationException

Solutions:

Verify AWS region supports Bedrock AgentCore
Check IAM role permissions
Validate authentication configuration

JIRA/GitHub Integration Fails

❌ Target creation failed: Authentication failed

Solutions:

Verify API credentials are correct
Check environment variables are exported
Validate API endpoint URLs
Ensure API tokens have required permissions

Debug Mode

Enable detailed logging:

export PYTHONPATH=.
python -c "
import logging
logging.basicConfig(level=logging.DEBUG)
exec(open('ops_orchestrator_multi_agent.py').read())
"

Security Best Practices

Credential Management

Use environment variables, not hardcoded credentials
Rotate API tokens regularly
Use IAM roles with minimal required permissions

Network Security

Use HTTPS for all external API calls
Implement VPC endpoints for AWS services
Consider private subnets for production deployments

Access Control

Implement least-privilege IAM policies
Use OAuth2 scopes to limit API access
Regularly audit service integrations

Success Indicators

When properly configured, you should see:

✅ Observability initialized
✅ Created memory for lead_agent: OpsAgent_mem_xxx
✅ Created memory for chat_ops_agent: OpsAgent_chat_xxx  
✅ Created memory for ticket_agent: TicketCreation_chat_xxx
✅ Gateway setup completed with URL: https://xxxxx
✅ Created 2 targets successfully
🚀 Ops orchestrator multi-agent system ready!

Your multi-agent system is now ready to handle operational incidents, create tickets, and collaborate across your organization! 🚀