The GitHub Issue AI Butler on Kubernetes

A guide to THE production AI stack: LangGraph, K8s, Docker, Guardrails, Qdrant, AWS & CDK

Aug 05, 2025

Paul: Today, the scene is owned by Benito Martin, an ML/AI engineer, the founder of Martin Data Solutions, and an AWS legend.

It’s an honour to have you here, Benito. The scene is all yours. 🎬↓

Benito: Managing GitHub issues in large or active repositories can become overwhelming. With a constant stream of bug reports, feature requests, and support questions, often redundant or unclear, manual triage quickly turns into a time sink for contributors and maintainers who would rather be building.

To address this challenge, I built a multi-agent system for intelligent GitHub issue processing. This system leverages large language models (LLMs), vector databases (Qdrant), and cloud-native infrastructure (AWS CDK + Kubernetes) to triage, enrich, and analyze issues at scale automatically.

It integrates directly with GitHub, stores structured data in PostgreSQL, embeds content for semantic search, and uses guardrails to ensure output safety and quality, all orchestrated by LangGraph, a powerful framework for multi-agent workflows.

This project is ideal for anyone building intelligent developer tools or scaling internal automation. I recommend following this blog along with the code repository that can be found here.

GitHub Repository

Before diving into the article, if you're looking to level up in deploying production-ready AI apps by learning MLOps from the best, we recommend checking out the course below ↓

Shave a few years off your MLOps learning curve (Affiliate)

Maria Vechtomova and Basak Eskili from Marvelous MLOps are starting their last and most refined cohort on their End-to-end MLOps with Databricks.

**End-to-end MLOps with Databricks** - Get **100 EUR off** using code PAUL100

If you want to level up your MLOps skills, Maria is the perfect person to learn from. She and Basak are among the best MLOps engineers and teachers in the field.

The course packs a ton of value. It uses the Databricks ecosystem, but MLOps is about principles, not tooling. Maria takes this statement to heart. Thus, you can easily apply all the MLOps principles learnt during the course to any environment, such as AWS, GCS, Azure, or other open-source stacks.

If you consider learning MLOps, we 100% recommend this course.

Enroll (or learn more) in the 5th and most refined cohort starting on September 1:

Enroll Here

Use our affiliate code to support Decoding ML. Use code PAUL100 for 100 EUR off.

Back to our article ↓

The Big Picture: System Architecture

To build a scalable, intelligent GitHub issue processing pipeline, I designed a modular, multi-agent orchestration system with robust safety mechanisms. The high-level architecture (see diagram below) illustrates how each part fits together to form a cohesive flow.

Agentic Github Issues Retrieval on Kubernetes

The system follows a clear user flow:

Issues are fetched from GitHub.
Results are stored in PostgreSQL (Amazon RDS) and made searchable via a vector database (Qdrant).
Agents collaborate to triage, classify, and provide recommendations to the issues.
All components are deployed on a scalable, cloud-native stack using Kubernetes on AWS.

By the end of the pipeline, you have an enriched, searchable issue database, ready to support users and development teams.

Core Components

🐙 GitHub Integration

Issues are fetched via GitHub’s API, parsed, and stored. The system supports syncing raw issue data and comments, enabling the agents to work in a rich, structured context.

🐘 PostgreSQL

A relational database stores structured metadata in two tables: issues and comments. It forms the backbone for reproducibility, analytics, and long-term tracking. Amazon RDS is used in production only for this use case. The schema is managed with Alembic, ensuring smooth database migrations and schema updates. Adminer provides easy UI access to visualize the database.

🔍 Qdrant Vector Store

Issues and comments are embedded using dense and sparse models (for hybrid search) and stored in Qdrant. This allows agents to search and find related issues intelligently.

🧠 LangGraph Agents

At the heart of the system is a LangGraph-powered multi-agent flow. Agents are assigned specific responsibilities, such as semantic search, recommendation, and classification, to collaboratively process GitHub issues.

🛡️ Guardrails AI

Each LLM interaction is safeguarded using Guardrails AI. Checks for toxicity, jailbreak attempts, and secret leakage help ensure the output remains safe and compliant.

☁️ AWS CDK + Kubernetes (EKS)

Infrastructure is fully codified with AWS CDK. The system runs on EKS (Elastic Kubernetes Service), with pods deployed via Docker images, load-balanced, and production-ready.

This architecture allows developers to offload repetitive triage tasks while retaining full control, transparency, and scalability.

├── LICENSE
├── Makefile
├── README.md
├── alembic.ini
├── aws_cdk_infra
├── docker
├── kubernetes
├── pyproject.toml
├── src
│   ├── agents
│   ├── api
│   ├── config
│   ├── data_pipeline
│   ├── database
│   ├── models
│   ├── utils
│   └── vectorstore
└── tests

Let’s walk now through each component of the system.

Set Up

The full setup steps can be found in the repository SETUP.md file, but the following are the core components that must be configured:

AWS Account
Kubernetes
K9s
Guardrails AI
Qdrant Vector Database
API Keys
Environment Variables
Makefile

Configuration

Under the src/config directory, there are two files for configuring. Feel free to adapt them.

Repositories Configuration:

This file defines which repositories to pull issues from, how many issues to pull, and in what state (e.g., open, closed, or all).

Each entry in this file maps to a specific GitHub repository, along with pagination settings.

- owner: scikit-learn
  repo: scikit-learn
  state: all
  per_page: 100
  max_pages: 1

What this means:

Fetch issues from github.com/scikit-learn/scikit-learn
Include both open and closed issues
Pull up to 100 issues per page, with a maximum of 1 page

Guardrails Configuration:

This file configures the thresholds for Guardrails agents like jailbreak, toxicity, and secrets detection.

jailbreak:
  threshold: 0.8
  on_fail: "filter"

toxicity:
  threshold: 0.5
  validation_method: "full"
  on_fail: "filter"

secrets:
  on_fail: "filter"

Models:

All Pydantic models have been stored in separate files under the src/models folder. Feel free to explore them in advance to familiarize yourself with the different structures.

Once you have completed these setup and configuration steps, the system should be ready for use.

The Infrastructure

AWS CDK

To move into production, we will use AWS CDK (Cloud Development Kit) to define and deploy the necessary cloud infrastructure. Once your development environment is up and running, which will be explored in the next sections, the next step is to create the production environment by provisioning the infrastructure on AWS.

Below is a high-level overview of the CDK stack that will be created later, which can be found in the app.py file of the aws_cdk_infra folder:

import aws_cdk as cdk
from aws_eks_rds.eks_stack import EKSStack
from aws_eks_rds.rds_stack import RDSStack
from aws_eks_rds.vpc_stack import VPCStack

app = cdk.App()

vpc_stack = VPCStack(app, "VpcStack")
rds_stack = RDSStack(app, "RdsStack", vpc=vpc_stack.vpc)
eks_stack = EKSStack(app, "EksStack", vpc=vpc_stack.vpc)

app.synth()

This code sets up the following components:

VPC Stack: A Virtual Private Cloud (VPC) that provides an isolated network for the application.
RDS Stack: Provisioning an Amazon RDS instance for PostgreSQL, ensuring your database is scalable, highly available, and secure.
EKS Stack: Setting up Elastic Kubernetes Service (EKS) for managing the application containers and scaling as needed.

By using AWS CDK, you can deploy this infrastructure as code, allowing easy management, automation, and reproducibility for your cloud environment.

Docker

Dockerfile

The Dockerfile is central to the infrastructure of this project. It defines the container image setup for the entire application, including PostgreSQL, Adminer, and the FastAPI app. Below is the structure of the Dockerfiles and Docker Compose file used in the project, which you can find in the docker folder.

├── docker/
│   ├── dev.Dockerfile        
│   ├── prod.Dockerfile       
│   └── docker-compose.yml

The Dockerfile is a multi-stage Docker build that reduces the image considerably while keeping all dependencies in place. If you are interested in digging deeper into multi-stage Docker builds, you can read the article I wrote about it.

Build Stage

The first stage installs:

Python 3.12 using uv
Runtime dependencies
Python packages via uv sync
Guardrails agents

# Multi-stage Docker build for Python application with Guardrails
FROM ghcr.io/astral-sh/uv:bookworm-slim AS builder

# UV configuration for optimized builds
ENV UV_COMPILE_BYTECODE=1 UV_LINK_MODE=copy
ENV UV_PYTHON_INSTALL_DIR=/python
ENV UV_PYTHON_PREFERENCE=only-managed

# Install Python before the project for better caching
RUN uv python install 3.12

# Install runtime dependencies, including Git
RUN apt-get update && apt-get install -y \
    ca-certificates \
    git \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Install dependencies first (better Docker layer caching)
RUN --mount=type=cache,target=/root/.cache/uv \
    --mount=type=bind,source=uv.lock,target=uv.lock \
    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
    uv sync --locked --no-install-project --no-dev

# Copy only necessary files for the project installation
COPY pyproject.toml uv.lock README.md ./
COPY src/ ./src/

# Install the project itself
RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --locked --no-dev

# Set the directory for nltk data
ENV NLTK_DATA=/opt/nltk_data

# Download NLTK data (punkt tokenizer and other common data)
RUN --mount=type=cache,target=/root/.cache/nltk \
    uv run python -m nltk.downloader -d /opt/nltk_data punkt stopwords wordnet averaged_perceptron_tagger

# Configure guardrails using build secret
RUN --mount=type=secret,id=GUARDRAILS_API_KEY \
    uv run guardrails configure --token "$(cat /run/secrets/GUARDRAILS_API_KEY)" \
                                --disable-metrics \
                                --enable-remote-inferencing

# Cache both the download and install locations
RUN --mount=type=cache,target=/root/.cache/guardrails \
    --mount=type=cache,target=/root/.guardrails \
    --mount=type=cache,target=/tmp/guardrails-install \
    uv run guardrails hub install hub://guardrails/toxic_language && \
    uv run guardrails hub install hub://guardrails/detect_jailbreak && \
    uv run guardrails hub install hub://guardrails/secrets_present

Final Runtime Stage

The runtime stage:

Copies the pre-installed Python env and app
Loads only what’s needed to run the FastAPI server
Sets a health check (only in dev mode) and exposes the port 8000

# Production stage - minimal runtime image
FROM debian:bookworm-slim

# Copy Python installation from builder
COPY --from=builder /python /python

# Set working directory
WORKDIR /app

# Copy application and virtual environment from builder
COPY --from=builder /app /app

# Copy NLTK data from builder
COPY --from=builder /opt/nltk_data /opt/nltk_data

# Copy guardrails config
COPY --from=builder /root/.guardrailsrc /root/.guardrailsrc

# Set PATH to include Python and virtual environment
ENV PATH="/python/bin:/app/.venv/bin:$PATH"
ENV NLTK_DATA=/opt/nltk_data

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# Expose port
EXPOSE 8000

# FastAPI command
CMD ["uvicorn", "src.api.main:app", "--host", "0.0.0.0", "--port", "8000", "--log-level", "info"]

As you have seen, there are two different Dockerfiles, one for development and one for production. They are mainly the same, but the development version includes an additional health check on localhost to ensure the FastAPI app is live during development.

Docker Compose

The docker-compose.yml dynamically links to these Dockerfiles based on the APP_ENV environment variable, allowing you to easily toggle between environments.

Docker Compose defines three services:

1. PostgreSQL Database

# docker-compose.yml
services:
  db:
    image: postgres:latest
    restart: unless-stopped
    container_name: ${APP_ENV}-db-container
    env_file:
      - ../.env.${APP_ENV}
    ports:
      - "${POSTGRES_PORT}:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -d $POSTGRES_DB -U ${POSTGRES_USER}"]
      interval: 10s
      timeout: 5s
      retries: 5

2. Adminer UI: Visual interface to inspect your PostgreSQL DB

  adminer:
    image: adminer
    restart: always
    container_name: ${APP_ENV}-adminer-container
    environment:
      - APP_ENV=${APP_ENV}
    env_file:
      - ../.env.${APP_ENV}
    ports:
    - "${ADMINER_PORT}:8080"
    depends_on:
      db:
        condition: service_healthy

3. FastAPI Application: Dynamically uses either dev.Dockerfile or prod.Dockerfile

app:
    build:
      context: ..
      dockerfile: docker/${APP_ENV}.Dockerfile
      secrets:
        - GUARDRAILS_API_KEY
    image: myapp-${APP_ENV}-image
    container_name: ${APP_ENV}-app-container
    ports:
      - "8000:8000"
    environment:
      - APP_ENV=${APP_ENV}
    env_file:
      - ../.env.${APP_ENV}
    depends_on:
      db:
        condition: service_healthy
    restart: unless-stopped

secrets:
  GUARDRAILS_API_KEY:
    environment: GUARDRAILS_API_KEY

volumes:
  postgres_data:
    name: db_data_${APP_ENV}

To build and run the container, you can use the following commands:

make docker-build APP_ENV=dev 
make docker-up APP_ENV=dev

The Data Pipeline

In this project, two main pipelines handle the ingestion of data into different storage systems: PostgreSQL and Qdrant Vector Database. Make sure your Docker container is running to ingest data into the databases. The files can be found under src/data_pipeline, which imports and uses different functions from the database (PostgreSQL) and vectorstore (Qdrant) folders.

1. PostgreSQL/AWS RDS Pipeline

The PostgreSQL pipeline is responsible for storing structured metadata related to GitHub issues and comments. This data is stored in two main tables, issues and comments, as defined in the models. These models are mapped using SQLAlchemy.

The data from this database is later sent to a Qdrant collection for semantic search.

├── src
│   ├── database
│   │   ├── __init__.py
│   │   ├── drop_tables.py
│   │   ├── init_db.py
│   │   └── session.py

Initialize the Database:
The init_db.py, under the database folder, is used to initialize the PostgreSQL database, creating the necessary tables (issues and comments). This step ensures the schema is set up correctly for data ingestion.

make init-db APP_ENV=dev

You can inspect the tables by logging on to Adminer under

http://localhost:8080/

Ingest Raw Data:
Once the tables have been created, you can ingest GitHub issues and comments by running the ingest_raw_data.py script from the data pipeline folder. Each issue is linked to its associated comments through foreign keys, creating a relational structure.

make ingest-github-issues APP_ENV=dev

Alembic Migrations:
Alembic is a lightweight database migration tool designed to work with SQLAlchemy. It is used to manage changes to the database schema over time. Note that Alembic only tracks and applies changes to the structure (e.g., tables, columns, indexes, constraints), not the data itself.
This repository already includes a working Alembic setup used during development, but here are the essential commands you might need if you want to explore this tool in more detail (official documentation).

alembic init alembic             # Initialize Alembic in your project
alembic revision -m "message"    # Create a new migration file
alembic upgrade head             # Apply latest migrations to the database
alembic downgrade -1             # Revert the last migration
alembic current                  # Show the current migration version

Running alembic init alembic will generate:
- An alembic.ini file in your root directory
- A migrations/ folder for migration scripts and configuration
To connect Alembic to your actual database and models, you will need to:
1. Set the database URL in alembic.ini:

sqlalchemy.url = postgresql+psycopg2://myuser:mypassword@localhost:5432/github_issues

1. Set the target_metadata in env.py to include the database models:

from src.models.db_models import Base
target_metadata = Base.metadata

2. Qdrant Pipeline

The Qdrant pipeline handles the ingestion of the issues and comments from the PostgreSQL tables into a vector database collection. It has been configured to work in Async mode for better performance (a “sync” version can be found in the same vectorstore folder), scalar quantization for a faster retrieval process, and hybrid search with Qdrant fastembed package and miniCoil sparse model.

├── src
│   └── vectorstore
│       ├── __init__.py
│       ├── create_collection.py
│       ├── create_index.py
│       ├── delete_collection.py
│       ├── payload_builder.py
│       ├── qdrant_store.py
│       └── qdrant_store_sync.py

# qdrant_store.py

class AsyncQdrantVectorStore:
    def __init__(self) -> None:
        self.client = AsyncQdrantClient(url=settings.QDRANT_URL, api_key=settings.QDRANT_API_KEY)

        self.collection_name = f"{settings.APP_ENV}_{settings.COLLECTION_NAME}"
        self.embedding_size = settings.LEN_EMBEDDINGS

        self.dense_model = TextEmbedding(model_name=settings.DENSE_MODEL_NAME)
        self.sparse_model = SparseTextEmbedding(model_name=settings.SPARSE_MODEL_NAME)

        self.quantization_config = models.ScalarQuantization(
            scalar=models.ScalarQuantizationConfig(
                type=models.ScalarType.INT8,
                quantile=0.99,
                always_ram=True,
            )
        )

        self.sparse_vectors_config = {"miniCOIL": models.SparseVectorParams(modifier=models.Modifier.IDF)}

Create Collection:
First, you need to create a collection. The create_collection.py script initializes a new Qdrant collection with the appropriate configuration, including vector size, quantization settings, and hybrid search parameters.

make create-collection APP_ENV=dev

Create Index:
Additionally, an index is created on the issue_number field to prevent duplicate embeddings.

make create-indexes APP_ENV=dev

Ingest Embeddings:
Once the collection is created and indexed, embeddings can be ingested from the structured data stored in PostgreSQL using the ingest_embeddings.py script. The ingestion process follows these steps:
Chunk & Embed Text:
- Each GitHub comment is split into smaller chunks using a configurable chunk_size.
- Dense embeddings are generated using the BAAI/bge-large-en-v1.5 model.
- Sparse embeddings are generated via miniCOIL to support hybrid search.
- Each chunk is enriched with metadata such as issue_number, comment_id, repository, and user.
Batch & Upsert to Qdrant:
- Chunks are processed and upserted in batches to improve efficiency.
- Duplicate detection ensures previously indexed comments are skipped.
- The process is fully asynchronous and parallelized for scalability.

make ingest-embeddings APP_ENV=dev

Running this command ingests all structured GitHub data into Qdrant, enabling downstream applications like semantic search, question answering, or RAG pipelines.

The Agents: Division of Cognitive Labor

LangGraph is used to structure the decision-making workflow as a graph of composable, asynchronous agents, each responsible for a distinct cognitive task. This modular setup enables control, traceability, and parallelism, making it ideal for complex, multi-stage pipelines like issue triage, classification, and summarization.

Each function in agents.py represents a node in the LangGraph. These agents operate over a shared state object (IssueState), which evolves as the issue flows through the pipeline.

├── src
│   ├── agents
│   │   ├── __init__.py
│   │   ├── agents.py
│   │   ├── graph.py
│   │   └── graph_service.py

# agentic_models.py

class IssueState(BaseModel):
    title: str | None = None
    body: str | None = None
    similar_issues: list[dict[str, Any]] | None = None
    classification: ClassificationState | None = None
    recommendation: Recommendation | None = None
    errors: list[str] | None = None
    blocked: bool | None = None
    validation_summary: dict[str, Any] | None = None

Agent Roles and Workflow

The system is organized into the following steps:

1. Input Guardrail Agent

The Input Guardrail Agent pre-validates the incoming issue text for safety and compliance. It uses the guardrail_validator module to perform three main checks: jailbreak detection, toxicity filtering, and secrets scanning. If any of these validations fail, the issue is marked as blocked, and the reason is stored in the state for logging and debugging. If all checks pass, the pipeline continues to the next step.

async def input_guardrail_agent(state: IssueState) -> IssueState:
    try:
        input_text = f"{getattr(state, 'title', '')}\n{getattr(state, 'body', '')}"
        # from loguru import logger
        # logger.info(f"Input text to guardrails:\n{input_text}")

        ## Jailbreak Guard
        jailbreak_result = await guardrail_validator.check_jailbreak(input_text)

        if not jailbreak_result.validation_passed:
            summary = jailbreak_result.validation_summaries[0]
            score_match = re.search(r"Score: ([\d.]+)", summary.failure_reason) if summary.failure_reason else None
            score = float(score_match.group(1)) if score_match else None

            state.blocked = True
            state.validation_summary = {
                "type": "DetectJailbreak",
                "failure_reason": summary.validator_name,
                "score": score,
            }
            return state

        ## Toxic Guard
        toxic_result = await guardrail_validator.check_toxicity(input_text)

        if not toxic_result.validation_passed:
            summary = toxic_result.validation_summaries[0]

            state.blocked = True
            state.validation_summary = {
                "type": "ToxicLanguage_Input",
                "failure_reason": summary.failure_reason,
                "error_spans": [
                    {"start": span.start, "end": span.end, "reason": span.reason} for span in summary.error_spans or []
                ],
            }

            return state

        # Secret Guard
        secret_result = await guardrail_validator.check_secrets(input_text)

        if not secret_result.validation_passed:
            summary = secret_result.validation_summaries[0]

            state.blocked = True
            state.validation_summary = {
                "type": "SecretsPresent_Input",
                "failure_reason": summary.failure_reason,
            }

            return state

        state.blocked = False
        return state
    except Exception as e:
        return ErrorHandler.log_error(state, e, context="Guardrail error")

2. Issue Search Agent

The Issue Search Agent performs a semantic search over previously logged GitHub issues using vector embeddings. It generates a hybrid dense/sparse embedding of the issue title and body, then queries the Qdrant vector database to retrieve the most similar past issues. The results are parsed and added to the state as a list of similar issue metadata.

async def issue_search_agent(state: IssueState) -> IssueState:
    try:
        query_text = f"{getattr(state, 'title', '')} {getattr(state, 'body', '')}"
        results = await services.qdrant_store.search_similar_issues(query_text)

        similar_issues = [
            {
                "issue_number": hit.payload.get("issue_number"),
                "repo": hit.payload.get("repo"),
                "owner": hit.payload.get("owner"),
                "title": hit.payload.get("title"),
                "url": hit.payload.get("url"),
                "comment_id": hit.payload.get("comment_id"),
                "chunk_text": hit.payload.get("chunk_text"),
                "score": hit.score,
                "is_bug": hit.payload.get("is_bug"),
                "is_feature": hit.payload.get("is_feature"),
            }
            for hit in results
            if hit.payload is not None
        ]

        state.similar_issues = similar_issues
        return state

    except Exception as e:
        return ErrorHandler.log_error(state, e, context="Issue search error")

3. Classification Agent

The Classification Agent is responsible for understanding the intent and priority of the issue. It builds a structured prompt that includes the issue’s title, body, and similar issues found in the previous step. This prompt is then sent to an LLM that supports tool-calling. The response is parsed into a ClassificationState object containing key fields like category (e.g., bug, feature), priority (e.g., high, medium), relevant labels, and an appropriate assignee (e.g., core-devs, ml-team) . This helps route the issue correctly within a development workflow.

async def classification_agent(state: IssueState) -> IssueState:
    try:
        prompt = PromptTemplates.classification_prompt().format(
            title=state.title, body=state.body, similar_issues=state.similar_issues
        )

        response: AIMessage = await services.llm_with_tools.ainvoke(prompt)  # type: ignore
        parsed = response.tool_calls[0]["args"]  # Parsed dict output

        state.classification = ClassificationState(**dict(parsed))

        return state

    except Exception as e:
        return ErrorHandler.log_error(state, e, context="Classification error")

4. Recommendation Agent

The Recommendation Agent generates a human-readable summary and reference recommendations to assist developers. It extracts the top URLs from similar issues and creates a summarization prompt that includes the current issue details along with those references. This prompt is submitted to an LLM to produce a high-level recommendation, which includes a concise summary and a list of the most relevant prior issues. The generated recommendation is stored in the state for review and output.

async def recommendation_agent(state: IssueState) -> IssueState:
    try:
        # Get top similar issue URLs
        top_references = []
        seen_urls = set()

        similar_issues = getattr(state, "similar_issues", []) or []
        for issue in similar_issues:
            url = issue["url"]
            if url not in seen_urls:
                top_references.append(url)
                seen_urls.add(url)
            if len(top_references) == 4:
                break

        prompt = PromptTemplates.summary_prompt(state.dict(), top_references)

        response = await services.llm.ainvoke(prompt)

        if isinstance(response.content, str):
            summary = response.content.strip()
        elif isinstance(response.content, list):
            # Join list items as string, or handle as needed
            summary = " ".join(str(item) for item in response.content).strip()
        else:
            summary = str(response.content).strip()

        state.recommendation = Recommendation(summary=summary, references=top_references)

        # logger.info(f"Recommendation summary: {summary}")

        return state

    except Exception as e:
        return ErrorHandler.log_error(state, e, context="Recommendation error")

5. Output Guardrail Agent

The Output Guardrail Agent ensures that the final recommendation output generated by the LLM is safe and compliant. It runs the same validations used in the input guardrail step, specifically checks for toxic language and leaked secrets. If any unsafe content is detected in the summary, the output is marked as blocked, and the associated failure reason is stored.

async def output_guardrail_agent(state: IssueState) -> IssueState:
    try:
        output_text = getattr(getattr(state, "recommendation", None), "summary", "")

        if not output_text:
            # No text to validate, consider not blocked or handle accordingly
            state.blocked = False
            return state

        # Toxic Guard
        toxic_result = await guardrail_validator.check_toxicity(output_text)

        if not toxic_result.validation_passed:
            summary = toxic_result.validation_summaries[0]
            state.blocked = True
            state.validation_summary = {
                "type": "ToxicLanguage_Output",
                "failure_reason": summary.failure_reason,
                "error_spans": [
                    {"start": span.start, "end": span.end, "reason": span.reason} for span in summary.error_spans or []
                ],
            }
            return state

        # Secret Guard
        secret_result = await guardrail_validator.check_secrets(output_text)

        if not secret_result.validation_passed:
            summary = secret_result.validation_summaries[0]
            state.blocked = True
            state.validation_summary = {
                "type": "SecretsPresent_Output",
                "failure_reason": summary.failure_reason,
            }
            return state

        state.blocked = False
        return state

    except Exception as e:
        return ErrorHandler.log_error(state, e, context="Output Guardrail error")

Querying the Graph

To test the graph, you can run the graph.py file with a test query (feel free to change the test query in the file), where you can add a title and a body for a test issue. This will trigger the agents and provide the final recommendation if the input is safe, or stop it if any of the guardrails are activated

make query-graph APP_ENV=dev

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Run research graph with custom parameters")
    parser.add_argument("--title", type=str, default="Huberregressor.", help="Issue Title")
    parser.add_argument(
        "--body",
        type=str,
        default="""
    def hello():
        user_id = "1234"
        user_pwd = "password1234"
        user_api_key = "sk-xhdfgtest"

    """,
        help="Issue Body",
    )
    args = parser.parse_args()

    async def main() -> dict:
        graph = build_issue_workflow().compile()
        result = await graph.ainvoke({"title": args.title, "body": args.body})

Example output with secrets:

Title: Hello world.
2025-07-05 16:13:38.351 | INFO     | __main__:main:75 -

Body:
    def hello():
        user_id = "1234"
        user_pwd = "password1234"
        user_api_key = "sk-xhdfgtest"


2025-07-05 16:13:38.351 | INFO     | __main__:main:78 -

Validation Type: SecretsPresent_Input
2025-07-05 16:13:38.351 | INFO     | __main__:main:79 -

Failure Reason: The following secrets were detected in your response:
password1234
sk-xhdfgtest
Querying the graph completed.

Example output from safe input:

### Recommendation Summary for GitHub Issue

**Issue Classification:**
- **Category:** Bug
- **Priority:** High
- **Labels:** bug, needs triage, API, performance
- **Assignee:** @ml-team

**Context:**
This issue has been identified as a high-priority bug related to the API performance of the project. Given its classification, it is essential to address it promptly to ensure optimal functionality and user experience.

**Similar Past Issue:**
- [Issue #31659](https://github.com/scikit-learn/scikit-learn/issues/31659) provides a relevant reference point. This issue may share similar characteristics or underlying causes, which could offer insights into potential solutions or workarounds.

**Recommendations:**
1. **Review Similar Issues:** Analyze the details and resolutions of the referenced issue to identify any applicable solutions or patterns that could inform the current bug fix.

2. **Triage and Prioritize:** Given the high priority, ensure that this issue is triaged quickly. Engage with the team to assess the impact and urgency, and allocate resources accordingly.

3. **Performance Analysis:** Conduct a thorough performance analysis of the API to pinpoint the root cause of the bug. Utilize profiling tools to gather data on performance bottlenecks.

4. **Testing and Validation:** Implement comprehensive testing to validate the fix. Consider both unit tests and integration tests to ensure that the solution does not introduce new issues.

5. **Documentation:** Update any relevant documentation to reflect changes made in the API or performance improvements. This will help users understand the updates and any necessary adjustments on their end.

6. **Community Engagement:** If applicable, communicate with the community regarding the issue's status and any expected timelines for resolution. Transparency can help manage user expectations.

By following these recommendations, the team can effectively address the bug, improve API performance, and enhance overall user satisfaction.
Querying the graph completed.

This structured, agentic approach allows for intelligent, explainable, and safe decision-making, making the system maintainable and developer-friendly.

API Layer: FastAPI as the Interface

The final component of the system is a lightweight FastAPI server that exposes the LangGraph-powered agent workflow over HTTP.

src/
└── api/
    ├── __init__.py
    └── main.py

This layer serves as the external entry point, wrapping the cognitive pipeline behind a clean RESTful interface:

Initializes the FastAPI app with a lifespan context that pre-compiles the LangGraph workflow at startup for faster runtime performance.
Exposes core endpoints:
- /process-issue: full issue analysis
- /validate: lightweight input validation
- /health and /ready: health and readiness probes
- /stats: service diagnostics
Includes middleware for CORS handling, request/response logging, and centralized error reporting.
Uses dependency injection to safely share the compiled graph across routes and block requests during startup if uninitialized.

Once the Docker container is running, you can access and test these endpoints with different queries via FastAPI’s interactive Swagger UI at http://127.0.0.1:8000/docs.

Production Infrastructure: AWS CDK and Kubernetes

To scale the application in a cloud-native environment, you will deploy the infrastructure into AWS and provision a Relational Database (AWS RDS), a Kubernetes Cluster (AWS EKS) for orchestration, and a Virtual Private Cloud (AWS VPC) that allows our services to interact with each other.

The control plane (kube-apiserver, etcd, kube-controller-manager, kube-scheduler) and several nodes. Each node is running a kubelet and kube-proxy — Kubernetes Architecture

Kubernetes is a powerful, open-source platform designed to automate the deployment, scaling, and management of containerized applications. Its architecture is built around the following key components:

API Server: It is the central management entity.
Controller Manager: Manages controllers that handle routine tasks such as replication and scaling, and makes sure the deployment is running as per specifications.
Worker Nodes: These are instances (either virtual machines or physical servers) that run the applications in the form of containers.
Scheduler: Assigns workloads to nodes based on resource availability and other constraints.
Pods: The smallest deployable units in Kubernetes, representing a single instance of a running process in a cluster. Pods can contain one or more containers that share storage, network, and a specification for how to run the containers.

The nodes have three main components:

Kubelet: An agent that ensures containers are running in a Pod.
Kube-proxy: Handles network traffic within the Kubernetes cluster.
Container Runtime: Software that runs the containers (Docker for our app).

This architecture provides robust scalability, ensuring that our app runs as per our specifications at all times. Some of the key factors that make Kubernetes a very popular choice include:

Self-healing: If a container does not work as expected, such as when a pod stops working, a new pod is automatically generated.
Rolling Updates: Automated rollouts and rollbacks allow for updating applications without downtime.
Load Balancing: Distributes load across the pods, optimizing resource utilization.

If you are not familiar with Kubernetes, you can explore the documentation to better understand how it works in detail.

AWS CDK Infrastructure

You can use Kubernetes standalone, but to automate and manage the cloud infrastructure for the application, you can use AWS Cloud Development Kit (CDK), a powerful Infrastructure-as-Code (IaC) framework that allows us to define AWS resources using familiar programming languages like Python.

The CDK project is organized into three core stacks:

aws_cdk_infra/
├── app.py
├── aws_eks_rds/
│   ├── vpc_stack.py
│   ├── eks_stack.py
│   └── rds_stack.py
├── requirements.txt
└── .env

Make sure to create the .env file with your credentials. The public IP (IPv4) will be used to simplify access from your computer to the RDS once you deploy the infrastructure and ingest data in the production tables. You can find your IP here.

POSTGRES_USER=postgressecret
POSTGRES_DB=github_issues
IAM_USER=
AWS_ACCOUNT_ID=
MY_PUBLIC_IP=

EKS Stack

The EKSStack provisions the managed Kubernetes cluster on AWS:

EKS Cluster Creation:

Deploys an EKS cluster with no default EC2 worker nodes, as AWS Fargate will be used to run pods serverlessly.
Note: AWS Fargate is a serverless compute engine that runs containers without requiring you to manage the underlying EC2 instances. This means you do not need to provision, scale, or maintain servers, as AWS handles all infrastructure management for you. In contrast, with EC2, you must manage and scale the worker nodes yourself, which gives more control but requires more operational effort.

IAM Configuration:

Maps the existing IAM user to Kubernetes system:masters group to allow admin access. This is useful for setting up initial access, but you can use IAM roles with AWS SSO or role-based access to allow more controlled access.
Creates a Pod Execution Role with policies for:
- Running Fargate pods.
- Accessing secrets, container registries, CloudWatch logging, and EC2 networking.

Fargate Profiles:

Defines a Fargate profile for core system pods (kube-system namespace).
Defines a Fargate profile for our FastAPI application pods, tagged with the label app=fastapi in the my-app namespace.

class EKSStack(Stack):
    def __init__(self, scope: Construct, id: str, vpc: ec2.Vpc, **kwargs: object) -> None:
        super().__init__(scope, id, **kwargs)

        # Define the EKS Cluster (Fargate)
        eks_cluster = eks.Cluster(
            self,
            "EKSCluster",
            version=eks.KubernetesVersion.V1_32,
            vpc=vpc,
            default_capacity=0,  # No default EC2 worker nodes
            kubectl_layer=aws_cdk.lambda_layer_kubectl_v33.KubectlV33Layer(self, "KubectlLayer"),
        )

        # Define the IAM user
        admin_user = iam.User.from_user_arn(
            self, "AdminUser", f"arn:aws:iam::{os.getenv('AWS_ACCOUNT_ID')}:user/{os.getenv('IAM_USER')}"
        )

        # Add the IAM user to the 'system:masters' Kubernetes group for admin access
        eks_cluster.aws_auth.add_user_mapping(
            admin_user,  # Use admin_user here
            groups=["system:masters"],
        )

        # Create the Pod Execution Role for Fargate Profile
        pod_execution_role = iam.Role(
            self,
            "PodExecutionRole",
            assumed_by=iam.ServicePrincipal("eks-fargate-pods.amazonaws.com"),
            managed_policies=[
                # Required for Fargate pod execution
                iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEKSFargatePodExecutionRolePolicy"),
                # # Allow full EC2 access
                iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEC2FullAccess"),
                # Allow CloudWatch logging (if you're logging)
                iam.ManagedPolicy.from_aws_managed_policy_name("CloudWatchLogsFullAccess"),
                # Access to ECR (if pulling container images from ECR)
                iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEC2ContainerRegistryReadOnly"),
                # Permissions for EKS networking
                iam.ManagedPolicy.from_aws_managed_policy_name("SecretsManagerReadWrite"),
            ],
        )

        # Define Fargate Profile for system pods (CoreDNS, etc.)
        eks_cluster.add_fargate_profile(
            "SystemFargateProfile",
            selectors=[{"namespace": "kube-system"}],
            pod_execution_role=pod_execution_role,
        )

        # Define Fargate Profile for FastAPI app
        eks_cluster.add_fargate_profile(
            "AppFargateProfile",
            selectors=[{"namespace": "my-app", "labels": {"app": "fastapi"}}],
            pod_execution_role=pod_execution_role,
        )

RDS Stack

The RDSStack sets up a managed PostgreSQL database for storing GitHub issues and related data:

Creates a PostgreSQL instance in private subnets for security.
Uses AWS Secrets Manager to generate and manage database credentials automatically.
Restricts access with security groups to allow traffic only within the VPC and within the public IP.
Configures instance size to BURSTABLE3.MICRO for cost efficiency.
Enables backups with a 7-day retention policy.
Ensures the database is not publicly accessible.

Note that you can make access to the RDS completely private, but this would require ingesting the data from an EC2 instance inside the same VPC, which would generate additional costs. Making it public from your IP will allow for an easier interaction.

class RDSStack(Stack):
    def __init__(self, scope: Construct, id: str, vpc: ec2.Vpc, **kwargs: Any):
        super().__init__(scope, id, **kwargs)

        credentials = rds.Credentials.from_generated_secret(os.getenv("POSTGRES_USER"))

        security_group = ec2.SecurityGroup(
            self,
            "RDSSecurityGroup",
            vpc=vpc,
            description="Allow internal access to PostgreSQL",
            allow_all_outbound=True,
        )

        security_group.add_ingress_rule(
            peer=ec2.Peer.ipv4(vpc.vpc_cidr_block),
            connection=ec2.Port.tcp(5432),
        )

        # Allow access only from your specific IP (use your real public IP here)
        my_public_ip = os.getenv("MY_PUBLIC_IP")
        if my_public_ip:
            security_group.add_ingress_rule(
                peer=ec2.Peer.ipv4(f"{my_public_ip}/32"),
                connection=ec2.Port.tcp(5432),
                description="Allow external access to PostgreSQL from my IP"
            )
        else:
            raise ValueError("Environment variable MY_PUBLIC_IP is not set.")

        rds.DatabaseInstance(
            self,
            "PostgresDB",
            engine=rds.DatabaseInstanceEngine.postgres(version=rds.PostgresEngineVersion.VER_16_3),
            vpc=vpc,
            credentials=credentials,
            # vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS),
            vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PUBLIC),
            security_groups=[security_group],
            database_name=os.getenv("POSTGRES_DB"),
            instance_type=ec2.InstanceType.of(ec2.InstanceClass.BURSTABLE3, ec2.InstanceSize.MICRO),
            allocated_storage=20,
            max_allocated_storage=100,
            # publicly_accessible=False,
            publicly_accessible=True,
            multi_az=False,
            backup_retention=Duration.days(7),
            removal_policy=RemovalPolicy.DESTROY,
        )

This managed RDS instance ensures persistent, reliable storage while abstracting away the underlying infrastructure management.

VPC Stack

The VPCStack is responsible for creating a custom Virtual Private Cloud (VPC) with public and private subnets, which serve as the foundational network layer for our infrastructure:

Defines a VPC spanning 2 availability zones for high availability.
Creates two types of subnets:
- Public subnets for internet-facing resources.
- Private subnets with egress for internal resources like databases and compute instances.
Uses a smaller CIDR block (10.0.0.0/20) to efficiently allocate IPs without wasting resources.

class VPCStack(Stack):
    def __init__(self, scope: Construct, id: str, **kwargs: object) -> None:
        super().__init__(scope, id, **kwargs)

        # Define a VPC with smaller CIDR block and optimize for smaller-scale use cases
        self.vpc = ec2.Vpc(
            self,
            "SharedVPC",
            max_azs=2,  # 2 Availability Zones for fault tolerance, you can change to 3 for more resiliency
            ip_addresses=ec2.IpAddresses.cidr("10.0.0.0/20"),  # A smaller CIDR block (4,096 IP addresses)
            subnet_configuration=[
                ec2.SubnetConfiguration(
                    name="Public",
                    subnet_type=ec2.SubnetType.PUBLIC,
                    cidr_mask=24,  # This subnet will have 256 IP addresses
                ),
                ec2.SubnetConfiguration(
                    name="Private",
                    subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS,
                    cidr_mask=24,  # This subnet will also have 256 IP addresses
                ),
            ],
        )

This modular network setup enables secure communication between services and isolates our database from public access while allowing pods in Kubernetes to communicate freely.

Deployment

To deploy the infrastructure, you need to move to the aws_cdk_infra folder and create an isolated virtual environment. Once done, you have to follow the next steps:

Install dependencies:

pip install -r requirements.txt

Bootstrap your AWS environment (if first time):

cdk bootstrap aws://<AWS_ACCOUNT_ID>/<AWS_REGION>

Deploy the stacks:

cdk deploy --all

This will provision your VPC, EKS cluster with Fargate, and RDS PostgreSQL database in AWS, ready to host your containerized FastAPI app (the deployment might take up to 30 minutes to finish). You can find the provisioned Stack under AWS CloudFormation and the necessary secrets under AWS Secrets Manager. You would need to add to the .env.prod file the password, host, username, and secret name, as during development, you were running on localhost.

Once you adapt your .env.prod file, you can build your Dockerfile with the AWS CDK Stack. You must run the same commands as before but with APP_ENV=prod. This is applicable for creating the issues/comments tables, ingesting GitHub data, creating the Qdrant collection/index, and ingesting the embeddings:

# Docker
make docker-build APP_ENV=prod
make docker-up APP_ENV=prod

# PostgreSQL
make init-db APP_ENV=prod
make ingest-github-issues APP_ENV=prod

# Qdrant
make create-collection APP_ENV=prod
make create-indexes APP_ENV=prod
make ingest-embeddings APP_ENV=prod

NOTE: The following setup serves as the initial production environment and has been simplified to allow full IAM access and reduce complexity. As your application scales, you must enhance it with features like autoscaling policies, CI/CD pipelines, monitoring dashboards, and stricter security controls.

Kubernetes Cluster

Once you have deployed your AWS CDK infrastructure, your EKS cluster will be visible in the AWS Console. To interact with the cluster from your local machine, you need to configure your kubectl CLI.

When you install the Kubernetes CLI (kubectl), the configuration file is stored under ~/.kube/config. To add your EKS cluster to this configuration file, run the following command (replace placeholders accordingly):

aws eks --region <aus-region> update-kubeconfig --name <cluster-name>

You can verify the configuration with:

kubectl config view

This should display information about your cluster context, users, and settings.

To verify that the core system components are running, list the pods in the kube-system namespace. Namespaces are a way to organize and isolate groups of resources within a single cluster. The namespace kube-system is the default one with the core pods that make the cluster work.

kubectl get pods -n kube-system

You should see output similar to:

NAME                       READY   STATUS    RESTARTS   AGE
coredns-749464bbc5-8kz45   1/1     Running   0          23m
coredns-749464bbc5-mszql   1/1     Running   0          23m

If, for some reason, system pods are not running, you can delete them. Kubernetes will automatically recreate them:

 kubectl delete pods --all -n kube-system

K9s UI

If you have installed K9s, you can interact with your cluster via a terminal-based UI. Simply run:

k9s

Useful commands in K9s:

Press “Shift + :” to open the command bar.
Type nodes to view Fargate serverless nodes.
Use Enter to drill into resources.
Press Esc to go back to the previous screen.

While the UI might take a little time to get used to, it greatly simplifies navigation and debugging.

Kubernetes Manifests

Kubernetes uses YAML manifests to declaratively manage infrastructure and deployments. For the FastAPI application, you must define a deployment (the app) and a service (a load balancer to make requests from an external API)

├── kubernetes
│   ├── fastapi-deployment.yaml
│   ├── fastapi-service.yaml

Make sure to run this command before deploying the app as, as mentioned before, the app will run in a separate namespace.

kubectl create namespace my-app

Before deploying the app, you need to build and push your Docker image to an AWS ECR (Elastic Container Registry) repository, as now we will deploy to production and in the deployment manifest (see next chapter) will point to the image in the ECR Repository. Ensure that your image is built for production with the APP_ENV=prod, and just replace the placeholders with your AWS credentials.

aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin <aws-account-id>.dkr.ecr.<aus-region>.amazonaws.com

aws ecr create-repository --repository-name fastapi-app --region <aus-region>

docker tag myapp-prod-image:latest <aws-account-id>.dkr.ecr.<aus-region>.amazonaws.com/fastapi-app:latest

docker push <aws-account-id>.dkr.ecr.<aus-region>.amazonaws.com/fastapi-app:latest

Configure Kubernetes Secrets & ConfigMaps

To provide environment variables and sensitive information to the application, use Kubernetes Secrets and ConfigMaps. ConfigMaps are for non-sensitive variables, while Secrets are sensitive information like passwords or API keys.

There are different ways to add the environment variables to the cluster. Ideally, this can be done during the AWS CDK deployment. Under the eks_stack.py script, you will find at the end of the file the possibility to add this to the cluster. You can uncomment the eks_cluster.add_manifest options and make sure to have the variables set up in the .env file. Then, run the following command to update the cluster.

cdk deploy EksStack

Alternatively, you can do it through the terminal by creating a specific env file (or a manifest) and running the command below, pointing to that file.

kubectl create secret generic app-secrets \
  --from-env-file=secrets.env \
  -n my-app

kubectl create configmap app-config \
  --from-env-file=secrets.env \
  -n my-app

Also, it is possible to provide the values directly (which, for a proper production setup, is not recommended):

kubectl create configmap app-config \
                --from-literal=APP_ENV=prod \
                --from-literal=AWS_REGION= \
                --from-literal=POSTGRES_DB= \
                --from-literal=POSTGRES_PORT= \
                --from-literal=ADMINER_PORT= \
                --from-literal=ISSUES_TABLE_NAME= \
                --from-literal=COMMENTS_TABLE_NAME= \
                --from-literal=DENSE_MODEL_NAME= \
                --from-literal=SPARSE_MODEL_NAME= \
                --from-literal=LEN_EMBEDDINGS= \
                --from-literal=COLLECTION_NAME= \
                --from-literal=CHUNK_SIZE= \
                --from-literal=BATCH_SIZE= \
                --from-literal=CONCURRENT_COMMENTS= \
                --from-literal=LLM_MODEL_NAME= \
                --from-literal=TEMPERATURE= \
                --from-literal=REPOS_CONFIG=src/config/repos.yaml \
                --from-literal=GUARDRAILS_CONFIG=src/config/guardrails.yaml \
                -n my-app

kubectl create secret generic app-secrets \
                --from-literal=GH_TOKEN= \
                --from-literal=POSTGRES_USER= \
                --from-literal=POSTGRES_HOST=\
                --from-literal=POSTGRES_PASSWORD== \
                --from-literal=QDRANT_API_KEY= \
                --from-literal=QDRANT_URL= \
                --from-literal=LANGSMITH_API_KEY= \
                --from-literal=OPENAI_API_KEY= \
                --from-literal=GUARDRAILS_API_KEY= \
                --from-literal=SECRET_NAME= \
                -n my-app

Deploy the Application

Once the environment variables have been set up, you can deploy the application by applying the deployment manifest. Below is a short snippet of the file. It contains a reference to all variables and the Docker image that have been previously pushed to the AWS ECR repository. Having 3 replicas allows you to keep the application running with some safety margin.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastapi-deployment
  namespace: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: fastapi
  template:
    metadata:
      labels:
        app: fastapi
    spec:
      containers:
        - name: fastapi
          image: 730445307583.dkr.ecr.eu-central-1.amazonaws.com/fastapi-app:latest
          ports:
            - containerPort: 8000
          env:
            # Non-sensitive env variables from ConfigMap
            - name: APP_ENV
              valueFrom:
                configMapKeyRef:
                  name: app-config
                  key: APP_ENV
            - name: AWS_REGION

kubectl apply -f fastapi-deployment.yaml

After applying the manifest, you should see the application pods, and if you enter one of them, all health/ready checks shall be ok.

Now you can try locally to send a request from the terminal. If everything has been set up properly, you shall get a reply.

curl -X POST "http://localhost:8000/process-issue" \
-H "Content-Type: application/json" \
-d '{
"title": "Test Issue",
"body": "Test Issue"
}'

Load Balancer

If you want to route internet traffic, you can install the AWS Load Balancer Controller that enables your EKS cluster to provision external IPs via AWS-managed Elastic Load Balancers (ELB) when using Service or Ingress resources.

You must follow the steps outlined here in this link. Just make sure you add your VPC ID, which you can find in the AWS Console or by running this command in the terminal.

aws eks describe-cluster --name <your cluster name> --query "cluster.resourcesVpcConfig.vpcId" --output text

If everything is set up properly, the load balancer must be visible under deployments.

kubectl get deployment -n kube-system aws-load-balancer-controller

The final step is to apply a LoadBalancer service to our app. This will provide us with an external IP address that we can use to send requests to the application.

apiVersion: v1
kind: Service
metadata:
  name: fastapi-external
  namespace: my-app
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    my-app.com/description: "FastAPI external service"

spec:
  selector:
    app: fastapi
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 8000
  type: LoadBalancer

kubectl apply -f fastapi-service.yaml

 kubectl get svc -n my-app

NAME               TYPE           CLUSTER-IP      EXTERNAL-IP                                                                     PORT(S)        AGE
fastapi-external   LoadBalancer   172.20.149.13   k8s-myapp-fastapie-fd3be5a8de-70506abda67dae9d.elb.eu-central-1.amazonaws.com   80:31745/TCP   85s

If you try now to make this request from an external device, you shall get an answer.

curl -X POST "http://k8s-myapp-fastapie-fd3be5a8de-70506abda67dae9d.elb.eu-central-1.amazonaws.com/process-issue" \
-H "Content-Type: application/json" \
-d '{
"title": "Test Issue",
"body": "Test Issue"
}'

Clean Up

Do not forget to destroy the AWS CDK stacks when you are done. Keeping this running may cost approximately $5-10/day. The following command will delete all stacks (and might take up to 30 minutes). Additionally, you shall delete the AWS ECR repository.

cdk destroy --all

Once this is done, make sure there are no running resources by checking the AWS Console and going through the following resources:

AWS CloudFormation
AWS EKS
AWS RDS
AWS Secrets Manager
EC2 Load Balancers
AWS VPC

If any resources remain there, you can always delete them manually in the AWS Console.

Conclusion

Managing issues in large-scale GitHub repositories demands more than just basic automation. It requires intelligent, scalable, and safe decision-making. This project demonstrates how a multi-agent system powered by LangGraph, LLMs, vector databases, and cloud-native tooling can provide a comprehensive solution to that challenge.

By orchestrating specialized agents, each handling a different aspect of issue triage, such as validation, semantic search, classification, and recommendation, the system not only streamlines developer workflows but also ensures higher consistency, contextual relevance, and operational safety. The use of PostgreSQL and Qdrant allows for both structured and semantic insights, while Guardrails AI guarantees the integrity of LLM interactions.

Additionally, the deployment stack built with AWS CDK and Kubernetes provides a robust, production-grade foundation that is fully reproducible and ready to scale. With Docker-based development, configurable pipelines, and modular architecture, this system serves as a practical blueprint for integrating AI-powered agents into real-world developer infrastructure.

See you next week.

Benito Martin

👋 We’d love your feedback to help improve Decoding ML.
Share what you want to see next or your top takeaway. I read and reply to every comment!

👏 If you enjoyed reading this content, you can support me by following me on Substack, Medium, or Dev, where I publish my articles!

Whenever you’re ready, there are 3 ways we can help you:

Perks: Exclusive discounts on our recommended learning resources
(books, live courses, self-paced courses and learning platforms).
The LLM Engineer’s Handbook: Our bestseller book on teaching you an end-to-end framework for building production-ready LLM and RAG applications, from data collection to deployment (get up to 20% off using our discount code).
Free open-source courses: Master production AI with our end-to-end open-source courses, which reflect real-world AI projects and cover everything from system architecture to data collection, training and deployment.

Images

If not otherwise stated, all images are created by the author.

A guest post by

Benito Martin

Founder @ Martin Data Solutions | DS, ML & AI | Building, Teaching & Writing 🚀🚀🚀