Precision Autoimmune Intelligence Agent -- Deployment Guide¶

Author: Adam Jones Date: March 2026

Table of Contents¶

Deployment Overview
Prerequisites
Local Development Setup
Docker Deployment
DGX Spark Deployment
Environment Variables
Milvus Configuration
Health Checks and Monitoring
Security Configuration
Production Hardening
Scaling and High Availability
Troubleshooting
Quick Reference

1. Deployment Overview¶

The Precision Autoimmune Intelligence Agent is a multi-collection RAG (Retrieval-Augmented Generation) system purpose-built for autoimmune disease analysis. It combines 14 domain-specific Milvus vector collections with Claude LLM synthesis to provide evidence-based clinical decision support covering differential diagnosis, flare prediction, pharmacogenomics, and treatment optimization.

Architecture Summary¶

                         +---------------------+
                         |   Streamlit UI      |
                         |   (port 8531)       |
                         +--------+------------+
                                  |
                         +--------v------------+
                         |   FastAPI Server    |
                         |   (port 8532)       |
                         +--------+------------+
                                  |
                    +-------------+-------------+
                    |             |              |
            +-------v--+  +------v-----+  +----v--------+
            | RAG      |  | Diagnostic |  | Document    |
            | Engine   |  | Engine     |  | Processor   |
            +-------+--+  +------+-----+  +----+--------+
                    |             |              |
                    +------+------+--------------+
                           |
                  +--------v---------+
                  |  Milvus Vector   |
                  |  DB (19530)      |
                  |  14 collections  |
                  |  384-dim BGE     |
                  +------------------+

Service Topology¶

The agent deploys as 3 Docker services plus a shared Milvus dependency:

Service	Container Name	Port	Description
Streamlit UI	`autoimmune-streamlit`	8531	Interactive clinical interface
FastAPI API	`autoimmune-api`	8532	REST API with RAG query, search, ingest, export
Setup	`autoimmune-setup`	--	One-shot: creates collections + seeds knowledge
Milvus	`milvus-standalone`	19530	Shared vector database (external dependency)

Port Map¶

Port	Service	Protocol
8531	Streamlit UI	HTTP
8532	FastAPI API	HTTP
19530	Milvus gRPC	gRPC
9091	Milvus Proxy	HTTP

2. Prerequisites¶

Hardware Requirements¶

Minimum (Development): - 8 CPU cores - 32 GB RAM (Milvus alone needs ~8 GB for 14 collections) - 50 GB disk (model cache + collections + data) - No GPU required (embedding model runs on CPU)

Recommended (Production / DGX Spark): - NVIDIA DGX Spark: GB10 GPU, 128 GB unified LPDDR5x, 20 ARM cores (Grace CPU), NVLink-C2C - 100 GB SSD for Milvus persistence and model cache - GPU accelerates embedding generation for large ingestion workloads

Software Requirements¶

Component	Version
Python	3.10+
Docker	24.0+
Docker Compose	2.20+
Milvus	2.4+ (standalone)

Python Dependencies (from requirements.txt):

pydantic>=2.0
pydantic-settings>=2.7
pymilvus>=2.4.0
sentence-transformers>=2.2.0
anthropic>=0.18.0
streamlit>=1.30.0
fastapi>=0.109.0
uvicorn[standard]>=0.27.0
python-dotenv>=1.0.0
loguru>=0.7.0
numpy>=1.24.0
pandas>=2.0.0
plotly>=5.18.0
reportlab>=4.0.0
PyPDF2>=3.0.0
python-multipart>=0.0.6
prometheus-client>=0.20.0
requests>=2.31.0
httpx>=0.25.0

Network Requirements¶

Outbound HTTPS to api.anthropic.com (Claude API)
Outbound HTTPS to huggingface.co (first-run model download for BAAI/bge-small-en-v1.5)
Internal TCP to Milvus on port 19530
Docker network hcls-network (external, shared across AI Factory services)

API Keys¶

Key	Required	Source
`ANTHROPIC_API_KEY`	Yes (for LLM features)	console.anthropic.com
`AUTO_API_KEY`	No (optional auth)	Self-generated

Without ANTHROPIC_API_KEY, the agent starts in degraded mode: vector search works but LLM synthesis is unavailable.

3. Local Development Setup¶

3.1 Clone and Configure¶

cd /home/adam/projects/hcls-ai-factory/ai_agent_adds/precision_autoimmune_agent

# Create .env from example
cp .env.example .env

# Edit .env and set your Anthropic API key
nano .env

Minimum .env contents:

ANTHROPIC_API_KEY=sk-ant-api03-your-key-here
AUTO_MILVUS_HOST=localhost
AUTO_MILVUS_PORT=19530

3.2 Install Dependencies¶

The run.sh script auto-creates a virtual environment, but you can do it manually:

python3 -m venv venv
./venv/bin/pip install --upgrade pip
./venv/bin/pip install -r requirements.txt

The first run will download the BAAI/bge-small-en-v1.5 embedding model (~134 MB) from Hugging Face. This is cached at ~/.cache/huggingface/ for subsequent runs.

3.3 Start Milvus¶

If Milvus is not already running from the main AI Factory stack:

# Standalone Milvus via Docker
docker run -d \
  --name milvus-standalone \
  -p 19530:19530 \
  -p 9091:9091 \
  -v milvus_data:/var/lib/milvus \
  milvusdb/milvus:v2.4.0 \
  milvus run standalone

# Verify Milvus is ready
curl -s http://localhost:9091/healthz
# Expected: {"status":"ok"}

3.4 Initialize Collections¶

Create all 14 vector collections and seed the knowledge base:

./run.sh --setup

This runs scripts/setup_collections.py --seed, which: 1. Connects to Milvus at AUTO_MILVUS_HOST:AUTO_MILVUS_PORT 2. Creates 13 autoimmune-specific collections (skips genomic_evidence if it does not exist, as it is a shared read-only collection) 3. Seeds HLA associations, autoantibody panels, biologic therapies, disease activity scores, flare patterns, classification criteria, and cross-disease patterns 4. Embeds all seed data using BAAI/bge-small-en-v1.5 (384-dimensional vectors)

You can also run it directly with explicit host/port:

./venv/bin/python scripts/setup_collections.py \
    --host localhost \
    --port 19530 \
    --seed

To recreate collections from scratch (drops existing data):

./venv/bin/python scripts/setup_collections.py \
    --drop-existing \
    --seed

3.5 Load Demo Data¶

Demo patient data is stored in demo_data/ and can be loaded via the API once the server is running:

# Start the API first (see 3.6), then:
curl -X POST http://localhost:8532/ingest/demo-data

Or load specific patient scenarios using the scripts:

./venv/bin/python scripts/patient_sarah.py
./venv/bin/python scripts/patient_maya.py
./venv/bin/python scripts/patient_emma.py
./venv/bin/python scripts/generate_demo_patients.py

3.6 Start Services¶

Streamlit UI only (default):

./run.sh
# Starts on port 8531
# Access: http://localhost:8531

FastAPI API only:

./run.sh --api
# Starts on port 8532 with 2 uvicorn workers
# Access: http://localhost:8532/docs (Swagger UI)

Both UI and API (recommended for development):

./run.sh --both
# UI on port 8531, API on port 8532
# Ctrl+C triggers graceful shutdown of both processes

The --both mode uses signal trapping (SIGTERM, SIGINT) and a 5-second graceful shutdown window before force-killing child processes.

4. Docker Deployment¶

4.1 Dockerfile Architecture¶

The Dockerfile uses a multi-stage build for minimal image size:

Stage 1 -- Builder: - Base: python:3.10-slim - Creates a virtualenv at /opt/venv - Installs all dependencies from requirements.txt with --no-cache-dir

Stage 2 -- Runtime: - Base: python:3.10-slim - Installs tini as PID 1 init process (proper signal handling) - Copies virtualenv from builder stage - Creates non-root user autouser - Copies application code: config/, src/, api/, app/, scripts/, data/, .streamlit/ - Sets ownership to autouser - Exposes ports 8531 and 8532 - Default CMD: Streamlit UI

# Key Dockerfile lines
FROM python:3.10-slim AS builder
# ... builds /opt/venv ...

FROM python:3.10-slim AS runtime
RUN apt-get update && apt-get install -y --no-install-recommends tini
COPY --from=builder /opt/venv /opt/venv
RUN useradd -m -s /bin/bash autouser
USER autouser
ENTRYPOINT ["tini", "--"]
CMD ["streamlit", "run", "app/autoimmune_ui.py", \
     "--server.port=8531", "--server.address=0.0.0.0", "--server.headless=true"]

4.2 Building the Image¶

cd /home/adam/projects/hcls-ai-factory/ai_agent_adds/precision_autoimmune_agent

# Build with default tag
docker build -t autoimmune-agent:latest .

# Build with version tag
docker build -t autoimmune-agent:1.0.0 .

# Verify image size
docker images autoimmune-agent

4.3 Docker Compose Configuration¶

The docker-compose.yml defines three services:

version: "3.8"

services:
  autoimmune-streamlit:
    build: .
    container_name: autoimmune-streamlit
    ports:
      - "8531:8531"
    environment:
      - AUTO_MILVUS_HOST=milvus-standalone
      - AUTO_MILVUS_PORT=19530
      - AUTO_STREAMLIT_PORT=8531
      - AUTO_API_PORT=8532
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
    volumes:
      - ./demo_data:/app/demo_data:ro
      - ./data:/app/data
    networks:
      - hcls-network
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "python", "-c",
             "import urllib.request; urllib.request.urlopen('http://localhost:8531/_stcore/health')"]
      interval: 30s
      timeout: 5s
      start_period: 60s
      retries: 3

  autoimmune-api:
    build: .
    container_name: autoimmune-api
    command: ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8532", "--workers", "2"]
    ports:
      - "8532:8532"
    environment:
      - AUTO_MILVUS_HOST=milvus-standalone
      - AUTO_MILVUS_PORT=19530
      - AUTO_API_PORT=8532
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
    volumes:
      - ./demo_data:/app/demo_data:ro
      - ./data:/app/data
    networks:
      - hcls-network
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "python", "-c",
             "import urllib.request; urllib.request.urlopen('http://localhost:8532/healthz')"]
      interval: 30s
      timeout: 5s
      start_period: 30s
      retries: 3

  autoimmune-setup:
    build: .
    container_name: autoimmune-setup
    command: ["python", "scripts/setup_collections.py", "--seed"]
    environment:
      - AUTO_MILVUS_HOST=milvus-standalone
      - AUTO_MILVUS_PORT=19530
    networks:
      - hcls-network
    restart: "no"

networks:
  hcls-network:
    external: true

Key design decisions: - demo_data mounted read-only (:ro) -- prevents accidental modification of demo scenarios - data mounted read-write -- for cache, reference data, and ingested documents - hcls-network is external -- must be created before starting, shared with Milvus and other AI Factory agents - autoimmune-setup runs once (restart: "no") -- creates collections and exits - Milvus host is milvus-standalone in Docker (not localhost)

4.4 Starting Services¶

# Ensure the shared network exists
docker network create hcls-network 2>/dev/null || true

# Ensure Milvus is running on hcls-network
docker ps | grep milvus-standalone

# Start all services (setup runs first, then exits)
docker compose up -d

# Watch setup complete
docker logs -f autoimmune-setup

# Once setup exits, verify the long-running services
docker compose ps

4.5 Verifying Deployment¶

# Check service identity
curl -s http://localhost:8532/ | python3 -m json.tool
# Expected: {"service": "Precision Autoimmune Intelligence Agent", "version": "1.0.0", ...}

# Detailed health check
curl -s http://localhost:8532/health | python3 -m json.tool
# Expected: milvus_connected=true, collections=14, embedder_loaded=true, llm_available=true

# Lightweight probe
curl -s http://localhost:8532/healthz
# Expected: {"status": "ok"}

# Streamlit health
curl -s http://localhost:8531/_stcore/health
# Expected: "ok"

# List collections and vector counts
curl -s http://localhost:8532/collections | python3 -m json.tool

# Test a query
curl -s -X POST http://localhost:8532/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the early biomarkers for lupus flare?"}' | python3 -m json.tool

# Check Prometheus metrics
curl -s http://localhost:8532/metrics

5. DGX Spark Deployment¶

5.1 Hardware Specifications¶

The NVIDIA DGX Spark provides:

Resource	Specification
GPU	GB10 (NVIDIA Blackwell architecture)
Memory	128 GB unified LPDDR5x (shared CPU/GPU)
CPU	20 ARM cores (NVIDIA Grace)
Interconnect	NVLink-C2C
Storage	NVMe SSD
Price	$3,999

The unified memory architecture eliminates PCIe bottlenecks for embedding model inference and allows Milvus to leverage the full 128 GB for large collection indexes.

5.2 DGX Spark-Specific Configuration¶

The autoimmune agent integrates with the main AI Factory stack via docker-compose.dgx-spark.yml. In this deployment model, the agent runs as a single container (port 8000 internally, mapped to 8105/8106 externally) rather than the standalone 3-service topology.

From the main compose file:

precision-autoimmune-agent:
  build:
    context: ./ai_agent_adds/precision_autoimmune_agent
    dockerfile: Dockerfile
  restart: unless-stopped
  ports:
    - "8105:8531"
    - "8106:8000"
  environment:
    <<: *common-env
    AGENT_NAME: precision-autoimmune-agent
  depends_on:
    milvus:
      condition: service_healthy

The *common-env anchor provides:

MILVUS_HOST: milvus
MILVUS_PORT: "19530"
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
EMBEDDING_MODEL: BAAI/bge-small-en-v1.5
LOG_LEVEL: INFO

5.3 GPU Memory Considerations¶

On the DGX Spark, all services share 128 GB of unified memory:

Component	Estimated Memory
Milvus (14 collections, IVF_FLAT indexes)	4--8 GB
BGE-small-en-v1.5 embedding model	~500 MB
Streamlit UI process	~200 MB
FastAPI + 2 uvicorn workers	~600 MB
Sentence-transformers runtime	~1 GB
Total autoimmune agent	~6--10 GB

This leaves ample headroom for the other 4 intelligence agents, the drug discovery pipeline, and Milvus itself.

5.4 Integration with Main docker-compose.dgx-spark.yml¶

To deploy as part of the full AI Factory stack:

cd /home/adam/projects/hcls-ai-factory

# Set API key
export ANTHROPIC_API_KEY=sk-ant-api03-your-key-here

# Start the full stack (includes Milvus, all 11 agents, monitoring, landing page)
docker compose -f docker-compose.dgx-spark.yml up -d

# Verify the autoimmune agent is healthy
curl -s http://localhost:8106/health | python3 -m json.tool

The landing page at http://localhost:8080 provides a unified health dashboard showing all agents, including the autoimmune agent's status.

6. Environment Variables¶

All agent-specific variables use the AUTO_ prefix and are managed by Pydantic Settings (config/settings.py). They can be set via environment variables, .env file, or Docker Compose environment blocks.

Connection¶

Variable	Default	Description
`ANTHROPIC_API_KEY`	(empty)	Anthropic API key for Claude. Required for LLM features. No `AUTO_` prefix.
`AUTO_MILVUS_HOST`	`localhost`	Milvus server hostname. Use `milvus-standalone` in Docker.
`AUTO_MILVUS_PORT`	`19530`	Milvus gRPC port.
`EMBEDDING_MODEL`	`BAAI/bge-small-en-v1.5`	Sentence-transformers model name. No `AUTO_` prefix.

Ports¶

Variable	Default	Description
`AUTO_STREAMLIT_PORT`	`8531`	Streamlit UI listen port.
`AUTO_API_PORT`	`8532`	FastAPI listen port.

LLM¶

Variable	Default	Description
`AUTO_LLM_MODEL`	`claude-sonnet-4-6`	Claude model ID for synthesis.
`AUTO_LLM_MAX_TOKENS`	`4096`	Maximum tokens in LLM response.
`AUTO_LLM_TEMPERATURE`	`0.2`	LLM temperature (lower = more deterministic).
`AUTO_LLM_MAX_RETRIES`	`3`	Retry count for failed LLM calls.

RAG Parameters¶

Variable	Default	Description
`AUTO_TOP_K_PER_COLLECTION`	`5`	Number of top results retrieved per collection.
`AUTO_SCORE_THRESHOLD`	`0.40`	Minimum cosine similarity score to include a result.
`AUTO_MAX_EVIDENCE_ITEMS`	`30`	Maximum total evidence items across all collections.
`AUTO_CONVERSATION_MEMORY_SIZE`	`3`	Number of prior conversation turns retained for context.

Security¶

Variable	Default	Description
`AUTO_API_KEY`	(empty)	API key for endpoint authentication. Empty = no auth required.
`AUTO_CORS_ORIGINS`	`http://localhost:8080,http://localhost:8531`	Comma-separated allowed CORS origins.
`AUTO_MAX_REQUEST_SIZE_MB`	`50`	Maximum upload size for PDF ingestion.

Thresholds¶

Variable	Default	Description
`AUTO_CITATION_HIGH`	`0.80`	Cosine score threshold for high-confidence citations.
`AUTO_CITATION_MEDIUM`	`0.60`	Cosine score threshold for medium-confidence citations.
`AUTO_FLARE_RISK_IMMINENT`	`0.8`	Flare risk score threshold: imminent.
`AUTO_FLARE_RISK_HIGH`	`0.6`	Flare risk score threshold: high.
`AUTO_FLARE_RISK_MODERATE`	`0.4`	Flare risk score threshold: moderate.

Timeouts¶

Variable	Default	Description
`AUTO_REQUEST_TIMEOUT_SECONDS`	`60`	Overall request timeout.
`AUTO_MILVUS_TIMEOUT_SECONDS`	`10`	Milvus connection/query timeout.

Streaming and Metrics¶

Variable	Default	Description
`AUTO_STREAMING_ENABLED`	`True`	Enable SSE streaming for `/query/stream` endpoint.
`AUTO_METRICS_ENABLED`	`True`	Enable Prometheus metrics at `/metrics`.

Logging¶

Variable	Default	Description
`AUTO_LOG_LEVEL`	`INFO`	Log level: DEBUG, INFO, WARNING, ERROR.
`AUTO_LOG_DIR`	`<project_root>/logs`	Directory for log files.

Document Processing¶

Variable	Default	Description
`AUTO_MAX_CHUNK_SIZE`	`2500`	Maximum characters per text chunk when processing PDFs.
`AUTO_CHUNK_OVERLAP`	`200`	Character overlap between consecutive chunks.
`AUTO_PDF_DPI`	`200`	DPI for PDF rendering during OCR.

7. Milvus Configuration¶

7.1 Collection Initialization¶

The agent manages 14 vector collections, each with a domain-specific schema:

#	Collection Name	Description
1	`autoimmune_clinical_documents`	Ingested patient records (PDFs)
2	`autoimmune_patient_labs`	Lab results with flag analysis
3	`autoimmune_autoantibody_panels`	Autoantibody test result panels
4	`autoimmune_hla_associations`	HLA allele to disease risk mapping
5	`autoimmune_disease_criteria`	ACR/EULAR classification criteria
6	`autoimmune_disease_activity`	Activity scoring (DAS28, SLEDAI, BASDAI, etc.)
7	`autoimmune_flare_patterns`	Flare prediction biomarker patterns
8	`autoimmune_biologic_therapies`	Biologic drug database with PGx
9	`autoimmune_pgx_rules`	Pharmacogenomic dosing rules
10	`autoimmune_clinical_trials`	Autoimmune clinical trials
11	`autoimmune_literature`	Published literature and research
12	`autoimmune_patient_timelines`	Patient diagnostic timeline events
13	`autoimmune_cross_disease`	Cross-disease overlap syndromes
14	`genomic_evidence`	Shared read-only (from genomics pipeline)

The genomic_evidence collection is shared across all 5 intelligence agents and is never created or dropped by the autoimmune agent. It is accessed read-only if it already exists.

7.2 Index Parameters¶

All collections use the same index and search configuration:

INDEX_PARAMS = {
    "metric_type": "COSINE",
    "index_type": "IVF_FLAT",
    "params": {"nlist": 1024},
}
SEARCH_PARAMS = {
    "metric_type": "COSINE",
    "params": {"nprobe": 16},
}

Parameter	Value	Rationale
`metric_type`	`COSINE`	Standard for normalized text embeddings from BGE models
`index_type`	`IVF_FLAT`	Good recall/speed balance for collections under 1M vectors
`nlist`	`1024`	Number of Voronoi cells for IVF partitioning
`nprobe`	`16`	Cells searched at query time (higher = better recall, slower)
`embedding_dim`	`384`	Output dimension of `BAAI/bge-small-en-v1.5`

7.3 Connection Settings¶

The collection manager uses a named Milvus connection alias (autoimmune_agent) with automatic reconnection:

connections.connect(alias="autoimmune_agent", host=settings.MILVUS_HOST, port=settings.MILVUS_PORT)

The API startup includes a 2-attempt retry loop with a 2-second delay between attempts. If Milvus is unavailable, the agent starts in degraded mode with vector search disabled.

7.4 Shared Collections¶

The genomic_evidence collection is populated by the genomics pipeline (stage 1 of the AI Factory). The autoimmune agent reads from it with a low weight (0.02) in the multi-collection RAG search. It is weighted low because genomic variants are supplementary to the autoimmune-specific clinical data.

Collection search weights (sum to 1.0):

Collection	Weight
`clinical_documents`	0.18
`patient_labs`	0.14
`autoantibody_panels`	0.12
`hla_associations`	0.08
`disease_criteria`	0.08
`disease_activity`	0.07
`flare_patterns`	0.06
`biologic_therapies`	0.06
`clinical_trials`	0.05
`literature`	0.05
`pgx_rules`	0.04
`patient_timelines`	0.03
`cross_disease`	0.02
`genomic_evidence`	0.02

8. Health Checks and Monitoring¶

8.1 Health Endpoints¶

GET / -- Service identity:

{
  "service": "Precision Autoimmune Intelligence Agent",
  "version": "1.0.0",
  "status": "running",
  "ports": {"api": 8532, "ui": 8531}
}

GET /health -- Detailed health (checks Milvus, collections, embedder, LLM):

{
  "status": "healthy",
  "service": "autoimmune-agent",
  "milvus_connected": true,
  "collections": 14,
  "total_vectors": 1247,
  "embedder_loaded": true,
  "llm_available": true,
  "uptime_seconds": 3600
}

GET /healthz -- Lightweight probe (for orchestrators and landing page):

{"status": "ok"}

Streamlit health (GET /_stcore/health): Returns "ok" as plain text.

GET /metrics -- Prometheus-compatible metrics (text format):

# HELP autoimmune_agent_up Whether the agent is running
# TYPE autoimmune_agent_up gauge
autoimmune_agent_up 1
# HELP autoimmune_collection_vectors Number of vectors per collection
# TYPE autoimmune_collection_vectors gauge
autoimmune_collection_vectors{collection="autoimmune_clinical_documents"} 342
autoimmune_collection_vectors{collection="autoimmune_hla_associations"} 87
...
# HELP autoimmune_agent_uptime_seconds Agent uptime
# TYPE autoimmune_agent_uptime_seconds gauge
autoimmune_agent_uptime_seconds 3600

8.2 Docker Health Checks¶

Both long-running containers include health checks:

Streamlit container:

healthcheck:
  test: ["CMD", "python", "-c",
         "import urllib.request; urllib.request.urlopen('http://localhost:8531/_stcore/health')"]
  interval: 30s
  timeout: 5s
  start_period: 60s    # Streamlit needs more startup time
  retries: 3

API container:

healthcheck:
  test: ["CMD", "python", "-c",
         "import urllib.request; urllib.request.urlopen('http://localhost:8532/healthz')"]
  interval: 30s
  timeout: 5s
  start_period: 30s
  retries: 3

8.3 Prometheus Metrics¶

The /metrics endpoint exposes:

Metric	Type	Description
`autoimmune_agent_up`	Gauge	1 if agent is running
`autoimmune_collection_vectors`	Gauge	Vector count per collection (labeled)
`autoimmune_agent_uptime_seconds`	Gauge	Agent uptime in seconds

Prometheus scrape config (add to prometheus.yml):

scrape_configs:
  - job_name: 'autoimmune-agent'
    scrape_interval: 30s
    static_configs:
      - targets: ['autoimmune-api:8532']
    metrics_path: /metrics

8.4 Grafana Dashboard Integration¶

The AI Factory monitoring stack includes Grafana at port 3000. To add the autoimmune agent:

Ensure Prometheus is scraping the /metrics endpoint (see above).
Import or create a dashboard with panels for:
Agent uptime
Collection vector counts (bar chart by collection)
Health status (using the /health endpoint via Grafana HTTP datasource)
Set alert rules for:
autoimmune_agent_up == 0 (agent down)
autoimmune_collection_vectors{collection="autoimmune_clinical_documents"} == 0 (empty clinical collection)

9. Security Configuration¶

9.1 API Key Authentication¶

When AUTO_API_KEY is set, all endpoints except /, /health, /healthz, and /metrics require authentication via:

Header: X-API-Key: <your-key>
Query parameter: ?api_key=<your-key>

# Set the API key
export AUTO_API_KEY="your-secure-api-key-here"

# Authenticated request
curl -X POST http://localhost:8532/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-secure-api-key-here" \
  -d '{"question": "What biologics are indicated for lupus?"}'

# Without key (returns 401)
curl -X POST http://localhost:8532/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What biologics are indicated for lupus?"}'
# {"detail": "Invalid or missing API key"}

9.2 CORS Configuration¶

CORS origins are controlled by AUTO_CORS_ORIGINS (comma-separated):

# Default: allows landing page and Streamlit
AUTO_CORS_ORIGINS=http://localhost:8080,http://localhost:8531

# Production: restrict to specific domains
AUTO_CORS_ORIGINS=https://aifactory.example.com,https://portal.example.com

The CORS middleware allows all methods and headers (allow_methods=["*"], allow_headers=["*"]) with credentials enabled.

9.3 Non-Root Docker User¶

The Dockerfile creates and switches to a non-root user:

RUN useradd -m -s /bin/bash autouser
# ... copy application files ...
RUN chown -R autouser:autouser /app
USER autouser

This ensures the container process cannot modify system files or escalate privileges. The application runs entirely within /app owned by autouser.

9.4 Secret Management¶

Current approach: API keys are passed via environment variables or .env file.

Production recommendations:

Never commit .env files to version control. The .env.example template uses placeholder values.
Use Docker secrets or a secrets manager (HashiCorp Vault, AWS Secrets Manager) for ANTHROPIC_API_KEY.
Rotate AUTO_API_KEY periodically.
The ANTHROPIC_API_KEY is validated at startup; if missing, the agent logs a warning and enters degraded mode rather than failing.

Request size limiting: The AUTO_MAX_REQUEST_SIZE_MB setting (default: 50 MB) rejects uploads exceeding the limit with HTTP 413.

10. Production Hardening¶

10.1 Resource Limits¶

Add resource constraints to the Docker Compose services:

autoimmune-api:
  # ... existing config ...
  deploy:
    resources:
      limits:
        memory: 4G
        cpus: "4.0"
      reservations:
        memory: 2G
        cpus: "1.0"

autoimmune-streamlit:
  deploy:
    resources:
      limits:
        memory: 2G
        cpus: "2.0"
      reservations:
        memory: 1G
        cpus: "0.5"

10.2 Restart Policies¶

The Docker Compose file already uses unless-stopped for long-running services and "no" for the one-shot setup container. For production, consider:

autoimmune-api:
  restart: unless-stopped
  # Alternatively, for Kubernetes-like behavior:
  # restart: on-failure
  # deploy:
  #   restart_policy:
  #     condition: on-failure
  #     delay: 5s
  #     max_attempts: 10
  #     window: 120s

10.3 Log Management¶

Loguru configuration (from config/logging.py):

Setting	Value
Console output	`stderr` with color
File output	`logs/autoimmune-agent.log` (API uses `autoimmune-api.log`)
Rotation	10 MB per file
Retention	5 rotated files
Thread safety	`enqueue=True`
Encoding	UTF-8

Docker log driver -- add to compose for centralized logging:

autoimmune-api:
  logging:
    driver: "json-file"
    options:
      max-size: "10m"
      max-file: "5"

Log level tuning -- reduce verbosity in production:

AUTO_LOG_LEVEL=WARNING

10.4 Backup Strategy¶

Milvus data:

# Milvus data is stored in the milvus_data Docker volume
docker run --rm -v milvus_data:/data -v $(pwd)/backups:/backup \
  alpine tar czf /backup/milvus-$(date +%Y%m%d).tar.gz -C /data .

Application data:

# Back up ingested documents and cache
tar czf backups/autoimmune-data-$(date +%Y%m%d).tar.gz \
  data/ demo_data/ logs/

Collection recreation: Collections can be fully rebuilt from seed data and demo patients:

./run.sh --setup
curl -X POST http://localhost:8532/ingest/demo-data

10.5 Performance Tuning¶

Embedding batch size -- increase for bulk ingestion:

AUTO_EMBEDDING_BATCH_SIZE=64  # default: 32

Milvus search parallelism -- the search_all method uses ThreadPoolExecutor with max_workers=6 by default. For higher throughput:

# In rag_engine.py or via settings
results = cm.search_all(query_embedding, max_workers=10)

Uvicorn workers -- the API defaults to 2 workers. Increase for higher concurrency:

autoimmune-api:
  command: ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8532", "--workers", "4"]

Top-K tuning -- reduce AUTO_TOP_K_PER_COLLECTION to 3 for faster responses at the cost of recall, or increase to 10 for maximum evidence coverage.

IVF_FLAT nprobe -- the current nprobe=16 balances speed and recall. For latency-critical deployments, reduce to 8. For maximum recall, increase to 32 or 64.

11. Scaling and High Availability¶

11.1 Uvicorn Workers¶

The API runs 2 uvicorn workers by default. Each worker loads its own copy of the embedding model and Milvus connection. On the DGX Spark (20 cores), you can safely run 4--6 workers:

# Docker override
docker compose exec autoimmune-api \
  uvicorn api.main:app --host 0.0.0.0 --port 8532 --workers 4

# Or modify docker-compose.yml command
command: ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8532", "--workers", "4"]

Memory impact: each additional worker adds approximately 500 MB (embedding model + Milvus client + application state).

11.2 Milvus Scaling¶

For larger deployments (>1M vectors per collection), consider:

Milvus Cluster mode -- replaces standalone with distributed components (query node, data node, index node).
HNSW index -- replace IVF_FLAT with HNSW for better recall at high vector counts:

INDEX_PARAMS = {
    "metric_type": "COSINE",
    "index_type": "HNSW",
    "params": {"M": 16, "efConstruction": 256},
}
SEARCH_PARAMS = {
    "metric_type": "COSINE",
    "params": {"ef": 64},
}

Collection partitioning -- partition autoimmune_clinical_documents by patient_id for faster patient-scoped queries.

11.3 Load Balancing Considerations¶

For multi-instance deployments:

Stateless API -- the FastAPI server is stateless (no session affinity required). Place behind a reverse proxy (NGINX, Traefik) or cloud load balancer.
Milvus connection pooling -- each API instance maintains its own Milvus connection via the autoimmune_agent alias. No connection pool sharing is needed.
Streamlit -- Streamlit maintains WebSocket connections for UI interactivity. If scaling horizontally, use sticky sessions or deploy one Streamlit instance per user group.

# NGINX example
upstream autoimmune_api {
    server autoimmune-api-1:8532;
    server autoimmune-api-2:8532;
}

server {
    listen 8532;
    location / {
        proxy_pass http://autoimmune_api;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

12. Troubleshooting¶

12.1 Milvus Connection Refused¶

Symptom: MilvusException: connection refused at startup.

Solution:

# Verify Milvus is running
docker ps | grep milvus
curl -s http://localhost:9091/healthz

# Check AUTO_MILVUS_HOST is correct
# Local: localhost | Docker: milvus-standalone
echo $AUTO_MILVUS_HOST

# Verify network connectivity (from inside container)
docker exec autoimmune-api python -c "
from pymilvus import connections
connections.connect(host='milvus-standalone', port=19530)
print('Connected')
"

12.2 Embedding Model Download Fails¶

Symptom: OSError: Can't load tokenizer for 'BAAI/bge-small-en-v1.5'.

Solution:

# Pre-download the model
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('BAAI/bge-small-en-v1.5')"

# Or set a custom cache directory
export SENTENCE_TRANSFORMERS_HOME=/app/data/models
export HF_HOME=/app/data/huggingface

12.3 ANTHROPIC_API_KEY Not Detected¶

Symptom: LLM features will be unavailable (demo mode) in logs.

Solution:

# Verify the key is set
echo $ANTHROPIC_API_KEY | head -c 20

# In Docker, check it's passed through
docker exec autoimmune-api env | grep ANTHROPIC

# Common issue: .env file not loaded in Docker
# Fix: pass explicitly in docker-compose.yml environment block

12.4 Collections Not Created¶

Symptom: /health shows "collections": 0.

Solution:

# Run setup manually
docker exec autoimmune-api python scripts/setup_collections.py --seed

# Or check if setup container ran
docker logs autoimmune-setup

12.5 Streamlit Not Loading¶

Symptom: Browser shows connection refused on port 8531.

Solution:

# Check if Streamlit is actually running
docker logs autoimmune-streamlit

# Common issue: port conflict
lsof -i :8531

# Verify health
curl -s http://localhost:8531/_stcore/health

12.6 PDF Upload Returns 422¶

Symptom: "No text extracted from PDF" error.

Solution: - Ensure the PDF contains selectable text (not scanned images without OCR). - Check AUTO_MAX_REQUEST_SIZE_MB is large enough for the file. - Verify the file has a .pdf extension (other formats return 400).

12.7 Slow Query Response¶

Symptom: Queries take >10 seconds.

Solution:

# Reduce collections searched
curl -X POST http://localhost:8532/query \
  -H "Content-Type: application/json" \
  -d '{"question": "...", "collections_filter": ["autoimmune_clinical_documents", "autoimmune_patient_labs"]}'

# Reduce top_k
AUTO_TOP_K_PER_COLLECTION=3

# Check Milvus index status
docker exec autoimmune-api python -c "
from pymilvus import Collection, connections
connections.connect(host='milvus-standalone', port=19530)
c = Collection('autoimmune_clinical_documents')
print(c.indexes)
"

12.8 Out of Memory¶

Symptom: Container killed by OOM.

Solution:

# Increase memory limit
deploy:
  resources:
    limits:
      memory: 6G

# Or reduce workers
command: ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8532", "--workers", "1"]

12.9 Docker Network Not Found¶

Symptom: network hcls-network declared as external, but could not be found.

Solution:

docker network create hcls-network

12.10 Permission Denied in Container¶

Symptom: PermissionError: [Errno 13] writing to /app/data or /app/logs.

Solution:

# Fix host-side permissions for mounted volumes
chmod -R 777 data/ logs/

# Or match the autouser UID
docker exec autoimmune-api id autouser
# Then: chown -R <uid>:<gid> data/ logs/

12.11 Collection Weight Mismatch Warning¶

Symptom: Collection weights sum to X.XXX, expected ~1.0 in logs.

Solution: This is a warning, not an error. If you override individual AUTO_WEIGHT_* variables, ensure all 14 weights sum to approximately 1.0. The tolerance is 0.05.

12.12 CORS Errors in Browser¶

Symptom: Access-Control-Allow-Origin errors in browser console.

Solution:

# Add your frontend origin
AUTO_CORS_ORIGINS=http://localhost:8080,http://localhost:8531,http://your-frontend:3000

13. Quick Reference¶

Command Cheat Sheet¶

# ── Local Development ──
./run.sh              # Start Streamlit UI (port 8531)
./run.sh --api        # Start FastAPI (port 8532)
./run.sh --both       # Start both UI and API
./run.sh --setup      # Create collections + seed knowledge

# ── Docker ──
docker build -t autoimmune-agent:latest .
docker network create hcls-network
docker compose up -d
docker compose down
docker compose logs -f autoimmune-api
docker compose ps

# ── Collection Management ──
python scripts/setup_collections.py --seed
python scripts/setup_collections.py --drop-existing --seed
python scripts/setup_collections.py --host milvus-standalone --port 19530

# ── Health Checks ──
curl -s http://localhost:8532/
curl -s http://localhost:8532/health
curl -s http://localhost:8532/healthz
curl -s http://localhost:8532/metrics
curl -s http://localhost:8531/_stcore/health

# ── API Queries ──
curl -s -X POST http://localhost:8532/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What are early warning signs of a lupus flare?"}'

curl -s -X POST http://localhost:8532/search \
  -H "Content-Type: application/json" \
  -d '{"question": "rituximab pharmacogenomics"}'

curl -s http://localhost:8532/collections

# ── Full Stack (DGX Spark) ──
docker compose -f docker-compose.dgx-spark.yml up -d
docker compose -f docker-compose.dgx-spark.yml down

Port Map¶

Port	Service	Health Check
8531	Streamlit UI	`/_stcore/health`
8532	FastAPI API	`/healthz`
19530	Milvus gRPC	--
9091	Milvus Proxy	`/healthz`

Health Check URLs¶

URL	Purpose	Expected Response
`http://localhost:8532/`	Service identity	`{"service": "Precision Autoimmune Intelligence Agent", ...}`
`http://localhost:8532/health`	Full health	`{"status": "healthy", "milvus_connected": true, ...}`
`http://localhost:8532/healthz`	Lightweight probe	`{"status": "ok"}`
`http://localhost:8532/metrics`	Prometheus metrics	Text-format metrics
`http://localhost:8531/_stcore/health`	Streamlit probe	`ok`

API Endpoints¶

Method	Path	Description
`GET`	`/`	Service identity
`GET`	`/health`	Detailed health check
`GET`	`/healthz`	Lightweight probe
`GET`	`/metrics`	Prometheus metrics
`POST`	`/query`	RAG query (retrieve + synthesize)
`POST`	`/query/stream`	Streaming RAG query (SSE)
`POST`	`/search`	Evidence-only search (no LLM)
`POST`	`/analyze`	Full patient analysis pipeline
`POST`	`/differential`	Differential diagnosis
`POST`	`/ingest/upload`	Upload and ingest a PDF
`POST`	`/ingest/demo-data`	Ingest all demo patients
`GET`	`/collections`	List collections with stats
`POST`	`/collections/create`	Create/recreate collections
`POST`	`/export`	Export report (markdown/FHIR/PDF)

Key File Paths¶

precision_autoimmune_agent/
  config/
    settings.py          # All AUTO_* configuration (Pydantic Settings)
    logging.py           # Loguru dual-sink setup
  src/
    collections.py       # 14 collection schemas + Milvus manager
    rag_engine.py        # Multi-collection RAG with weighted search
    agent.py             # Core autoimmune analysis agent
    diagnostic_engine.py # Differential diagnosis + criteria scoring
    knowledge.py         # Static knowledge base (HLA, antibodies, drugs)
    models.py            # Pydantic data models
    document_processor.py # PDF ingestion pipeline
    timeline_builder.py  # Patient diagnostic odyssey timeline
    export.py            # Report export (Markdown, FHIR R4, PDF)
  api/
    main.py              # FastAPI application with all endpoints
  app/
    autoimmune_ui.py     # Streamlit interactive UI
  scripts/
    setup_collections.py # Collection creation + knowledge seeding
  run.sh                 # Multi-mode startup script
  Dockerfile             # Multi-stage build
  docker-compose.yml     # 3-service compose
  requirements.txt       # Python dependencies
  .env.example           # Environment variable template
  logs/                  # Loguru output (autoimmune-agent.log)
  data/                  # Cache, reference data, ingested docs
  demo_data/             # Demo patient scenarios (read-only in Docker)