Cardiology Intelligence Agent -- Deployment Guide¶

Version: 1.0.0 Author: Adam Jones Date: March 2026 License: Apache 2.0

Table of Contents¶

Prerequisites
Docker Compose Deployment
Manual Deployment
Milvus Tuning
Port Map
Environment Configuration
Security Checklist
Monitoring with Prometheus
Backup and Recovery
Scaling Considerations
Updating and Maintenance

1. Prerequisites¶

Hardware Requirements¶

Component	Minimum	Recommended (DGX Spark)
CPU	4 cores	72 ARM cores (Grace)
RAM	16 GB	128 GB LPDDR5X
Storage	50 GB SSD	1 TB NVMe SSD
GPU	None (CPU inference)	NVIDIA Blackwell (128 GB)

Software Requirements¶

Software	Version	Purpose
Docker	24.0+	Container runtime
Docker Compose	2.20+	Multi-service orchestration
Python	3.12+	Manual setup only
pip	24.0+	Manual setup only

Network Requirements¶

Port	Direction	Purpose
8126	Inbound	FastAPI REST API
8536	Inbound	Streamlit UI
19530	Internal	Milvus gRPC
9091	Internal	Milvus health
9000/9001	Internal	MinIO API/Console
2379	Internal	etcd
443	Outbound	Anthropic API (Claude)
443	Outbound	PubMed, ClinicalTrials.gov (ingest)

API Keys¶

Key	Required	Purpose
`ANTHROPIC_API_KEY`	Yes (for LLM synthesis)	Claude Sonnet 4.6 access
`NCBI_API_KEY`	Optional	PubMed ingest (higher rate limits)

2. Docker Compose Deployment¶

2.1 Quick Start¶

# Clone the repository
cd ai_agent_adds/cardiology_intelligence_agent

# Configure environment
cp .env.example .env
# Edit .env:
#   CARDIO_ANTHROPIC_API_KEY=sk-ant-...
#   CARDIO_NCBI_API_KEY=... (optional)

# Start all services
docker compose up -d

# Watch setup logs
docker compose logs -f cardio-setup

# Verify all services are running
docker compose ps

2.2 Service Architecture¶

The docker-compose.yml defines 6 services:

services:
  etcd:        # Milvus metadata store
  minio:       # Milvus object storage
  milvus:      # Vector database
  cardio-setup: # One-shot: create collections + seed data
  cardio-api:  # FastAPI server
  cardio-ui:   # Streamlit UI

2.3 Service Startup Order¶

etcd starts first (no dependencies)
minio starts first (no dependencies)
milvus starts after etcd + minio are healthy
cardio-setup starts after milvus is healthy; runs once and exits
cardio-api starts after cardio-setup completes
cardio-ui starts after cardio-api is healthy

2.4 Verifying Deployment¶

# Check all services
docker compose ps

# Expected output:
# NAME               STATUS     PORTS
# etcd               running    2379
# minio              running    9000, 9001
# milvus             running    19530, 9091
# cardio-setup       exited(0)  -
# cardio-api         running    8126
# cardio-ui          running    8536

# Health checks
curl http://localhost:8126/health
curl http://localhost:9091/healthz

# Open UI
open http://localhost:8536

2.5 Stopping and Restarting¶

# Stop all services (preserves data)
docker compose stop

# Start again
docker compose start

# Full teardown (removes containers, preserves volumes)
docker compose down

# Full teardown including data volumes
docker compose down -v

3. Manual Deployment¶

3.1 Milvus Setup¶

The Cardiology Intelligence Agent requires a running Milvus 2.4 instance. If you already have Milvus running (e.g., from the HCLS AI Factory), skip this step.

# Option 1: Use existing HCLS AI Factory Milvus
# Set CARDIO_MILVUS_HOST and CARDIO_MILVUS_PORT in .env

# Option 2: Start standalone Milvus
docker run -d --name milvus \
  -p 19530:19530 \
  -p 9091:9091 \
  -v milvus_data:/var/lib/milvus \
  milvusdb/milvus:v2.4-latest \
  milvus run standalone

3.2 Python Environment¶

cd ai_agent_adds/cardiology_intelligence_agent

# Create virtual environment
python3.12 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your settings

3.3 Collection Setup¶

# Create all 12 cardiology collections
python scripts/setup_collections.py --drop-existing --seed

# Seed knowledge graph data
python scripts/seed_knowledge.py

# Optional: Run live data ingest
python scripts/run_ingest.py

3.4 Start Services¶

# Terminal 1: FastAPI server
uvicorn api.main:app --host 0.0.0.0 --port 8126 --workers 2

# Terminal 2: Streamlit UI
streamlit run app/cardio_ui.py --server.port 8536

4. Milvus Tuning¶

4.1 Index Configuration¶

All collections use IVF_FLAT with COSINE similarity. This provides high recall suitable for clinical queries at acceptable latency on DGX Spark.

# Default index parameters (in src/collections.py)
index_params = {
    "index_type": "IVF_FLAT",
    "metric_type": "COSINE",
    "params": {"nlist": 128}
}

4.2 Search Parameters¶

# Default search parameters
search_params = {
    "metric_type": "COSINE",
    "params": {"nprobe": 16}
}

4.3 Performance Tuning¶

Parameter	Default	Tuning Guidance
`nlist`	128	Increase for larger collections (256 for >100K vectors)
`nprobe`	16	Increase for better recall (32 for clinical queries) at cost of latency
`top_k`	5	Per-collection; increase if coverage seems incomplete
`score_threshold`	0.4	Lower for broader results; raise for precision

4.4 Memory Configuration¶

For DGX Spark (128GB RAM), configure Milvus for in-memory operation:

# In milvus.yaml or docker-compose environment
MILVUS_QUERYNODE_CACHE_SIZE: 32GB
MILVUS_INDEXNODE_MEMORY_LIMIT: 16GB

4.5 Collection Compaction¶

Run periodic compaction to optimize storage:

from pymilvus import utility
utility.compact("cardio_literature")
utility.wait_for_compaction("cardio_literature")

5. Port Map¶

Service	Port	Protocol	Exposure
FastAPI API	8126	HTTP	External (client-facing)
Streamlit UI	8536	HTTP	External (client-facing)
Milvus gRPC	19530	gRPC	Internal only
Milvus Health	9091	HTTP	Internal only
MinIO API	9000	HTTP	Internal only
MinIO Console	9001	HTTP	Internal (admin)
etcd	2379	HTTP	Internal only
Prometheus metrics	8126/metrics	HTTP	Internal (monitoring)

Port Conflict Resolution¶

If ports conflict with existing services:

# Override via environment variables
export CARDIO_API_PORT=8127
export CARDIO_STREAMLIT_PORT=8528

# Or in docker-compose override
# docker-compose.override.yml
services:
  cardio-api:
    ports:
      - "8127:8126"
  cardio-ui:
    ports:
      - "8528:8536"

Standalone Docker Compose (Offset Ports)¶

When running the Cardiology Intelligence Agent standalone (outside the main HCLS AI Factory stack), the standalone docker-compose.yml uses offset ports to avoid conflicts:

Service	Standalone Port	Purpose
Milvus gRPC	29530	Offset from main Milvus (19530)
Milvus Health	29091	Offset from main Milvus health (9091)

The agent is also integrated into the top-level docker-compose.dgx-spark.yml and the landing page health monitor for production deployments.

6. Environment Configuration¶

All settings use the CARDIO_ prefix and are managed via Pydantic BaseSettings:

Required Settings¶

# Anthropic API key (required for LLM synthesis)
CARDIO_ANTHROPIC_API_KEY=sk-ant-api03-...

Optional Settings¶

# Milvus connection (defaults shown)
CARDIO_MILVUS_HOST=localhost
CARDIO_MILVUS_PORT=19530

# API server
CARDIO_API_HOST=0.0.0.0
CARDIO_API_PORT=8126

# Streamlit
CARDIO_STREAMLIT_PORT=8536

# Embedding model
CARDIO_EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
CARDIO_EMBEDDING_DIMENSION=384

# LLM
CARDIO_LLM_MODEL=claude-sonnet-4-6

# RAG search
CARDIO_TOP_K_PER_COLLECTION=5
CARDIO_SCORE_THRESHOLD=0.4

# PubMed ingest
CARDIO_NCBI_API_KEY=your-ncbi-key
CARDIO_PUBMED_MAX_RESULTS=5000

# Monitoring
CARDIO_METRICS_ENABLED=true

# Scheduler
CARDIO_INGEST_ENABLED=false
CARDIO_INGEST_SCHEDULE_HOURS=168

# CORS (comma-separated origins)
CARDIO_CORS_ORIGINS=http://localhost:8080,http://localhost:8126,http://localhost:8536

# Citation scoring
CARDIO_CITATION_HIGH_THRESHOLD=0.75
CARDIO_CITATION_MEDIUM_THRESHOLD=0.60

# Conversation
CARDIO_MAX_CONVERSATION_CONTEXT=3

Startup Validation¶

The CardioSettings.validate() method checks configuration at startup and logs warnings for: - Missing Milvus host/port - Missing Anthropic API key (search-only mode) - Missing embedding model - Port conflicts between API and Streamlit - Collection weights not summing to ~1.0 - Non-existent RAG pipeline root directory

7. Security Checklist¶

7.1 API Key Management¶

Store ANTHROPIC_API_KEY in .env file (not in code or docker-compose.yml)
Add .env to .gitignore
Use environment variables or secrets manager in production
Rotate API keys periodically
Set NCBI_API_KEY separately (different rotation schedule)

7.2 Network Security¶

Restrict external access to ports 8126 (API) and 8536 (UI) only
API authentication via X-API-Key header (enabled by default)
Keep Milvus (19530), etcd (2379), MinIO (9000) on internal network only
Use reverse proxy (nginx/traefik) with TLS for production
Enable CORS restrictions for production domains
Rate limit API endpoints (100 req/min per IP -- enabled by default)
API authentication via X-API-Key header

7.3 Data Security¶

Enable Milvus authentication if available
Encrypt MinIO storage at rest
Audit log API access
No PHI/PII should be stored in vector collections
All patient data in queries is transient (not persisted)

7.4 Application Security¶

Set MAX_REQUEST_SIZE_MB to prevent oversized payloads (default: 10)
Validate all input via Pydantic models (automatic)
Review CORS_ORIGINS for production (remove localhost)
Disable debug mode in production

7.5 Container Security¶

Use specific image tags (not latest) in docker-compose.yml
Run containers as non-root user
Limit container resource usage (CPU, memory limits)
Scan images for vulnerabilities
Keep base images updated

8. Monitoring with Prometheus¶

8.1 Metrics Endpoint¶

The FastAPI server exposes Prometheus metrics at:

GET http://localhost:8126/metrics

8.2 Available Metrics¶

Metric	Type	Description
`cardio_queries_total`	Counter	Total queries processed
`cardio_query_duration_seconds`	Histogram	Query processing latency
`cardio_risk_calculations_total`	Counter	Risk calculator invocations (by type)
`cardio_gdmt_optimizations_total`	Counter	GDMT optimization requests
`cardio_collection_search_duration`	Histogram	Per-collection search latency
`cardio_cross_modal_triggers_total`	Counter	Cross-modal triggers fired
`cardio_workflow_executions_total`	Counter	Workflow executions (by type)
`cardio_export_requests_total`	Counter	Export requests (by format)
`cardio_errors_total`	Counter	Error count (by type)

8.3 Prometheus Configuration¶

Add to your Prometheus prometheus.yml:

scrape_configs:
  - job_name: 'cardio-agent'
    scrape_interval: 15s
    static_configs:
      - targets: ['localhost:8126']
    metrics_path: '/metrics'

8.4 Grafana Dashboard¶

Create a Grafana dashboard with panels for: - Query rate (queries per minute) - Query latency (p50, p95, p99) - Risk calculator usage distribution - GDMT optimization rate - Error rate - Collection search latency by collection - Cross-modal trigger rate

8.5 Alerting Rules¶

Suggested Prometheus alerting rules:

groups:
  - name: cardio-agent
    rules:
      - alert: HighErrorRate
        expr: rate(cardio_errors_total[5m]) > 0.1
        for: 5m
        annotations:
          summary: "Cardio agent error rate above 10%"

      - alert: HighQueryLatency
        expr: histogram_quantile(0.95, rate(cardio_query_duration_seconds_bucket[5m])) > 10
        for: 5m
        annotations:
          summary: "Cardio agent p95 latency above 10 seconds"

      - alert: MilvusDown
        expr: up{job="milvus"} == 0
        for: 1m
        annotations:
          summary: "Milvus vector database is down"

9. Backup and Recovery¶

9.1 Milvus Data Backup¶

# Export collection data
python -c "
from pymilvus import utility
collections = utility.list_collections()
for col in collections:
    if col.startswith('cardio_'):
        print(f'Backing up {col}...')
        # Use Milvus backup utility or pymilvus export
"

# Volume-level backup (Docker)
docker run --rm -v milvus_data:/data -v $(pwd)/backup:/backup \
  alpine tar czf /backup/milvus-data-$(date +%Y%m%d).tar.gz /data

9.2 Configuration Backup¶

# Backup configuration and environment
cp .env .env.backup.$(date +%Y%m%d)
cp docker-compose.yml docker-compose.yml.backup.$(date +%Y%m%d)

9.3 Recovery Procedure¶

# 1. Stop services
docker compose down

# 2. Restore Milvus data (if needed)
docker run --rm -v milvus_data:/data -v $(pwd)/backup:/backup \
  alpine tar xzf /backup/milvus-data-YYYYMMDD.tar.gz -C /

# 3. Restore configuration
cp .env.backup.YYYYMMDD .env

# 4. Start services
docker compose up -d

# 5. Verify
curl http://localhost:8126/health

10. Scaling Considerations¶

10.1 Single-Node (DGX Spark)¶

The default deployment is optimized for single-node operation: - Milvus standalone mode - 2 Uvicorn workers for the API - Single Streamlit instance - All services on one machine

10.2 Multi-Worker API¶

To increase API throughput:

# Increase Uvicorn workers
uvicorn api.main:app --host 0.0.0.0 --port 8126 --workers 4

10.3 Milvus Cluster Mode¶

For larger deployments with millions of vectors per collection:

Deploy Milvus in cluster mode (separate query nodes, data nodes, index nodes)
Increase nlist and nprobe parameters
Consider HNSW index for sub-millisecond search at the cost of memory

10.4 Read Replicas¶

For high query volume: - Deploy multiple API server instances behind a load balancer - All instances share the same Milvus cluster - Streamlit can be deployed separately for different user groups

11. Updating and Maintenance¶

11.1 Code Updates¶

# Pull latest code
git pull

# Rebuild and restart
docker compose build
docker compose up -d

11.2 Knowledge Base Updates¶

# Re-run knowledge seed (idempotent)
python scripts/seed_knowledge.py

# Run live ingest for latest data
python scripts/run_ingest.py

11.3 Dependency Updates¶

# Update Python dependencies
pip install -r requirements.txt --upgrade

# Update Docker images
docker compose pull
docker compose up -d

11.4 Milvus Upgrades¶

Follow the Milvus upgrade guide. Key steps: 1. Backup all collection data 2. Stop Milvus 3. Update the Milvus image version in docker-compose.yml 4. Start Milvus (automatic schema migration) 5. Verify collections and vector counts

11.5 Scheduled Ingest¶

To enable automatic weekly data updates:

# Enable in .env
CARDIO_INGEST_ENABLED=true
CARDIO_INGEST_SCHEDULE_HOURS=168  # Weekly

# Restart API server
docker compose restart cardio-api

The scheduler runs all 7 ingest parsers (PubMed, ClinicalTrials.gov, imaging, ECG, guidelines, devices, hemodynamics) on the configured interval.