Precision Biomarker Intelligence Agent -- Project Bible¶

Complete implementation reference for the Precision Biomarker Intelligence Agent, part of the HCLS AI Factory pipeline: Patient DNA -> Drug Candidates.

Version: 1.0.0 Author: Adam Jones Date: March 2026 License: Apache 2.0 Repository: hcls-ai-factory/ai_agent_adds/precision_biomarker_agent

Table of Contents¶

Project Overview
Pipeline Pattern
DGX Spark Hardware
Repository Layout
Docker Compose Services
Milvus Collection Schemas
Pydantic Data Models
Configuration Reference
Embedding Strategy
Pharmacogenomic Engine
Biological Age Engine
Disease Trajectory Engine
Genotype Adjustment Engine
Critical Value Engine
Discordance Detector
Knowledge Graph
RAG Engine
Agent Orchestrator
Export Pipeline
Report Generator
FastAPI REST Server
Streamlit UI
Demo Patients
Cross-Agent Integration
Monitoring and Metrics
Audit Logging
Testing
Dependencies
Quick Start

1. Project Overview¶

The Precision Biomarker Intelligence Agent is a genotype-aware biomarker interpretation platform that transforms standard blood panel results and genomic data into actionable precision health intelligence. It is one of five intelligence agents in the HCLS AI Factory, which delivers a complete Patient DNA to Drug Candidates pipeline in under 5 hours on a single NVIDIA DGX Spark desktop workstation ($3,999).

Core capabilities:

14 pharmacogenes with CPIC-guided phenotyping (CYP2D6, CYP2C19, CYP2C9, CYP3A5, SLCO1B1, VKORC1, MTHFR, TPMT, DPYD)
Biological age estimation using PhenoAge (Levine 2018) and GrimAge surrogate
Disease trajectory prediction across 9 categories (diabetes, cardiovascular, liver, thyroid, iron, nutritional, kidney, bone health, cognitive)
Ashkenazi Jewish carrier screening panel (10 genes)
Genotype-based reference range adjustments for 7 modifier genes
Critical value alerting with escalation targets
Cross-biomarker discordance detection
Longitudinal biomarker trend tracking
12-section clinical report generation with PDF and FHIR R4 export
Multi-language report translation (English, Spanish, Chinese, Hindi, French, Arabic, Portuguese)

Codebase statistics:

Metric	Value
Total source lines	~20,200
Source modules (src/)	16 Python files
API modules (api/)	4 Python files
UI modules (app/)	3 Python files
Scripts	7 Python files
Test files	18 test files
Tests	709, all passing
Milvus collections	14 total
Reference data files	18 JSON files

2. Pipeline Pattern¶

The agent follows the HCLS AI Factory three-stage pipeline:

Stage 1: Genomics Pipeline (Parabricks/DeepVariant)
  FASTQ -> VCF -> genomic_evidence collection (shared, read-only)
      |
Stage 2: RAG + Intelligence Agents
      |
      +-> Biomarker Agent (this project)
      |     Multi-collection RAG + 6 analysis engines
      |     14 Milvus collections (13 owned + 1 shared)
      |
      +-> CAR-T Agent, Oncology Agent, Imaging Agent, Autoimmune Agent
      |
Stage 3: Drug Discovery Pipeline (BioNeMo/DiffDock/RDKit)
  PGx findings -> drug candidate optimization

Agent internal pipeline:

Patient Profile
    |
    v
[1. Plan]  -> Identify topics, disease areas, relevant modules
    |
    v
[2. Analyze] -> Run analysis modules in parallel:
    |   - BiologicalAgeCalculator (PhenoAge/GrimAge)
    |   - DiseaseTrajectoryAnalyzer (9 categories)
    |   - PharmacogenomicMapper (14 pharmacogenes)
    |   - GenotypeAdjuster (7 modifier genes)
    |   - CriticalValueEngine (21 rules)
    |   - DiscordanceDetector (cross-biomarker patterns)
    |   - LabRangeInterpreter (standard vs optimal)
    |
    v
[3. Search] -> Multi-collection RAG across 14 collections
    |
    v
[4. Synthesize] -> LLM-grounded synthesis with citations
    |
    v
[5. Report] -> 12-section clinical report (MD/PDF/FHIR)

3. DGX Spark Hardware¶

Spec	Value
GPU	NVIDIA Grace Blackwell, 128 GB
CPU	ARM-based Grace CPU, 20 cores
System RAM	128 GB unified memory
Storage	4 TB NVMe SSD
CUDA	12.x
Price	$3,999
Power	~250 W (desktop form factor)

The biomarker agent runs alongside Milvus, embedding model, and the other four intelligence agents on this single machine. The embedding model (BGE-small-en-v1.5, 33M parameters) runs on CPU; the LLM (Claude) is called via the Anthropic API.

4. Repository Layout¶

precision_biomarker_agent/
|-- api/                         # FastAPI REST server
|   |-- __init__.py
|   |-- main.py                  (465 lines) Entry point, lifespan, core endpoints
|   +-- routes/
|       |-- __init__.py
|       |-- analysis.py          (495 lines) /v1/analyze, /v1/biological-age, etc.
|       |-- events.py            (326 lines) /v1/events/cross-modal, /biomarker-alert
|       +-- reports.py           (296 lines) /v1/report/generate, /{id}/pdf, /fhir
|
|-- app/                         # Streamlit UI
|   |-- __init__.py
|   |-- biomarker_ui.py          (1,863 lines) 8-tab Streamlit application
|   |-- patient_360.py           (670 lines) Cross-agent Patient 360 dashboard
|   +-- protein_viewer.py        (168 lines) 3D protein structure viewer
|
|-- config/
|   +-- settings.py              (139 lines) PrecisionBiomarkerSettings (Pydantic)
|
|-- data/
|   |-- cache/                   # Runtime cache directory
|   +-- reference/               # 18 JSON seed files
|       |-- biomarker_reference.json
|       |-- biomarker_genetic_variants.json
|       |-- biomarker_pgx_rules.json
|       |-- biomarker_disease_trajectories.json
|       |-- biomarker_clinical_evidence.json
|       |-- biomarker_nutrition.json
|       |-- biomarker_drug_interactions.json
|       |-- biomarker_aging_markers.json
|       |-- biomarker_genotype_adjustments.json
|       |-- biomarker_monitoring.json
|       |-- biomarker_critical_values.json
|       |-- biomarker_discordance_rules.json
|       |-- biomarker_aj_carrier_screening.json
|       |-- biomarker_genomic_evidence.json
|       |-- biomarker_lab_ranges.json
|       |-- biomarker_longitudinal_tracking.json
|       |-- biomarker_sample_patients.json
|       +-- nutrition_guidelines_seed.json
|
|-- src/                         # Core engine modules
|   |-- __init__.py
|   |-- models.py                (786 lines) 14 collection + 8 analysis models + 3 enums
|   |-- collections.py           (1,391 lines) Milvus collection management
|   |-- rag_engine.py            (573 lines) Multi-collection RAG with parallel search
|   |-- agent.py                 (610 lines) Autonomous plan-analyze-search-synthesize
|   |-- knowledge.py             (1,326 lines) 6 disease domain knowledge graphs
|   |-- pharmacogenomics.py      (1,503 lines) CPIC-guided phenotyping, 14 pharmacogenes
|   |-- disease_trajectory.py    (1,421 lines) Pre-symptomatic detection, 9 categories
|   |-- biological_age.py        (408 lines) PhenoAge + GrimAge surrogate
|   |-- genotype_adjustment.py   (1,225 lines) Genotype-based reference range adjustments
|   |-- export.py                (1,392 lines) PDF + FHIR R4 + CSV + JSON + Markdown
|   |-- report_generator.py      (993 lines) 12-section clinical reports
|   |-- critical_values.py       (179 lines) Critical threshold detection (21 rules)
|   |-- discordance_detector.py  (299 lines) Cross-biomarker anomaly detection
|   |-- lab_range_interpreter.py (221 lines) Standard vs optimal range interpretation
|   |-- translation.py           (217 lines) Multi-language medical terminology
|   +-- audit.py                 (83 lines) HIPAA-compliant audit logging
|
|-- scripts/                     # Setup, seeding, and validation
|   |-- setup_collections.py     (57 lines) Create Milvus collections
|   |-- seed_all.py              (207 lines) Seed all 14 collections from JSON
|   |-- gen_patient_data.py      (345 lines) Generate sample patient data
|   |-- gen_critical_values.py   (510 lines) Generate critical value rules
|   |-- gen_lab_ranges_and_aj.py (527 lines) Generate lab ranges + AJ screening data
|   |-- expand_biomarker_reference.py  (580 lines) Expand reference data
|   |-- expand_variants_and_interactions.py (436 lines) Expand variant data
|   |-- demo_validation.py       (333 lines) End-to-end demo validation
|   +-- validate_e2e.py          (178 lines) E2E validation script
|
|-- tests/                       # 18 test files, 709 tests
|   |-- conftest.py              Shared fixtures (mock embedder, LLM, manager)
|   |-- test_models.py
|   |-- test_collections.py
|   |-- test_rag_engine.py
|   |-- test_agent.py
|   |-- test_pharmacogenomics.py
|   |-- test_biological_age.py
|   |-- test_disease_trajectory.py
|   |-- test_genotype_adjustment.py
|   |-- test_critical_values.py
|   |-- test_discordance_detector.py
|   |-- test_lab_range_interpreter.py
|   |-- test_report_generator.py
|   |-- test_export.py
|   |-- test_api.py
|   |-- test_integration.py
|   |-- test_edge_cases.py
|   |-- test_longitudinal.py
|   +-- test_ui.py
|
|-- docker-compose.yml           6-service stack
|-- Dockerfile                   Multi-stage Python 3.10-slim
|-- requirements.txt             Python dependencies
|-- pyproject.toml               Project metadata
+-- README.md                    Quick start guide

5. Docker Compose Services¶

File: docker-compose.yml -- 6 services on biomarker-network bridge.

Service	Image	Port(s)	Purpose
`milvus-etcd`	`quay.io/coreos/etcd:v3.5.5`	(internal)	Milvus metadata store (4 GB quota)
`milvus-minio`	`minio/minio:RELEASE.2023-03-20T20-16-18Z`	(internal)	Milvus object storage (log/index)
`milvus-standalone`	`milvusdb/milvus:v2.4-latest`	19530, 9091	Vector database (gRPC + health)
`biomarker-streamlit`	Built from `./Dockerfile`	8528:8528	Streamlit UI (8 tabs)
`biomarker-api`	Built from `./Dockerfile`	8529:8529	FastAPI REST server (uvicorn, 2 workers)
`biomarker-setup`	Built from `./Dockerfile`	--	One-shot: create collections + seed

DGX Spark external port mapping (from docker-compose.dgx-spark.yml):

Internal Port	External Port	Service
8528	8502	Streamlit UI
8529	8102	FastAPI API

Dockerfile: Multi-stage build on python:3.10-slim. Builder stage compiles native extensions for sentence-transformers/numpy. Runtime stage copies the virtualenv and runs as non-root biomarkeruser. Default CMD launches Streamlit; the API container overrides with uvicorn api.main:app.

Startup sequence:

cp .env.example .env                     # Set ANTHROPIC_API_KEY
docker compose up -d                     # Start all services
docker compose logs -f biomarker-setup   # Watch seed progress

Setup runs: setup_collections.py --drop-existing then seed_all.py, which reads 14 JSON files from data/reference/, embeds text chunks with BGE-small-en-v1.5, and inserts into Milvus.

6. Milvus Collection Schemas¶

14 collections total: 13 biomarker-specific (owned) + 1 shared read-only.

Embedding configuration:

Setting	Value
Model	BAAI/bge-small-en-v1.5
Dimensions	384
Metric	COSINE
Index	IVF_FLAT

6.1 biomarker_reference¶

Reference biomarker definitions with standard and optimal ranges.

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim BGE-small
name	VARCHAR(100)	Display name (e.g., "HbA1c")
unit	VARCHAR(20)	e.g., "%", "mg/dL", "ng/mL"
category	VARCHAR(30)	CBC, CMP, Lipids, Thyroid, etc.
ref_range_min	FLOAT	Standard lower bound
ref_range_max	FLOAT	Standard upper bound
text_chunk	VARCHAR(3000)	Text for embedding
clinical_significance	VARCHAR(2000)	Clinical interpretation
epigenetic_clock	VARCHAR(50)	PhenoAge/GrimAge coefficient
genetic_modifiers	VARCHAR(500)	Comma-separated modifier genes

6.2 biomarker_genetic_variants¶

Genetic variants affecting biomarker levels (MTHFR, APOE, PNPLA3, HFE, etc.).

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
gene	VARCHAR(50)	Gene symbol
rs_id	VARCHAR(20)	dbSNP rsID
risk_allele	VARCHAR(5)	Risk allele
protective_allele	VARCHAR(5)	Protective allele
effect_size	VARCHAR(100)	Effect size description
mechanism	VARCHAR(2000)	Molecular mechanism
disease_associations	VARCHAR(1000)	Disease associations
text_chunk	VARCHAR(3000)	Text for embedding

6.3 biomarker_pgx_rules¶

CPIC pharmacogenomic dosing rules.

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
gene	VARCHAR(50)	Pharmacogene (CYP2D6, etc.)
star_alleles	VARCHAR(100)	Star allele combination
drug	VARCHAR(100)	Drug name
phenotype	VARCHAR(30)	MetabolizerPhenotype enum
cpic_level	VARCHAR(5)	CPIC evidence level (1A-3)
recommendation	VARCHAR(2000)	Dosing recommendation
evidence_url	VARCHAR(500)	CPIC/PharmGKB URL
text_chunk	VARCHAR(3000)	Text for embedding

6.4 biomarker_disease_trajectories¶

Disease progression trajectories with intervention windows.

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
disease	VARCHAR(30)	DiseaseCategory enum
stage	VARCHAR(30)	Progression stage
biomarker_pattern	VARCHAR(2000)	JSON thresholds for this stage
years_to_diagnosis	FLOAT	Estimated years to diagnosis
intervention_window	VARCHAR(500)	Intervention opportunity
risk_reduction_pct	FLOAT	Potential risk reduction %
text_chunk	VARCHAR(3000)	Text for embedding

6.5 biomarker_clinical_evidence¶

Published clinical evidence with PubMed references.

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
pmid	VARCHAR(20)	PubMed ID
title	VARCHAR(500)	Publication title
finding	VARCHAR(3000)	Key finding
year	INT16	Publication year
disease_area	VARCHAR(100)	Disease area
text_chunk	VARCHAR(3000)	Text for embedding

6.6 biomarker_nutrition¶

Genotype-aware nutrition guidelines.

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
nutrient	VARCHAR(100)	Nutrient name
genetic_context	VARCHAR(200)	e.g., "MTHFR C677T heterozygous"
recommended_form	VARCHAR(200)	e.g., "methylfolate"
dose_range	VARCHAR(100)	e.g., "400-800 mcg/day"
evidence_summary	VARCHAR(2000)	Evidence summary
text_chunk	VARCHAR(3000)	Text for embedding

6.7 biomarker_drug_interactions¶

Gene-drug interactions beyond PGx (substrate/inhibitor/inducer relationships).

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
drug	VARCHAR(100)	Drug name
gene	VARCHAR(50)	Gene involved
interaction_type	VARCHAR(50)	substrate, inhibitor, inducer
text_chunk	VARCHAR(3000)	Text for embedding

6.8 biomarker_aging_markers¶

Epigenetic aging clock marker data (PhenoAge, GrimAge coefficients).

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
text_chunk	VARCHAR(3000)	Text for embedding

6.9 biomarker_genotype_adjustments¶

Genotype-based reference range adjustments for 7 modifier genes.

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
text_chunk	VARCHAR(3000)	Text for embedding

6.10 biomarker_monitoring¶

Condition-specific monitoring protocols (frequency, biomarkers to track).

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
text_chunk	VARCHAR(3000)	Text for embedding

6.11 biomarker_critical_values¶

Critical threshold definitions for life-threatening lab values.

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
text_chunk	VARCHAR(3000)	Text for embedding

6.12 biomarker_discordance_rules¶

Cross-biomarker discordance patterns (contradictory or unexpected relationships).

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
text_chunk	VARCHAR(3000)	Text for embedding

6.13 biomarker_aj_carrier_screening¶

Ashkenazi Jewish carrier screening panel (10 genes: BRCA1/2, GBA, HEXA, FANCC, ASPA, BLM, SMPD1, IKBKAP/ELP1, MCOLN1).

Field	Type	Notes
id	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
text_chunk	VARCHAR(3000)	Text for embedding

6.14 genomic_evidence (shared, read-only)¶

Shared VCF-derived genomic variants from the Genomics Pipeline. Populated by Stage 1 (Parabricks/DeepVariant). The biomarker agent reads but never writes to this collection.

Collection search weights (must sum to ~1.0):

Collection	Weight
biomarker_reference	0.12
biomarker_genetic_variants	0.11
biomarker_pgx_rules	0.10
biomarker_disease_trajectories	0.10
biomarker_clinical_evidence	0.09
genomic_evidence	0.08
biomarker_drug_interactions	0.07
biomarker_aging_markers	0.07
biomarker_nutrition	0.05
biomarker_genotype_adjustments	0.05
biomarker_monitoring	0.05
biomarker_critical_values	0.04
biomarker_discordance_rules	0.04
biomarker_aj_carrier_screening	0.03
Total	1.00

7. Pydantic Data Models¶

File: src/models.py (786 lines)

7.1 Enums (7 total)¶

Enum	Values
`RiskLevel`	critical, high, moderate, low, normal
`ClockType`	PhenoAge, GrimAge
`DiseaseCategory`	diabetes, cardiovascular, liver, thyroid, iron, nutritional, kidney, bone_health, cognitive
`MetabolizerPhenotype`	ultra_rapid, normal, intermediate, poor
`CPICLevel`	1A, 1B, 2A, 2B, 3
`Zygosity`	homozygous_ref, heterozygous, homozygous_alt

7.2 Collection Models (14)¶

Each collection model inherits BaseModel and provides: - Schema fields matching the Milvus collection - to_embedding_text() method for generating BGE-small embedding input - @model_validator for cross-field validation

Models: BiomarkerReference, GeneticVariant, PGxRule, DiseaseTrajectory, ClinicalEvidence, NutritionGuideline, DrugInteraction, AgingMarker, GenotypeAdjustment, MonitoringProtocol, CriticalValue, DiscordanceRule, AJCarrierScreeningEntry, (shared: GenomicEvidence).

7.3 Analysis Models (8+)¶

Model	Purpose
`PatientProfile`	Patient demographics, biomarkers, genotypes
`SearchHit`	Single RAG search result with score
`CrossCollectionResult`	Merged results across all collections
`AgentQuery`	Structured query with filters
`AgentResponse`	Agent response with citations
`AnalysisResult`	Full analysis output (all modules)
`BiologicalAgeResult`	PhenoAge/GrimAge calculation result
`DiseaseTrajectoryResult`	Disease risk trajectory output
`PGxResult`	Pharmacogenomic mapping output
`GenotypeAdjustmentResult`	Adjusted reference ranges

8. Configuration Reference¶

File: config/settings.py -- PrecisionBiomarkerSettings extends Pydantic BaseSettings with env_prefix="BIOMARKER_".

8.1 Environment Variables¶

All settings can be overridden via environment variables prefixed with BIOMARKER_. The .env file is also loaded automatically.

Variable	Default	Description
`BIOMARKER_MILVUS_HOST`	`localhost`	Milvus gRPC host
`BIOMARKER_MILVUS_PORT`	`19530`	Milvus gRPC port
`BIOMARKER_API_HOST`	`0.0.0.0`	API bind address
`BIOMARKER_API_PORT`	`8529`	FastAPI port
`BIOMARKER_STREAMLIT_PORT`	`8528`	Streamlit port
`ANTHROPIC_API_KEY`	(none)	Claude API key (required)
`BIOMARKER_LLM_PROVIDER`	`anthropic`	LLM provider
`BIOMARKER_LLM_MODEL`	`claude-sonnet-4-6`	Claude model ID
`BIOMARKER_EMBEDDING_MODEL`	`BAAI/bge-small-en-v1.5`	Embedding model
`BIOMARKER_EMBEDDING_DIMENSION`	`384`	Embedding dimensions
`BIOMARKER_EMBEDDING_BATCH_SIZE`	`32`	Embedding batch size
`BIOMARKER_TOP_K_PER_COLLECTION`	`5`	Results per collection
`BIOMARKER_SCORE_THRESHOLD`	`0.4`	Minimum cosine similarity
`BIOMARKER_REQUEST_TIMEOUT_SECONDS`	`60`	HTTP request timeout
`BIOMARKER_MILVUS_TIMEOUT_SECONDS`	`10`	Milvus operation timeout
`BIOMARKER_LLM_MAX_RETRIES`	`3`	LLM retry count
`BIOMARKER_MAX_CONVERSATION_CONTEXT`	`3`	Conversation memory depth
`BIOMARKER_CITATION_HIGH_THRESHOLD`	`0.75`	High-confidence citation cutoff
`BIOMARKER_CITATION_MEDIUM_THRESHOLD`	`0.60`	Medium-confidence cutoff
`BIOMARKER_CORS_ORIGINS`	`http://localhost:8080,...`	CORS allowed origins
`BIOMARKER_MAX_REQUEST_SIZE_MB`	`10`	Maximum request body size
`BIOMARKER_API_KEY`	(empty)	API auth key (empty = no auth)
`BIOMARKER_METRICS_ENABLED`	`true`	Enable Prometheus metrics

8.2 Weight Overrides¶

Collection weights are individually overridable:

BIOMARKER_WEIGHT_BIOMARKER_REF=0.15
BIOMARKER_WEIGHT_GENETIC_VARIANTS=0.12
# ... etc.

A @model_validator warns if weights do not sum to approximately 1.0.

9. Embedding Strategy¶

Parameter	Value
Model	BAAI/bge-small-en-v1.5
Parameters	33M
Dimensions	384
Metric	COSINE
Index type	IVF_FLAT
Batch size	32
Runtime	CPU (no GPU required)

Each collection model provides a to_embedding_text() method that constructs a domain-optimized text representation. Examples:

BiomarkerReference: "{name} ({unit}). {text_chunk}. Significance: {clinical_significance}. Category: {category}. Genetic modifiers: {genetic_modifiers}"
PGxRule: "{gene} {star_alleles} -- {drug}. {text_chunk}. Recommendation: {recommendation}. Phenotype: {phenotype}. CPIC Level: {cpic_level}"
GeneticVariant: "{gene} {rs_id}. {text_chunk}. Mechanism: {mechanism}. Diseases: {disease_associations}"

The seed_all.py script reads JSON seed files, calls to_embedding_text() on each record, embeds with SentenceTransformer, and inserts into Milvus.

10. Pharmacogenomic Engine¶

File: src/pharmacogenomics.py (1,503 lines)

Maps star alleles and genotypes to drug recommendations following CPIC Level 1A guidelines. Pure computation -- no LLM or database calls required.

10.1 Covered Pharmacogenes (14)¶

Gene	Description	CPIC Level	Key Drugs
CYP2D6	Metabolizes ~25% of drugs	1A	Codeine, tramadol, tamoxifen
CYP2C19	Clopidogrel, PPIs, antidepressants	1A	Clopidogrel, omeprazole
CYP2C9	Warfarin, NSAIDs, phenytoin	1A	Warfarin, celecoxib
CYP3A5	Tacrolimus metabolism	1A	Tacrolimus
SLCO1B1	Statin hepatic uptake transporter	1A	Simvastatin, atorvastatin
VKORC1	Warfarin target sensitivity	1A	Warfarin
MTHFR	Folate metabolism, homocysteine	Informational	Methotrexate
TPMT	Thiopurine metabolism	1A	Azathioprine, mercaptopurine
DPYD	Fluoropyrimidine metabolism	1A	5-FU, capecitabine

10.2 Phenotype Classification¶

Uses CPIC standard metabolizer terminology: - CYP enzymes: Normal / Intermediate / Poor / Ultra-rapid / Rapid Metabolizer - SLCO1B1: Normal / Intermediate / Poor Function (transporter activity) - MTHFR: Normal / Intermediate / Reduced Activity (enzyme activity) - VKORC1: Normal / Intermediate / High Sensitivity (drug sensitivity)

10.3 Drug Recommendation Actions¶

Each drug recommendation includes an action category:

Action	Meaning
`STANDARD_DOSING`	No change needed
`DOSE_REDUCTION`	Reduce dose per recommendation
`DOSE_ADJUSTMENT`	Adjust dose based on context
`CONSIDER_ALTERNATIVE`	Current drug may work, alternative preferred
`AVOID`	Do not use this drug
`CONTRAINDICATED`	Absolute contraindication (FDA/EMA)

Alert levels: INFO (routine), WARNING (clinical review), CRITICAL (immediate action).

10.4 Usage¶

from src.pharmacogenomics import PharmacogenomicMapper

mapper = PharmacogenomicMapper()
result = mapper.map_star_alleles({"CYP2D6": "*4/*4", "CYP2C19": "*1/*2"})
# Returns: PGxResult with phenotypes, drug recommendations, alerts

11. Biological Age Engine¶

File: src/biological_age.py (408 lines)

Implements PhenoAge (Levine et al. 2018, PMID:29676998) and GrimAge surrogate estimation from standard blood biomarkers. Pure computation.

11.1 PhenoAge Algorithm¶

Uses 9 blood biomarkers with published coefficients from the dayoonkwon/BioAge R package:

Biomarker	Coefficient	Unit (input)	Unit (internal SI)
Albumin	-0.0336	g/dL	g/L (x10)
Creatinine	+0.0095	mg/dL	umol/L (x88.4)
Glucose	+0.1953	mg/dL	mmol/L (/18.016)
ln(CRP)	+0.0954	mg/L	ln(mg/L)
Lymphocyte %	-0.0120	%	%
MCV	+0.0268	fL	fL
RDW	+0.3306	%	%
Alkaline Phosphatase	+0.0019	U/L	U/L
WBC	+0.0554	10^3/uL	10^3/uL

Intercept: -19.9067. The module accepts standard US clinical units and converts internally before applying coefficients.

11.2 Gompertz Mortality Model¶

Converts the linear predictor to biological age via Gompertz mortality parameters: - Mortality numerator: -1.51714 - Mortality denominator (gamma): 0.007692696 - BA intercept: 141.50225

Standard error: 4.9 years (from NHANES III validation).

11.3 GrimAge Surrogate¶

Uses available blood biomarkers to approximate GrimAge components when DNA methylation data is unavailable. Returns estimated biological age with confidence interval.

11.4 Usage¶

from src.biological_age import BiologicalAgeCalculator

calc = BiologicalAgeCalculator()
result = calc.calculate(
    chronological_age=45,
    biomarkers={
        "albumin": 4.2,        # g/dL
        "creatinine": 0.9,     # mg/dL
        "glucose": 95,         # mg/dL
        "hs_crp": 1.2,        # mg/L
        "lymphocyte_pct": 30,  # %
        "mcv": 88,             # fL
        "rdw": 13.2,           # %
        "alkaline_phosphatase": 65,  # U/L
        "wbc": 6.5,            # 10^3/uL
    },
)
# Returns: BiologicalAgeResult with phenoage, grimage, delta, interpretation

12. Disease Trajectory Engine¶

File: src/disease_trajectory.py (1,421 lines)

Detects pre-symptomatic disease trajectories across 9 categories using genotype-stratified biomarker thresholds. Pure computation.

12.1 Disease Categories (9)¶

Category	Display Name	Key Biomarkers	Genetic Modifiers
type2_diabetes	Type 2 Diabetes	HbA1c, fasting glucose, insulin, HOMA-IR	TCF7L2, PPARG, SLC30A8, KCNJ11, GCKR
cardiovascular	Cardiovascular Disease	Lp(a), LDL, ApoB, hs-CRP, HDL, TG	APOE, PCSK9, LPA, IL6
liver	Liver Disease (NAFLD)	ALT, AST, GGT, ferritin, platelets	PNPLA3, TM6SF2, HSD17B13
thyroid	Thyroid Dysfunction	TSH, free T4, free T3	DIO2, DIO1
iron	Iron Metabolism Disorder	Ferritin, transferrin sat, serum iron	HFE C282Y, HFE H63D
nutritional	Nutritional Deficiency	Omega-3, Vit D, B12, folate, Mg, Zn	FADS1, FADS2, VDR, BCMO1, FUT2
kidney	Kidney Disease	eGFR, cystatin C, BUN, urine ACR	APOL1
bone_health	Bone Health	Calcium, PTH, Vit D, CTX, P1NP	VDR, COL1A1
cognitive	Cognitive Decline	Homocysteine, B12, folate, hs-CRP	APOE, MTHFR

12.2 Genotype-Stratified Thresholds¶

Thresholds are adjusted based on genotype via GENOTYPE_THRESHOLDS in the knowledge module:

# TCF7L2 risk alleles lower the HbA1c threshold for concern:
#   0 risk alleles: 6.0%
#   1 risk allele:  5.8%
#   2 risk alleles: 5.5%

# PNPLA3 I148M genotype adjusts ALT upper limit:
#   CC (wild-type): 56 U/L
#   CG (heterozygous): 45 U/L
#   GG (homozygous): 35 U/L

12.3 Stage Progression¶

Each disease defines ordered stages. The engine identifies the current stage, estimates years to clinical diagnosis, and calculates the intervention window with potential risk reduction percentage.

13. Genotype Adjustment Engine¶

File: src/genotype_adjustment.py (1,225 lines)

Adjusts standard biomarker reference ranges based on individual genotype, age, sex, and ancestry. Pure computation.

13.1 Modifier Genes (7)¶

Gene	Affected Biomarkers	Adjustment Mechanism
MTHFR	Homocysteine, folate	Reduced enzyme activity
APOE	LDL, total cholesterol	Lipid metabolism variation
PNPLA3	ALT, AST	Hepatic lipid accumulation
HFE	Ferritin, transferrin saturation	Iron absorption dysregulation
DIO2	TSH, free T4	Impaired T4-to-T3 conversion
VDR	Vitamin D, calcium	Vitamin D receptor sensitivity
FADS1	Omega-3 index	Fatty acid desaturation

13.2 Age-Stratified Brackets¶

Five age brackets for reference range stratification: - 0-17, 18-39, 40-59, 60-79, 80+

Sources: NHANES III, Framingham Heart Study, KDIGO 2012, ATA 2017, ADA 2024, ACC/AHA 2019, Endocrine Society guidelines.

13.3 Ancestry-Aware Adjustments¶

Population-specific biomarker adjustments from the knowledge module (ANCESTRY_ADJUSTMENTS): - African: Higher Lp(a) prevalence, lower TG, higher creatinine - South Asian: Lower LDL/HbA1c thresholds due to higher CVD risk - East Asian: Lower ALT limits, statin sensitivity - Hispanic: Higher NAFLD prevalence, earlier diabetes screening

14. Critical Value Engine¶

File: src/critical_values.py (179 lines)

Evaluates patient biomarker values against 21 critical threshold rules loaded from biomarker_critical_values.json. Real-time alerting for life-threatening lab values.

14.1 Alert Severity Levels¶

Level	Meaning	Response Time
CRITICAL	Immediately life-threatening	Immediate
URGENT	Requires prompt clinical action	Within hours
WARNING	Clinically significant deviation	Next visit

14.2 Covered Biomarkers (15)¶

Platelet Count, Glucose, Potassium, INR, Sodium, Hemoglobin, Calcium, Troponin I, WBC Count, Creatinine, Total Bilirubin, eGFR, Free T4, TSH, Lactate.

Each rule includes: critical high/low thresholds, severity level, escalation target (e.g., "Emergency Physician"), clinical action, cross-check biomarkers, and LOINC code.

14.3 Biomarker Alias Resolution¶

The engine resolves multiple input names to canonical forms. For example, "platelets", "platelet_count", and "plt" all resolve to "Platelet Count".

15. Discordance Detector¶

File: src/discordance_detector.py (299 lines)

Detects cross-biomarker discordance patterns from biomarker_discordance_rules.json. Identifies contradictory or unexpected relationships between biomarker pairs.

15.1 Detection Output¶

Each DiscordanceFinding includes: - rule_name: Descriptive rule name - biomarker_a, biomarker_b: The discordant biomarker pair - condition: The specific discordance condition detected - differential_diagnosis: Possible explanations (list) - agent_handoff: Recommended agents for follow-up (list) - priority: high, moderate, low

16. Knowledge Graph¶

File: src/knowledge.py (1,326 lines)

Contains 4 major knowledge structures, versioned clinical thresholds, age/sex reference ranges, ancestry adjustments, and plausible validation ranges.

16.1 BIOMARKER_DOMAINS (6 disease domains)¶

Each domain contains: - Key biomarkers with units, normal/pre-disease/disease ranges, clinical notes - Genetic modifiers with risk alleles and molecular mechanisms - Intervention targets with evidence-based recommendations

Domains: Diabetes/Metabolic, Cardiovascular, Liver (NAFLD/Fibrosis), Thyroid, Iron Metabolism, Nutritional Deficiency.

16.2 PGX_KNOWLEDGE (14 pharmacogenes)¶

Maps each pharmacogene to key drugs and CPIC guidance. Used by the RAG system prompt and agent reasoning.

16.3 PHENOAGE_KNOWLEDGE¶

PhenoAge clock biomarker descriptions, coefficients, and clinical interpretation. Used for biological age context in LLM responses.

16.4 CROSS_MODAL_LINKS (8 links)¶

Maps biomarker findings to triggers for other HCLS AI Factory agents: - Elevated Lp(a) -> Imaging Agent (coronary calcium scoring) - Iron overload -> Imaging Agent (liver MRI) - PGx drug safety -> CAR-T/Oncology Agent - VCF re-analysis -> Genomics Pipeline

16.5 Shared Clinical Thresholds¶

GENOTYPE_THRESHOLDS dictionary provides genotype-stratified thresholds used by both the disease trajectory and genotype adjustment engines for consistency.

16.6 Age-Sex Reference Ranges¶

AGE_SEX_REFERENCE_RANGES provides clinically validated ranges stratified by sex and age bracket (18-49, 50-69, 70+) for: creatinine, ALT, alkaline phosphatase, ferritin, TSH, hemoglobin, BUN, cystatin C, homocysteine, vitamin D.

16.7 Biomarker Plausible Ranges¶

BIOMARKER_PLAUSIBLE_RANGES provides validation bounds for 35+ biomarkers to detect likely data entry errors before analysis.

16.8 Ancestry Adjustments¶

ANCESTRY_ADJUSTMENTS provides population-specific threshold multipliers for African, South Asian, East Asian, and Hispanic populations with PMID citations.

16.9 Knowledge Versioning¶

KNOWLEDGE_VERSION = {
    "version": "1.0.0",
    "cpic_version": "March 2025",
    "ada_standards": "2024",
    "esc_guidelines": "2021",
    "aasld_guidelines": "2023",
    "levine_phenoage": "2018",
    "lu_grimage": "2019",
}

17. RAG Engine¶

File: src/rag_engine.py (573 lines)

Multi-collection RAG engine that searches across all 14 collections simultaneously using parallel ThreadPoolExecutor, applies weighted scoring, and generates grounded LLM responses.

17.1 Search Pipeline¶

Query text
    |
    v
[Embed] -> BGE-small-en-v1.5 (384-dim vector)
    |
    v
[Parallel Search] -> ThreadPoolExecutor across 14 collections
    |                  Each returns top_k=5 results
    v
[Score & Weight] -> cosine_similarity * collection_weight
    |                 Filter by SCORE_THRESHOLD (0.4)
    v
[Merge & Rank] -> CrossCollectionResult (sorted by weighted score)
    |
    v
[Knowledge Augment] -> Inject domain knowledge, PGx context, PhenoAge info
    |
    v
[LLM Synthesis] -> Claude generates response with citations

17.2 Input Sanitization¶

Filter expressions are validated against _SAFE_FILTER_RE = r"^[A-Za-z0-9 _\-]+$" to prevent Milvus filter injection.

17.3 System Prompt¶

The BIOMARKER_SYSTEM_PROMPT (75 lines) instructs Claude to: - Cite evidence using collection labels (e.g., [BiomarkerRef:marker-id]) - Always specify units when discussing values - Provide genotype-specific interpretation - Highlight critical findings (PGx drug safety, severe iron overload) - Recommend actionable interventions grounded in CPIC evidence - Explain pre-symptomatic disease trajectories with timelines - Flag cross-modal triggers for other agents

18. Agent Orchestrator¶

File: src/agent.py (610 lines)

PrecisionBiomarkerAgent implements the plan-analyze-search-synthesize-report pattern from the VAST AI OS AgentEngine model.

18.1 Initialization¶

agent = PrecisionBiomarkerAgent(
    rag_engine=rag_engine,
    bio_age_calc=BiologicalAgeCalculator(),
    trajectory_analyzer=DiseaseTrajectoryAnalyzer(),
    pgx_mapper=PharmacogenomicMapper(),
    genotype_adjuster=GenotypeAdjuster(),
)

18.2 Agent Loop¶

Plan (search_plan()): Parse question, identify topics, disease areas, relevant analysis modules, and formulate sub-questions
Analyze (analyze_patient()): Run all applicable analysis modules:
BiologicalAgeCalculator
DiseaseTrajectoryAnalyzer
PharmacogenomicMapper
GenotypeAdjuster
CriticalValueEngine
DiscordanceDetector
LabRangeInterpreter
Search (rag_engine.retrieve()): Multi-collection RAG search
Synthesize (evaluate_evidence()): Merge analysis results with RAG evidence, generate LLM response with citations
Report: Pass to ReportGenerator for structured output

18.3 SearchPlan Dataclass¶

@dataclass
class SearchPlan:
    question: str
    identified_topics: List[str]
    disease_areas: List[str]
    relevant_modules: List[str]      # e.g., ["biological_age", "pgx", "trajectory"]
    search_strategy: str             # "broad", "targeted", "domain-specific"
    sub_questions: List[str]

The agent generates cross-modal triggers when findings warrant follow-up by other HCLS AI Factory agents (imaging, oncology, genomics pipeline).

19. Export Pipeline¶

File: src/export.py (1,392 lines)

Exports analysis results in 5 formats:

Format	Function	Description
Markdown	`export_markdown()`	Human-readable with evidence tables
JSON	`export_json()`	Machine-readable structured data
PDF	`export_pdf()`	Styled report via reportlab Platypus
CSV	`export_csv()`	Tabular export for spreadsheets
FHIR R4	`export_fhir_diagnostic_report()`	FHIR R4 DiagnosticReport JSON bundle

19.1 PDF Generation¶

Uses reportlab Platypus engine with styled tables, headers, risk-level color coding, and embedded charts. Generates timestamped filenames: biomarker_report_20260301T143025Z_a1b2.pdf

19.2 FHIR R4 Export¶

Generates a compliant FHIR R4 DiagnosticReport resource bundle including: - Patient reference - Observation resources for each biomarker - Condition resources for disease trajectories - MedicationStatement resources for PGx recommendations

20. Report Generator¶

File: src/report_generator.py (993 lines)

ReportGenerator class produces structured 12-section clinical reports in markdown format from an AnalysisResult.

20.1 Report Sections (12)¶

#	Section	Content
1	Biological Age Assessment	PhenoAge/GrimAge, age acceleration
2	Executive Findings	Top 5 critical/high priority findings
3	Biomarker-Gene Correlation Map	Genotype-biomarker interaction matrix
4	Disease Trajectory Analysis	9 disease categories with stages
5	Pharmacogenomic Profile	PGx phenotypes and drug recommendations
6	Nutritional Analysis	Genotype-guided supplement protocols
7	Interconnected Pathways	Cross-domain pathway connections
8	Prioritized Action Plan	Ranked interventions by urgency
9	Monitoring Schedule	Follow-up timeline and biomarkers
10	Supplement Protocol Summary	Dosing schedules based on genotype
11	Clinical Summary for MD	Physician-oriented executive summary
12	References	PMID citations and guideline versions

Additional sections: Evidence Provenance, Clinical Validation.

20.2 Usage¶

from src.report_generator import ReportGenerator

generator = ReportGenerator()
markdown = generator.generate(analysis_result, patient_profile)

21. FastAPI REST Server¶

File: api/main.py (465 lines) + route modules in api/routes/.

21.1 Application Setup¶

Title: "Biomarker Intelligence Agent API"
Version: 1.0.0
Docs: /docs (Swagger UI), /openapi.json
CORS: Configurable origins via BIOMARKER_CORS_ORIGINS
Auth: Optional API key via X-API-Key header (skips /health and /metrics)
Request size limit: 10 MB (configurable)

21.2 Lifespan Initialization¶

On startup, the lifespan handler initializes: 1. BiomarkerCollectionManager (Milvus connection) 2. SentenceTransformer embedder (BGE-small-en-v1.5) 3. Anthropic LLM client 4. All analysis modules (bio age, trajectory, PGx, genotype adjuster) 5. BiomarkerRAGEngine (wires everything together) 6. PrecisionBiomarkerAgent (autonomous reasoning)

On shutdown, Milvus connection is closed.

21.3 API Endpoints (19+)¶

Core endpoints (defined in api/main.py):

Method	Path	Description
GET	`/`	Service info (name, docs, health)
GET	`/health`	Collection count, vector count, agent status
GET	`/collections`	Collection names and record counts
GET	`/knowledge/stats`	Knowledge graph statistics
GET	`/metrics`	Prometheus-compatible metrics

Analysis router (/v1/, defined in api/routes/analysis.py):

Method	Path	Description
POST	`/v1/analyze`	Full patient analysis (all modules)
POST	`/v1/biological-age`	Biological age calculation
POST	`/v1/disease-risk`	Disease trajectory analysis
POST	`/v1/pgx`	Pharmacogenomic mapping
POST	`/v1/query`	RAG Q&A query
POST	`/v1/query/stream`	Streaming RAG Q&A (SSE)
GET	`/v1/health`	V1 router health check

Events router (/v1/events/, defined in api/routes/events.py):

Method	Path	Description
POST	`/v1/events/cross-modal`	Receive cross-modal event
POST	`/v1/events/biomarker-alert`	Send biomarker alert
GET	`/v1/events/cross-modal`	List inbound events
GET	`/v1/events/biomarker-alert`	List outbound alerts

Reports router (/v1/report/, defined in api/routes/reports.py):

Method	Path	Description
POST	`/v1/report/generate`	Generate 12-section report
GET	`/v1/report/{report_id}/pdf`	Download report as PDF
POST	`/v1/report/fhir`	Export as FHIR R4 bundle

21.4 Request Validation¶

All POST endpoints use Pydantic request models with: - Field constraints (ge=0, le=150, pattern="^(M|F)$") - @model_validator requiring at least one data source (biomarkers, genotypes, or star alleles)

21.5 HIPAA Audit Logging¶

Analysis and report endpoints call audit_log() with action type and hashed patient ID for every request.

22. Streamlit UI¶

File: app/biomarker_ui.py (1,863 lines)

22.1 Tabs (8)¶

Tab	Function
Biomarker Analysis	Full patient analysis with sample patient quick-load
Biological Age	PhenoAge/GrimAge calculator with visualization
Disease Risk	Focused disease trajectory analysis
PGx Profile	Pharmacogenomic drug interaction mapping
Evidence Explorer	RAG Q&A with collection filtering
Reports	PDF and FHIR R4 report generation and download
Patient 360	Unified cross-agent intelligence dashboard
Longitudinal	Biomarker trend tracking across multiple visits

22.2 Page Configuration¶

st.set_page_config(
    page_title="Biomarker Intelligence Agent -- HCLS AI Factory",
    page_icon="🩸",
    layout="wide",
    initial_sidebar_state="expanded",
)

22.3 Engine Initialization¶

Uses @st.cache_resource(ttl=300) to cache the analysis engine across Streamlit reruns (5-minute TTL). Initializes: BiomarkerCollectionManager, BiologicalAgeCalculator, DiseaseTrajectoryAnalyzer, CriticalValueEngine, DiscordanceDetector, LabRangeInterpreter, PharmacogenomicMapper.

22.4 Patient 360 Dashboard¶

File: app/patient_360.py (670 lines)

Cross-agent intelligence dashboard that aggregates findings from multiple HCLS AI Factory agents into a unified patient view. Includes: - Biomarker summary with risk indicators - PGx drug interaction alerts - Disease trajectory timelines - Cross-agent trigger status - Longitudinal trend charts

23. Demo Patients¶

Two pre-configured demo patients for validation and demonstration:

23.1 Patient HCLS-BIO-2026-00001¶

Field	Value
Patient ID	HCLS-BIO-2026-00001
Age	45
Sex	Male
Ethnicity	Ashkenazi Jewish
Genome	HG002 / NA24385 (Genome in a Bottle)
Key genotypes	MTHFR C677T, APOE E3/E4, PNPLA3 I148M
PGx alleles	CYP2D6 1/4, CYP2C19 1/2

23.2 Patient HCLS-BIO-2026-00002¶

Field	Value
Patient ID	HCLS-BIO-2026-00002
Age	38
Sex	Female
Ethnicity	Ashkenazi Jewish

Demo patient data is stored in data/reference/biomarker_sample_patients.json and can be loaded via the Streamlit UI "Biomarker Analysis" tab quick-load buttons.

24. Cross-Agent Integration¶

The Biomarker Intelligence Agent communicates with other HCLS AI Factory agents via the cross-modal event system.

24.1 Outbound Triggers (Biomarker -> Other Agents)¶

Finding	Target Agent	Trigger
Elevated Lp(a) > 125 nmol/L	Imaging Agent	Coronary calcium scoring
Iron overload (ferritin > 1000)	Imaging Agent	Liver MRI (T2*)
DPYD Poor Metabolizer	Oncology Agent	5-FU contraindication alert
BRCA1/2 carrier (AJ panel)	Oncology Agent	Cancer risk assessment
Novel variant detected	Genomics Pipeline	VCF re-analysis trigger
GBA + APOE E4 compound	Autoimmune Agent	Parkinson's risk assessment

24.2 Inbound Events (Other Agents -> Biomarker)¶

Source Agent	Event Type	Action
Imaging Agent	Imaging finding	Correlate with biomarker patterns
Genomics Pipeline	Genomic variant	Update PGx profile, re-analyze
Oncology Agent	Drug alert	Check PGx interactions

24.3 Event API¶

Events are exchanged via /v1/events/cross-modal (POST to send, GET to list). In-memory store with configurable maximum (1,000 events). Production deployments would use a message bus (e.g., Redis, Kafka).

25. Monitoring and Metrics¶

25.1 Prometheus Metrics¶

Endpoint: GET /metrics (Prometheus text format)

Metric	Type	Description
`biomarker_api_requests_total`	counter	Total API requests
`biomarker_api_query_requests_total`	counter	Total /query requests
`biomarker_api_search_requests_total`	counter	Total /search requests
`biomarker_api_analyze_requests_total`	counter	Total /analyze requests
`biomarker_api_bio_age_requests_total`	counter	Total /biological-age requests
`biomarker_api_errors_total`	counter	Total error responses
`biomarker_collection_vectors`	gauge	Vectors per collection (labeled)

25.2 Health Check¶

Endpoint: GET /health

Returns: { status, collections, total_vectors, agent_ready } - healthy: All systems operational - degraded: Milvus unavailable but API responsive

Docker health check: Every 30 seconds via Python stdlib urllib (no curl dependency in container).

26. Audit Logging¶

File: src/audit.py (83 lines)

HIPAA-compliant audit logging for all patient data access. Uses loguru with a dedicated audit=True binding for separate audit event routing.

26.1 Auditable Actions¶

Action	When Logged
`PATIENT_ANALYSIS`	Full patient analysis request
`BIOLOGICAL_AGE`	Biological age calculation
`DISEASE_RISK`	Disease trajectory analysis
`PGX_MAPPING`	Pharmacogenomic mapping
`RAG_QUERY`	RAG evidence query
`REPORT_GENERATED`	Report generation
`REPORT_EXPORTED`	Report downloaded (PDF/CSV)
`FHIR_EXPORTED`	FHIR R4 export
`PATIENT_DATA_ACCESSED`	Any patient data access

26.2 Privacy¶

Patient IDs are SHA-256 hashed before logging. Full IDs are only stored in encrypted production storage, never in log files.

27. Testing¶

709 tests across 18 test files, all passing.

27.1 Test Files¶

File	Module Under Test
`test_models.py`	Pydantic model validation
`test_collections.py`	Milvus collection manager
`test_rag_engine.py`	RAG engine search/synthesis
`test_agent.py`	Agent orchestrator
`test_pharmacogenomics.py`	PGx phenotyping
`test_biological_age.py`	PhenoAge/GrimAge
`test_disease_trajectory.py`	Disease trajectory engine
`test_genotype_adjustment.py`	Genotype adjustments
`test_critical_values.py`	Critical value alerts
`test_discordance_detector.py`	Discordance detection
`test_lab_range_interpreter.py`	Lab range interpretation
`test_report_generator.py`	Report generation
`test_export.py`	Export pipeline (MD/JSON/PDF/FHIR)
`test_api.py`	FastAPI endpoint testing
`test_integration.py`	End-to-end integration
`test_edge_cases.py`	Edge cases and error handling
`test_longitudinal.py`	Longitudinal tracking
`test_ui.py`	Streamlit UI components

27.2 Test Fixtures¶

Shared fixtures in tests/conftest.py: - mock_embedder: Returns 384-dim zero vectors - mock_llm_client: Returns "Mock response" - mock_collection_manager: Simulates Milvus operations - Sample patient profiles and biomarker data

All tests run without Milvus, Anthropic API, or external services.

27.3 Running Tests¶

# All tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=src --cov=api --cov-report=term-missing

# Single module
pytest tests/test_pharmacogenomics.py -v

28. Dependencies¶

File: requirements.txt

28.1 Core Dependencies¶

Package	Version Range	Purpose
pydantic	>=2.0,<3.0	Data models and validation
pydantic-settings	>=2.7,<3.0	Settings with env var support
loguru	>=0.7.0,<1.0	Structured logging
pymilvus	>=2.4.0,<2.6	Milvus vector database client
sentence-transformers	>=2.2.0,<3.0	BGE-small-en-v1.5 embedding
anthropic	>=0.18.0,<1.0	Claude API client
streamlit	>=1.30.0,<2.0	Web UI
fastapi	>=0.109.0,<1.0	REST API framework
uvicorn[standard]	>=0.27.0,<1.0	ASGI server
python-multipart	>=0.0.6,<1.0	Form data parsing
reportlab	>=4.0.0,<5.0	PDF generation
numpy	>=1.24.0,<3.0	Numerical computation
pandas	>=2.0.0,<3.0	Data manipulation
plotly	>=5.18.0,<6.0	Interactive charts
tqdm	>=4.65.0,<5.0	Progress bars
python-dotenv	>=1.0.0,<2.0	.env file loading
prometheus-client	>=0.20.0,<1.0	Prometheus metrics

28.2 Testing Dependencies¶

Package	Version Range	Purpose
pytest	>=7.0,<8.0	Test framework
pytest-asyncio	>=0.21,<1.0	Async test support
pytest-cov	>=4.0,<5.0	Coverage reporting

29. Quick Start¶

29.1 Docker (Recommended)¶

cd ai_agent_adds/precision_biomarker_agent

# Configure API key
cp .env.example .env
# Edit .env: set ANTHROPIC_API_KEY=sk-ant-...

# Start all services
docker compose up -d

# Watch setup progress
docker compose logs -f biomarker-setup

# Access UI and API
open http://localhost:8528    # Streamlit UI
open http://localhost:8529/docs  # Swagger API docs

29.2 Local Development¶

cd ai_agent_adds/precision_biomarker_agent

# Create virtual environment
python -m venv .venv && source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Set API key
export ANTHROPIC_API_KEY=sk-ant-...

# Ensure Milvus is running (standalone or via Docker)
# docker run -d -p 19530:19530 milvusdb/milvus:v2.4-latest

# Create collections and seed data
python scripts/setup_collections.py --drop-existing
python scripts/seed_all.py

# Start Streamlit UI
streamlit run app/biomarker_ui.py --server.port 8528

# Start FastAPI server (separate terminal)
uvicorn api.main:app --host 0.0.0.0 --port 8529 --reload

# Run tests
pytest tests/ -v

29.3 Example API Call¶

curl -X POST http://localhost:8529/v1/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "patient_id": "HCLS-BIO-2026-00001",
    "age": 45,
    "sex": "M",
    "biomarkers": {
      "albumin": 4.2,
      "creatinine": 0.9,
      "glucose": 95,
      "hs_crp": 1.2,
      "hba1c": 5.8,
      "ldl": 142,
      "ferritin": 280,
      "alt": 38,
      "tsh": 2.8
    },
    "genotypes": {
      "rs1801133": "CT",
      "rs429358": "CT"
    },
    "star_alleles": {
      "CYP2D6": "*1/*4",
      "CYP2C19": "*1/*2"
    }
  }'

Generated for the HCLS AI Factory. Import this document as context for Claude Code sessions to get accurate, grounded implementation guidance.