Cardiology Intelligence Agent -- Project Bible¶
Version: 1.0.0 Author: Adam Jones Date: March 2026 License: Apache 2.0 Repository: hcls-ai-factory/ai_agent_adds/cardiology_intelligence_agent
Table of Contents¶
- Executive Summary
- Vision & Mission
- System Overview
- File Inventory
- Collections Catalog
- Knowledge Graph
- Clinical Workflows
- Risk Calculators
- GDMT Optimizer
- Cross-Modal Triggers
- RAG Engine
- Query Expansion
- Data Models
- Export System
- API Reference
- UI Guide
- Ingest Parsers
- Metrics & Monitoring
- Scheduler
- Configuration
- Docker Deployment
- Port Map
- Tech Stack
- Future Roadmap
1. Executive Summary¶
The Cardiology Intelligence Agent is a domain-specialized retrieval-augmented generation (RAG) system that synthesizes cardiac imaging, electrophysiology, hemodynamics, heart failure management, valvular disease, preventive cardiology, interventional data, and cardio-oncology surveillance into guideline-aligned clinical recommendations using ACC/AHA/ESC evidence.
The system searches 12 cardiology-specific Milvus vector collections plus 1 shared genomic_evidence collection (3.5M variant vectors), implements 6 validated cardiovascular risk calculators (ASCVD, HEART, CHA2DS2-VASc, HAS-BLED, MAGGIC, EuroSCORE II), optimizes guideline-directed medical therapy (GDMT) for heart failure (7 therapy classes including finerenone, omecamtiv, and sotagliflozin), and provides 11 clinical workflows covering the highest-impact cardiovascular use cases.
The Cardiology Intelligence Agent is built as part of the HCLS AI Factory, a three-stage precision medicine platform (Genomics, RAG/Chat, Drug Discovery) designed to run end-to-end on a single NVIDIA DGX Spark.
Codebase at a Glance¶
| Metric | Value |
|---|---|
| Total Python LOC | 28,189 |
| Total files | 43 (37 Python, 1 TOML, 1 YAML, 2 Markdown, 1 requirements.txt, 1 .env template) |
| Milvus collections | 13 (12 cardiology-specific + 1 shared genomic_evidence) |
| Knowledge graph conditions | 45 cardiovascular conditions |
| Knowledge graph biomarkers | 29 cardiac biomarkers |
| Knowledge graph drug classes | 32 drug classes |
| Knowledge graph genes | 56 cardiovascular genes |
| Knowledge graph imaging modalities | 27 imaging protocols (12 echo, 7 CMR, 5 nuclear, 3 CT) |
| Guideline documents | 20 guideline documents |
| Guideline recommendations | 63 structured ACC/AHA/ESC recommendations |
| Entity aliases | 167 abbreviation mappings |
| Risk calculators | 6 validated scoring systems |
| Clinical workflows | 11 |
| Cross-modal imaging triggers | 18 genomic trigger patterns |
| Ingest parsers | 7 (PubMed, trials, imaging, ECG, guideline, device, hemodynamics) |
| Enums | 16 |
| Pydantic models | 13 |
| Dataclasses | 1 (SearchPlan) |
| LLM | Claude Sonnet 4.6 (claude-sonnet-4-6) |
| Embedding model | BGE-small-en-v1.5 (384-dim) |
2. Vision & Mission¶
The Problem¶
Cardiovascular disease (CVD) remains the leading cause of death globally, claiming approximately 17.9 million lives annually -- 32% of all deaths worldwide. Despite exponential growth in cardiovascular AI research (over 4,500 publications in 2025 alone), critical barriers persist in clinical translation: fragmented evidence across modalities, siloed data systems, lack of integrated genomic-imaging correlation, and prohibitive infrastructure costs that limit advanced cardiac AI to elite academic centers.
Current tools fall short:
| Approach | Limitation |
|---|---|
| PubMed search | Keyword-based; no cross-modal integration |
| UpToDate / DynaMed | Expert-curated but static; no patient-specific reasoning |
| Commercial CVIS | Vendor-locked; $500K-$2M+ implementation |
| EHR-integrated CDS | Rule-based; cannot synthesize unstructured evidence |
| General AI assistants | No citation provenance; hallucination risk |
| Imaging-only AI | Single-modality; no genomic integration; cloud-dependent |
Mission Statement¶
Democratize cardiovascular clinical decision support by providing unified, evidence-grounded intelligence across the entire cardiology landscape -- from imaging interpretation through risk stratification through GDMT optimization -- on hardware that any clinic or research lab can afford.
Design Principles¶
- Guideline-Aligned: Every recommendation traces to ACC/AHA/ESC guidelines with recommendation class and evidence level.
- Multi-Modal: Integrates imaging, electrophysiology, hemodynamics, genomics, biomarkers, and clinical data.
- Quantitative: Risk calculators produce specific scores, not vague guidance. GDMT optimizer recommends specific doses.
- Cross-Modal: Imaging findings automatically trigger appropriate genomic workup panels.
- Transparent: All responses cite source collections, similarity scores, and guideline references.
3. System Overview¶
High-Level Architecture¶
+------------------------------------------------------------------+
| STREAMLIT UI (8536) |
| [Dashboard] [CAD] [Heart Failure] [Valvular] [Arrhythmia] |
| [Cardiac MRI] [Stress Test] [Prevention] [Cardio-Onc] [Reports] |
| Sidebar: Workflow filter, Collection stats, Risk calc, Export |
+------------------------------------------------------------------+
| |
v v
+------------------+ +---------------------+
| FastAPI Server | | Clinical Engines |
| :8126 | | - Risk Calculators |
| POST /v1/cardio | | - GDMT Optimizer |
| GET /health | | - Cross-Modal |
+------------------+ | - Workflows |
| +---------------------+
v |
+------------------+ |
| RAG Engine |<-------------------------------+
| - Query Expand |
| - Multi-Collect |
| - Citation Score|
| - LLM Synthesis |
+------------------+
|
v
+--------------------------------------------------+
| Milvus Vector DB (:19530) |
| 12 cardio collections + genomic_evidence |
| 384-dim BGE-small-en-v1.5 / IVF_FLAT / COSINE |
+--------------------------------------------------+
| etcd (:2379) | MinIO (:9000) |
+--------------------------------------------------+
Component Interaction Flow¶
- Query arrives via Streamlit UI or FastAPI endpoint
- Query expansion decomposes into sub-questions, identifies conditions/drugs/modalities
- Workflow routing selects applicable clinical workflows (CAD, HF, valvular, etc.)
- Multi-collection search queries relevant Milvus collections with weighted scoring
- Risk calculators compute applicable scores (ASCVD, CHA2DS2-VASc, etc.)
- GDMT optimizer evaluates heart failure medication status and recommends titration
- Cross-modal engine identifies genomic workup triggers from imaging findings
- LLM synthesis (Claude Sonnet 4.6) generates evidence-grounded clinical response
- Citation scoring ranks and formats source citations
- Response assembly packages answer, citations, risk scores, workflow results, triggers
4. File Inventory¶
Source Files (src/)¶
| File | Lines | Description |
|---|---|---|
agent.py |
1,658 | Main agent orchestrator; coordinates all engines |
clinical_workflows.py |
2,445 | 11 clinical workflow implementations |
collections.py |
1,226 | Milvus collection CRUD and schema definitions |
cross_modal.py |
1,734 | Imaging-genomics cross-modal trigger engine |
export.py |
1,379 | PDF/CSV/FHIR export system |
gdmt_optimizer.py |
2,457 | 4-pillar GDMT optimization engine |
knowledge.py |
1,431 | Knowledge graph (conditions, biomarkers, drugs, genes, imaging, guidelines) |
metrics.py |
537 | Prometheus metrics instrumentation |
models.py |
717 | 16 enums, 13 Pydantic models, 1 dataclass |
query_expansion.py |
2,025 | Query decomposition, synonym expansion, entity extraction |
rag_engine.py |
1,589 | Multi-collection RAG retrieval and synthesis |
risk_calculators.py |
2,397 | 6 validated risk calculator implementations |
scheduler.py |
612 | APScheduler-based periodic ingest scheduler |
Ingest Parsers (src/ingest/)¶
| File | Lines | Description |
|---|---|---|
base.py |
224 | Abstract base parser class |
pubmed_parser.py |
477 | PubMed cardiovascular literature via NCBI E-utilities |
clinical_trials_parser.py |
713 | ClinicalTrials.gov cardiovascular trials |
imaging_parser.py |
519 | Cardiac imaging reports and protocols |
ecg_parser.py |
424 | ECG interpretation data |
guideline_parser.py |
555 | ACC/AHA/ESC guideline documents |
device_parser.py |
428 | FDA-cleared cardiovascular devices |
hemodynamics_parser.py |
402 | Catheterization and hemodynamic data |
API Layer (api/)¶
| File | Lines | Description |
|---|---|---|
main.py |
433 | FastAPI application with CORS, error handling, lifespan |
routes/cardio_clinical.py |
1,020 | Clinical query, risk calc, GDMT, workflow endpoints |
routes/reports.py |
306 | Report generation and export endpoints |
routes/events.py |
147 | SSE event streaming |
Application (app/)¶
| File | Lines | Description |
|---|---|---|
cardio_ui.py |
1,182 | Streamlit 10-tab cardiovascular UI |
Configuration (config/)¶
| File | Lines | Description |
|---|---|---|
settings.py |
181 | Pydantic BaseSettings with validation |
Scripts (scripts/)¶
| File | Lines | Description |
|---|---|---|
setup_collections.py |
231 | Create/drop Milvus collections and seed |
seed_knowledge.py |
412 | Seed knowledge graph into collections |
run_ingest.py |
275 | Execute all ingest parsers |
Tests (tests/)¶
| File | Lines | Description |
|---|---|---|
conftest.py |
15 | Pytest configuration and fixtures |
Infrastructure¶
| File | Description |
|---|---|
docker-compose.yml |
Full stack: etcd, MinIO, Milvus, Streamlit, API, setup |
requirements.txt |
Python dependencies |
.streamlit/config.toml |
Streamlit theme and server configuration |
README.md |
Project overview and quickstart |
5. Collections Catalog¶
12 Cardiology-Specific Collections + 1 Shared¶
All collections use BGE-small-en-v1.5 embeddings (384-dim), IVF_FLAT index, COSINE similarity.
| # | Collection Name | Description | Search Weight |
|---|---|---|---|
| 1 | cardio_literature |
Published cardiovascular research, reviews, meta-analyses | 0.10 |
| 2 | cardio_trials |
Cardiovascular clinical trials and landmark trial results | 0.08 |
| 3 | cardio_imaging |
Cardiac imaging protocols, findings, measurements (echo, CT, MRI, nuclear) | 0.10 |
| 4 | cardio_electrophysiology |
ECG interpretation, arrhythmia classification, EP data | 0.08 |
| 5 | cardio_heart_failure |
HF classification, GDMT protocols, management algorithms | 0.10 |
| 6 | cardio_valvular |
Valvular heart disease assessment and intervention criteria | 0.08 |
| 7 | cardio_prevention |
Preventive cardiology: risk stratification, lipid management | 0.10 |
| 8 | cardio_interventional |
Interventional procedures, techniques, outcomes | 0.07 |
| 9 | cardio_oncology |
Cardio-oncology surveillance, cardiotoxicity detection | 0.06 |
| 10 | cardio_devices |
FDA-cleared cardiovascular AI devices, implantable devices | 0.04 |
| 11 | cardio_guidelines |
ACC/AHA/ESC/HRS clinical practice guidelines | 0.10 |
| 12 | cardio_hemodynamics |
Catheterization data, pressure tracings, derived calculations | 0.06 |
| 13 | genomic_evidence |
Shared genomic evidence (read-only, 3.5M variants) | 0.03 |
Total weights: 1.00
Collection Schema¶
Each collection follows a consistent schema:
| Field | Type | Description |
|---|---|---|
id |
INT64 (PK, auto) | Unique identifier |
text |
VARCHAR(65535) | Source text content |
embedding |
FLOAT_VECTOR(384) | BGE-small-en-v1.5 embedding |
source |
VARCHAR(512) | Source identifier (DOI, URL, guideline ID) |
title |
VARCHAR(1024) | Document title |
metadata |
JSON | Additional structured metadata |
created_at |
VARCHAR(32) | Ingestion timestamp |
6. Knowledge Graph¶
The knowledge graph (src/knowledge.py, 1,431 lines) provides structured domain data across 7 categories:
6.1 Cardiovascular Conditions (45 entries)¶
Each condition includes: aliases, ICD-10 code, prevalence, diagnostic criteria, risk factors, imaging modalities, treatment options, associated genes, guideline references, and cross-modal trigger flag.
Conditions by category:
| Category | Conditions |
|---|---|
| Cardiomyopathies (7) | HCM, DCM, ARVC, Takotsubo, Restrictive CM, Cardiac Amyloidosis, Cardiac Sarcoidosis |
| Coronary Disease (3) | CAD, ACS, Stable Angina |
| Heart Failure (2) | HFrEF, HFpEF |
| Arrhythmias (7) | AF, Atrial Flutter, VT, LQTS, Brugada, CPVT, WPW |
| Valvular Disease (5) | Aortic Stenosis, Aortic Regurgitation, Mitral Regurgitation, Mitral Stenosis, Tricuspid Regurgitation |
| Vascular (3) | Pulmonary Hypertension, Aortic Dissection, Thoracic Aortic Aneurysm |
| Inflammatory (3) | Myocarditis, Pericarditis, Infective Endocarditis |
| Metabolic/Other (2) | Familial Hypercholesterolemia, Cardiac Tamponade |
6.2 Cardiac Biomarkers (29 entries)¶
Each biomarker includes: full name, LOINC code, reference ranges (age/sex stratified where applicable), clinical use, kinetics, confounders, and guideline thresholds.
Biomarkers: hs-cTnI, hs-cTnT, NT-proBNP, BNP, CK-MB, D-dimer, hsCRP, LDL-C, HDL-C, Total Cholesterol, Triglycerides, Lp(a), ApoB, HbA1c, Creatinine/eGFR, Potassium, Magnesium, Ferritin/Iron/TSAT, Lactate, Procalcitonin, IL-6
6.3 Cardiac Drug Classes (32 entries)¶
Each drug class includes: specific drugs, mechanism, indications, contraindications, key trials, target doses, and monitoring parameters.
Drug classes: Beta-blockers, ACE inhibitors, ARBs, ARNI, SGLT2 inhibitors, MRA, Loop diuretics, Thiazide diuretics, DHP CCBs, Non-DHP CCBs, Statins, PCSK9 inhibitors, Ezetimibe, Antiplatelets, DOACs, Warfarin, Heparin/LMWH, Class I antiarrhythmics, Class III antiarrhythmics, Digoxin, Nitrates, Hydralazine/ISDN, Ivabradine, Mavacamten, GLP-1 receptor agonists, Inotropes
6.4 Cardiovascular Genes (56 entries)¶
Each gene includes: full name, chromosome locus, function, associated conditions, inheritance pattern, key variants, clinical significance, and genetic testing indication.
Gene categories: - Cardiomyopathy (11): MYH7, MYBPC3, TNNT2, TNNI3, TPM1, ACTC1, MYL2, MYL3, TTN, LMNA, RBM20 - Desmosomal/ARVC (6): PKP2, DSP, DSG2, DSC2, JUP, TMEM43 - Channelopathy (7): SCN5A, KCNQ1, KCNH2, KCNJ2, RYR2, CASQ2, CALM1, ANK2 - Aortopathy (7): FBN1, TGFBR1, TGFBR2, SMAD3, ACTA2, MYH11, COL3A1 - Lipid/Metabolic (5): LDLR, PCSK9, APOB, APOE, LPA - Congenital/Other (4): NKX2-5, GATA4, TBX5, GLA
6.5 Imaging Modalities (27 protocols: 12 echo, 7 CMR, 5 nuclear, 3 CT)¶
Each modality includes: full name, protocols, key measurements with reference values, indications, contraindications, and guideline society.
Modalities: TTE, TEE, Stress Echo, Cardiac CT Calcium Score, Coronary CTA, Cardiac MRI, Cardiac PET, SPECT MPI, MUGA, Right Heart Catheterization, Coronary Angiography, 12-Lead ECG, Holter Monitor, Event Monitor/ILR, Device Interrogation
6.6 Guideline Recommendations (63 entries across 20 guideline documents)¶
Structured guideline entries with: society, year, recommendation text, class (I/IIa/IIb/III), evidence level (A/B/C), condition, and source citation.
Guideline coverage: - ACC/AHA 2022 Heart Failure (8 recommendations) - ACC/AHA 2020 Valvular Heart Disease (6 recommendations) - ACC/AHA 2018 Cholesterol (6 recommendations) - ACC/AHA 2019 Primary Prevention (3 recommendations) - ACC/AHA/HRS 2023 Atrial Fibrillation (5 recommendations) - ACC/AHA 2024 HCM (6 recommendations) - ESC 2023 ACS (4 recommendations) - ESC 2022 Cardio-Oncology (4 recommendations) - Additional key recommendations (10)
6.7 Entity Aliases (167 entries)¶
Bidirectional abbreviation-to-full-name mapping for cardiovascular terminology, enabling query expansion and entity recognition via 18 synonym maps. Covers abbreviations for conditions, procedures, medications, imaging modalities, biomarkers, professional societies, and clinical measurements.
6.8 Condition-Drug Map (11 entries)¶
Direct mapping from conditions to applicable drug classes for treatment recommendation generation.
6.9 Imaging Trigger Map (18 entries)¶
Maps imaging findings to recommended genetic testing panels. Expanded from 6 to 18 genomic trigger patterns, including: - Unexplained LVH >=15mm --> HCM/Fabry/Danon gene panel - Non-ischemic LGE with DCM --> DCM gene panel - RV dilation with arrhythmia --> ARVC desmosomal panel - Diffuse LGE + low voltage --> TTR (amyloidosis) - Aortic root dilation --> Aortopathy panel - Premature coronary calcification --> FH panel - Plus 12 additional trigger patterns for channelopathies, congenital disease, and infiltrative cardiomyopathies
7. Clinical Workflows¶
Eleven clinical workflows (src/clinical_workflows.py) cover the highest-impact cardiovascular use cases:
| # | Workflow | Key Actions |
|---|---|---|
| 1 | CAD Assessment | Calcium scoring interpretation, CAD-RADS classification, plaque characterization, FFR/iFR correlation, revascularization criteria |
| 2 | Heart Failure Classification & GDMT | NYHA/ACC staging, EF categorization (HFrEF/HFmrEF/HFpEF/HFimpEF), 4-pillar GDMT optimization, device eligibility (ICD/CRT) |
| 3 | Valvular Heart Disease Quantification | Severity grading per ASE criteria, intervention threshold assessment, TAVR vs SAVR decision support |
| 4 | Arrhythmia Detection & Management | ECG interpretation (15 criteria, 10 arrhythmia protocols), CHA2DS2-VASc scoring, anticoagulation recommendation, rhythm vs rate control strategy |
| 5 | Cardiac MRI Tissue Characterization | LGE pattern analysis (ischemic vs non-ischemic), T1/T2 mapping interpretation, ECV quantification, parametric analysis |
| 6 | Stress Test Interpretation | Duke Treadmill Score, SPECT/PET MPI defect analysis, stress echo wall motion, ischemic burden quantification |
| 7 | Preventive Risk Stratification | ASCVD risk calculation, statin eligibility algorithm, CAC reclassification, risk enhancer assessment |
| 8 | Cardio-Oncology Surveillance | Cardiotoxicity risk stratification, CTRCD detection (LVEF decline, GLS decline), biomarker monitoring, cardioprotection |
| 9 | Acute Decompensated Heart Failure | Hemodynamic profiling (14 parameters, 6 cath lab protocols), diuretic strategy, inotrope selection, mechanical support evaluation |
| 10 | Post-Myocardial Infarction | Post-MI risk stratification, secondary prevention, cardiac rehabilitation, follow-up imaging timing |
| 11 | Myocarditis/Pericarditis | Lake Louise criteria, inflammatory biomarker tracking, activity restriction guidance, colchicine/NSAID protocols |
Each workflow:
- Receives a CardioQuery with patient context
- Queries relevant Milvus collections with workflow-specific weights
- Executes applicable risk calculators
- Generates WorkflowResult with findings, recommendations, guideline references, severity, and cross-modal triggers
8. Risk Calculators¶
Six validated cardiovascular risk calculators (src/risk_calculators.py, 2,397 lines):
8.1 ASCVD Pooled Cohort Equations (Goff 2013)¶
- Purpose: 10-year atherosclerotic cardiovascular disease risk
- Inputs: Age, sex, race, total cholesterol, HDL, systolic BP, BP treatment status, diabetes, smoking
- Coefficients: 4 sex/race cohorts (white female, African American female, white male, African American male)
- Output: 10-year risk percentage + risk category (low <5%, borderline 5-7.5%, intermediate 7.5-20%, high >=20%)
- Source: Goff DC Jr, et al. J Am Coll Cardiol. 2014;63(25 Pt B):2935-2959
8.2 HEART Score¶
- Purpose: Chest pain risk stratification in the emergency department
- Inputs: History, ECG, Age, Risk factors, Troponin (0-2 points each)
- Output: Score 0-10; Low (0-3, 1.7% MACE), Moderate (4-6, 16.6% MACE), High (7-10, 50.1% MACE)
- Source: Six AJ, et al. Neth Heart J. 2008;16(6):191-196
8.3 CHA2DS2-VASc¶
- Purpose: Atrial fibrillation stroke risk assessment
- Inputs: CHF, Hypertension, Age (>=75 = 2 pts, 65-74 = 1 pt), Diabetes, Stroke/TIA history (2 pts), Vascular disease, Sex (female = 1 pt)
- Output: Score 0-9 with annual stroke rate lookup
- Source: Lip GYH, et al. Chest 2010;137(2):263-272
8.4 HAS-BLED¶
- Purpose: Bleeding risk assessment on anticoagulation
- Inputs: Hypertension, Abnormal renal/liver function, Stroke, Bleeding history, Labile INR, Elderly (>65), Drugs/alcohol
- Output: Score 0-9; Low (0-1), Moderate (2), High (>=3)
- Source: Pisters R, et al. Chest 2010;138(5):1093-1100
8.5 MAGGIC Heart Failure Risk Score¶
- Purpose: Heart failure 1-year and 3-year mortality prediction
- Inputs: Age, sex, LVEF, NYHA class, systolic BP, BMI, creatinine, diabetes, beta-blocker use, ACEi/ARB use, COPD, HF duration, smoking
- Output: Integer score with mortality percentage lookup (51 tiers for 1-year and 3-year)
- Source: Pocock SJ, et al. Eur Heart J. 2013;34(19):1404-1413
8.6 EuroSCORE II¶
- Purpose: Cardiac surgical mortality risk prediction
- Inputs: Patient factors (age, sex, renal function, mobility, prior surgery, lung disease, endocarditis, critical state, diabetes), cardiac factors (NYHA, angina, LVEF, recent MI, pulmonary hypertension), operation factors (urgency, procedure weight, thoracic aorta)
- Coefficients: Logistic regression with 28 beta coefficients
- Output: Predicted operative mortality percentage
- Source: Nashef SA, et al. Eur J Cardiothorac Surg. 2012;41(4):734-745
Risk Calculator Engine¶
The RiskCalculatorEngine class provides:
- calculate(score_type, input, extra) -- compute a single score
- calculate_all_applicable(input, extra) -- compute all scores for which sufficient data exists
- Input validation with descriptive error messages via RiskCalculatorError
- Each calculator returns RiskScoreResult with score value, risk category, interpretation, recommendations, and guideline reference
9. GDMT Optimizer¶
The GDMT optimizer (src/gdmt_optimizer.py, 2,457 lines) implements evidence-based guideline-directed medical therapy optimization for heart failure with reduced ejection fraction (HFrEF).
Four Pillars of GDMT¶
| Pillar | Drug Class | Target Dose | Key Trials |
|---|---|---|---|
| 1. Beta-Blocker | Carvedilol, Metoprolol Succinate, Bisoprolol | Carvedilol 25mg BID, Metoprolol 200mg QD, Bisoprolol 10mg QD | MERIT-HF, COPERNICUS, CIBIS-II |
| 2. ARNI/ACEi/ARB | Sacubitril/Valsartan (preferred), Enalapril, Valsartan | Sacubitril/Valsartan 97/103mg BID | PARADIGM-HF, PARAGON-HF |
| 3. MRA | Spironolactone, Eplerenone | Spironolactone 25-50mg QD, Eplerenone 50mg QD | RALES, EMPHASIS-HF |
| 4. SGLT2i | Dapagliflozin, Empagliflozin | 10mg QD (both) | DAPA-HF, EMPEROR-Reduced |
Optimizer Features¶
- EF-stratified recommendations: Different GDMT algorithms for HFrEF, HFmrEF, HFpEF, and HFimpEF
- Titration sequencing: Recommends initiation order and uptitration steps
- Contraindication checking: Validates each pillar against patient-specific contraindications (K+, eGFR, BP, HR)
- Lab monitoring guidance: Generates lab check schedules for potassium, creatinine, BP after each titration
- Device therapy eligibility: Evaluates ICD and CRT criteria based on LVEF, QRS duration, LBBB, NYHA class
- Additional therapies (7 total): H-ISDN for African American patients, ivabradine for persistent tachycardia, IV iron for iron deficiency, finerenone (non-steroidal MRA), omecamtiv mecarbil (cardiac myosin activator), sotagliflozin (dual SGLT1/2 inhibitor), vericiguat (sGC stimulator)
10. Cross-Modal Triggers¶
The cross-modal engine (src/cross_modal.py, 1,734 lines) links cardiac imaging findings to genomic workup recommendations via the shared genomic_evidence collection.
Trigger Patterns¶
| Imaging Finding | Triggered Gene Panel | Suspected Conditions |
|---|---|---|
| Unexplained LVH >=15mm | MYH7, MYBPC3, TNNT2, TNNI3, TPM1, GLA, LAMP2, PRKAG2 | HCM, Fabry Disease, Danon Disease |
| Non-ischemic LGE with DCM | TTN, LMNA, RBM20, MYH7, DSP, FLNC, BAG3 | DCM, Arrhythmogenic CM |
| RV dilation with arrhythmia | PKP2, DSP, DSG2, DSC2, JUP, TMEM43 | ARVC |
| Diffuse LGE + low voltage ECG | TTR | Cardiac Amyloidosis |
| Aortic root dilation (>=4.0cm) | FBN1, TGFBR1, TGFBR2, SMAD3, ACTA2, MYH11, COL3A1 | Marfan, Loeys-Dietz, Familial TAAD |
| Premature coronary calcification | LDLR, PCSK9, APOB | Familial Hypercholesterolemia |
Cross-Modal Workflow¶
- Imaging findings are extracted from the workflow result
- Pattern matching identifies applicable trigger entries from
IMAGING_TRIGGER_MAP - For each matched trigger, the genomic_evidence collection is queried for relevant variant data
- Results are packaged as
CrossModalTriggerobjects with source, finding, gene panel, conditions, and clinical rationale - Triggers are included in the final
CardioResponsefor clinician review
11. RAG Engine¶
The RAG engine (src/rag_engine.py, 1,589 lines) implements multi-collection retrieval-augmented generation:
Search Pipeline¶
- Embedding: Input query is embedded using BGE-small-en-v1.5 (384-dim)
- Multi-collection search: Queries all 13 collections in parallel with
top_k=5per collection - Score thresholding: Filters results below
SCORE_THRESHOLD=0.4 - Weighted ranking: Applies collection-specific weights (see Collections Catalog)
- Deduplication: Removes near-duplicate content across collections
- Citation scoring: Classifies citations as high (>=0.75), medium (>=0.60), or standard confidence
LLM Synthesis¶
- Provider: Anthropic (Claude Sonnet 4.6)
- System prompt: Cardiology domain expert with guideline citation requirements
- Context window: Retrieved passages assembled with source attribution
- Output format: Structured clinical answer with inline citations
Conversation Memory¶
- Maintains up to 3 conversation turns for contextual follow-up questions
- Configurable via
MAX_CONVERSATION_CONTEXTsetting
12. Query Expansion¶
The query expansion module (src/query_expansion.py, 2,025 lines) enhances retrieval through:
Expansion Strategies¶
- Synonym expansion: Maps abbreviations to full names using 167 entity aliases across 18 synonym maps (e.g., "HCM" -> "hypertrophic cardiomyopathy")
- Condition identification: Extracts cardiovascular conditions from the query using knowledge graph matching
- Drug identification: Identifies mentioned drug classes and specific medications
- Imaging modality detection: Recognizes imaging modality references for collection routing
- Sub-question decomposition: Breaks complex queries into searchable sub-questions
- Search strategy selection: Chooses between broad, targeted, comparative, and clinical strategies
- Topic identification: Tags query with relevant topics for collection weight adjustment
Output: SearchPlan¶
The query expansion module produces a SearchPlan dataclass containing:
- Original question
- Identified conditions, drugs, and imaging modalities
- Relevant workflow types
- Search strategy (broad/targeted/comparative/clinical)
- Decomposed sub-questions
- Identified topics
13. Data Models¶
Enums (16)¶
| Enum | Values | Description |
|---|---|---|
CardioWorkflowType |
12 | Clinical workflow types (9 original + 3 new: acute_decompensated_hf, post_mi, myocarditis_pericarditis) |
RiskScoreType |
6 | Risk calculator types |
SeverityLevel |
5 | Clinical finding severity (critical/high/moderate/low/informational) |
ImagingModality |
5 | Echo, CT, MRI, Nuclear, Angiography |
HeartFailureClass |
4 | NYHA I-IV |
HeartFailureStage |
4 | ACC/AHA Stage A-D |
EjectionFractionCategory |
4 | HFrEF, HFmrEF, HFpEF, HFimpEF |
ValveSeverity |
4 | Mild, Moderate, Severe, Critical |
CADRADSScore |
7 | CAD-RADS 0-5 stenosis grading |
AnticoagulationRecommendation |
4 | No anticoag to Contraindicated |
GDMTPillar |
4 | Beta-blocker, ARNI/ACEi/ARB, MRA, SGLT2i |
GDMTStatus |
6 | Not started through Intolerant |
CardiotoxicityRisk |
4 | Low, Moderate, High, Very High |
LGEPattern |
7 | Subendocardial through None |
GuidelineClass |
5 | ACC/AHA recommendation classes I/IIa/IIb/III |
EvidenceLevel |
5 | A, B-R, B-NR, C-LD, C-EO |
Pydantic Models (13)¶
| Model | Fields | Purpose |
|---|---|---|
CardioQuery |
3 | Input query with optional workflow and patient context |
CardioSearchResult |
4 | Single search result from any collection |
RiskScoreInput |
30+ | Universal risk calculator input (demographics, vitals, labs, comorbidities, HF-specific, surgical) |
RiskScoreResult |
6 | Calculator output (score, category, interpretation, recommendations) |
GDMTMedication |
6 | Single GDMT pillar medication status |
GDMTRecommendation |
5 | GDMT optimization output |
ValveAssessment |
6 | Valve lesion assessment |
ECGInterpretation |
6 | Structured ECG interpretation |
ImagingResult |
5 | Cardiac imaging result with cross-modal triggers |
CardiotoxicityAssessment |
8 | Cardio-oncology surveillance assessment |
CrossModalTrigger |
5 | Imaging-to-genomics trigger |
WorkflowResult |
7 | Single workflow execution output |
CardioResponse |
6 | Top-level agent response |
Dataclass (1)¶
| Class | Fields | Purpose |
|---|---|---|
SearchPlan |
8 | Pre-retrieval search planning output |
14. Export System¶
The export system (src/export.py, 1,379 lines) supports multiple output formats:
| Format | Description |
|---|---|
| Structured clinical report with risk scores, GDMT status, recommendations, citations | |
| CSV | Tabular export of risk scores, medications, findings |
| JSON | Full CardioResponse serialization |
| FHIR | HL7 FHIR R4-compatible DiagnosticReport and RiskAssessment resources |
15. API Reference¶
Base URL: http://localhost:8126
Core Endpoints¶
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check with collection count and vector totals |
POST |
/v1/cardio/query |
General cardiology query |
POST |
/v1/cardio/risk/ascvd |
ASCVD risk calculation |
POST |
/v1/cardio/risk/heart |
HEART score calculation |
POST |
/v1/cardio/risk/cha2ds2-vasc |
CHA2DS2-VASc calculation |
POST |
/v1/cardio/risk/has-bled |
HAS-BLED calculation |
POST |
/v1/cardio/risk/maggic |
MAGGIC HF risk calculation |
POST |
/v1/cardio/risk/euroscore-ii |
EuroSCORE II calculation |
POST |
/v1/cardio/gdmt/optimize |
GDMT optimization |
POST |
/v1/cardio/workflow/cad |
CAD assessment workflow |
POST |
/v1/cardio/workflow/heart-failure |
Heart failure workflow |
POST |
/v1/cardio/workflow/valvular |
Valvular disease workflow |
POST |
/v1/cardio/workflow/arrhythmia |
Arrhythmia workflow |
POST |
/v1/cardio/workflow/cardiac-mri |
Cardiac MRI workflow |
POST |
/v1/cardio/workflow/stress-test |
Stress test workflow |
POST |
/v1/cardio/workflow/prevention |
Prevention workflow |
POST |
/v1/cardio/workflow/cardio-onc |
Cardio-oncology workflow |
POST |
/v1/cardio/workflow/acute-dhf |
Acute decompensated HF workflow |
POST |
/v1/cardio/workflow/post-mi |
Post-MI workflow |
POST |
/v1/cardio/workflow/myocarditis-pericarditis |
Myocarditis/Pericarditis workflow |
GET |
/v1/cardio/knowledge-version |
Knowledge base version info |
POST |
/v1/cardio/report/pdf |
Generate PDF report |
POST |
/v1/cardio/report/csv |
Generate CSV export |
POST |
/v1/cardio/report/fhir |
Generate FHIR resources |
GET |
/v1/cardio/collections |
Collection statistics |
GET |
/metrics |
Prometheus metrics endpoint |
16. UI Guide¶
The Streamlit UI (app/cardio_ui.py, 1,182 lines) provides 10 tabs at port 8536:
| Tab | Description |
|---|---|
| 1. Dashboard | System overview: collection stats, recent queries, health status |
| 2. CAD Assessment | Coronary artery disease evaluation with calcium scoring and CAD-RADS |
| 3. Heart Failure | HF classification, GDMT status, optimization recommendations |
| 4. Valvular Disease | Valve severity grading, intervention criteria assessment |
| 5. Arrhythmia | ECG interpretation, CHA2DS2-VASc, anticoagulation decision |
| 6. Cardiac MRI | Tissue characterization, LGE pattern analysis, parametric mapping |
| 7. Stress Test | Stress test interpretation, ischemic burden analysis |
| 8. Prevention | ASCVD risk, statin eligibility, CAC reclassification |
| 9. Cardio-Oncology | Cardiotoxicity surveillance, CTRCD monitoring |
| 10. Reports | Report generation and export (PDF, CSV, FHIR) |
Sidebar: Workflow filter, collection statistics, risk calculator quick access, export options.
17. Ingest Parsers¶
Seven domain-specific parsers (src/ingest/, 3,742 lines total) process cardiovascular data from external sources:
| Parser | Lines | Source | Target Collection |
|---|---|---|---|
pubmed_parser.py |
477 | PubMed via NCBI E-utilities | cardio_literature |
clinical_trials_parser.py |
713 | ClinicalTrials.gov API | cardio_trials |
imaging_parser.py |
519 | Cardiac imaging report data | cardio_imaging |
ecg_parser.py |
424 | ECG interpretation data | cardio_electrophysiology |
guideline_parser.py |
555 | ACC/AHA/ESC guideline PDFs | cardio_guidelines |
device_parser.py |
428 | FDA 510(k)/De Novo database | cardio_devices |
hemodynamics_parser.py |
402 | Catheterization data | cardio_hemodynamics |
All parsers extend BaseParser (base.py, 224 lines) which provides:
- Chunking with configurable size and overlap
- Embedding generation via BGE-small-en-v1.5
- Batch Milvus insertion
- Deduplication by content hash
- Error handling and retry logic
18. Metrics & Monitoring¶
Prometheus metrics (src/metrics.py, 537 lines):
| Metric | Type | Description |
|---|---|---|
cardio_queries_total |
Counter | Total queries processed |
cardio_query_duration_seconds |
Histogram | Query processing latency |
cardio_risk_calculations_total |
Counter | Risk calculator invocations by type |
cardio_gdmt_optimizations_total |
Counter | GDMT optimization requests |
cardio_collection_search_duration |
Histogram | Per-collection search latency |
cardio_cross_modal_triggers_total |
Counter | Cross-modal triggers fired |
cardio_workflow_executions_total |
Counter | Workflow executions by type |
cardio_export_requests_total |
Counter | Export requests by format |
cardio_errors_total |
Counter | Error count by type |
Endpoint: GET /metrics (Prometheus scrape target)
19. Scheduler¶
The APScheduler-based scheduler (src/scheduler.py, 612 lines) manages periodic data ingestion:
| Setting | Default | Description |
|---|---|---|
INGEST_SCHEDULE_HOURS |
168 (weekly) | Hours between ingest runs |
INGEST_ENABLED |
False | Enable/disable automatic ingestion |
When enabled, the scheduler runs all 7 ingest parsers sequentially on the configured interval, updating collection data from PubMed, ClinicalTrials.gov, and other external sources.
20. Configuration¶
All configuration via config/settings.py (181 lines) using Pydantic BaseSettings with CARDIO_ prefix environment variables:
Key Configuration Groups¶
| Group | Settings |
|---|---|
| Paths | PROJECT_ROOT, DATA_DIR, CACHE_DIR, REFERENCE_DIR, RAG_PIPELINE_ROOT |
| Milvus | MILVUS_HOST (localhost), MILVUS_PORT (19530), 13 collection names |
| Embeddings | EMBEDDING_MODEL (BGE-small-en-v1.5), EMBEDDING_DIMENSION (384), EMBEDDING_BATCH_SIZE (32) |
| LLM | LLM_PROVIDER (anthropic), LLM_MODEL (claude-sonnet-4-6), ANTHROPIC_API_KEY |
| RAG Search | TOP_K_PER_COLLECTION (5), SCORE_THRESHOLD (0.4), 13 collection weights |
| PubMed | NCBI_API_KEY, PUBMED_MAX_RESULTS (5000) |
| API Server | API_HOST (0.0.0.0), API_PORT (8126) |
| Streamlit | STREAMLIT_PORT (8536) |
| Metrics | METRICS_ENABLED (True) |
| Scheduler | INGEST_SCHEDULE_HOURS (168), INGEST_ENABLED (False) |
| Conversation | MAX_CONVERSATION_CONTEXT (3) |
| Citation | CITATION_HIGH_THRESHOLD (0.75), CITATION_MEDIUM_THRESHOLD (0.60) |
| CORS | CORS_ORIGINS |
| Limits | MAX_REQUEST_SIZE_MB (10) |
Startup Validation¶
The validate() method checks:
- Milvus host/port validity
- API key presence (warns if missing; search-only mode available)
- Embedding model presence
- Port conflict detection (API vs Streamlit)
- Collection weights sum to ~1.0 (tolerance 0.05)
- RAG pipeline root directory existence
21. Docker Deployment¶
docker-compose.yml defines the full stack:
| Service | Image | Port | Description |
|---|---|---|---|
etcd |
bitnami/etcd | 2379 | Milvus metadata store |
minio |
minio/minio | 9000, 9001 | Milvus object storage |
milvus |
milvusdb/milvus | 19530, 9091 | Vector database |
cardio-setup |
python:3.12-slim | - | Creates collections, seeds data (exits on completion) |
cardio-api |
python:3.12-slim | 8126 | FastAPI server |
cardio-ui |
python:3.12-slim | 8536 | Streamlit UI |
Quickstart¶
# 1. Configure environment
cp .env.example .env
# Edit .env and set ANTHROPIC_API_KEY
# 2. Start all services
docker compose up -d
# 3. Watch setup progress
docker compose logs -f cardio-setup
# 4. Open the UI
open http://localhost:8536
22. Port Map¶
| Service | Port | Protocol |
|---|---|---|
| Streamlit UI | 8536 | HTTP |
| FastAPI API | 8126 | HTTP |
| Milvus gRPC | 19530 | gRPC |
| Milvus Health | 9091 | HTTP |
| MinIO API | 9000 | HTTP |
| MinIO Console | 9001 | HTTP |
| etcd | 2379 | HTTP |
23. Tech Stack¶
| Layer | Technology |
|---|---|
| Vector DB | Milvus 2.4 (IVF_FLAT, COSINE) |
| Embeddings | BGE-small-en-v1.5 (384-dim, BAAI) |
| LLM | Claude Sonnet 4.6 (Anthropic) |
| UI | Streamlit |
| API | FastAPI + Uvicorn |
| Configuration | Pydantic BaseSettings |
| Monitoring | Prometheus metrics |
| Scheduler | APScheduler |
| Containerization | Docker Compose |
| Compute | NVIDIA DGX Spark |
| Data Sources | PubMed, ClinicalTrials.gov, ACC/AHA, ESC, FDA |
24. Future Roadmap¶
- ~~Test suite expansion~~ -- 1,966 tests now passing across all modules
- NIM integration -- On-device inference with NVIDIA NIM microservices
- FHIR interoperability -- Full FHIR R4 integration for EHR connectivity
- Real-time ECG analysis -- Streaming ECG interpretation pipeline
- Imaging AI integration -- Direct DICOM analysis with on-device models
- Population analytics -- Cohort-level cardiovascular risk dashboards
- Multi-language support -- Clinical guidelines in multiple languages
- Wearable integration -- Apple Watch, Fitbit cardiovascular data ingestion