Clinical Imaging Engine -- Architecture Guide¶

Author: Adam Jones Date: March 2026 Version: 2.1.0

1. System Architecture Overview¶

The Clinical Imaging Engine (Engine 4) is organized into six interconnected layers, each with clear responsibilities and interfaces. The system integrates 20 NVIDIA technologies (Community Edition, all free), 9 NIM clients, 9 clinical workflows, 13 Milvus collections (38,028 vectors including 1,938 real PubMed papers), and 1,324 tests. It is designed for deployment on a single NVIDIA DGX Spark ($4,699) with 128 GB unified memory in a 3-tier model (Community/Enterprise/Research), but runs equally well in CPU-only mode with mock NIM fallbacks.

Design Principles¶

Graceful degradation -- Every NIM-dependent feature falls back to mock mode automatically
Shared infrastructure -- Reuses Milvus and embedding model from the HCLS AI Factory platform
Cross-agent interoperability -- Read-only access to genomic_evidence collection from Stage 2
Consistent patterns -- Follows the same Pydantic BaseSettings, collection manager, and RAG engine patterns as the CAR-T Intelligence Engine

2. Component Diagram¶

+=========================================================================+
|                        PRESENTATION LAYER                                |
|                                                                          |
|  +---------------------------+    +-------------------------------+      |
|  | Streamlit Chat UI (8525)  |    | FastAPI REST Server (8524)    |      |
|  | app/imaging_ui.py         |    | api/main.py                  |      |
|  |                           |    |   +-- routes/meta_agent.py    |      |
|  | - Chat interface          |    |   +-- routes/nim.py          |      |
|  | - Evidence panel          |    |   +-- routes/workflows.py    |      |
|  | - Workflow runner         |    |   +-- routes/reports.py      |      |
|  | - NIM status              |    |                               |      |
|  | - Report export           |    | Prometheus metrics            |      |
|  +---------------------------+    +-------------------------------+      |
+=========================================================================+
                    |                            |
                    v                            v
+=========================================================================+
|                        INTELLIGENCE LAYER                                |
|                                                                          |
|  +----------------------------+   +-----------------------------+        |
|  | Imaging Agent              |   | RAG Engine                  |        |
|  | src/agent.py               |   | src/rag_engine.py           |        |
|  |                            |   |                             |        |
|  | - Query classification     |   | - Multi-collection search   |        |
|  | - Workflow dispatch        |   | - Weighted scoring          |        |
|  | - NIM orchestration        |   | - Query expansion           |        |
|  | - Response assembly        |   | - Comparative analysis      |        |
|  +----------------------------+   | - LLM synthesis             |        |
|                                   +-----------------------------+        |
|  +----------------------------+   +-----------------------------+        |
|  | Knowledge Graph            |   | Query Expansion             |        |
|  | src/knowledge.py           |   | src/query_expansion.py      |        |
|  |                            |   |                             |        |
|  | - 25 pathologies           |   | - Domain-specific maps      |        |
|  | - 9 modalities             |   | - Keyword -> term expansion |        |
|  | - 21 anatomy entries       |   | - Entity resolution         |        |
|  +----------------------------+   +-----------------------------+        |
+=========================================================================+
                    |                            |
                    v                            v
+=========================================================================+
|                        INFERENCE LAYER                                   |
|                                                                          |
|  +------------------------------------------------------------------+   |
|  | NIM Service Manager (src/nim/service_manager.py)                  |   |
|  |                                                                    |   |
|  |  +-----------+  +-----------+  +-----------+  +----------------+  |   |
|  |  | VISTA-3D  |  | MAISI     |  | VILA-M3   |  | LLM            |  |   |
|  |  | Client    |  | Client    |  | Client    |  | Client         |  |   |
|  |  | 8530      |  | 8531      |  | 8532      |  | 8520           |  |   |
|  |  +-----------+  +-----------+  +-----------+  +----------------+  |   |
|  |  +-----------+  +-----------+  +-----------+  +----------------+  |   |
|  |  | NV-Seg-CT |  | Nemotron  |  | NV-Gen-CT |  | NV-Gen-MR      |  |   |
|  |  | Client    |  | Nano Clt  |  | Client    |  | Client         |  |   |
|  |  +-----------+  +-----------+  +-----------+  +----------------+  |   |
|  |  +-----------+                                                     |   |
|  |  | NV-Reason |  (stub)                                            |   |
|  |  | CXR Client|                                                    |   |
|  |  +-----------+                                                     |   |
|  |                                                                    |   |
|  |  All 9 inherit BaseNIMClient: health check + retry + mock fallback|   |
|  +------------------------------------------------------------------+   |
|                                                                          |
|  +------------------------------------------------------------------+   |
|  | Workflow Engine (src/workflows/)                                   |   |
|  |                                                                    |   |
|  |  +-- CTHeadHemorrhageWorkflow    (< 90 sec, 3D U-Net)            |   |
|  |  +-- CTChestLungNoduleWorkflow   (< 5 min, RetinaNet+SegResNet)  |   |
|  |  +-- CTCoronaryAngiographyWorkflow (< 5 min, CAD-RADS)          |   |
|  |  +-- CXRRapidFindingsWorkflow    (< 30 sec, DenseNet-121)        |   |
|  |  +-- MRIBrainMSLesionWorkflow    (< 5 min, 3D U-Net+SyN)        |   |
|  |  +-- MRIProstateWorkflow         (< 5 min, PI-RADS v2.1)        |   |
|  |  +-- BreastBIRADSWorkflow        (< 5 min, BI-RADS)             |   |
|  |  +-- ThyroidTIRADSWorkflow       (< 3 min, TI-RADS)             |   |
|  |  +-- LiverLIRADSWorkflow         (< 5 min, LI-RADS)             |   |
|  |                                                                    |   |
|  |  All 9 inherit BaseImagingWorkflow: preprocess->infer->postprocess|   |
|  +------------------------------------------------------------------+   |
+=========================================================================+
                    |                            |
                    v                            v
+=========================================================================+
|                        DATA LAYER                                        |
|                                                                          |
|  +----------------------------+   +-----------------------------+        |
|  | Milvus 2.4 (19530)        |   | Collection Manager          |        |
|  |                            |   | src/collections.py          |        |
|  | 13 imaging collections     |   |                             |        |
|  | + 1 read-only genomic      |   | - Schema definitions        |        |
|  | 38,028 vectors total       |   | - CRUD operations           |        |
|  | IVF_FLAT / COSINE / 384d   |   |                             |        |
|  +----------------------------+   | - Parallel search            |        |
|                                   +-----------------------------+        |
|  +----------------------------+   +-----------------------------+        |
|  | Pydantic Models            |   | Ingest Pipelines            |        |
|  | src/models.py              |   | src/ingest/                 |        |
|  |                            |   |                             |        |
|  | - 13 collection models     |   | - PubMed parser             |        |
|  | - 9 NIM result models      |   | - ClinicalTrials parser     |        |
|  | - Search result models     |   | - 6 seed data parsers       |        |
|  | - Agent I/O models         |   | - APScheduler integration   |        |
|  +----------------------------+   +-----------------------------+        |
+=========================================================================+

3. Data Flow¶

3.1 RAG Query Flow¶

User Query: "What is ACR Lung-RADS classification?"
       |
       v
[1. Query Classification]
       |-- Detect comparative ("X vs Y")? --> No
       |-- Detect modality filter? --> CT
       |-- Detect body region? --> Chest
       |
       v
[2. Query Expansion]
       |-- "Lung-RADS" --> ["lung_rads", "lung_cancer_screening",
       |                     "nodule_management", "ACR", ...]
       |
       v
[3. Embedding]
       |-- BGE-small-en-v1.5: "Represent this sentence: ..."
       |-- Output: 384-dim float32 vector
       |
       v
[4. Parallel Multi-Collection Search]
       |-- imaging_literature    (weight 0.18, top-5) --> 5 hits
       |-- imaging_guidelines    (weight 0.10, top-5) --> 5 hits
       |-- imaging_findings      (weight 0.15, top-5) --> 3 hits
       |-- imaging_trials        (weight 0.12, top-5) --> 4 hits
       |-- imaging_radiomics     (weight 0.08, top-5) --> 3 hits
       |-- imaging_reports       (weight 0.06, top-5) --> 4 hits
       |-- ... (all 14 collections including genomic_evidence)
       |
       v
[5. Weighted Score Merge]
       |-- Combine hits across collections
       |-- Apply collection weights
       |-- Filter by SCORE_THRESHOLD (0.4)
       |-- Sort by weighted score descending
       |
       v
[6. Knowledge Graph Augmentation]
       |-- Match "lung_nodule" pathology entry
       |-- Inject: Lung-RADS categories, severity criteria,
       |           CT characteristics, AI models
       |
       v
[7. LLM Synthesis]
       |-- Build prompt: question + evidence + knowledge context
       |-- Inject conversation history (up to 3 prior turns)
       |-- Call Claude API (or Llama-3 NIM fallback)
       |
       v
[8. Response Assembly]
       |-- Grounded answer with evidence citations
       |-- Source references with scores
       |-- Follow-up question suggestions
       |-- NIM service availability status

3.2 Workflow Execution Flow¶

API Request: POST /workflow/ct_head_hemorrhage/run
       |
       v
[1. Workflow Registry Lookup]
       |-- WORKFLOW_REGISTRY["ct_head_hemorrhage"]
       |-- Instantiate CTHeadHemorrhageWorkflow(mock_mode=True)
       |
       v
[2. Preprocess]
       |-- Mock: skip (return synthetic volume metadata)
       |-- Live: LoadImaged -> EnsureChannelFirst -> Orientationd(RAS)
       |         -> Spacingd(1mm) -> ScaleIntensityRanged(0-80 HU)
       |
       v
[3. Infer]
       |-- Mock: return synthetic segmentation result
       |-- Live: 3D U-Net binary segmentation via MONAI
       |
       v
[4. Postprocess]
       |-- Volume estimation: voxel count x voxel volume
       |-- Midline shift: center of mass vs falx cerebri
       |-- Max thickness measurement
       |-- BTF urgency classification (P1/P2/P4)
       |
       v
[5. WorkflowResult]
       |-- findings: [{category, description, severity, recommendation}]
       |-- measurements: {volume_ml, shift_mm, thickness_mm}
       |-- classification: "P1" / "P2" / "P4"
       |-- severity: critical / urgent / routine
       |-- inference_time_ms, is_mock

4. Milvus Collection Design¶

4.1 Index Configuration¶

All collections use the same index configuration:

Parameter	Value
Index type	IVF_FLAT
Metric type	COSINE
nlist	1024
nprobe	16
Vector dimension	384
Embedding model	BAAI/bge-small-en-v1.5

4.2 Schema Pattern¶

Every collection follows the same field pattern:

FieldSchema("id",        DataType.VARCHAR, max_length=100, is_primary=True)
FieldSchema("embedding", DataType.FLOAT_VECTOR, dim=384)
FieldSchema("text",      DataType.VARCHAR, max_length=3000)
# ... domain-specific metadata fields (VARCHAR, INT64, FLOAT, etc.)

4.3 Search Strategy¶

Parallel search: All collections are searched simultaneously using ThreadPoolExecutor
Per-collection top-K: Default 5 results per collection (configurable 1-50)
Weighted scoring: Each collection has a configurable weight (0.04 to 0.18)
Score threshold: Results below 0.4 cosine similarity are filtered out
Asymmetric embedding: Queries use BGE instruction prefix "Represent this sentence for searching relevant passages: "

5. NIM Client Layer Design¶

5.1 BaseNIMClient (ABC)¶

BaseNIMClient
    |
    +-- health_check()       Ping /v1/health/ready
    +-- is_available()       Cached check (30s interval)
    +-- _request()           HTTP POST with tenacity retry (3 attempts)
    +-- _mock_response()     Abstract: return synthetic result
    +-- _invoke_or_mock()    Try real NIM, fall back to mock
    +-- get_status()         Return "available" / "mock" / "unavailable"

5.2 Client Hierarchy¶

BaseNIMClient (ABC)
    |
    +-- VISTA3DClient
    |       segment(input_path, target_classes) -> SegmentationResult
    |
    +-- MAISIClient
    |       generate(body_region, resolution) -> SyntheticCTResult
    |
    +-- VILAM3Client
    |       analyze(question, input_path) -> VLMResponse
    |
    +-- LLMClient
    |       complete(messages) -> str
    |       (OpenAI-compatible /v1/chat/completions)
    |
    +-- NVSegmentCTClient
    |       segment_ct(input_path) -> SegmentationResult
    |
    +-- NemotronNanoClient
    |       reason(messages) -> str
    |
    +-- NVGenerateCTClient
    |       generate_ct(params) -> SyntheticCTResult
    |
    +-- NVGenerateMRClient
    |       generate_mr(params) -> SyntheticMRResult
    |
    +-- NVReasonCXRClient (stub)
            reason_cxr(image_path, question) -> ReasoningResult

5.3 NIMServiceManager¶

Coordinates all 9 NIM clients:

NIMServiceManager(settings)
    .vista3d         -> VISTA3DClient
    .maisi           -> MAISIClient
    .vilam3          -> VILAM3Client
    .llm             -> LLMClient
    .nv_segment_ct   -> NVSegmentCTClient
    .nemotron_nano   -> NemotronNanoClient
    .nv_generate_ct  -> NVGenerateCTClient
    .nv_generate_mr  -> NVGenerateMRClient
    .nv_reason_cxr   -> NVReasonCXRClient (stub)
    .check_all_services() -> Dict[str, str]  # name -> status

5.4 Mock Fallback Logic¶

_invoke_or_mock(endpoint, payload):
    if is_available():
        try:
            return _request(endpoint, payload)    # Real NIM
        except:
            if mock_enabled:
                return _mock_response()           # Fallback mock
            raise
    elif mock_enabled:
        return _mock_response()                   # Direct mock
    else:
        raise ConnectionError

6. Workflow Pipeline Design¶

6.1 BaseImagingWorkflow (ABC)¶

All nine reference workflows inherit from the same abstract base class:

class BaseImagingWorkflow(ABC):
    WORKFLOW_NAME: str
    TARGET_LATENCY_SEC: float
    MODALITY: str
    BODY_REGION: str
    MODELS_USED: List[str]

    preprocess(input_path)    -> Any          # Abstract
    infer(preprocessed)       -> Dict         # Abstract
    postprocess(result)       -> WorkflowResult   # Abstract
    _mock_inference()         -> Dict         # Abstract
    run(input_path)           -> WorkflowResult   # Orchestrator
    get_workflow_info()       -> Dict         # Metadata

6.2 Workflow Registry¶

WORKFLOW_REGISTRY = {
    "ct_head_hemorrhage":     CTHeadHemorrhageWorkflow,
    "ct_chest_lung_nodule":   CTChestLungNoduleWorkflow,
    "ct_coronary_angiography": CTCoronaryAngiographyWorkflow,
    "cxr_rapid_findings":     CXRRapidFindingsWorkflow,
    "mri_brain_ms_lesion":    MRIBrainMSLesionWorkflow,
    "mri_prostate_pirads":    MRIProstateWorkflow,
    "breast_birads":          BreastBIRADSWorkflow,
    "thyroid_tirads":         ThyroidTIRADSWorkflow,
    "liver_lirads":           LiverLIRADSWorkflow,
}

Dynamic dispatch via the /workflow/{name}/run API endpoint.

6.3 Error Handling¶

run(input_path):
    start = time.time()
    try:
        if mock_mode:
            raw = _mock_inference()
        else:
            preprocessed = preprocess(input_path)
            raw = infer(preprocessed)
        result = postprocess(raw)
        result.inference_time_ms = elapsed
        result.is_mock = mock_mode
        return result
    except:
        return WorkflowResult(status=FAILED, inference_time_ms=elapsed)

7. Ingest Pipeline Design¶

7.1 Pipeline Pattern¶

[Source] --> fetch() --> parse() --> embed() --> store()
              |            |           |           |
         HTTP/API    Extract fields  BGE-small   Milvus
         PubMed      Normalize       384-dim     upsert
         CT.gov      Validate
         Seed JSON   Pydantic model

7.2 Ingest Parsers¶

Parser	Source	Collection
`literature_parser.py`	PubMed (NCBI E-utilities)	`imaging_literature`
`clinical_trials_parser.py`	ClinicalTrials.gov API v2	`imaging_trials`
`finding_parser.py`	Curated seed data	`imaging_findings`
`protocol_parser.py`	Curated seed data	`imaging_protocols`
`device_parser.py`	Curated seed data	`imaging_devices`
`anatomy_parser.py`	Curated seed data	`imaging_anatomy`
`benchmark_parser.py`	Curated seed data	`imaging_benchmarks`
`guideline_parser.py`	Curated seed data	`imaging_guidelines`
`report_template_parser.py`	Curated seed data	`imaging_report_templates`

7.3 PubMed Client (`src/utils/pubmed_client.py`)¶

NCBI E-utilities: esearch + efetch
Optional API key for increased rate limits
Configurable max results (default 5000)

7.4 Scheduling¶

APScheduler (src/scheduler.py) supports periodic re-ingestion: - Default interval: 168 hours (weekly) - Configurable via IMAGING_INGEST_SCHEDULE_HOURS - Disabled by default (IMAGING_INGEST_ENABLED=false)

8. API Layer¶

8.1 FastAPI Application¶

Lifespan management: Initializes Milvus connection, embedding model, NIM service manager, and RAG engine on startup
CORS: Enabled for all origins (development mode)
Prometheus metrics: Query count, latency histogram, search hit histogram
Health check: Reports collection stats, NIM service status, and overall system health

8.2 Route Organization¶

Router	Prefix	Tags	Endpoints
`meta_agent`	`/api`	Meta-Agent	`/api/ask`
`nim`	`/nim`	NIM Services	`/nim/status`, `/nim/vista3d/segment`, `/nim/maisi/generate`, `/nim/vilam3/analyze`
`workflows`	(root)	Workflows	`/workflows`, `/workflow/{name}/info`, `/workflow/{name}/run`
`reports`	(root)	Reports	`/reports/generate`
`events`	`/events`	DICOM Events	`/events/dicom-webhook`, `/events/history`, `/events/status`

Core endpoints registered directly on the app: /health, /collections, /query, /search, /find-related, /knowledge/stats, /metrics

9. UI Layer¶

9.1 Streamlit Application (`app/imaging_ui.py`)¶

The Streamlit UI provides:

Chat interface with multi-turn conversation memory
Evidence panel with expandable results grouped by collection
Comparative analysis auto-detection and dual-panel display
Workflow runner sidebar for executing reference workflows
NIM service status indicators showing available/mock/unavailable
Report export button for PDF generation
Collection statistics in the sidebar
NVIDIA-themed dark/green styling

10.1 Genomic Pipeline Trigger (Implemented)¶

The CrossModalTrigger class (src/cross_modal.py) automatically enriches high-risk imaging findings with genomic context from the shared genomic_evidence collection (3.5M vectors).

8 trigger conditions including: - Lung-RADS 4A+ findings --> queries EGFR, ALK, ROS1, KRAS variants - CXR urgent consolidation --> queries infection-related genomic variants - CAD-RADS >= 3 --> queries LDLR, PCSK9, APOB cardiovascular variants - PI-RADS >= 4 --> queries BRCA2, HOXB13 cancer susceptibility - BI-RADS 4+ --> queries BRCA1, BRCA2, ATM breast cancer variants - TI-RADS TR4+ --> queries RET, BRAF thyroid cancer variants - LI-RADS LR-4+ --> queries HFE, SERPINA1 liver disease variants - Brain lesion high activity --> queries HLA-DRB1, MS susceptibility genes

Data flow:

WorkflowResult (severity=urgent, classification=Lung-RADS 4A)
    |
    v
CrossModalTrigger.evaluate(workflow_result)
    |
    v
Query genomic_evidence collection (3 queries: EGFR, ALK, KRAS)
    |
    v
CrossModalResult (12 genomic hits, top score: 0.78)
    |
    v
AgentResponse.cross_modal (enriched response)

Configuration:

CROSS_MODAL_ENABLED: bool = True  # Active

10.2 Export Architecture (5 Formats)¶

The export system supports 5 formats: Markdown, JSON, PDF (ReportLab), FHIR R4, and DICOM SR (Structured Report via highdicom TID 1500). The export_fhir() function generates FHIR R4 DiagnosticReport Bundles with 54 SNOMED CT codes:

FHIR Bundle (type: collection)
├── Patient resource (stub with identifier)
├── ImagingStudy resource (modality auto-detected from query)
├── Observation resources (one per workflow finding)
│   ├── SNOMED CT coding (finding category)
│   ├── Interpretation (severity → HH/H/A/N)
│   └── Components (measurements with UCUM units)
└── DiagnosticReport resource
    ├── LOINC category (LP29684-5 Radiology)
    ├── LOINC code (18748-4 Diagnostic imaging study)
    ├── conclusionCode (SNOMED for all findings)
    └── extension (cross-modal enrichment summary)

10.3 DICOM Ingestion Architecture¶

Orthanc DICOM Server (port 8042 HTTP, 4242 C-STORE)
    |
    v
POST /events/dicom-webhook (study.complete event)
    |
    v
determine_workflow(modality, body_region) → workflow name
    |
    v
WorkflowRegistry.run(workflow_name, study_data)
    |
    v
DicomIngestionResult (findings, classification, severity)
    |
    v
Event history (in-memory, max 200 entries)

10.4 Drug Discovery Pipeline Feed (Phase 2)¶

Quantitative imaging endpoint
    |-- Tumor volume change
    |-- RECIST measurements
    |-- Treatment response
    |
    v
Drug Discovery Pipeline
    |-- Treatment-response tracking
    |-- Molecular target validation

11. New Architectural Components (v2.1)¶

11.1 Agentic Reasoning (AIQ Toolkit)¶

The Clinical Imaging Engine integrates AIQ Plan/Execute/Reflect/Refine agentic reasoning with 6 tools for multi-step clinical analysis. The agent plans a series of tool invocations, executes them, reflects on intermediate results, and refines its approach before synthesizing a final answer.

11.2 NeMo Guardrails¶

NeMo Guardrails enforce PII protection (detecting and redacting patient identifiers), evidence grounding (ensuring claims are traceable to retrieved evidence), and disclaimer enforcement (appending clinical disclaimer to all outputs).

11.3 Radiomics (PyRadiomics-CUDA)¶

~1,500 radiomics features are extracted via PyRadiomics-CUDA, stored in the imaging_radiomics collection, and searchable via the RAG engine. Features include shape, first-order, GLCM, GLRLM, GLSZM, NGTDM, and GLDM descriptors.

11.4 Radiology Report NLP¶

A full radiology report parsing pipeline extracts findings, impressions, measurements, and coded diagnoses from free-text reports, storing structured results in the imaging_reports collection.

11.5 Protocol Optimization¶

12 ACR indications with patient-specific safety parameters. Protocol recommendations consider patient age, weight, renal function, contrast allergy history, and pregnancy status.

11.6 Dose Tracking¶

DRL (Diagnostic Reference Level) comparison with cumulative dose alerts. Tracks patient radiation exposure history and alerts when cumulative doses approach institutional thresholds.

11.7 Population Analytics (RAPIDS)¶

GPU-accelerated RAPIDS population analytics for cohort-level imaging trends, disease prevalence monitoring, and outcomes tracking across institutional imaging archives.

11.8 Streaming (Holoscan)¶

Holoscan real-time streaming pipeline for ultrasound and endoscopy, enabling sub-second AI inference on live video feeds.

11.9 MONAI Deploy MAPs¶

9 MONAI Application Packages (MAPs) packaged for clinical deployment, following MONAI Deploy standards for containerized inference pipelines.

11.10 MONAI Label¶

Interactive annotation with FLARE bridge, enabling radiologists to interactively segment structures and feed corrections back to the model training loop.

11.11 3D Visualization¶

Three.js rotating point cloud visualization for 3D volumetric data display in the React portal.

11.12 React Portal¶

Live Analysis Layer. A DICOMAnalyzer class (src/dicom_analyzer.py) provides real GPU inference on uploaded DICOM files. It auto-detects modality from DICOM headers, routes to the appropriate workflow, and runs actual model inference (DenseNet-121 for CXR, threshold segmentation for CT/MRI). Six MONAI model bundles (1.87 GB) are downloaded for production inference. API endpoints at /analyze/* handle file upload, sample analysis, and status reporting. The React portal exposes this at /live-analysis with drag-and-drop upload.

Full React portal with 10 pages, providing a modern web interface alongside the Streamlit workbench.

For NIM-specific setup instructions, see NIM_INTEGRATION_GUIDE.md. For the complete implementation specification, see PROJECT_BIBLE.md.