Precision Oncology Intelligence Agent -- Architecture Guide¶

Author: Adam Jones Date: March 2026 License: Apache 2.0

Table of Contents¶

System Diagram
Component Interactions
Data Flow
Collection Design Rationale
Therapy Ranker Engine
Trial Matcher Engine
Resistance Analysis Engine
Cross-Modal Engine
Query Expansion
RAG Pipeline
Agent Orchestrator
Data Model Architecture

1. System Diagram¶

1.1 Full System Architecture¶

                          EXTERNAL USERS
                               |
                    +----------+----------+
                    |                     |
              +-----+------+      +------+-----+
              | Streamlit  |      | REST API   |
              | UI :8526   |      | :8527      |
              +-----+------+      +------+-----+
                    |                     |
                    +----------+----------+
                               |
                    +----------+----------+
                    |  Agent Orchestrator  |
                    |  (src/agent.py)      |
                    +----------+----------+
                               |
          +--------------------+--------------------+
          |                    |                    |
   +------+------+    +-------+-------+    +-------+-------+
   | Query       |    | Workflow      |    | Clinical      |
   | Expansion   |    | Engine        |    | Engines       |
   | (812 LOC)   |    |               |    |               |
   +------+------+    +-------+-------+    | Therapy       |
          |                    |           | Ranker        |
          |                    |           | (748 LOC)     |
          |                    |           |               |
          |                    |           | Trial         |
          |                    |           | Matcher       |
          |                    |           | (513 LOC)     |
          |                    |           |               |
          |                    |           | Case          |
          |                    |           | Manager       |
          |                    |           | (516 LOC)     |
          |                    |           +-------+-------+
          |                    |                    |
          +--------------------+--------------------+
                               |
                    +----------+----------+
                    |    RAG Engine       |
                    |    (908 LOC)        |
                    +----------+----------+
                               |
          +--------------------+--------------------+
          |                    |                    |
   +------+------+    +-------+-------+    +-------+-------+
   | Knowledge   |    | Milvus        |    | LLM           |
   | Graph       |    | Vector DB     |    | (Claude 4.6)  |
   | (1,662 LOC) |    | 11 Collections|    |               |
   +-------------+    +-------+-------+    +---------------+
                               |
                    +----------+----------+
                    |  etcd    |  MinIO   |
                    +----------+----------+

1.2 Ingest Pipeline Architecture¶

  +------------------------------------------+
  |           External Data Sources           |
  +------------------------------------------+
  | PubMed | ClinicalTrials | NCCN/ESMO | FDA|
  +---+--------+--------+--------+--------+--+
      |        |        |        |        |
  +---v---+ +--v---+ +-v----+ +-v-----+ +v--------+
  |PubMed | |Trial | |Guide | |Variant| |Therapy  |
  |Parser | |Parser| |Parser| |Parser | |Parser   |
  +---+---+ +--+---+ +-+----+ +-+-----+ ++--------+
      |        |        |        |        |
      +--------+--------+--------+--------+
                         |
                  +------v------+
                  | Base Parser |
                  | - Chunking  |
                  | - Embedding |
                  | - Insertion |
                  +------+------+
                         |
                  +------v------+
                  |   Milvus    |
                  | Collections |
                  +-------------+

2. Component Interactions¶

2.1 Component Dependency Graph¶

OncoUI (Streamlit) ──> FastAPI Server ──> OncoIntelligenceAgent
                                               |
                                 +-------------+-------------+
                                 |             |             |
                          QueryExpansion  CaseManager     TherapyRanker
                                 |             |             |
                                 +------+------+      TrialMatcher
                                        |                    |
                                   RAGEngine          CrossModalEngine
                                        |                    |
                                 +------+------+             |
                                 |             |             |
                            Milvus DB    Knowledge      genomic_evidence
                            (11 cols)     Graph          (shared col)

2.2 Module Responsibilities¶

Module	File	LOC	Responsibilities
Agent Orchestrator	`src/agent.py`	553	Plan-search-evaluate-synthesize loop; top-level coordination
Query Expansion	`src/query_expansion.py`	812	12-category domain-aware expansion, entity extraction, synonym resolution
RAG Engine	`src/rag_engine.py`	908	Multi-collection search, citation scoring, LLM prompt assembly and synthesis
Therapy Ranker	`src/therapy_ranker.py`	748	7-step evidence-based therapy ranking with 80+ therapies mapped
Trial Matcher	`src/trial_matcher.py`	513	Hybrid deterministic + semantic trial matching
Case Manager	`src/case_manager.py`	516	VCF parsing, case CRUD, MTB packet generation
Knowledge Graph	`src/knowledge.py`	1,662	40+ actionable targets, 80+ therapies, 12+ resistance mechanisms
Collections	`src/collections.py`	--	Milvus collection CRUD, schema definitions, index management
Export	`src/export.py`	1,055	PDF, CSV, JSON, FHIR R4 report generation
Cross-Modal	`src/cross_modal.py`	--	Genomic-imaging-drug discovery event integration
Metrics	`src/metrics.py`	--	Prometheus instrumentation
Models	`src/models.py`	538	13 enums, 10 Pydantic domain models

2.3 Interface Contracts¶

Agent Orchestrator inputs/outputs:

Input:  OncoQuery(question, patient_context?, case_id?)
Output: AgentResponse(answer, evidence, knowledge_used, report, confidence)

Therapy Ranker inputs/outputs:

Input:  PatientProfile(cancer_type, variants, biomarkers, prior_therapies)
Output: TherapyRanking(ranked_therapies, flagged_therapies, combination_options)

Trial Matcher inputs/outputs:

Input:  PatientProfile(cancer_type, biomarkers, stage, age)
Output: TrialMatchResult(matches, composite_scores, match_explanations)

3. Data Flow¶

3.1 Query Processing Pipeline¶

Step 1: RECEIVE QUERY
  OncoQuery arrives via API or UI
  |
Step 2: PLAN (agent.py)
  Identify topics from keyword matching (20+ triggers)
  Extract target genes (40+ ACTIONABLE_TARGETS)
  Detect cancer types (26 canonical + 70+ aliases)
  Select strategy: broad | targeted | comparative
  Decompose complex queries into sub-questions
  Produce SearchPlan
  |
Step 3: SEARCH (rag_engine.py)
  Embed expanded query via BGE-small-en-v1.5
  Search 11 collections in parallel (top_k=5 each)
  Optional query expansion (12 categories)
  Weighted scoring with per-collection weights
  Filter by score_threshold (0.30)
  Deduplicate results
  Merge and rank: top 30
  |
Step 4: EVALUATE (agent.py)
  sufficient: >= 3 hits from >= 2 collections
  partial: some hits but below threshold
  insufficient: zero quality hits -> broaden and retry
  MAX_RETRIES = 2
  |
Step 5: KNOWLEDGE INJECTION (knowledge.py)
  Inject gene context (40+ ACTIONABLE_TARGETS)
  Inject therapy context (80+ therapies, NCCN/ESMO)
  Inject resistance mechanisms (12+ documented mechanisms)
  Inject pathway context (10+ oncogenic pathways)
  Inject biomarker panels (20+ clinical panels)
  |
Step 6: LLM SYNTHESIS (rag_engine.py)
  Assemble context: domain knowledge + evidence + question
  Send to Claude Sonnet 4.6 with oncology system prompt
  Generate evidence-grounded answer with inline citations
  |
Step 7: RESPONSE ASSEMBLY (agent.py)
  Package AgentResponse with:
  - Synthesized answer
  - Ranked evidence with citations
  - Knowledge domains used
  - Markdown report
  - Confidence score
  |
Step 8: DELIVERY
  Return via API (JSON) or display in UI (formatted)
  Optional: Export as PDF/JSON/FHIR R4/Markdown

3.2 Case Creation Flow¶

Patient Data (ID, cancer type, stage, VCF, biomarkers, prior therapies)
    |
    v
[VCF Parsing] -- If raw VCF text, parse via cyvcf2
    |   Extract: gene, variant, chrom, pos, ref, alt, consequence
    |
    v
[Variant Annotation] -- Cross-reference ACTIONABLE_TARGETS (40+ genes)
    |   Classify actionability (A/B/C/D/E/VUS)
    |
    v
[CaseSnapshot Creation] -- Generate UUID, build text_summary
    |   Embed via BGE-small-en-v1.5
    |
    v
[Persist to onco_cases] -- Insert into Milvus collection
    |
    v
[MTB Packet Generation] -- Combine variant table, evidence,
    |   therapy ranking, trial matches, open questions, citations
    |
    v
MTBPacket + CaseSnapshot returned

4. Collection Design Rationale¶

4.1 Why 11 Collections (Not One Giant Collection)¶

The multi-collection architecture was chosen over a single monolithic collection for five reasons:

1. Semantic Precision

The same terms carry different meanings across oncology subspecialties: - "Resistance": acquired mutations (T790M), pathway bypass (MET amplification), lineage plasticity - "Response": RECIST criteria, molecular response (ctDNA), pathological complete response - "Marker": predictive biomarker (PD-L1), prognostic biomarker (TMB), diagnostic biomarker (FISH)

Collection-specific embedding spaces preserve these semantic boundaries.

2. Relevance Weighting

Different queries require different emphasis: - "EGFR T790M resistance" -> weight onco_variants (0.18), onco_resistance (0.07) - "First-line NSCLC therapy" -> weight onco_therapies (0.14), onco_guidelines (0.12) - "MSI-H trial eligibility" -> weight onco_trials (0.10), onco_biomarkers (0.08)

A single collection would return mixed results without the ability to boost domain-specific content.

3. Independent Lifecycle Management

Collection	Update Frequency	Source
onco_literature	Weekly (PubMed)	Automated
onco_guidelines	Quarterly (NCCN)	Manual review
onco_trials	Monthly	Automated
onco_variants	Monthly (CIViC/OncoKB)	Automated
genomic_evidence	Shared, maintained by genomics pipeline	External

4. Source Attribution

Clinicians need to know the provenance of every citation: - "This comes from the NCCN NSCLC v4.2026 Guideline" (onco_guidelines) - "This is based on the KEYNOTE-024 trial" (onco_trials) - "This variant is documented in CIViC as Level A evidence" (onco_variants)

Collection names provide automatic source categorization.

5. Scalability

Individual collections can be independently reindexed, compacted, partitioned, or replicated.

4.2 Collection Inventory¶

#	Collection Name	Description	Weight	Seed Records
1	onco_variants	Actionable somatic/germline variants (CIViC/OncoKB)	0.18	130
2	onco_literature	PubMed/PMC/preprint literature chunks	0.16	60
3	onco_therapies	Approved and investigational therapies	0.14	94
4	onco_guidelines	NCCN/ASCO/ESMO guideline recommendations	0.12	45
5	onco_trials	ClinicalTrials.gov summaries with biomarker criteria	0.10	55
6	onco_biomarkers	Predictive and prognostic biomarkers	0.08	50
7	onco_resistance	Resistance mechanisms and bypass strategies	0.07	50
8	onco_pathways	Signaling pathways, cross-talk, druggable nodes	0.06	35
9	onco_outcomes	Real-world treatment outcome records	0.04	40
10	onco_cases	De-identified patient case snapshots	0.02	37
11	genomic_evidence	Shared VCF-derived genomic variants (read-only)	0.03	--
	Total		1.00	596

4.3 Collection Weight Distribution¶

onco_variants      ██████████████████████  0.18
onco_literature    ████████████████████    0.16
onco_therapies     █████████████████       0.14
onco_guidelines    ██████████████          0.12
onco_trials        ████████████            0.10
onco_biomarkers    █████████               0.08
onco_resistance    ████████                0.07
onco_pathways      ███████                 0.06
onco_outcomes      █████                   0.04
genomic_evidence   ███                     0.03
onco_cases         ██                      0.02
                                      Sum: 1.00

4.4 Key Collection Schemas¶

onco_variants -- Actionable somatic/germline variants:

Field	Type	Notes
id (PK)	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
gene	VARCHAR(50)	Gene symbol
variant_name	VARCHAR(100)	Variant designation
variant_type	VARCHAR(30)	SNV, INDEL, CNV_AMP, FUSION, etc.
cancer_type	VARCHAR(50)	Associated cancer type
evidence_level	VARCHAR(20)	A (FDA) through E (Computational)
drugs	VARCHAR(500)	Indicated therapies
civic_id	VARCHAR(20)	CIViC database ID
vrs_id	VARCHAR(100)	GA4GH VRS identifier
text_summary	VARCHAR(3000)	Clinical narrative for embedding
clinical_significance	VARCHAR(200)	Pathogenic, likely pathogenic, VUS
allele_frequency	FLOAT	Population allele frequency

onco_therapies -- Approved and investigational therapies:

Field	Type	Notes
id (PK)	VARCHAR(100)	Primary key
embedding	FLOAT_VECTOR	384-dim
drug_name	VARCHAR(200)	Generic drug name
category	VARCHAR(30)	TARGETED, IMMUNOTHERAPY, CHEMO, etc.
targets	VARCHAR(200)	Molecular targets
approved_indications	VARCHAR(500)	FDA-approved indications
resistance_mechanisms	VARCHAR(500)	Known resistance mechanisms
evidence_level	VARCHAR(20)	Evidence tier
text_summary	VARCHAR(3000)	Clinical summary for embedding
mechanism_of_action	VARCHAR(500)	MOA description

5. Therapy Ranker Engine¶

5.1 Architecture¶

Patient Profile (cancer_type, variants, biomarkers, prior_therapies)
    |
    v
[Step 1: Variant-Driven Therapies]
    |   ACTIONABLE_TARGETS lookup for each gene/variant (40+ genes)
    |   Evidence level from knowledge graph
    |
    v
[Step 2: Biomarker-Driven Therapies]
    |   MSI-H -> pembrolizumab, nivolumab, dostarlimab (Level A)
    |   TMB-H (>=10 mut/Mb) -> pembrolizumab (Level A)
    |   HRD/BRCA -> olaparib, rucaparib, niraparib, talazoparib (Level A/B)
    |   PD-L1 TPS >=50% -> pembrolizumab first-line (Level A)
    |   NTRK fusion -> larotrectinib, entrectinib (Level A)
    |   + BIOMARKER_PANELS registry check (20+ panels)
    |
    v
[Step 3: Evidence Level Sort]
    |   A (FDA-approved) > B (Clinical) > C (Case reports) > D > E
    |
    v
[Step 4: Resistance Check]
    |   RESISTANCE_MAP: 12+ mutation-level resistance mechanisms
    |   _DRUG_CLASS_GROUPS: same-mechanism class resistance
    |
    v
[Step 5: Contraindication Check]
    |   Same drug previously used -> flag
    |   Same drug_class as prior failed therapy -> flag
    |
    v
[Step 6: Supporting Evidence + Combination Therapy]
    |   Search onco_therapies + onco_literature for each drug
    |   Known FDA-approved combos (dabrafenib+trametinib, etc.)
    |
    v
[Step 7: Final Ranking]
    Clean therapies first (sorted by evidence level)
    Flagged therapies after (resistance/contraindication)
    Assign rank 1..N
    Total therapy mappings: 80+

5.2 Evidence Level Classification¶

Level	Description	Source
A	FDA-approved companion diagnostic or indication	FDA label, NCCN Category 1
B	Well-powered clinical evidence	Phase 2/3 trials, ESMO MCBS
C	Case reports, small series	Published case reports
D	Preclinical or early clinical	Phase 1, in vitro data
E	Computational prediction	In silico, pathway inference

6. Trial Matcher Engine¶

6.1 Architecture¶

Patient Profile (cancer_type, biomarkers, stage, age)
    |
    v
[Step 1: Deterministic Filter]
    |   Cancer type (fuzzy via 18+ alias groups)
    |   Open statuses: Recruiting, Active, Enrolling by invitation
    |   Milvus filter expressions
    |
    v
[Step 2: Semantic Search]
    |   Embed eligibility query -> vector similarity search
    |
    v
[Step 3: Merge and Deduplicate]
    |   Union by trial_id, keep best score
    |
    v
[Step 4: Composite Scoring]
    |   biomarker_match (0.40) + semantic_score (0.25)
    |   + phase_weight (0.20) + status_weight (0.15)
    |   * age_penalty (1.0 or 0.5)
    |
    v
[Step 5: Explanation Generation]
    Ranked trial list with match rationale

6.2 Cancer Type Alias Resolution¶

The trial matcher resolves 70+ cancer type aliases to 26 canonical types to improve deterministic matching. Examples:

"NSCLC" -> "non-small cell lung cancer"
"CRC" -> "colorectal cancer"
"TNBC" -> "triple-negative breast cancer"
"HCC" -> "hepatocellular carcinoma"
"ccRCC" -> "clear cell renal cell carcinoma"

7. Resistance Analysis Engine¶

7.1 Resistance Map (12+ Mechanisms)¶

The knowledge graph documents resistance mechanisms for major targeted therapies:

Drug Class	Primary Resistance	Mechanism	Next-Line
EGFR TKI (1st/2nd gen)	EGFR T790M	Gatekeeper mutation	Osimertinib
EGFR TKI (3rd gen)	EGFR C797S	Binding site mutation	Combination strategies
BRAF inhibitor	MAPK reactivation	MEK bypass, NRAS mutation	BRAF+MEK combo
ALK TKI (crizotinib)	ALK G1202R	Solvent front mutation	Lorlatinib
HER2 therapy	HER3 upregulation	Bypass signaling	HER3 antibodies
PARP inhibitor	BRCA reversion	Homologous recombination restored	Platinum rechallenge
Anti-PD-1/PD-L1	B2M loss	Antigen presentation defect	Combination IO
KRAS G12C inhibitor	KRAS amplification	On-target amplification	Combination strategies
RET inhibitor	RET V804M	Gatekeeper mutation	Next-gen RET inhibitors
MET inhibitor	MET D1228N	Kinase domain mutation	Combination or next-gen
PIK3CA inhibitor	PTEN loss	PI3K pathway reactivation	Combination strategies
CDK4/6 inhibitor	RB1 loss	Cell cycle bypass	Chemotherapy

7.2 Drug Class Grouping¶

Same-class resistance flagging groups drugs by mechanism of action, so that a patient who progressed on erlotinib is flagged for all first/second-generation EGFR TKIs, not just erlotinib.

8.1 Genomic-Imaging-Drug Discovery Integration¶

Genomics Pipeline (Stage 1)
    |-- VCF -> genomic_evidence collection
    |
    v
Oncology Agent (Stage 2.5)
    |-- Variant interpretation
    |-- Therapy ranking
    |-- Trial matching
    |
    v
Drug Discovery Pipeline (Stage 3)
    |-- Target validation
    |-- Lead compound optimization
    |-- DiffDock binding prediction

8.2 Cross-Agent Event Publishing¶

The agent publishes events for consumption by other HCLS AI Factory agents:

onco.variant.actionable -- New actionable variant identified
onco.therapy.recommended -- Therapy recommendation generated
onco.trial.matched -- Patient-trial match found
onco.resistance.detected -- Resistance mechanism identified
onco.case.created -- New case snapshot persisted

9. Query Expansion¶

9.1 Expansion Pipeline¶

Raw Query: "What targeted therapy options exist for EGFR exon 19 deletion in NSCLC?"
    |
    v
Step 1: Entity Extraction
  - Genes: [EGFR]
  - Variants: [exon 19 deletion]
  - Cancer types: [NSCLC]
  - Topics: [targeted therapy]
    |
    v
Step 2: Synonym Expansion (12 categories)
  - "NSCLC" -> "non-small cell lung cancer, lung adenocarcinoma"
  - "EGFR exon 19 deletion" -> "del19, E746_A750del"
  - "targeted therapy" -> "TKI, tyrosine kinase inhibitor"
    |
    v
Step 3: Sub-Question Decomposition
  - Q1: "What are FDA-approved EGFR TKIs for exon 19 deletion?"
  - Q2: "What resistance mechanisms occur with EGFR-targeted therapy?"
  - Q3: "What clinical trials are open for EGFR-mutant NSCLC?"
    |
    v
Step 4: Strategy Selection
  - Strategy: "targeted" (specific gene/variant/cancer combination)
    |
    v
Output: SearchPlan

9.2 Expansion Categories (12)¶

Category	Example Input	Example Expansions
Cancer types	NSCLC	lung adenocarcinoma, EGFR-mutant lung
Genes	EGFR	L858R, exon 19 deletion, T790M, C797S
Therapies	osimertinib	Tagrisso, 3rd-gen EGFR TKI
Biomarkers	TMB	tumor mutational burden, mut/Mb
Pathways	MAPK	RAS-RAF-MEK-ERK, RTK signaling
Resistance	T790M	gatekeeper mutation, osimertinib
Clinical terms	PFS	progression-free survival, HR
Trial terms	Phase 3	randomized, pivotal, registration trial
Immunotherapy	checkpoint	PD-1, PD-L1, CTLA-4, pembrolizumab
Surgery/radiation	lobectomy	surgical resection, VATS
Toxicity	pneumonitis	ILD, interstitial lung disease
Genomics	ctDNA	circulating tumor DNA, liquid biopsy

10. RAG Pipeline¶

10.1 RAG Architecture¶

SearchPlan (from Query Expansion)
        |
  Embedding (BGE-small-en-v1.5, 384-dim)
        |
  Multi-Collection Search
  +------+------+------+------+------+------+
  |var   |lit   |ther  |gdl   |trial |biom  |
  |top5  |top5  |top5  |top5  |top5  |top5  |
  +------+------+------+------+------+------+
  |resist|path  |outc  |cases |gen   |
  |top5  |top5  |top5  |top5  |top5  |
  +------+------+------+------+------+
        |
  Score Filtering (threshold 0.30)
        |
  Weight Application (per-collection weights)
        |
  Deduplication (content hash)
        |
  Citation Scoring
  - High confidence: score >= 0.85
  - Medium confidence: score >= 0.65
  - Standard: score < 0.65
        |
  Knowledge Injection
  - Gene context (ACTIONABLE_TARGETS)
  - Therapy context (THERAPY_MAP)
  - Resistance context (RESISTANCE_MAP)
  - Pathway context (PATHWAY_MAP)
  - Biomarker context (BIOMARKER_PANELS)
        |
  LLM Synthesis (Claude Sonnet 4.6)
  - System: Oncology domain expert (MTB-ready)
  - User: Clinical question + retrieved context + knowledge
  - Instructions: Cite sources, follow NCCN/ESMO, flag resistance
        |
  Response Parsing
  - Extract inline citations
  - Map PubMed IDs and NCT IDs to links
  - Calculate confidence score

10.2 Embedding Configuration¶

Parameter	Value
Model	BAAI/bge-small-en-v1.5
Parameters	33M
Dimensions	384
Metric	COSINE
Index type	IVF_FLAT (nlist=1024, nprobe=16)
Batch size	32
Runtime	CPU (no GPU required)
Instruction prefix	"Represent this sentence for searching relevant passages: "
Search mode	Asymmetric (queries use instruction prefix, documents do not)

10.3 Comparative Retrieval¶

The engine detects comparative questions via regex (compare|vs|versus|difference between|head.to.head) and routes them to a dual-entity retrieval pipeline:

Parse entity A and entity B from the question
Retrieve evidence independently for each entity
Identify shared/head-to-head evidence (intersection by ID)
Build a structured comparison prompt with 8 comparison axes
Generate comparative synthesis via LLM

11. Agent Orchestrator¶

11.1 Orchestration Flow¶

The OncoIntelligenceAgent (src/agent.py, 553 lines) executes a 4-step plan-search-evaluate-synthesize loop:

class OncoIntelligenceAgent:
    def __init__(self):
        self.rag_engine = OncoRAGEngine()
        self.therapy_ranker = TherapyRanker()
        self.trial_matcher = TrialMatcher()
        self.case_manager = OncologyCaseManager()
        self.query_expander = QueryExpander()
        self.cross_modal = CrossModalIntegrator()
        self.export_system = OncologyExporter()

    async def process_query(self, query: OncoQuery) -> AgentResponse:
        # 1. Plan
        search_plan = self.query_expander.expand(query)

        # 2. Search
        evidence = await self.rag_engine.cross_collection_search(search_plan)

        # 3. Evaluate
        verdict = self._evaluate_evidence(evidence)
        if verdict == "insufficient" and retries < MAX_RETRIES:
            # Broaden and retry
            ...

        # 4. Synthesize
        answer = await self.rag_engine.synthesize(query, evidence, knowledge)

        return AgentResponse(
            answer=answer.text,
            evidence=evidence,
            knowledge_used=knowledge_domains,
            report=markdown_report,
            confidence=answer.confidence
        )

11.2 Error Handling Strategy¶

The orchestrator implements graceful degradation:

Milvus unavailable: Returns error with search-only mode recommendation
LLM unavailable: Returns search results without synthesis
Therapy ranker error: Logs warning, returns response without rankings
Trial matcher error: Logs warning, returns response without matches
Cross-modal error: Logs warning, returns response without triggers
Timeout: Returns partial results with timeout indicator

11.3 Key Statistics¶

Metric	Value
Python files	66
Total lines of code	~20,490
Milvus collections	11 (10 owned + 1 read-only shared)
Actionable gene targets	40+
Therapy mappings	80+
Resistance mechanisms	12+
Oncogenic pathways	10+
Biomarker panels	20+
Test files / cases	10 files, 556 test cases, all passing
Docker services	6
Export formats	4 (Markdown, JSON, PDF, FHIR R4)
Cancer types supported	26

12. Data Model Architecture¶

12.1 Model Hierarchy¶

                    AgentResponse (top-level output)
                    /        |         \           \
              answer    evidence    knowledge_used   report
                |           |            |              |
              str    List[Dict]    List[str]         str

                    TherapyRanking (from TherapyRanker)
                    /        |         \           \
         ranked_therapies  flagged  combinations  evidence_level
                |           |            |              |
         List[Therapy]  List[Flag]  List[Combo]     str

                    MTBPacket (from CaseManager)
                    /        |         \           \
           case_snapshot  variant_table  rankings  trial_matches
                |              |            |          |
         CaseSnapshot    List[Variant]  TherapyRanking TrialMatchResult

12.2 Enum Design¶

Enums enforce type safety and valid value sets:

CancerType (26 values): Maps to canonical cancer type names with 70+ aliases
EvidenceLevel (5 values): A through E mapping to AMP/ASCO/CAP tiers
VariantType (7 values): SNV, INDEL, CNV_AMP, CNV_DEL, FUSION, SPLICE, FRAMESHIFT
TherapyCategory (6 values): TARGETED, IMMUNOTHERAPY, CHEMO, HORMONAL, COMBINATION, OTHER
TrialPhase (5 values): Phase 1 through Phase 4 plus Early Phase 1
TrialStatus (5 values): Recruiting, Active, Enrolling, Completed, Terminated
ActionabilityTier (6 values): A, B, C, D, E, VUS
SearchStrategy (3 values): broad, targeted, comparative
ExportFormat (4 values): markdown, json, pdf, fhir

12.3 API Endpoints Summary¶

Endpoint	Method	Purpose
`/healthz`	GET	Health check
`/readyz`	GET	Readiness check (Milvus connection)
`/metrics`	GET	Prometheus metrics
`/v1/query`	POST	RAG query with optional streaming
`/v1/cases`	POST	Create case from VCF or structured input
`/v1/cases/{id}`	GET	Retrieve case by ID
`/v1/cases/{id}/mtb`	GET	Generate MTB packet
`/v1/therapy-rank`	POST	Rank therapies for a patient profile
`/v1/trial-match`	POST	Match patient to clinical trials
`/v1/compare`	POST	Comparative evidence retrieval
`/v1/export/markdown`	POST	Generate Markdown report
`/v1/export/json`	POST	Generate JSON export
`/v1/export/pdf`	POST	Generate PDF report
`/v1/export/fhir`	POST	Generate FHIR R4 bundle

12.4 Port Configuration¶

Service	Port	Protocol
Streamlit UI	8526	HTTP
FastAPI REST API	8527	HTTP
Milvus (shared)	19530	gRPC

Clinical Decision Support Disclaimer

The Precision Oncology Agent is a clinical decision support research tool for oncology. It is not FDA-cleared and is not intended as a standalone diagnostic device. All recommendations should be reviewed by qualified healthcare professionals. Apache 2.0 License.