From Genome to Safe Prescription: A Multi-Collection RAG Architecture for Clinical Pharmacogenomic Decision Support¶

Author: Adam Jones Date: March 2026 Version: 0.1.0 (Pre-Implementation) License: Apache 2.0

Part of the HCLS AI Factory -- an end-to-end precision medicine platform. https://github.com/ajones1923/hcls-ai-factory

Abstract¶

Adverse drug reactions (ADRs) are the fourth leading cause of death in the United States, responsible for approximately 106,000 deaths and 2.2 million hospitalizations annually, at a cost exceeding $136 billion per year. Yet an estimated 95-99% of patients carry at least one actionable pharmacogenomic (PGx) variant that would alter prescribing decisions -- and fewer than 1% of prescriptions in the U.S. are informed by genetic testing. This catastrophic gap between genomic knowledge and clinical practice persists because pharmacogenomic data is complex, rapidly evolving, fragmented across multiple guideline bodies (CPIC, DPWG, FDA, CPNDS), and difficult for non-specialist clinicians to interpret and act upon at the point of prescribing.

This paper presents the architectural design, clinical rationale, and product requirements for the Pharmacogenomics Intelligence Agent -- a clinical decision support system built on multi-collection retrieval-augmented generation (RAG) that transforms raw genomic data from the HCLS AI Factory's genomics pipeline (VCF output) into actionable, patient-specific prescribing guidance for over 400 drug-gene interactions across 25+ pharmacogenes. The agent will unify 14 specialized Milvus vector collections spanning pharmacogene reference data (star allele definitions, diplotype-to-phenotype mappings, activity scores), clinical guideline knowledge (CPIC Level A/B guidelines, DPWG recommendations, FDA Table of Pharmacogenomic Biomarkers), drug interaction intelligence (PharmGKB annotations, drug-drug-gene interactions, phenoconversion modeling), HLA-mediated hypersensitivity screening (12 HLA-drug associations with absolute contraindications), population pharmacokinetics (ethnicity-adjusted allele frequencies, dosing nomograms), clinical evidence (published PGx implementation studies, outcomes data), and the shared genomic evidence collection (3.5 million variants) -- enabling queries like "What are the prescribing implications of this patient's CYP2D6 4/41 genotype for their current medication list?" or "Does this patient carry any HLA alleles that contraindicate specific drugs before surgery?"

The system extends the proven multi-collection RAG architecture established by six existing intelligence agents in the HCLS AI Factory (Precision Biomarker, Precision Oncology, CAR-T, Imaging, Autoimmune, and Cardiology), adapting it with a genomic variant-to-drug mapping pipeline capable of processing whole-genome VCF files, star allele calling for CYP450 enzymes and transporters, diplotype-to-phenotype translation using CPIC standardized terms, multi-gene interaction modeling (e.g., CYP2C9 + VKORC1 for warfarin), phenoconversion detection (drug-induced CYP inhibition altering metabolizer status), HLA typing from NGS data for hypersensitivity screening, and real-time medication list cross-referencing against the patient's complete PGx profile. Eight reference clinical workflows will cover the highest-impact prescribing scenarios: pre-emptive PGx panel interpretation, opioid prescribing safety (CYP2D6/codeine/tramadol), anticoagulant optimization (CYP2C9/VKORC1/warfarin), antidepressant selection (CYP2D6/CYP2C19/SSRIs/TCAs), statin myopathy risk (SLCO1B1), chemotherapy toxicity prevention (DPYD/5-FU, TPMT/NUDT15/thiopurines), HLA-mediated hypersensitivity screening (abacavir/carbamazepine/allopurinol), and polypharmacy drug-drug-gene interaction resolution.

The agent will deploy on a single NVIDIA DGX Spark ($3,999) using BGE-small-en-v1.5 embeddings (384-dimensional, IVF_FLAT, COSINE), Claude Sonnet 4.6 for evidence synthesis, and shared NVIDIA NIM microservices for on-device inference. Licensed under Apache 2.0, the platform will democratize access to pharmacogenomic intelligence that currently requires multi-million-dollar institutional PGx implementation programs -- bringing the prescribing safety of world-class pharmacogenomics centers to any clinic, pharmacy, or emergency department worldwide.

Table of Contents¶

Introduction
The Pharmacogenomic Implementation Gap
Clinical Landscape and Market Analysis
Existing HCLS AI Factory Architecture
Pharmacogenomics Agent Architecture
Genomic Variant-to-Drug Mapping Pipeline
Milvus Collection Design
Clinical Workflows
Cross-Modal Integration and Genomic Correlation
NIM Integration Strategy
Knowledge Graph Design
Query Expansion and Retrieval Strategy
API and UI Design
Clinical Decision Support Engines
Reporting and Interoperability
Product Requirements Document
Data Acquisition Strategy
Validation and Testing Strategy
Regulatory Considerations
DGX Compute Progression
Implementation Roadmap
Risk Analysis
Competitive Landscape
Discussion
Conclusion
References

1. Introduction¶

1.1 The Adverse Drug Reaction Crisis¶

Adverse drug reactions represent one of the most significant -- and most preventable -- causes of morbidity and mortality in modern medicine. The scope of the problem is staggering:

Deaths: 106,000 Americans die annually from ADRs, making them the 4th leading cause of death (Lazarou et al., JAMA)
Hospitalizations: 2.2 million ADR-related hospitalizations per year in the U.S. alone
Cost: $136 billion annually in direct healthcare costs (more than cardiovascular disease or diabetes management)
Emergency visits: ADRs account for 27% of all emergency department drug-related visits
ICU admissions: 10-20% of ICU admissions are ADR-related
Global impact: The WHO estimates ADRs cause 197,000 deaths annually in the European Union

What makes this crisis particularly tragic is that a substantial proportion of ADRs are genetically predictable. Pharmacogenomic variants -- inherited differences in genes encoding drug-metabolizing enzymes, transporters, receptors, and immune molecules -- directly influence how individuals respond to medications. A patient who is a CYP2D6 poor metabolizer cannot activate codeine into morphine and will receive no pain relief. A patient who is a CYP2D6 ultra-rapid metabolizer will convert codeine too quickly, potentially causing fatal respiratory depression. A patient carrying HLA-B*57:01 who receives abacavir will develop a life-threatening hypersensitivity reaction in approximately 50% of cases. These are not rare edge cases -- they are common genetic variants present in significant proportions of the population.

1.2 The Promise of Pharmacogenomics¶

Pharmacogenomics -- the study of how genetic variation affects drug response -- has been one of the most successful translational applications of the Human Genome Project. Over the past two decades, the field has produced:

CPIC guidelines: 27 gene-drug guidelines covering 80+ drug-gene pairs with actionable prescribing recommendations, graded by evidence strength (Level A = strong evidence, Level B = moderate evidence)
FDA labeling: 450+ drugs carry pharmacogenomic information in their FDA-approved labeling, including 80+ with boxed warnings or contraindications based on genetic status
PharmGKB: Over 780 clinical annotations, 180 drug label annotations, 150 clinical guideline annotations, and 700+ variant annotations
DPWG (Dutch Pharmacogenetics Working Group): 120+ gene-drug therapeutic recommendations used across European healthcare systems
Economic evidence: Multiple studies demonstrate cost-effectiveness of PGx testing, with return on investment (ROI) of $4-$13 per $1 invested across various healthcare systems

The clinical genes with the strongest evidence include:

Gene	Function	Key Drugs Affected	Population Impact
CYP2D6	Phase I metabolism	Codeine, tramadol, tamoxifen, SSRIs, TCAs, antipsychotics	~7% poor metabolizers, ~5% ultra-rapid
CYP2C19	Phase I metabolism	Clopidogrel, PPIs, SSRIs, voriconazole	~2-15% poor metabolizers (varies by ethnicity)
CYP2C9	Phase I metabolism	Warfarin, phenytoin, NSAIDs	~1-3% poor metabolizers
VKORC1	Warfarin target	Warfarin	~37% carry dose-reduction variant
SLCO1B1	Hepatic transporter	Statins (simvastatin, atorvastatin)	~15% carry myopathy risk variant
DPYD	Pyrimidine catabolism	5-FU, capecitabine	~3-5% carry partial deficiency
TPMT/NUDT15	Thiopurine metabolism	Azathioprine, 6-MP, thioguanine	~10% intermediate, ~0.3% poor
HLA-B*57:01	Immune presentation	Abacavir	~6-8% in Caucasians
HLA-B*58:01	Immune presentation	Allopurinol	~1-6% (varies by ethnicity)
HLA-A*31:01	Immune presentation	Carbamazepine	~2-5% in Caucasians
HLA-B*15:02	Immune presentation	Carbamazepine, phenytoin	~8% in Southeast Asian populations
UGT1A1	Phase II conjugation	Irinotecan, atazanavir	~10% poor metabolizers
G6PD	Redox protection	Rasburicase, primaquine, dapsone	~8% globally (up to 25% in some populations)
CYP3A5	Phase I metabolism	Tacrolimus	~70-90% non-expressers (varies by ethnicity)
IFNL3 (IL28B)	Immune response	PEG-IFN/ribavirin	~70% favorable genotype in Europeans

1.3 Why Pharmacogenomics Remains Unused¶

Despite overwhelming evidence, PGx adoption remains dismally low:

<1% of prescriptions in the U.S. are guided by pharmacogenomic testing
Only 15% of physicians report feeling comfortable interpreting PGx results
42% of physicians surveyed stated they had never ordered a PGx test
88% of medical schools provide fewer than 8 hours of PGx education across 4 years
<5% of health systems have integrated PGx into their electronic health records

The barriers are well-characterized:

Knowledge gap: Clinicians lack training in PGx interpretation. A CYP2D6 4/41 diplotype means nothing to most prescribers.
Complexity: Star allele nomenclature, activity scores, diplotype-to-phenotype translations, gene-drug-drug interactions, and phenoconversion are intellectually demanding even for specialists.
Fragmentation: CPIC, DPWG, FDA, and institutional guidelines sometimes disagree. Clinicians don't know which to follow.
EHR integration: Most EHRs cannot store, display, or trigger alerts from structured PGx data. Results are often returned as PDFs that sit unread in the chart.
Point-of-care timing: PGx results must be available at the moment of prescribing, not days later from a reference lab.
Multi-gene complexity: Real patients have variants in multiple PGx genes simultaneously. A patient on warfarin needs CYP2C9 + VKORC1 + CYP4F2 interpreted together. A patient on psychiatric medications may need CYP2D6 + CYP2C19 + CYP1A2 + CYP3A4.
Population diversity: Allele frequencies vary dramatically across ethnic groups. The most common CYP2D6 poor metabolizer allele in Europeans (4) is rare in East Asians, where 10 predominates.

1.4 The HCLS AI Factory Opportunity¶

The HCLS AI Factory's existing genomics pipeline already generates the raw data needed for pharmacogenomic analysis. Every patient genome processed through the pipeline produces a VCF file containing all pharmacogenomic variants -- but this data is not currently translated into prescribing guidance. The Pharmacogenomics Intelligence Agent closes this gap by:

Extracting PGx variants from VCF files produced by the genomics pipeline (Parabricks/DeepVariant)
Calling star alleles using standardized nomenclature (PharmVar database)
Translating diplotypes to phenotypes using CPIC activity score algorithms
Cross-referencing the patient's PGx profile against their current and potential medication list
Generating actionable prescribing recommendations grounded in CPIC/DPWG guidelines
Storing results in persistent Milvus collections for lifetime re-querying as new drugs are prescribed

The agent doesn't just report what variants a patient carries -- it answers the question every prescriber actually needs answered: "Is this drug safe for this patient, and if not, what should I prescribe instead?"

2. The Pharmacogenomic Implementation Gap¶

2.1 Preventable Deaths: The Scale of the Problem¶

To understand why the Pharmacogenomics Intelligence Agent matters, consider these real-world scenarios that occur daily in hospitals and clinics worldwide:

Scenario 1: Codeine and CYP2D6 A 3-year-old child undergoes tonsillectomy. Prescribed codeine for post-operative pain. The child is a CYP2D6 ultra-rapid metabolizer (alleles 1/1xN), converting codeine to morphine at 4-8x the normal rate. The child develops fatal respiratory depression and dies. This exact scenario led to an FDA Black Box Warning in 2013 -- yet codeine is still prescribed to children without CYP2D6 testing.

Scenario 2: Clopidogrel and CYP2C19 A 58-year-old man receives a coronary stent and is prescribed clopidogrel (Plavix) as standard antiplatelet therapy. He is a CYP2C19 poor metabolizer (2/2), producing functionally no active metabolite. One month later, he suffers stent thrombosis and dies. An alternative antiplatelet (prasugrel or ticagrelor) -- not dependent on CYP2C19 -- would have prevented this death. The FDA label has carried a boxed warning about CYP2C19 since 2010.

Scenario 3: Abacavir and HLA-B*57:01 A 34-year-old woman newly diagnosed with HIV is prescribed an abacavir-containing regimen. She is HLA-B57:01 positive. Two weeks later, she develops abacavir hypersensitivity syndrome -- fever, rash, malaise progressing to hypotension and organ failure. Before mandatory HLA-B57:01 screening (implemented ~2008), this reaction occurred in ~5% of abacavir-treated patients and was fatal in some cases. This is one of the few PGx tests with near-universal adoption, proving that when guidelines are clear and testing is mandated, PGx saves lives.

Scenario 4: 5-Fluorouracil and DPYD A 62-year-old woman with colon cancer begins adjuvant 5-FU chemotherapy. She carries DPYD*2A (splice site variant), rendering her unable to metabolize 5-FU. She develops grade 4 mucositis, pancytopenia, and sepsis. She dies from treatment toxicity, not from cancer. Pre-treatment DPYD testing with dose adjustment would have prevented this death. The European Medicines Agency now mandates DPYD testing before fluoropyrimidine therapy; the FDA does not.

2.2 The Implementation Gap by the Numbers¶

The disconnect between PGx knowledge and clinical practice can be quantified:

Metric	Current State	Ideal State	Gap
Prescriptions guided by PGx	<1%	30-50% (for PGx-relevant drugs)	30-50x
Health systems with PGx CDS	<5%	100%	20x
Time to PGx result availability	3-14 days (reference lab)	Pre-emptive (already in chart)	Paradigm shift
Physicians comfortable with PGx	15%	>80%	5x
Pharmacogenes routinely tested	1-2 (reactive)	12-25 (pre-emptive panel)	10x
PGx variants detected per genome	~0 (not analyzed)	15-50 actionable findings	Infinite gap
Drug-gene alerts at prescribing	Near zero	Every PGx-relevant prescription	Total gap

2.3 The Pre-emptive vs. Reactive Testing Paradigm¶

The field is shifting from reactive PGx testing (test one gene when prescribing one drug) to pre-emptive PGx testing (test a panel of pharmacogenes BEFORE any drug is needed, store results for life). This paradigm shift is essential because:

Genomic data is stable: Unlike lab values, a patient's PGx genotype never changes. Test once, use forever.
Reactive testing is too slow: When a patient needs pain medication in the ER, there is no time to wait 7 days for CYP2D6 results.
Pre-emptive panels are cost-effective: A $250-$500 multi-gene panel tested once replaces dozens of single-gene tests ($100-$300 each) over a lifetime.
EHR alerts require pre-existing data: Clinical decision support can only fire at the moment of prescribing if PGx results are already in the system.

The Pharmacogenomics Intelligence Agent is architected for the pre-emptive paradigm: it processes a patient's entire genome once, extracts all pharmacogenomic variants, and stores the complete PGx profile in persistent Milvus collections. Every subsequent prescribing query -- whether today or in 20 years -- can be answered instantly.

3. Clinical Landscape and Market Analysis¶

3.1 Market Size and Growth¶

The pharmacogenomics market is experiencing rapid growth driven by declining sequencing costs, regulatory mandates, and health system adoption:

Global PGx market (2025): $4.1 billion
Projected (2030): $11.2 billion (CAGR 22.1%)
PGx testing volume growth: 30-40% annually
DTC PGx testing: $800 million (2025), growing 25% annually
PGx clinical decision support software: $420 million (2025), growing 35% annually

3.2 Key Market Drivers¶

Regulatory mandates: EMA requires DPYD testing before fluoropyrimidines (2020). FDA adding PGx to more drug labels annually. CMS considering PGx coverage expansion.
Health system initiatives: 80+ U.S. health systems have PGx implementation programs (IGNITE Network, CPIC institutions). Mayo Clinic, St. Jude, Vanderbilt, University of Florida leading adoption.
Payer coverage: UnitedHealthcare, Cigna, and Aetna now cover multi-gene PGx panels for select indications. Medicare MAC coverage expanding.
Pharmacist-led models: Clinical pharmacists are driving PGx adoption through medication therapy management programs.
Legal liability: Failure to test PGx before prescribing drugs with known genetic interactions is becoming a malpractice concern (Marchetti v. United States, 2022).

3.3 Competitive Landscape Overview¶

Company	Product	Approach	Limitations
OneOme	RightMed	Panel test + CDS portal	No genomics integration, proprietary
Myriad Genetics	GeneSight	Psychiatric PGx panel	Narrow scope (psych only), criticized evidence base
Invitae/Tempus	PGx panels	Testing + basic CDS	No RAG, limited multi-gene interaction
Clinical Pharmacogenomics Implementation Consortium	CPIC guidelines	Gold-standard guidelines (free)	Text-based, no CDS integration
Translational Software	PGx CDS	Standalone CDS engine	No genomic pipeline integration
Color Health	PGx program	Employer-based testing	Limited clinical depth

Gap our agent fills: No existing product combines (1) whole-genome pharmacogenomic extraction from a genomics pipeline, (2) multi-collection RAG over CPIC/DPWG/FDA guidelines, (3) multi-gene interaction modeling, (4) phenoconversion detection, (5) HLA-mediated hypersensitivity screening, and (6) natural language clinical queries -- all running on a $3,999 local device with no cloud data exposure.

4. Existing HCLS AI Factory Architecture¶

4.1 Three-Stage Pipeline¶

The HCLS AI Factory is an end-to-end precision medicine platform deployed on NVIDIA DGX Spark. Its three-stage pipeline provides the foundational infrastructure that the Pharmacogenomics Intelligence Agent builds upon:

Stage 1: Genomics Pipeline (genomics-pipeline/) - Input: FASTQ (raw sequencing data) or pre-aligned BAM - Processing: BWA-MEM2 alignment → Parabricks/DeepVariant variant calling - Output: Annotated VCF files with 11.7 million variants per genome - Relevance to PGx: VCF output contains all pharmacogenomic variants but they are not currently extracted, interpreted, or translated to prescribing guidance

Stage 2: Precision Intelligence Network (rag-chat-pipeline/) - Milvus vector database (19530) with BGE-small-en-v1.5 embeddings - Claude Sonnet 4.6 for evidence synthesis - 3.5 million searchable genomic variant vectors in shared genomic_evidence collection - Multi-collection architecture proven across 11 agents

Stage 3: Therapeutic Discovery Engine (drug-discovery-pipeline/) - BioNeMo MolMIM for molecular generation - DiffDock for binding pose prediction - RDKit for ADMET property calculation - Relevance to PGx: Drug metabolism predictions complement PGx by modeling how genetic variants affect drug pharmacokinetics at the molecular level

4.2 Existing Intelligence Agents¶

Six intelligence agents currently operate within the HCLS AI Factory, each demonstrating the multi-collection RAG architecture that the Pharmacogenomics Agent will extend:

Agent	Collections	Port (UI/API)	Key Capability
Precision Biomarker	13 + shared	8502/8102	Biomarker interpretation, includes existing PGx module (71K lines)
Precision Oncology	10 + shared	8503/8103	Tumor genomics, therapy selection
CAR-T Intelligence	10 + shared	8504/8104	CAR-T construct design, manufacturing
Imaging Intelligence	10 + shared	8505/8105	Medical imaging AI, radiology CDS
Precision Autoimmune	14 + shared	8506/8106	Diagnostic odyssey acceleration, clinical document intelligence
Cardiology Intelligence	12 + shared	8526/8527	Cardiac risk, ECG/imaging interpretation

The Pharmacogenomics Intelligence Agent will be assigned ports 8507 (Streamlit UI) and 8107 (FastAPI API).

4.3 Relationship to Existing Biomarker Agent PGx Module¶

The Precision Biomarker Agent already contains a substantial pharmacogenomics module (src/pharmacogenomics.py, 71,246 bytes) implementing CPIC guidelines for 13 genes. The Pharmacogenomics Intelligence Agent is NOT a replacement -- it is a dedicated, deep-dive expansion that provides:

Capability	Biomarker Agent PGx Module	Pharmacogenomics Intelligence Agent
Genes covered	13 (CPIC Level 1A)	25+ (CPIC Level A, B, and C + DPWG + FDA)
Drug-gene pairs	~80	400+
Multi-gene interactions	None	Full modeling (e.g., CYP2C9+VKORC1+CYP4F2 for warfarin)
Phenoconversion	None	Drug-drug-gene interaction detection
HLA screening	3 alleles	12+ alleles with population-specific frequencies
Star allele calling	Pre-computed lookup	VCF-to-star-allele pipeline
Population adjustments	None	Ethnicity-adjusted allele frequencies and dosing
Milvus collections	1 (pgx_rules)	14 specialized collections
Clinical workflows	1 (basic PGx query)	8 comprehensive workflows
Dosing calculators	None	Warfarin dosing algorithm, tacrolimus dosing, 5-FU dose adjustment

The two agents complement each other: the Biomarker Agent provides quick PGx screening as part of a broader biomarker analysis, while the Pharmacogenomics Agent provides deep clinical pharmacogenomic consultation when detailed PGx guidance is needed.

5. Pharmacogenomics Agent Architecture¶

5.1 System Design¶

┌─────────────────────────────────────────────────────────┐
│                    PHARMACOGENOMICS                       │
│                  INTELLIGENCE AGENT                       │
│                                                           │
│  ┌──────────┐  ┌──────────────┐  ┌────────────────────┐ │
│  │ Streamlit │  │ FastAPI      │  │ VCF-to-PGx         │ │
│  │ UI :8507  │  │ API :8107    │  │ Pipeline           │ │
│  └────┬─────┘  └──────┬───────┘  └────────┬───────────┘ │
│       │               │                    │             │
│  ┌────▼───────────────▼────────────────────▼──────────┐ │
│  │              PGx Intelligence Core                   │ │
│  │                                                       │ │
│  │  ┌─────────────┐ ┌──────────────┐ ┌──────────────┐  │ │
│  │  │ Star Allele  │ │ Phenotype    │ │ Drug-Gene    │  │ │
│  │  │ Caller       │ │ Translator   │ │ Matcher      │  │ │
│  │  └──────┬──────┘ └──────┬───────┘ └──────┬───────┘  │ │
│  │         │               │                │           │ │
│  │  ┌──────▼──────┐ ┌──────▼───────┐ ┌──────▼───────┐  │ │
│  │  │ Multi-Gene   │ │ Phenoconv.   │ │ HLA          │  │ │
│  │  │ Interaction  │ │ Detector     │ │ Screener     │  │ │
│  │  └──────┬──────┘ └──────┬───────┘ └──────┬───────┘  │ │
│  │         └───────────────┼────────────────┘           │ │
│  │                         │                             │ │
│  │  ┌──────────────────────▼────────────────────────┐   │ │
│  │  │         Multi-Collection RAG Engine            │   │ │
│  │  │    (14 PGx collections + shared genomic)       │   │ │
│  │  └──────────────────────┬────────────────────────┘   │ │
│  │                         │                             │ │
│  │  ┌──────────────────────▼────────────────────────┐   │ │
│  │  │      Claude Sonnet 4.6 Synthesis Engine        │   │ │
│  │  │  (Grounded PGx recommendations with evidence)  │   │ │
│  │  └────────────────────────────────────────────────┘   │ │
│  └───────────────────────────────────────────────────────┘ │
│                                                           │
│  ┌───────────────────────────────────────────────────────┐ │
│  │                  Milvus (19530)                        │ │
│  │  14 PGx collections + shared genomic_evidence         │ │
│  └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

5.2 Core Processing Modules¶

The agent contains six core processing modules:

1. Star Allele Caller Extracts pharmacogenomic variants from VCF files and resolves them to star allele nomenclature. Uses PharmVar-defined haplotype tables for each gene. Handles: - SNV-based star alleles (e.g., CYP2D6 4 = 1846G>A) - Structural variants (CYP2D6 gene deletion 5, gene duplication 1xN, 2xN) - Hybrid alleles (CYP2D6/CYP2D7 hybrids like 36, 13) - Suballeles (CYP2D6 4.001 vs. 4.002)

2. Phenotype Translator Converts diplotypes to standardized phenotype terms using CPIC's activity score system: - CYP enzymes: Ultra-rapid, Rapid, Normal, Intermediate, Poor Metabolizer - Transporters: Increased, Normal, Decreased, Poor Function - HLA: Positive (carrier) vs. Negative - Enzyme deficiency: Normal, Intermediate, Deficient (G6PD, DPYD, TPMT)

3. Drug-Gene Matcher Cross-references the patient's phenotype profile against their current medication list. For each drug-gene interaction, returns: - CPIC recommendation level (A, B, C, D) - Clinical action (standard dosing, dose adjustment, alternative drug, avoid, contraindicated) - Alert level (INFO, WARNING, CRITICAL) - Alternative drug suggestions with rationale

4. Multi-Gene Interaction Engine Models complex scenarios where multiple pharmacogenes affect a single drug or therapeutic area: - Warfarin: CYP2C9 + VKORC1 + CYP4F2 → personalized dose calculator - Psychiatric medications: CYP2D6 + CYP2C19 + CYP1A2 + CYP3A4 → comprehensive metabolizer profile - Immunosuppressants: CYP3A5 + CYP3A4 + ABCB1 → tacrolimus dosing

5. Phenoconversion Detector Identifies when a patient's medication list contains CYP inhibitors or inducers that change their effective metabolizer status: - Example: A CYP2D6 normal metabolizer taking fluoxetine (strong CYP2D6 inhibitor) is phenoconverted to a CYP2D6 poor metabolizer. If codeine is then prescribed, it will be ineffective despite a "normal" genotype. - Example: A CYP3A4 normal metabolizer taking carbamazepine (strong CYP3A4 inducer) metabolizes cyclosporine too rapidly, potentially causing organ rejection.

6. HLA Screener Screens for HLA alleles associated with drug hypersensitivity reactions:

HLA Allele	Drug	Reaction	Risk if Positive	CPIC Level
HLA-B*57:01	Abacavir	Hypersensitivity syndrome	~50%	A (mandatory screening)
HLA-B*58:01	Allopurinol	SJS/TEN	OR 80-580	A
HLA-B*15:02	Carbamazepine	SJS/TEN	OR 1,357	A
HLA-A*31:01	Carbamazepine	DRESS/maculopapular	OR 25.9	A
HLA-B*15:02	Phenytoin	SJS/TEN	OR 6.7	A
HLA-B*15:02	Oxcarbazepine	SJS/TEN	OR 27.9	B
HLA-B*13:01	Dapsone	Hypersensitivity	OR 20.5	B
HLA-A*33:03	Ticlopidine	Hepatotoxicity	OR 13	B
HLA-DRB1*07:01	Lapatinib	Hepatotoxicity	OR 2.5	B
HLA-B*35:01	Minocycline	DRESS	Under study	C
HLA-B*38:02	Sulfasalazine	DRESS	Under study	C
HLA-DPB1*03:01	Aspirin	AERD	Under study	C

6. Genomic Variant-to-Drug Mapping Pipeline¶

6.1 Pipeline Overview¶

VCF File (11.7M variants)
    │
    ▼
┌──────────────────────────┐
│ 1. PGx Variant Extraction │
│    Filter to ~2,500 PGx   │
│    relevant positions      │
└─────────┬────────────────┘
          │
          ▼
┌──────────────────────────┐
│ 2. Star Allele Resolution │
│    PharmVar haplotype      │
│    tables for each gene    │
└─────────┬────────────────┘
          │
          ▼
┌──────────────────────────┐
│ 3. Diplotype Assembly     │
│    Phase-aware diplotype   │
│    determination           │
└─────────┬────────────────┘
          │
          ▼
┌──────────────────────────┐
│ 4. Phenotype Translation  │
│    CPIC activity scores    │
│    → standardized terms    │
└─────────┬────────────────┘
          │
          ▼
┌──────────────────────────┐
│ 5. Drug Matching          │
│    Cross-reference med     │
│    list → recommendations  │
└─────────┬────────────────┘
          │
          ▼
┌──────────────────────────┐
│ 6. Report Generation      │
│    Clinical action items   │
│    with evidence grading   │
└──────────────────────────┘

6.2 Variant Extraction Detail¶

From a typical whole-genome VCF with 11.7 million variants, the pipeline extracts approximately 2,500 pharmacogenomically relevant positions across 25+ genes. Extraction uses a curated BED file of PGx-relevant coordinates derived from:

PharmVar database (all defined star allele positions)
CPIC gene-specific tables (defining variants for each guideline)
ClinVar PGx-classified variants
PharmGKB high-confidence variant annotations

6.3 Star Allele Calling Algorithm¶

Star allele calling is the most complex step and varies by gene:

Simple genes (SNV-only): CYP2C19, CYP2C9, VKORC1, DPYD, TPMT, NUDT15 - Haplotype matching against PharmVar-defined allele tables - Each star allele defined by 1-5 SNVs - Diplotype by standard phasing (statistical or read-backed)

Complex genes (SNV + structural): CYP2D6 - CYP2D6 is the most pharmacogenomically important and most difficult gene to genotype - Requires: SNV calling, copy number determination (gene deletions 5, duplications 1xN, *2xN), hybrid allele detection (CYP2D6/CYP2D7 fusions) - The agent implements a tiered approach: - Tier 1: SNV-based star allele assignment from VCF - Tier 2: Read depth analysis for copy number (requires BAM) - Tier 3: Hybrid allele detection (requires specialized alignment) - When structural variant data is unavailable, the agent flags this limitation and provides recommendations based on SNV data alone with appropriate caveats

HLA genes: HLA-A, HLA-B, HLA-DRB1, HLA-DPB1 - HLA typing from WGS data using established tools (OptiType, HLA-HD) - Four-digit resolution sufficient for most PGx associations - Population-specific frequency annotations

6.4 Activity Score System¶

CPIC uses activity scores to standardize phenotype assignment across genes. Example for CYP2D6:

Allele	Activity Score	Classification
*1 (normal function)	1.0	Functional
*2 (normal function)	1.0	Functional
*9 (decreased function)	0.5	Reduced
*10 (decreased function)	0.25	Reduced
*17 (decreased function)	0.5	Reduced
*41 (decreased function)	0.5	Reduced
*4 (no function)	0	Non-functional
*5 (gene deletion)	0	Non-functional
*6 (no function)	0	Non-functional

Diplotype to Phenotype: - Activity Score ≥2.25: Ultra-rapid Metabolizer (UM) - Activity Score 1.25-2.25: Normal Metabolizer (NM) - Activity Score 0.25-1.25: Intermediate Metabolizer (IM) - Activity Score 0: Poor Metabolizer (PM)

Example: CYP2D6 4/41 → Activity Score 0 + 0.5 = 0.5 → Intermediate Metabolizer

7. Milvus Collection Design¶

7.1 Collection Overview¶

The Pharmacogenomics Intelligence Agent maintains 14 specialized Milvus collections plus access to the shared genomic evidence collection:

#	Collection Name	Record Estimate	Purpose
1	`pgx_gene_reference`	~5,000	Star allele definitions, activity scores, allele frequencies
2	`pgx_drug_guidelines`	~8,000	CPIC/DPWG/FDA guideline recommendations
3	`pgx_drug_interactions`	~12,000	Drug-gene interaction annotations (PharmGKB)
4	`pgx_hla_hypersensitivity`	~2,000	HLA-drug hypersensitivity associations
5	`pgx_phenoconversion`	~3,000	CYP inhibitor/inducer drug interactions
6	`pgx_dosing_algorithms`	~1,500	Population PK models, dosing nomograms
7	`pgx_clinical_evidence`	~15,000	Published PGx implementation studies, outcomes
8	`pgx_population_data`	~4,000	Ethnicity-specific allele frequencies, dosing adjustments
9	`pgx_clinical_trials`	~6,000	PGx-related clinical trial data
10	`pgx_fda_labels`	~3,500	FDA pharmacogenomic labeling information
11	`pgx_drug_alternatives`	~5,000	Alternative drug recommendations by metabolizer status
12	`pgx_patient_profiles`	Variable	Patient-specific PGx profiles (per-patient)
13	`pgx_implementation`	~4,000	Health system PGx implementation protocols
14	`pgx_education`	~2,500	Clinician-facing PGx education materials
S	`genomic_evidence`	3,500,000	Shared genomic variant evidence (read-only)

7.2 Collection Schemas¶

Collection 1: pgx_gene_reference Core pharmacogene reference data including star allele definitions and allele function.

FieldSchema(name="id", dtype=VARCHAR, is_primary=True, max_length=100)
FieldSchema(name="embedding", dtype=FLOAT_VECTOR, dim=384)
FieldSchema(name="gene", dtype=VARCHAR, max_length=20)          # CYP2D6, SLCO1B1, etc.
FieldSchema(name="star_allele", dtype=VARCHAR, max_length=30)    # *1, *4, *10, *41
FieldSchema(name="defining_variants", dtype=VARCHAR, max_length=500)  # rs numbers
FieldSchema(name="activity_score", dtype=FLOAT)                  # 0, 0.25, 0.5, 1.0
FieldSchema(name="function_status", dtype=VARCHAR, max_length=30) # Normal, Decreased, No function
FieldSchema(name="allele_frequency_global", dtype=FLOAT)
FieldSchema(name="allele_frequency_european", dtype=FLOAT)
FieldSchema(name="allele_frequency_african", dtype=FLOAT)
FieldSchema(name="allele_frequency_east_asian", dtype=FLOAT)
FieldSchema(name="allele_frequency_south_asian", dtype=FLOAT)
FieldSchema(name="allele_frequency_latino", dtype=FLOAT)
FieldSchema(name="pharmvar_id", dtype=VARCHAR, max_length=30)
FieldSchema(name="text_chunk", dtype=VARCHAR, max_length=3000)
FieldSchema(name="source", dtype=VARCHAR, max_length=100)

Collection 2: pgx_drug_guidelines CPIC, DPWG, and FDA guideline recommendations for drug-gene pairs.

FieldSchema(name="id", dtype=VARCHAR, is_primary=True, max_length=100)
FieldSchema(name="embedding", dtype=FLOAT_VECTOR, dim=384)
FieldSchema(name="gene", dtype=VARCHAR, max_length=20)
FieldSchema(name="drug", dtype=VARCHAR, max_length=50)
FieldSchema(name="phenotype", dtype=VARCHAR, max_length=40)      # Poor Metabolizer, etc.
FieldSchema(name="guideline_body", dtype=VARCHAR, max_length=10) # CPIC, DPWG, FDA
FieldSchema(name="cpic_level", dtype=VARCHAR, max_length=5)      # A, A/B, B, C, D
FieldSchema(name="recommendation", dtype=VARCHAR, max_length=1000)
FieldSchema(name="clinical_action", dtype=VARCHAR, max_length=30) # STANDARD, DOSE_ADJUST, AVOID, etc.
FieldSchema(name="alert_level", dtype=VARCHAR, max_length=10)    # INFO, WARNING, CRITICAL
FieldSchema(name="alternative_drugs", dtype=VARCHAR, max_length=500)
FieldSchema(name="dose_adjustment", dtype=VARCHAR, max_length=200)
FieldSchema(name="evidence_pmids", dtype=VARCHAR, max_length=300)
FieldSchema(name="guideline_version", dtype=VARCHAR, max_length=20)
FieldSchema(name="last_updated", dtype=VARCHAR, max_length=10)
FieldSchema(name="text_chunk", dtype=VARCHAR, max_length=3000)

Collection 3: pgx_drug_interactions Comprehensive drug-gene interaction annotations from PharmGKB.

FieldSchema(name="id", dtype=VARCHAR, is_primary=True, max_length=100)
FieldSchema(name="embedding", dtype=FLOAT_VECTOR, dim=384)
FieldSchema(name="drug", dtype=VARCHAR, max_length=50)
FieldSchema(name="gene", dtype=VARCHAR, max_length=20)
FieldSchema(name="variant_rsid", dtype=VARCHAR, max_length=20)
FieldSchema(name="interaction_type", dtype=VARCHAR, max_length=30) # PK, PD, efficacy, toxicity
FieldSchema(name="effect_description", dtype=VARCHAR, max_length=500)
FieldSchema(name="evidence_level", dtype=VARCHAR, max_length=5)  # 1A, 1B, 2A, 2B, 3, 4
FieldSchema(name="clinical_significance", dtype=VARCHAR, max_length=20) # Actionable, Informative
FieldSchema(name="pharmgkb_id", dtype=VARCHAR, max_length=30)
FieldSchema(name="affected_phenotype", dtype=VARCHAR, max_length=100)
FieldSchema(name="text_chunk", dtype=VARCHAR, max_length=3000)

Collection 4: pgx_hla_hypersensitivity HLA-mediated drug hypersensitivity associations with population-specific risks.

FieldSchema(name="id", dtype=VARCHAR, is_primary=True, max_length=100)
FieldSchema(name="embedding", dtype=FLOAT_VECTOR, dim=384)
FieldSchema(name="hla_allele", dtype=VARCHAR, max_length=20)
FieldSchema(name="drug", dtype=VARCHAR, max_length=50)
FieldSchema(name="reaction_type", dtype=VARCHAR, max_length=50)  # SJS/TEN, DRESS, HSR, hepatotoxicity
FieldSchema(name="risk_if_positive", dtype=VARCHAR, max_length=100) # OR, absolute risk %
FieldSchema(name="severity", dtype=VARCHAR, max_length=20)       # Life-threatening, Severe, Moderate
FieldSchema(name="cpic_level", dtype=VARCHAR, max_length=5)
FieldSchema(name="recommendation", dtype=VARCHAR, max_length=500)
FieldSchema(name="screening_mandatory", dtype=BOOL)
FieldSchema(name="prevalence_european", dtype=FLOAT)
FieldSchema(name="prevalence_african", dtype=FLOAT)
FieldSchema(name="prevalence_east_asian", dtype=FLOAT)
FieldSchema(name="prevalence_south_asian", dtype=FLOAT)
FieldSchema(name="prevalence_latino", dtype=FLOAT)
FieldSchema(name="alternative_drugs", dtype=VARCHAR, max_length=300)
FieldSchema(name="text_chunk", dtype=VARCHAR, max_length=3000)

Collection 5: pgx_phenoconversion Drug-drug-gene interactions where concomitant medications alter metabolizer phenotype.

FieldSchema(name="id", dtype=VARCHAR, is_primary=True, max_length=100)
FieldSchema(name="embedding", dtype=FLOAT_VECTOR, dim=384)
FieldSchema(name="affected_enzyme", dtype=VARCHAR, max_length=20)  # CYP2D6, CYP3A4, etc.
FieldSchema(name="precipitant_drug", dtype=VARCHAR, max_length=50) # The inhibitor or inducer
FieldSchema(name="interaction_type", dtype=VARCHAR, max_length=15) # Strong/Moderate/Weak inhibitor or inducer
FieldSchema(name="effect_on_phenotype", dtype=VARCHAR, max_length=100) # "Converts NM to PM"
FieldSchema(name="clinical_significance", dtype=VARCHAR, max_length=500)
FieldSchema(name="affected_substrate_drugs", dtype=VARCHAR, max_length=500)
FieldSchema(name="time_to_onset", dtype=VARCHAR, max_length=50)
FieldSchema(name="reversibility", dtype=VARCHAR, max_length=50)
FieldSchema(name="evidence_level", dtype=VARCHAR, max_length=5)
FieldSchema(name="text_chunk", dtype=VARCHAR, max_length=3000)

Collection 6: pgx_dosing_algorithms Pharmacokinetic models and dosing nomograms for dose-critical drugs.

FieldSchema(name="id", dtype=VARCHAR, is_primary=True, max_length=100)
FieldSchema(name="embedding", dtype=FLOAT_VECTOR, dim=384)
FieldSchema(name="drug", dtype=VARCHAR, max_length=50)
FieldSchema(name="genes_involved", dtype=VARCHAR, max_length=100)
FieldSchema(name="algorithm_name", dtype=VARCHAR, max_length=100) # e.g., "IWPC Warfarin Dosing"
FieldSchema(name="input_variables", dtype=VARCHAR, max_length=500)
FieldSchema(name="formula_description", dtype=VARCHAR, max_length=1000)
FieldSchema(name="validation_cohort", dtype=VARCHAR, max_length=200)
FieldSchema(name="accuracy_metrics", dtype=VARCHAR, max_length=200) # R², MAE, etc.
FieldSchema(name="clinical_context", dtype=VARCHAR, max_length=500)
FieldSchema(name="text_chunk", dtype=VARCHAR, max_length=3000)

Collections 7-14 follow similar patterns for clinical evidence, population data, clinical trials, FDA labels, drug alternatives, patient profiles, implementation protocols, and education materials. Each uses the standard 384-dimension BGE-small-en-v1.5 embedding with IVF_FLAT COSINE indexing.

7.3 Collection Search Weights¶

WEIGHT_GENE_REFERENCE = 0.10
WEIGHT_DRUG_GUIDELINES = 0.14      # Highest -- clinical guidelines are primary
WEIGHT_DRUG_INTERACTIONS = 0.12
WEIGHT_HLA_HYPERSENSITIVITY = 0.10
WEIGHT_PHENOCONVERSION = 0.08
WEIGHT_DOSING_ALGORITHMS = 0.07
WEIGHT_CLINICAL_EVIDENCE = 0.08
WEIGHT_POPULATION_DATA = 0.06
WEIGHT_CLINICAL_TRIALS = 0.04
WEIGHT_FDA_LABELS = 0.06
WEIGHT_DRUG_ALTERNATIVES = 0.05
WEIGHT_PATIENT_PROFILES = 0.03
WEIGHT_IMPLEMENTATION = 0.02
WEIGHT_EDUCATION = 0.02
WEIGHT_GENOMIC_EVIDENCE = 0.03

8. Clinical Workflows¶

8.1 Workflow 1: Pre-emptive PGx Panel Interpretation¶

Trigger: New patient genome processed through HCLS AI Factory genomics pipeline.

Process: 1. VCF file ingested by PGx variant extraction module 2. Star alleles called for all 25+ pharmacogenes 3. Diplotypes assembled and phenotypes translated 4. Complete PGx profile stored in pgx_patient_profiles collection 5. Profile cross-referenced against common medication classes 6. "PGx Passport" report generated with: - All metabolizer phenotypes (table format) - High-risk drug-gene interactions (CRITICAL alerts) - Pre-emptive recommendations for common drug classes - HLA hypersensitivity risks - Population-adjusted allele frequency context

Demo Query: "Generate a complete pharmacogenomic profile for this patient and identify any high-priority drug-gene interactions."

Output Example:

PHARMACOGENOMIC PROFILE SUMMARY

Gene        Diplotype    Phenotype              Key Implications
CYP2D6      *4/*41       Intermediate Met.      Reduced codeine activation, consider alternatives
CYP2C19     *1/*2        Intermediate Met.      Reduced clopidogrel activation -- use prasugrel
CYP2C9      *1/*3        Intermediate Met.      Reduced warfarin metabolism -- lower starting dose
VKORC1      -1639 G/A    Intermediate Sens.     Combined with CYP2C9 IM → warfarin ~3 mg/day
SLCO1B1     *1/*5        Intermediate Function  Simvastatin myopathy risk ↑ -- use pravastatin
DPYD        *1/*1        Normal                 Standard fluoropyrimidine dosing
TPMT        *1/*3A       Intermediate Met.      Reduce thiopurine dose 30-50%
HLA-B*57:01 Negative                            Abacavir safe
HLA-B*58:01 Negative                            Allopurinol safe
HLA-B*15:02 Negative                            Carbamazepine SJS risk low

CRITICAL ALERTS (2):
⚠ CYP2C19 *1/*2: AVOID clopidogrel -- use prasugrel or ticagrelor
⚠ CYP2D6 *4/*41: AVOID codeine/tramadol -- use morphine or oxycodone

8.2 Workflow 2: Opioid Prescribing Safety¶

Trigger: Opioid prescription initiated or planned for patient with PGx data.

Process: 1. Retrieve patient's CYP2D6 diplotype and phenotype 2. Check for CYP2D6 phenoconversion (concomitant inhibitors: fluoxetine, paroxetine, bupropion, duloxetine, terbinafine) 3. Determine effective phenotype (genetic + phenoconversion) 4. Map to opioid-specific recommendations: - Codeine: PM → avoid (no activation), UM → avoid (rapid activation, toxicity risk) - Tramadol: PM → avoid (no activation), UM → avoid (seizure/serotonin risk) - Hydrocodone: PM → reduced efficacy, UM → increased effect - Oxycodone: Minimal CYP2D6 dependence → safer alternative for PM/UM 5. Generate recommendation with alternative analgesics

Demo Query: "This patient needs post-surgical pain management. What opioids are safe given their CYP2D6 status?"

8.3 Workflow 3: Anticoagulant Optimization¶

Trigger: Warfarin initiation or dose adjustment for patient with PGx data.

Process: 1. Retrieve CYP2C9 + VKORC1 + CYP4F2 genotypes 2. Apply IWPC (International Warfarin Pharmacogenetics Consortium) dosing algorithm: - Inputs: CYP2C9 genotype, VKORC1 genotype, age, weight, height, race, amiodarone use, smoker status - Output: Predicted therapeutic dose (mg/day) 3. Cross-reference with current INR values and clinical context 4. Flag drug-drug interactions that affect warfarin metabolism 5. Generate personalized dose recommendation with evidence level

Dosing Algorithm (IWPC):

Predicted weekly dose (mg) =
    5.6044
    - 0.2614 × age (decades)
    + 0.0087 × height (cm)
    + 0.0128 × weight (kg)
    - 0.8677 × VKORC1 A/G
    - 1.6974 × VKORC1 A/A
    - 0.5211 × CYP2C9 *1/*2
    - 0.9357 × CYP2C9 *1/*3
    - 1.0616 × CYP2C9 *2/*2
    - 1.9206 × CYP2C9 *2/*3
    - 2.3312 × CYP2C9 *3/*3
    - 0.2188 × CYP4F2 *1/*3
    - 0.2760 × CYP4F2 *3/*3
    + 1.1816 × race (African American)
    - 0.1070 × race (Asian)
    - 0.2029 × amiodarone (yes)
    + 0.2107 × smoker (yes)

8.4 Workflow 4: Antidepressant Selection¶

Trigger: SSRI/SNRI/TCA initiation for depression or anxiety.

Process: 1. Retrieve CYP2D6 and CYP2C19 phenotypes (primary metabolizers for most antidepressants) 2. Check CYP1A2 status if fluvoxamine or clozapine considered 3. Map each candidate antidepressant to patient's metabolizer profile: - SSRIs metabolized primarily by CYP2D6: fluoxetine, paroxetine, fluvoxamine - SSRIs metabolized primarily by CYP2C19: citalopram, escitalopram, sertraline - TCAs metabolized by both: amitriptyline, nortriptyline, imipramine 4. Flag drug-drug interactions with other psychiatric medications 5. Rank antidepressants by suitability for patient's genotype 6. Generate recommendation with dose adjustments if needed

Demo Query: "Patient is CYP2D6 poor metabolizer, CYP2C19 normal metabolizer. Which SSRI should I start for generalized anxiety?"

Expected Output: "Avoid paroxetine and fluoxetine (primarily CYP2D6 metabolized -- toxicity risk in PM). Sertraline or escitalopram (primarily CYP2C19 metabolized) are preferred. Standard dosing appropriate given CYP2C19 NM status. Note: fluoxetine is also a strong CYP2D6 inhibitor, which would compound the poor metabolizer phenotype."

8.5 Workflow 5: Statin Myopathy Risk Assessment¶

Trigger: Statin prescription initiated or patient reports muscle symptoms.

Process: 1. Retrieve SLCO1B1 genotype (rs4149056, T>C) 2. Assess statin-specific myopathy risk: - SLCO1B1 5/5 (CC): Simvastatin myopathy risk 18% (vs. 0.6% in 1/1) - SLCO1B1 1/5 (TC): Simvastatin myopathy risk 3% 3. Cross-reference with concurrent medications affecting statin levels 4. Recommend statin selection and dosing: - High risk: Avoid simvastatin >20 mg; consider pravastatin or rosuvastatin (not SLCO1B1 dependent) - Intermediate risk: Simvastatin ≤20 mg or alternative statin 5. If patient on existing statin with muscle symptoms, assess whether PGx explains the ADR

8.6 Workflow 6: Chemotherapy Toxicity Prevention¶

Trigger: Fluoropyrimidine (5-FU, capecitabine) or thiopurine (azathioprine, 6-MP) initiation.

Process: 1. Fluoropyrimidines (DPYD): - Retrieve DPYD genotype for 4 key variants: 2A (splice), 13 (missense), c.2846A>T, HapB3 - Activity score calculation: each variant assigned 0 (no function) or 0.5 (decreased function) - Dose recommendation: - Activity score 2.0 (NM): Full dose - Activity score 1.5 (IM): Reduce dose 50% - Activity score 1.0 (IM): Reduce dose 50%, consider therapeutic drug monitoring - Activity score 0.5: Strongly reduce dose or avoid - Activity score 0 (PM): AVOID fluoropyrimidines -- alternative regimen required

Thiopurines (TPMT + NUDT15):
Retrieve TPMT and NUDT15 diplotypes
Combined phenotype assessment (both genes contribute independently)
Dose recommendation per CPIC:
- Both NM: Full dose
- TPMT IM + NUDT15 NM: Reduce dose 30-50%
- TPMT PM or NUDT15 PM: Reduce dose 90% or avoid

8.7 Workflow 7: HLA-Mediated Hypersensitivity Screening¶

Trigger: Prescription of a drug with known HLA-mediated hypersensitivity risk.

Process: 1. Drug triggers HLA screening alert 2. Retrieve patient's HLA typing from genomic data 3. Cross-reference against HLA-drug hypersensitivity database 4. Generate risk assessment: - CONTRAINDICATED if positive for mandatory-screening alleles (HLA-B57:01/abacavir) - HIGH RISK with alternative recommendation if positive for strongly associated alleles - LOW RISK if negative for relevant alleles 5. Population-specific frequency context (e.g., HLA-B15:02 is rare in Europeans but common in Southeast Asians -- screening is ethnicity-guided for carbamazepine)

8.8 Workflow 8: Polypharmacy Drug-Drug-Gene Interaction Resolution¶

Trigger: Patient on 5+ medications, new drug being added, or comprehensive medication review.

Process: 1. Retrieve patient's complete PGx profile (all 25+ genes) 2. Cross-reference entire medication list against PGx profile 3. Identify drug-drug-gene interactions: - Drug A is a CYP2D6 inhibitor + Drug B is a CYP2D6 substrate + Patient is CYP2D6 IM → effective PM phenotype → Drug B toxicity risk - Drug C is a CYP3A4 inducer + Drug D is a CYP3A4 substrate → Drug D subtherapeutic levels 4. Model phenoconversion cascade effects 5. Prioritize interactions by clinical severity 6. Generate comprehensive medication safety report with actionable recommendations

Demo Query: "This patient is on 12 medications. Review the complete medication list against their PGx profile and identify all drug-drug-gene interactions."

9.1 VCF-to-Clinical Integration¶

The Pharmacogenomics Agent uniquely bridges the gap between genomic data and clinical action. Unlike other agents that operate primarily on text-based clinical knowledge, this agent directly processes structured genomic data (VCF format) and translates it to structured clinical recommendations.

Integration points with other HCLS AI Factory components:

Genomics Pipeline → PGx Agent: VCF output feeds directly into the PGx variant extraction module. Every genome processed through the pipeline automatically generates a PGx profile.
PGx Agent → Biomarker Agent: PGx findings inform biomarker interpretation (e.g., a CYP2D6 PM patient on tamoxifen will have subtherapeutic endoxifen levels, affecting breast cancer biomarker interpretation).
PGx Agent → Oncology Agent: Chemotherapy drug selection informed by DPYD, TPMT, NUDT15, UGT1A1 status.
PGx Agent → Autoimmune Agent: Biologic therapy metabolism (CYP3A4/5 for some biologics), thiopurine dosing for autoimmune conditions (azathioprine for lupus, IBD).
PGx Agent → Cardiology Agent: Warfarin dosing, clopidogrel selection, statin safety.

9.2 Temporal Considerations¶

Unlike most pharmacogenomic data (which is static -- genotype doesn't change), the PGx agent must handle temporal dynamics:

Phenoconversion is temporal: A patient's effective metabolizer status changes when CYP inhibitors/inducers are started or stopped.
Guidelines evolve: CPIC updates guidelines every 2-3 years. New drug-gene associations are published continuously.
Medication lists change: The same PGx profile produces different recommendations as drugs are added or removed.
Age-related changes: Some drug metabolism changes with age (reduced CYP activity in elderly), modifying PGx-based dose recommendations.

10. NIM Integration Strategy¶

10.1 On-Device Inference¶

The agent leverages NVIDIA NIM microservices for computationally intensive on-device tasks:

BGE-small-en-v1.5 NIM: Text embedding for all collection searches (384-dim, optimized for NVIDIA GPU)
Re-ranking NIM: Cross-encoder re-ranking of search results for improved retrieval accuracy
Genomics NIMs: Parabricks-based variant calling provides input VCF data

10.2 Cloud LLM Integration¶

Claude Sonnet 4.6 handles evidence synthesis and natural language response generation:

System prompt includes PGx-specific clinical reasoning framework
Structured output format for drug recommendations (gene, phenotype, drug, action, evidence level)
Citation grounding to CPIC/DPWG guideline versions
Uncertainty quantification for novel drug-gene combinations

Privacy architecture: Patient genomic data and PGx profiles remain on the local DGX Spark. Only anonymized query text and retrieved evidence snippets are sent to the cloud LLM. No patient identifiers, genotypes, or medication lists leave the local device.

11. Knowledge Graph Design¶

11.1 Core Entity Dictionaries¶

The PGx knowledge graph contains 8 core entity types:

1. PHARMACOGENES (25+ entries)

PHARMACOGENES = {
    "CYP2D6": {
        "full_name": "Cytochrome P450 2D6",
        "chromosome": "22q13.2",
        "function": "Phase I oxidative metabolism",
        "substrates_count": 80,
        "percent_drugs_metabolized": 25,
        "star_alleles_defined": 140,
        "key_variants": ["*1", "*2", "*3", "*4", "*5", "*6", "*9", "*10", "*17", "*41"],
        "structural_variation": True,
        "complexity_level": "Very High",
        "cpic_guidelines": ["codeine", "tramadol", "tamoxifen", "ondansetron", "SSRIs", "TCAs"],
    },
    "CYP2C19": { ... },
    "CYP2C9": { ... },
    # ... 22 more genes
}

2. METABOLIZER_PHENOTYPES

METABOLIZER_PHENOTYPES = {
    "Ultra-rapid Metabolizer": {
        "abbreviation": "UM",
        "clinical_meaning": "Metabolizes drug faster than normal. May need higher dose or different drug.",
        "risk": "Subtherapeutic drug levels, treatment failure (prodrugs: toxicity from rapid activation)",
    },
    "Normal Metabolizer": {
        "abbreviation": "NM",
        "clinical_meaning": "Standard drug metabolism. Standard dosing appropriate.",
        "risk": "None -- standard of care",
    },
    "Intermediate Metabolizer": {
        "abbreviation": "IM",
        "clinical_meaning": "Reduced drug metabolism. May need dose reduction.",
        "risk": "Elevated drug levels, increased ADR risk",
    },
    "Poor Metabolizer": {
        "abbreviation": "PM",
        "clinical_meaning": "Severely reduced or absent metabolism. Drug accumulates or prodrug fails to activate.",
        "risk": "Toxicity (active drugs) or treatment failure (prodrugs)",
    },
}

3. DRUG_CATEGORIES (12 therapeutic areas)

DRUG_CATEGORIES = {
    "opioids": ["codeine", "tramadol", "hydrocodone", "oxycodone", "morphine"],
    "anticoagulants": ["warfarin", "clopidogrel", "prasugrel", "ticagrelor"],
    "antidepressants": ["fluoxetine", "paroxetine", "sertraline", "citalopram", "escitalopram",
                        "amitriptyline", "nortriptyline", "imipramine", "venlafaxine", "duloxetine"],
    "antipsychotics": ["aripiprazole", "haloperidol", "risperidone", "clozapine"],
    "statins": ["simvastatin", "atorvastatin", "rosuvastatin", "pravastatin", "lovastatin"],
    "chemotherapy": ["5-fluorouracil", "capecitabine", "irinotecan", "tamoxifen",
                     "azathioprine", "6-mercaptopurine", "thioguanine", "cisplatin"],
    "anticonvulsants": ["carbamazepine", "oxcarbazepine", "phenytoin", "lamotrigine", "valproate"],
    "antivirals": ["abacavir", "efavirenz", "atazanavir"],
    "immunosuppressants": ["tacrolimus", "cyclosporine", "mycophenolate", "azathioprine"],
    "cardiovascular": ["metoprolol", "propranolol", "verapamil", "amiodarone"],
    "proton_pump_inhibitors": ["omeprazole", "pantoprazole", "lansoprazole"],
    "anti_gout": ["allopurinol", "febuxostat"],
}

4. CYP_INHIBITORS_INDUCERS

CYP_INHIBITORS = {
    "CYP2D6": {
        "strong": ["fluoxetine", "paroxetine", "bupropion", "quinidine", "terbinafine"],
        "moderate": ["duloxetine", "sertraline", "diphenhydramine", "abiraterone"],
        "weak": ["citalopram", "escitalopram", "amiodarone"],
    },
    "CYP3A4": {
        "strong": ["ketoconazole", "itraconazole", "clarithromycin", "ritonavir", "cobicistat"],
        "moderate": ["fluconazole", "erythromycin", "diltiazem", "verapamil", "grapefruit"],
        "weak": ["cimetidine"],
    },
    "CYP2C19": {
        "strong": ["fluoxetine", "fluvoxamine", "ticlopidine"],
        "moderate": ["omeprazole", "esomeprazole", "voriconazole"],
        "weak": ["cimetidine"],
    },
    "CYP1A2": {
        "strong": ["fluvoxamine", "ciprofloxacin", "enoxacin"],
        "moderate": ["oral contraceptives", "mexiletine"],
    },
}

CYP_INDUCERS = {
    "CYP3A4": {
        "strong": ["rifampin", "carbamazepine", "phenytoin", "St. John's wort", "phenobarbital"],
        "moderate": ["efavirenz", "bosentan", "modafinil"],
    },
    "CYP1A2": {
        "strong": ["smoking (tobacco)", "charcoal-grilled meats"],
        "moderate": ["omeprazole (high-dose)"],
    },
    "CYP2C19": {
        "strong": ["rifampin"],
        "moderate": ["carbamazepine", "efavirenz"],
    },
}

5-8. Additional dictionaries for POPULATION_ALLELE_FREQUENCIES, DRUG_ALTERNATIVE_MAPS, DOSING_PARAMETERS, and EVIDENCE_GRADING_CRITERIA follow similar structured patterns.

11.2 Query Expansion Maps¶

QUERY_EXPANSION = {
    "warfarin": ["coumadin", "anticoagulant", "INR", "blood thinner", "CYP2C9", "VKORC1",
                 "vitamin K antagonist", "bleeding risk", "dose adjustment"],
    "codeine": ["opioid", "pain", "CYP2D6", "morphine", "prodrug", "ultra-rapid",
                "poor metabolizer", "respiratory depression", "analgesic"],
    "clopidogrel": ["plavix", "antiplatelet", "stent", "CYP2C19", "prasugrel", "ticagrelor",
                    "stent thrombosis", "ACS", "PCI"],
    "statin": ["simvastatin", "atorvastatin", "SLCO1B1", "myopathy", "rhabdomyolysis",
               "cholesterol", "pravastatin", "rosuvastatin", "muscle pain"],
    "tamoxifen": ["breast cancer", "CYP2D6", "endoxifen", "ER-positive", "SERM",
                  "aromatase inhibitor", "poor metabolizer"],
    "5-FU": ["fluorouracil", "capecitabine", "DPYD", "DPD deficiency", "mucositis",
             "neutropenia", "chemotherapy toxicity", "dose reduction"],
    "abacavir": ["HIV", "HLA-B*57:01", "hypersensitivity", "antiretroviral",
                 "immune-mediated reaction"],
    "carbamazepine": ["tegretol", "epilepsy", "HLA-B*15:02", "HLA-A*31:01", "SJS",
                      "TEN", "DRESS", "Stevens-Johnson"],
    "tacrolimus": ["immunosuppressant", "transplant", "CYP3A5", "organ rejection",
                   "calcineurin inhibitor", "dose adjustment"],
    # ... 15+ more drug expansion maps
}

12. Query Expansion and Retrieval Strategy¶

12.1 Multi-Stage Retrieval¶

The RAG engine uses a four-stage retrieval strategy optimized for pharmacogenomic queries:

Stage 1: Query Classification Classify incoming query into PGx workflow type: - Gene-specific query ("What does CYP2D6 4/41 mean?") - Drug-specific query ("Is codeine safe for this patient?") - Patient profile query ("Generate PGx report for this genome") - Interaction query ("Any drug-drug-gene interactions in this med list?") - Dosing query ("What warfarin dose for this genotype?")

Stage 2: Targeted Collection Search Based on classification, prioritize relevant collections: - Gene query → pgx_gene_reference (weight boost), pgx_drug_guidelines - Drug query → pgx_drug_guidelines (weight boost), pgx_drug_interactions, pgx_drug_alternatives - Profile query → All collections, emphasizing pgx_gene_reference and pgx_drug_guidelines - Interaction query → pgx_phenoconversion (weight boost), pgx_drug_interactions

Stage 3: Evidence Merging and Re-ranking - Merge results across collections with weighted scoring - Cross-encoder re-ranking for relevance - Deduplication of overlapping evidence - Evidence grading annotation (CPIC Level A vs. B vs. emerging)

Stage 4: LLM Synthesis - Structured prompt with retrieved evidence - Clinical reasoning framework (gene → phenotype → drug → recommendation → evidence) - Output format enforced: actionable recommendations with evidence grading

13. API and UI Design¶

13.1 FastAPI Endpoints (Port 8107)¶

Health and Status:
GET  /health                      → Service health, collection counts
GET  /collections                 → Collection names and record counts
GET  /metrics                     → Prometheus-compatible metrics

Core PGx Queries:
POST /v1/pgx/profile             → Generate complete PGx profile from VCF
POST /v1/pgx/query               → Natural language PGx query with RAG
POST /v1/pgx/drug-check          → Check single drug against PGx profile
POST /v1/pgx/medication-review   → Full medication list review
POST /v1/pgx/dosing              → Genotype-guided dosing calculation
POST /v1/pgx/hla-screen          → HLA hypersensitivity screening

Interaction Analysis:
POST /v1/pgx/interactions        → Drug-drug-gene interaction analysis
POST /v1/pgx/phenoconversion     → Phenoconversion detection

Reporting:
POST /v1/pgx/report              → Generate clinical PGx report (PDF/JSON)
POST /v1/pgx/passport            → Generate PGx Passport card
GET  /v1/pgx/profile/{patient_id} → Retrieve stored patient PGx profile

Data Management:
POST /v1/pgx/ingest-vcf          → Ingest VCF and extract PGx variants
POST /v1/pgx/update-guidelines   → Update CPIC/DPWG guideline data
GET  /v1/pgx/gene/{gene_name}    → Gene-specific reference information
GET  /v1/pgx/drug/{drug_name}    → Drug-specific PGx information

13.2 Streamlit UI Design (Port 8507)¶

Tab 1: PGx Dashboard - Patient PGx profile overview (metabolizer status table) - Active medication list with drug-gene interaction alerts - Risk severity heatmap (genes × drugs)

Tab 2: Drug Check - Enter drug name → instant PGx safety assessment - Visual traffic light system: Green (safe), Yellow (adjust), Red (avoid) - Alternative drug suggestions with rationale

Tab 3: Medication Review - Paste or upload complete medication list - Comprehensive drug-drug-gene interaction matrix - Phenoconversion detection and cascade effects - Prioritized action items

Tab 4: Warfarin Dosing - Interactive dosing calculator (IWPC algorithm) - Input: genotypes, demographics, concurrent medications - Output: predicted therapeutic dose with confidence interval

Tab 5: Chemotherapy Safety - DPYD screening results and 5-FU dose recommendation - TPMT/NUDT15 results and thiopurine dose recommendation - UGT1A1 results and irinotecan dose recommendation

Tab 6: HLA Screening - Complete HLA typing results - Drug hypersensitivity risk table - Population-specific frequency context

Tab 7: PGx Report Generator - Generate clinical PGx report (PDF) - Generate PGx Passport (wallet card format) - Generate provider letter (for specialist communication)

Tab 8: Evidence Explorer - Search CPIC/DPWG guidelines by gene or drug - View published PGx implementation evidence - Clinical trial results for PGx-guided therapy

Tab 9: Phenoconversion Modeler - Interactive tool: add/remove drugs and see phenotype changes - Visual diagram of CYP enzyme inhibition/induction cascades - "What if" scenarios for medication changes

Tab 10: Population Analytics - Allele frequency explorer by ethnicity - Metabolizer phenotype distribution charts - Health equity analysis (differential PGx risk by population)

14. Clinical Decision Support Engines¶

14.1 Dosing Calculators¶

Warfarin Dose Calculator - Algorithm: IWPC (International Warfarin Pharmacogenetics Consortium) - Inputs: CYP2C9, VKORC1, CYP4F2 genotypes + demographics - Output: Predicted weekly dose (mg) with 95% CI - Validation: R² = 0.43 (vs. 0.17 for clinical-only algorithm)

Tacrolimus Dose Calculator - Algorithm: CYP3A5-guided dosing - CYP3A5 expressers (1/1, 1/3): Standard dose → higher trough levels expected - CYP3A5 non-expressers (3/3): May need 1.5-2x dose for target trough

5-FU Dose Adjustment Calculator - Algorithm: DPYD activity score-based - Input: DPYD diplotype → activity score - Output: Percent dose reduction recommendation per EMA/CPIC guidelines

14.2 Alert Classification Engine¶

Three-tier alert system for clinical decision support:

Alert Level	Criteria	Clinical Action	Example
CRITICAL	CPIC Level A, avoid/contraindicate	Immediate prescriber notification, hard stop recommended	CYP2D6 UM + codeine, HLA-B*57:01+ + abacavir
WARNING	CPIC Level A/B, dose adjustment needed	Prescriber review required before dispensing	CYP2C19 IM + clopidogrel, SLCO1B1 5/5 + simvastatin
INFO	CPIC Level B/C, informational	FYI for clinical record, no immediate action required	CYP2D6 IM + ondansetron (may need higher dose)

14.3 Drug Alternative Recommendation Engine¶

When a drug is flagged as AVOID or CONTRAINDICATED, the engine recommends alternatives:

DRUG_ALTERNATIVES = {
    "codeine": {
        "CYP2D6_PM": ["morphine (not CYP2D6 dependent)", "oxycodone (minimal CYP2D6)",
                       "acetaminophen (non-opioid)", "NSAIDs (if appropriate)"],
        "CYP2D6_UM": ["morphine (direct-acting, no activation needed)",
                       "fentanyl (CYP3A4 metabolized)", "hydromorphone"],
    },
    "clopidogrel": {
        "CYP2C19_PM": ["prasugrel (not CYP2C19 dependent)",
                        "ticagrelor (not CYP2C19 dependent)"],
        "CYP2C19_IM": ["prasugrel", "ticagrelor",
                        "clopidogrel at increased dose (75 mg → 150 mg, limited evidence)"],
    },
    "simvastatin": {
        "SLCO1B1_poor": ["pravastatin (not SLCO1B1 dependent)",
                          "rosuvastatin (minimal SLCO1B1 dependence)",
                          "fluvastatin (not SLCO1B1 dependent)"],
    },
    # ... 30+ drug alternative maps
}

15. Reporting and Interoperability¶

15.1 Output Formats¶

Clinical PGx Report (PDF): Comprehensive multi-page report for medical record
PGx Passport (PDF/card): Wallet-sized card with critical PGx information for patient to carry
HL7 FHIR PGx Report: Structured data output compatible with EHR integration (DiagnosticReport + Observation resources)
CDS Hooks: Integration with EHR clinical decision support via CDS Hooks specification (when available)
JSON/API: Structured data for programmatic consumption by other agents

15.2 FHIR Integration¶

PGx results will be represented using HL7 FHIR Genomics specifications: - DiagnosticReport for overall PGx panel results - Observation (component type 69548-6 Genetic variant assessment) for each gene - MedicationRequest extensions for PGx-informed prescribing - Task resources for recommended clinical actions

16. Product Requirements Document¶

16.1 User Stories¶

Epic 1: Genome-to-PGx Profile (Foundation)

ID	Story	Acceptance Criteria	Priority
PGX-001	As a clinician, I want to upload a patient's VCF file and receive a complete PGx profile so I can make informed prescribing decisions	VCF processed, star alleles called for 25+ genes, phenotypes assigned, profile stored in Milvus	P0
PGX-002	As a clinician, I want to see which of my patient's current medications have PGx implications so I can address high-risk interactions	Medication list cross-referenced against PGx profile, alerts generated with CPIC evidence levels	P0
PGX-003	As a clinician, I want HLA alleles extracted and screened for drug hypersensitivity risk so I can prevent life-threatening reactions	HLA typing from WGS, screening against 12+ drug-HLA associations, mandatory screening compliance	P0

Epic 2: Clinical Decision Support

ID	Story	Acceptance Criteria	Priority
PGX-004	As a prescriber, I want a traffic-light alert when I query a specific drug so I can quickly assess safety	Green/Yellow/Red classification with clinical recommendation and alternative drugs	P0
PGX-005	As a pharmacist, I want phenoconversion detection when reviewing medication lists so I can identify hidden drug-drug-gene interactions	All CYP inhibitors/inducers in med list flagged, effective phenotype calculated, cascade effects modeled	P1
PGX-006	As a cardiologist, I want genotype-guided warfarin dosing using the IWPC algorithm so I can optimize anticoagulation faster	CYP2C9+VKORC1+CYP4F2 genotypes + demographics → predicted dose with CI	P1

Epic 3: Natural Language PGx Consultation

ID	Story	Acceptance Criteria	Priority
PGX-007	As a clinician with no PGx training, I want to ask questions in plain English and get clear recommendations so I don't need to interpret star alleles myself	Natural language query → grounded RAG response with CPIC citations and evidence levels	P0
PGX-008	As a clinician, I want to ask "What should I use instead?" when a drug is flagged and get ranked alternatives with rationale	Alternative drug list ranked by PGx suitability with explanation of why each is preferred	P1
PGX-009	As an oncologist, I want to check chemotherapy safety (DPYD, TPMT, NUDT15, UGT1A1) before starting treatment so I can prevent fatal toxicity	All relevant chemotherapy PGx genes assessed, dose adjustment calculated, CRITICAL alerts for deficiency	P0

Epic 4: Reporting and Communication

ID	Story	Acceptance Criteria	Priority
PGX-010	As a clinician, I want to generate a PDF PGx report for the medical record so results are documented and accessible to other providers	Multi-page clinical report with gene table, alerts, recommendations, evidence citations	P1
PGX-011	As a patient, I want a PGx Passport card I can carry so I can inform emergency providers about my drug sensitivities	Wallet-card format with critical AVOID drugs, HLA alerts, and metabolizer status for key genes	P1
PGX-012	As a healthcare system, I want FHIR-formatted PGx data so I can integrate results into our EHR	DiagnosticReport + Observation FHIR resources with proper LOINC/SNOMED coding	P2

Epic 5: Population Health and Analytics

ID	Story	Acceptance Criteria	Priority
PGX-013	As a health equity researcher, I want population-specific PGx analytics so I can identify disparities in PGx-relevant prescribing	Allele frequency data by ethnicity, metabolizer phenotype distributions, differential risk analysis	P2
PGX-014	As a pharmacy director, I want aggregate PGx data across our patient population so I can prioritize formulary changes	Population-level metabolizer phenotype distributions for drugs on formulary	P2

Epic 6: Multi-Gene and Complex Scenarios

ID	Story	Acceptance Criteria	Priority
PGX-015	As a psychiatrist prescribing multiple psychotropic medications, I want a comprehensive CYP profile (2D6+2C19+1A2+3A4) so I can avoid multi-drug interactions	All psychiatric-relevant CYP phenotypes assessed, drug-drug-gene interactions modeled, recommendations for the combination	P1
PGX-016	As an internist managing a polypharmacy patient, I want the system to model phenoconversion cascades when adding a new drug so I can anticipate downstream effects	Interactive "what if" modeling: add Drug X → see phenotype changes → see downstream drug level predictions	P1
PGX-017	As a transplant physician, I want CYP3A5-guided tacrolimus dosing so I can reach therapeutic levels faster and reduce rejection risk	CYP3A5 phenotype → starting dose recommendation with expected trough level range	P1
PGX-018	As a pain specialist, I want comprehensive opioid safety assessment (CYP2D6 genetic + phenoconversion) so I can choose the safest analgesic	CYP2D6 genetic phenotype + effective phenotype (after phenoconversion) → opioid recommendation matrix	P0

16.2 Non-Functional Requirements¶

ID	Requirement	Target
NFR-001	VCF processing time (PGx extraction + star allele calling)	<60 seconds
NFR-002	Single drug query response time	<5 seconds
NFR-003	Full medication list review (20 drugs)	<15 seconds
NFR-004	PGx report generation (PDF)	<10 seconds
NFR-005	Concurrent user support	10 simultaneous
NFR-006	PGx profile storage capacity	10,000+ patients
NFR-007	Guideline update propagation	<24 hours after CPIC publication
NFR-008	Patient data remains on-device	100% (HIPAA/GDPR compliance)

17. Data Acquisition Strategy¶

17.1 Primary Data Sources¶

Source	Data Type	Access	Cost
CPIC Guidelines	Gene-drug guidelines, star allele tables	Open access (cpicpgx.org)	Free
PharmGKB	Drug-gene annotations, clinical annotations	Academic license	Free
PharmVar	Star allele definitions, haplotype tables	Open access (pharmvar.org)	Free
DPWG	Dutch PGx therapeutic recommendations	Open access	Free
FDA Table of PGx Biomarkers	Drug label PGx information	Open access (FDA.gov)	Free
ClinVar	PGx variant classifications	Open access (NCBI)	Free
gnomAD	Population allele frequencies	Open access	Free
1000 Genomes	Population reference genotypes	Open access	Free
ClinicalTrials.gov	PGx clinical trials	Open access	Free
PubMed/PMC	PGx literature	Open access	Free

All primary data sources for the Pharmacogenomics Agent are free and open access, making this one of the most data-accessible agents in the HCLS AI Factory.

17.2 Data Ingestion Pipeline¶

CPIC Guidelines: Parse structured guideline PDFs and supplementary data tables → embed in pgx_drug_guidelines
PharmVar: Download allele definition tables → populate pgx_gene_reference
PharmGKB: API access for clinical annotations → populate pgx_drug_interactions
FDA Labels: Parse pharmacogenomic labeling sections → populate pgx_fda_labels
gnomAD: Extract PGx-relevant allele frequencies by population → populate pgx_population_data
PubMed: Systematic search for PGx implementation studies → populate pgx_clinical_evidence

18. Validation and Testing Strategy¶

18.1 Validation Tiers¶

Tier 1: Variant Concordance (Unit) - Gold standard: GeT-RM (Genetic Testing Reference Materials) samples with known PGx genotypes - Test: VCF → star allele caller → compare to GeT-RM consensus genotype - Target: >99% concordance for simple genes, >95% for CYP2D6

Tier 2: Phenotype Accuracy (Integration) - Gold standard: CPIC allele functionality tables - Test: All possible diplotype combinations → phenotype → compare to CPIC assignment - Target: 100% concordance with CPIC standardized terms

Tier 3: Recommendation Accuracy (Clinical) - Gold standard: CPIC guideline recommendations + clinical pharmacist review - Test: Simulated patient cases with known genotypes + medication lists → recommendations - Target: 100% concordance with CPIC Level A guidelines for CRITICAL alerts

Tier 4: Clinical Utility (Outcomes) - Post-implementation: Track ADR rates, time to therapeutic dose, prescriber adoption - Compare PGx-guided vs. standard prescribing outcomes - Measure cost avoidance from prevented ADRs

18.2 Test Case Library¶

50+ synthetic test cases covering: - All 25+ pharmacogenes with multiple diplotype combinations - All CPIC Level A drug-gene pairs - Multi-gene scenarios (warfarin with CYP2C9 + VKORC1 + CYP4F2) - Phenoconversion scenarios - HLA screening edge cases - Population-specific allele frequency considerations - Polypharmacy with ≥10 medications

19. Regulatory Considerations¶

19.1 FDA Classification¶

The Pharmacogenomics Intelligence Agent operates as a clinical decision support (CDS) tool that: - Presents PGx information from validated sources (CPIC, DPWG, FDA labels) - Does NOT make autonomous prescribing decisions - Requires clinician review and judgment for all recommendations - Falls under FDA 21st Century Cures Act CDS exemptions (criteria i-iv) when: - Not intended to replace clinical judgment - Displays underlying evidence for clinician review - Does not generate alarms that trigger automatic action

19.2 Compliance Framework¶

HIPAA: Patient genomic data stays on local DGX Spark. No PHI transmitted to cloud.
CLIA: Star allele calling from research-grade WGS is informational. Clinical-grade results require CLIA-certified laboratory confirmation.
GDPR: On-device processing eliminates cross-border data transfer concerns.
State PGx laws: Some states require genetic counseling before PGx testing. The agent provides educational content but does not replace genetic counseling.

20. DGX Compute Progression¶

20.1 DGX Spark (Current Target)¶

GPU: NVIDIA GPU with 128 GB unified memory
CPU: NVIDIA Grace (72 ARM Neoverse cores)
RAM: 128 GB unified (CPU+GPU shared)
Storage: 4 TB NVMe SSD
Price: $3,999
PGx Agent footprint: ~15 GB (collections + models + reference data)
Concurrent capacity: 10 simultaneous users

20.2 Scaling Path¶

As PGx adoption grows (more patients, more queries): - DGX Spark cluster (2-4 nodes): Support 50+ concurrent users, faster batch VCF processing - DGX Station / Workstation: Support health system deployment with 100+ users - DGX SuperPOD: Population-scale PGx analytics across millions of patients

21. Implementation Roadmap¶

Phase 1: Core PGx Engine (Weeks 1-6)¶

Week	Deliverable
1-2	VCF-to-PGx variant extraction pipeline; star allele caller for 15 simple genes
3-4	CYP2D6 complex allele handling; diplotype-to-phenotype translator
5-6	Milvus collections setup; CPIC guideline ingestion; basic drug-gene matching

Phase 2: Clinical Workflows (Weeks 7-12)¶

Week	Deliverable
7-8	Pre-emptive panel workflow; opioid safety workflow; HLA screening
9-10	Warfarin dosing calculator; antidepressant selection workflow
11-12	Chemotherapy safety (DPYD, TPMT, NUDT15); statin myopathy workflow

Phase 3: Advanced Features (Weeks 13-18)¶

Week	Deliverable
13-14	Phenoconversion detection engine; polypharmacy drug-drug-gene interaction analysis
15-16	Streamlit UI (10 tabs); PDF/FHIR report generation; PGx Passport
17-18	Population analytics; validation suite (50+ test cases); performance optimization

22. Risk Analysis¶

22.1 Critical Risks¶

Risk	Severity	Mitigation
Incorrect star allele calling leads to wrong phenotype and wrong drug recommendation	Critical -- patient harm	Multi-tier validation against GeT-RM; CLIA disclaimer for research-grade WGS; mandatory clinician review
CYP2D6 structural variants missed from short-read WGS	High -- incomplete profile	Clear documentation of limitations; flag when structural variant data unavailable; recommend confirmatory testing
Guideline updates not propagated, stale recommendations served	High -- outdated guidance	Automated CPIC RSS monitoring; version stamping on all recommendations; manual review process for updates
Clinician over-reliance on PGx recommendations without clinical context	High -- inappropriate automation	CDS design emphasizes "decision SUPPORT" not "decision MAKING"; all outputs include "clinician review required"
Phenoconversion modeling errors in complex polypharmacy	Medium -- missed interactions	Conservative alerting (flag potential phenoconversion even with weak inhibitors); clear uncertainty language

22.2 Ethical Considerations¶

Health equity: PGx guidelines are predominantly derived from European-ancestry populations. Allele frequency databases for African, Asian, and Latino populations are less complete. The agent must clearly communicate when evidence is population-limited.
Access equity: The $3,999 DGX Spark + open-source model makes PGx CDS accessible to under-resourced clinics, but genomic sequencing itself remains costly.
Genetic determinism: PGx phenotype is one factor in drug response. Environment, adherence, comorbidities, age, and other medications all contribute. The agent must contextualize genetic findings appropriately.
Incidental findings: WGS-based PGx extraction may reveal disease-risk variants (e.g., BRCA1/2) incidentally. The agent must have a clear policy for managing incidental findings.

23. Competitive Landscape¶

23.1 Detailed Competitive Analysis¶

Feature	Our Agent	OneOme RightMed	Myriad GeneSight	Invitae PGx	CPIC Guidelines
Genes covered	25+	24	12 (psych only)	14	27 (guidelines)
Drug-gene pairs	400+	~300	~60	~150	~80
WGS integration	Yes (VCF pipeline)	No (panel only)	No	No	N/A
Multi-gene modeling	Yes	Limited	No	No	Manual
Phenoconversion	Yes	No	No	No	Mentioned
HLA screening	12+ alleles	3	None	2	6
Natural language query	Yes (RAG + Claude)	No	No	No	No
On-device / HIPAA	Yes (DGX Spark)	Cloud-based	Cloud-based	Cloud-based	N/A
Cost	$3,999 (one-time HW)	Per-test subscription	Per-test ($400+)	Per-test	Free (text)
Open source	Yes (Apache 2.0)	No	No	No	Yes (guidelines)

23.2 Unique Differentiators¶

Genome-first: Only solution that starts from WGS/WES VCF data rather than a targeted panel
Multi-collection RAG: Searches across 14 specialized knowledge bases simultaneously
Phenoconversion modeling: Detects when concurrent medications alter the patient's effective metabolizer status -- no competitor does this
Natural language interface: Clinicians ask questions in English, not star allele codes
Local deployment: Patient genomic data never leaves the clinic
Zero per-test cost: After initial hardware investment, unlimited PGx consultations

24. Discussion¶

24.1 Transformative Impact¶

The Pharmacogenomics Intelligence Agent addresses one of the most actionable gaps in modern medicine: the translation of genomic data into safe prescribing decisions. Unlike many precision medicine applications that remain aspirational, pharmacogenomics has Level A evidence from CPIC for dozens of drug-gene pairs, FDA boxed warnings for genetic testing, and demonstrated cost-effectiveness across multiple healthcare systems. The barriers to adoption are not scientific -- they are practical: complexity, integration, and clinical workflow friction.

By embedding PGx intelligence into a natural language interface backed by multi-collection RAG, the agent eliminates the need for clinicians to understand star allele nomenclature, activity scores, or guideline interpretation. A physician can simply ask: "Is this drug safe for my patient?" and receive a grounded, evidence-cited answer in seconds.

24.2 Integration with Existing Agents¶

The Pharmacogenomics Agent completes a critical circle in the HCLS AI Factory: - The Genomics Pipeline generates the variants - The Biomarker Agent provides initial PGx screening - The Pharmacogenomics Agent provides deep clinical PGx consultation - The Oncology Agent uses PGx for chemotherapy safety - The Autoimmune Agent uses PGx for immunosuppressant dosing - The Cardiology Agent uses PGx for anticoagulant and statin optimization

24.3 The Vision: PGx as Standard of Care¶

The ultimate goal is a future where every patient's genome is processed once and their PGx profile is available for every prescribing decision for life. The Pharmacogenomics Intelligence Agent makes this vision technically feasible on a $3,999 device -- democratizing access to pharmacogenomic intelligence that currently requires multi-million-dollar institutional programs.

With 106,000 ADR deaths per year in the U.S. alone, and 95-99% of patients carrying actionable PGx variants, the potential impact is measured not in efficiency gains but in lives saved.

25. Conclusion¶

This paper has presented the comprehensive architecture, clinical rationale, and product requirements for the Pharmacogenomics Intelligence Agent -- a multi-collection RAG system that transforms raw genomic data into actionable, evidence-grounded prescribing guidance for over 400 drug-gene interactions across 25+ pharmacogenes. The agent addresses the most preventable cause of drug-related death by bridging the gap between genomic knowledge and clinical practice through natural language clinical decision support.

Key architectural innovations include: - VCF-to-prescribing pipeline: Direct integration with the HCLS AI Factory genomics pipeline - 14 specialized Milvus collections covering CPIC/DPWG/FDA guidelines, PharmGKB annotations, HLA hypersensitivity screening, phenoconversion modeling, dosing algorithms, and population-specific data - Multi-gene interaction engine for complex scenarios (warfarin, psychiatric polypharmacy, transplant immunosuppression) - Phenoconversion detector that identifies when concurrent medications alter effective metabolizer status -- a capability no competing product offers - 8 clinical workflows covering the highest-impact prescribing scenarios where PGx testing has CPIC Level A evidence - Privacy-first architecture with all patient genomic data remaining on the local DGX Spark device

The agent represents the seventh intelligence module in the HCLS AI Factory platform, bringing the total agent portfolio to: Precision Biomarker, Precision Oncology, CAR-T Intelligence, Imaging Intelligence, Precision Autoimmune, Cardiology Intelligence, and Pharmacogenomics Intelligence -- a comprehensive precision medicine platform that takes patient care from genome to safe prescription on a single $3,999 device.

26. References¶

Lazarou J, Pomeranz BH, Corey PN. Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. JAMA. 1998;279(15):1200-1205. PMID: 9555760
Relling MV, Klein TE. CPIC: Clinical Pharmacogenetics Implementation Consortium of the Pharmacogenomics Research Network. Clin Pharmacol Ther. 2011;89(3):464-467. PMID: 21270786
Caudle KE, et al. Standardizing CYP2D6 Genotype to Phenotype Translation: Consensus Recommendations from the Clinical Pharmacogenetics Implementation Consortium and Dutch Pharmacogenetics Working Group. Clin Transl Sci. 2020;13(1):116-124. PMID: 31647186
Crews KR, et al. Clinical Pharmacogenetics Implementation Consortium Guidelines for Cytochrome P450 2D6 Genotype and Codeine Therapy: 2014 Update. Clin Pharmacol Ther. 2014;95(4):376-382. PMID: 24458010
Scott SA, et al. Clinical Pharmacogenetics Implementation Consortium Guidelines for CYP2C19 Genotype and Clopidogrel Therapy: 2013 Update. Clin Pharmacol Ther. 2013;94(3):317-323. PMID: 23698643
Johnson JA, et al. Clinical Pharmacogenetics Implementation Consortium Guidelines for CYP2C9 and VKORC1 Genotypes and Warfarin Dosing: 2017 Update. Clin Pharmacol Ther. 2017;102(3):397-404. PMID: 28198005
SEARCH Collaborative Group. SLCO1B1 variants and statin-induced myopathy -- a genomewide study. N Engl J Med. 2008;359(8):789-799. PMID: 18650507
Amstutz U, et al. Clinical Pharmacogenetics Implementation Consortium (CPIC) Guideline for Dihydropyrimidine Dehydrogenase Genotype and Fluoropyrimidine Dosing: 2017 Update. Clin Pharmacol Ther. 2018;103(2):210-216. PMID: 29152729
Mallal S, et al. HLA-B*5701 screening for hypersensitivity to abacavir. N Engl J Med. 2008;358(6):568-579. PMID: 18256392
Hershfield MS, et al. Clinical Pharmacogenetics Implementation Consortium Guidelines for Human Leukocyte Antigen-B Genotype and Allopurinol Dosing. Clin Pharmacol Ther. 2013;93(2):153-158. PMID: 23232549
Leckband SG, et al. Clinical Pharmacogenetics Implementation Consortium Guidelines for HLA-B Genotype and Carbamazepine Dosing. Clin Pharmacol Ther. 2013;94(3):324-328. PMID: 23695185
International Warfarin Pharmacogenetics Consortium. Estimation of the warfarin dose with clinical and pharmacogenetic data. N Engl J Med. 2009;360(8):753-764. PMID: 19228618
Birdwell KA, et al. Clinical Pharmacogenetics Implementation Consortium (CPIC) Guidelines for CYP3A5 Genotype and Tacrolimus Dosing. Clin Pharmacol Ther. 2015;98(1):19-24. PMID: 25801146
Relling MV, et al. Clinical Pharmacogenetics Implementation Consortium Guideline for Thiopurine Dosing Based on TPMT and NUDT15 Genotypes: 2018 Update. Clin Pharmacol Ther. 2019;105(5):1095-1105. PMID: 30447069
Hicks JK, et al. Clinical Pharmacogenetics Implementation Consortium Guideline (CPIC) for CYP2D6 and CYP2C19 Genotypes and Dosing of Tricyclic Antidepressants: 2016 Update. Clin Pharmacol Ther. 2017;102(1):37-44. PMID: 27997040
Hicks JK, et al. Clinical Pharmacogenetics Implementation Consortium (CPIC) Guideline for CYP2D6 and CYP2C19 Genotypes and Dosing of Selective Serotonin Reuptake Inhibitors. Clin Pharmacol Ther. 2015;98(2):127-134. PMID: 25974703
PharmGKB. https://www.pharmgkb.org/
PharmVar. https://www.pharmvar.org/
CPIC. https://cpicpgx.org/
Dunnenberger HM, et al. Preemptive clinical pharmacogenetics implementation: current programs in five US medical centers. Annu Rev Pharmacol Toxicol. 2015;55:89-106. PMID: 25292429
Weitzel KW, et al. The IGNITE network: a model for genomic medicine implementation and research. BMC Med Genomics. 2016;9:1. PMID: 26729011
Swen JJ, et al. Pharmacogenetics: from bench to byte -- an update of guidelines. Clin Pharmacol Ther. 2011;89(5):662-673. PMID: 21412232
FDA Table of Pharmacogenomic Biomarkers in Drug Labeling. https://www.fda.gov/drugs/science-and-research-drugs/table-pharmacogenomic-biomarkers-drug-labeling
Bousman CA, et al. Pharmacogenetic tests and depressive symptom remission: a meta-analysis of randomized controlled trials. Pharmacogenomics. 2019;20(1):37-47. PMID: 30520364
Empey PE, et al. Multisite investigation of outcomes with implementation of CYP2C19 genotype-guided antiplatelet therapy after percutaneous coronary intervention. JACC Cardiovasc Interv. 2018;11(2):181-191. PMID: 29348010
Samwald M, et al. Incidence of exposure of patients in the United States to multiple drugs for which pharmacogenomic guidelines are available. PLoS One. 2016;11(10):e0164972. PMID: 27736993