HCLS AI Factory — Executive Bullets¶
One-page reference for executives, stakeholders, and demo audiences.
License: Apache 2.0 | Author: Adam Jones | Date: February 2026
What It Is¶
The HCLS AI Factory transforms patient DNA into ranked novel drug candidates in under 5 hours on a single NVIDIA DGX Spark ($3,999). Three GPU-accelerated stages — genomics, RAG-grounded target identification, and AI-driven drug discovery — run end-to-end with no manual intervention.
The Problem¶
- CPU-based genomics pipelines take 12-36 hours for a single 30× WGS sample
- Variant annotation is fragmented across disconnected databases and manual curation
- The gap from identified variant to drug lead compound is months of manual work
- No integrated platform connects genomics, clinical reasoning, and drug discovery
The Solution — Three Stages¶
Stage 1: GPU-Accelerated Genomics (120-240 min)¶
- NVIDIA Parabricks 4.6 — 10-20× faster than CPU
- BWA-MEM2 alignment: 20-45 min (vs. 12-24 hours on CPU)
- Google DeepVariant: 10-35 min, >99% accuracy
- Input: ~200 GB FASTQ (30× WGS, HG002)
- Output: VCF with ~11.7 million variants
Stage 2: RAG-Grounded Target Identification (Interactive)¶
- 3 annotation databases: ClinVar (4.1M), AlphaMissense (71M), Ensembl VEP
- 3.5 million high-quality variant embeddings in Milvus vector database
- Anthropic Claude (RAG-grounded reasoning) identifies druggable gene targets
- 201 genes across 13 therapeutic areas, 171 druggable targets (85%)
- Output: Target gene with full evidence chain
Stage 3: AI-Driven Drug Discovery (8-16 min)¶
- BioNeMo MolMIM — generative chemistry (novel molecule design)
- BioNeMo DiffDock — molecular docking (binding affinity prediction)
- RDKit — drug-likeness scoring (Lipinski, QED, TPSA)
- Composite scoring: 30% generation + 40% docking + 30% QED
- Output: 100 ranked novel drug candidates + PDF report
Key Numbers¶
| Metric | Value |
|---|---|
| Total Pipeline Time | < 5 hours |
| Input Data | ~200 GB FASTQ (30× WGS) |
| Variants Called | ~11.7 million |
| High-Quality Variants | ~3.5 million |
| Genes in Knowledge Base | 201 (13 therapeutic areas) |
| Druggable Targets | 171 (85%) |
| Drug Candidates Generated | 100 (ranked by composite score) |
| Hardware Cost | $3,999 (DGX Spark) |
VCP/FTD Demo Highlights¶
- Target: VCP gene — Frontotemporal Dementia, ALS, IBMPFD
- Variant: rs188935092 — ClinVar Pathogenic, AlphaMissense 0.87
- Seed: CB-5083 (Phase I clinical VCP inhibitor)
- Result: Top candidate shows +39% composite improvement over seed
- Docking: -11.4 kcal/mol (vs. -8.1 for CB-5083)
- QED: 0.81 (vs. 0.62 for CB-5083)
- All top 10 pass Lipinski's Rule of Five
Technology Stack¶
| Layer | Technology |
|---|---|
| Hardware | NVIDIA DGX Spark (GB10 GPU, 128 GB unified, $3,999) |
| Genomics | NVIDIA Parabricks 4.6, DeepVariant (>99% accuracy) |
| Annotation | ClinVar, AlphaMissense, Ensembl VEP |
| Vector DB | Milvus 2.4, BGE-small-en-v1.5, IVF_FLAT |
| LLM | Anthropic Claude (RAG-grounded reasoning) |
| Drug Discovery | BioNeMo MolMIM, BioNeMo DiffDock, RDKit |
| Orchestration | Nextflow DSL2 (5 modes: full, target, drug, demo, genomics_only) |
| Monitoring | Grafana, Prometheus, DCGM Exporter |
| License | Apache 2.0 (fully open) |
Deployment Roadmap¶
| Phase | Hardware | Scale | Cost |
|---|---|---|---|
| 1 — Proof Build | DGX Spark | 1 patient, Docker Compose | $3,999 |
| 2 — Departmental | DGX B200 | Multiple concurrent, Kubernetes | $500K-$1M |
| 3 — Enterprise | DGX SuperPOD | Thousands concurrent, FLARE federated | $7M-$60M+ |
Cross-Modal Integration¶
- Imaging → Genomics: Lung-RADS 4B+ triggers tumor gene profiling
- Genomics → Drug Discovery: Pathogenic variants trigger molecule generation
- Drug Discovery → Imaging: Combined genomic + imaging clinical reports
- NVIDIA FLARE: Federated learning across institutions (data stays local)
Competitive Differentiation¶
- Only platform running genomics-to-drug-candidates on a single desktop GPU
- End-to-end: No manual handoffs between stages
- < 5 hours total pipeline time (vs. weeks/months traditional)
- $3,999 proof build cost (vs. $100K+ for equivalent CPU infrastructure)
- Open-source: Apache 2.0, reproducible, auditable
- Scalable: Same Nextflow pipelines scale from DGX Spark to SuperPOD
HCLS AI Factory — Apache 2.0 | Author: Adam Jones | February 2026