Skip to content
Open Project · Apache 2.0 · Powered by NVIDIA DGX Spark

From Genomic Foundation to Precision Intelligence to Therapeutic Discovery

11 Intelligence Agents. 3 Engines. One Platform.
Patient DNA to Drug Candidates in Under 5 Hours.

$3,999 on NVIDIA DGX Spark · Open Source — Apache 2.0

Scroll to explore

See It in Action

From Patient DNA to New Medicines — a 9-minute walkthrough of the complete pipeline

What It Is

The Healthcare & Life Sciences (HCLS) AI Factory unifies three production-grade AI workflows into a single, continuous system — designed to take raw patient DNA and produce viable drug candidates without the fragmentation and delays that define traditional approaches.

This enables computational predictions. They're promising starting points for laboratory testing, not finished medicines. Real drug development requires years of laboratory and clinical validation. But what this HCLS AI Factory does is collapse the first and most expensive step — identifying targets and generating promising candidates — from months of work into hours.

Raw FASTQ files flow through NVIDIA Parabricks for GPU-accelerated genomics — alignment, variant calling, clinical-grade accuracy via DeepVariant — completing in hours instead of days.

Outputs feed directly into an evidence layer where millions of variants can be queried in natural language, grounded in ClinVar, AlphaMissense, structural data, and curated biomedical knowledge.

Validated targets then move into generative drug discovery via NVIDIA BioNeMo, where novel molecules are created, docked, scored, and ranked.

No batch jobs. No manual handoffs. Full lineage from patient DNA to candidate therapeutic.

One workstation. One workflow. Hours, not months.

What This Is Not

The intent is clarity, not control.

Not a clinical product

It is a reference architecture and research platform, not a regulated medical device or diagnostic system.

Not a black-box AI solution

All workflows, data flows, and reasoning layers are inspectable and reproducible.

Not a replacement for expertise

It is designed to augment researchers and clinicians, not automate judgment.

Not a single prescribed workflow

The system is modular by design and intended to be adapted, extended, or specialized.

Not a vendor lock-in strategy

The architecture is deliberately vendor-neutral and infrastructure-agnostic.

Three Engines

End-to-end from raw sequencing data to drug candidates

01

Genomic Foundation Engine

Patient DNA → 3.5M annotated variant vectors in 2 hours. GPU-accelerated with NVIDIA Parabricks.

120 – 240 min
Explore pipeline →
02

Precision Intelligence Network

11 specialized intelligence agents across every major medical domain. RAG-powered with Milvus + Claude.

Interactive
Explore pipeline →
03

Therapeutic Discovery Engine

Validated targets → novel drug candidates. Generative design with BioNeMo MolMIM + DiffDock.

8 – 16 min
Explore pipeline →

Intelligence Agents

Domain-specific AI agents extending the core platform with cross-modal evidence linking

A1

CAR-T Intelligence

Cross-functional intelligence across the CAR-T cell therapy lifecycle. 11 collections, 6,266+ vectors, comparative analysis, and deep research mode.

Port 8521
Explore agent →
A2

Imaging Intelligence

AI-powered medical imaging with NVIDIA NIM microservices: VISTA-3D segmentation, MAISI generation, VILA-M3 analysis, and FHIR R4 export.

Port 8525
Explore agent →
A3

Precision Oncology

Clinical decision support for molecular tumor boards. VCF-to-MTB packets, trial matching, therapy ranking, and FHIR R4 diagnostic bundles.

Port 8526
Explore agent →

By the Numbers

Real results from a single NVIDIA DGX Spark workstation

11.7M
Variants Called
201
Genes · 13 Therapeutic Areas
85%
Druggable Targets (171 of 201)
< 5 hrs
End-to-End Runtime
$3,999
DGX Spark Hardware
100
Drug Candidates per Target

Choose Your Path

The HCLS AI Factory serves every stakeholder in precision medicine

Traditional vs. HCLS AI Factory

What used to take months now takes hours

Metric Traditional Approach HCLS AI Factory
Sequence Alignment 12 – 24 hours 2 – 3 hours
Variant Calling 8 – 12 hours 1 – 2 hours
Annotation & Interpretation Days of manual work Minutes (automated)
Target Identification Weeks of literature review Minutes (AI-powered)
Drug Candidate Design Months of medicinal chemistry 8 – 16 minutes
Total Time Weeks to months < 5 hours
Infrastructure Cost $100K+ (cluster) $3,999 (DGX Spark)
Reproducibility Variable Deterministic

Origin

In 2012, I set out to use my high-performance computing background for something that mattered. I started with one conviction: no parent should ever have to lose a child to disease.

That conviction led me to Pediatric Neuroblastoma. I taught myself biology, genomics, molecular pathways, drug discovery — whatever the work required. I made one commitment early: I would not profit from this. Whatever I built, I would give away freely, so others could build on it and move faster than any one person ever could alone.

Thousands of hours later, this is the result.

— Adam Jones

Why This Is Open

This project is open by design, not as a shortcut or a visibility exercise, but as a deliberate decision about how foundational healthcare infrastructure should be built.

The challenges in precision medicine are no longer primarily scientific — they are architectural. Fragmented pipelines, opaque tooling, and closed systems slow the transition from genomic data to actionable insight.

By openly publishing the HCLS AI Factory, this project provides a reproducible, inspectable reference implementation for end-to-end genomics, AI reasoning, and therapeutic exploration.

Open access to infrastructure knowledge accelerates progress, enables collaboration, and shifts innovation away from re-solving plumbing toward advancing care.

Who This Is For

For people and institutions working at the intersection of healthcare, life sciences, and AI who need systems, not abstractions.

Researchers & Bioinformaticians

Building, extending, or validating secondary genomics pipelines and variant interpretation workflows who need reproducible, inspectable infrastructure.

Clinicians & Translational Teams

Exploring how genomic data, AI reasoning, and therapeutic insights can be integrated into real-world decision-making.

Academic Medical Centers

Teaching, researching, or operationalizing genomics and AI at scale, where transparency, reproducibility, and extensibility are critical.

AI & Systems Engineers

Interested in how real biomedical workloads behave when treated as first-class AI systems, including data flow, orchestration, and reasoning layers.

Platform Builders

Designing future healthcare platforms who want a concrete reference architecture for AI-native pipelines rather than high-level diagrams.

Anyone, Anywhere

This project is not limited to a single specialty, disease, or institution. It is designed to be a shared foundation that can support many domains.

Getting Started

The HCLS AI Factory documentation site provides everything needed to deploy and run a complete precision medicine pipeline — from patient DNA to drug candidates — on a single NVIDIA DGX Spark. Resources include a quick-start checklist, full deployment guide, live demo walkthrough, and detailed technical documentation for each of the three pipeline stages. The site also offers architecture diagrams, a comprehensive project bible, learning guides for both introductory and professional audiences, and all source code under an Apache 2.0 open-source license.

Architecture

Complete pipeline running on a single NVIDIA DGX Spark

HCLS AI Factory Architecture on NVIDIA DGX Spark

Click to explore the full architecture →

Technology Stack

NVIDIA DGX Spark
Parabricks 4.6
DeepVariant
BWA-MEM2
BioNeMo MolMIM
DiffDock
Milvus 2.4
Anthropic Claude
RDKit
Docker Compose
Nextflow
Grafana
Prometheus
Streamlit
Flask

Take it. Use it. Make it better.

All HCLS AI Factory code is Apache 2.0. Deploy the full pipeline on your DGX Spark in minutes.