Open Project · Apache 2.0 · Powered by NVIDIA DGX Spark

From Genomic Foundation to Precision Intelligence to Therapeutic Discovery

11 Intelligence Agents. 3 Engines. One Platform.
Patient DNA to Drug Candidates in Under 5 Hours.

$3,999 on NVIDIA DGX Spark · Open Source — Apache 2.0

Scroll to explore

See It in Action

From Patient DNA to New Medicines — a 9-minute walkthrough of the complete pipeline

What It Is

The Healthcare & Life Sciences (HCLS) AI Factory unifies three production-grade AI workflows into a single, continuous system — designed to take raw patient DNA and produce viable drug candidates without the fragmentation and delays that define traditional approaches.

This enables computational predictions. They're promising starting points for laboratory testing, not finished medicines. Real drug development requires years of laboratory and clinical validation. But what this HCLS AI Factory does is collapse the first and most expensive step — identifying targets and generating promising candidates — from months of work into hours.

Raw FASTQ files flow through NVIDIA Parabricks for GPU-accelerated genomics — alignment, variant calling, clinical-grade accuracy via DeepVariant — completing in hours instead of days.

Outputs feed directly into an evidence layer where millions of variants can be queried in natural language, grounded in ClinVar, AlphaMissense, structural data, and curated biomedical knowledge.

Validated targets then move into generative drug discovery via NVIDIA BioNeMo, where novel molecules are created, docked, scored, and ranked.

No batch jobs. No manual handoffs. Full lineage from patient DNA to candidate therapeutic.

One workstation. One workflow. Hours, not months.

What This Is Not

The intent is clarity, not control.

✘

Not a clinical product

It is a reference architecture and research platform, not a regulated medical device or diagnostic system.

✘

Not a black-box AI solution

All workflows, data flows, and reasoning layers are inspectable and reproducible.

✘

Not a replacement for expertise

It is designed to augment researchers and clinicians, not automate judgment.

✘

Not a single prescribed workflow

The system is modular by design and intended to be adapted, extended, or specialized.

✘

Not a vendor lock-in strategy

The architecture is deliberately vendor-neutral and infrastructure-agnostic.

Three Engines

End-to-end from raw sequencing data to drug candidates

01

Genomic Foundation Engine

Patient DNA → 3.5M annotated variant vectors in 2 hours. GPU-accelerated with NVIDIA Parabricks.

120 – 240 min
Explore pipeline →

▶

02

Precision Intelligence Network

11 specialized intelligence agents across every major medical domain. RAG-powered with Milvus + Claude.

Interactive
Explore pipeline →

▶

03

Therapeutic Discovery Engine

Validated targets → novel drug candidates. Generative design with BioNeMo MolMIM + DiffDock.

8 – 16 min
Explore pipeline →

Intelligence Agents

Domain-specific AI agents extending the core platform with cross-modal evidence linking

A1

CAR-T Intelligence

Cross-functional intelligence across the CAR-T cell therapy lifecycle. 11 collections, 6,266+ vectors, comparative analysis, and deep research mode.

Port 8521
Explore agent →

A2

Imaging Intelligence

AI-powered medical imaging with NVIDIA NIM microservices: VISTA-3D segmentation, MAISI generation, VILA-M3 analysis, and FHIR R4 export.

Port 8525
Explore agent →

A3

Precision Oncology

Clinical decision support for molecular tumor boards. VCF-to-MTB packets, trial matching, therapy ranking, and FHIR R4 diagnostic bundles.

Port 8526
Explore agent →

By the Numbers

Real results from a single NVIDIA DGX Spark workstation

11.7M

Variants Called

201

Genes · 13 Therapeutic Areas

85%

Druggable Targets (171 of 201)

< 5 hrs

End-to-End Runtime

$3,999

DGX Spark Hardware

100

Drug Candidates per Target

Choose Your Path

The HCLS AI Factory serves every stakeholder in precision medicine

🩺

Clinician

Find your specialty agent — cardiology, neurology, oncology, and more

🧬

Researcher

Explore GPU-accelerated genomics and single-cell analysis

💊

Pharma

Protocol optimization, patient matching, and trial intelligence

💻

Developer

Docker Compose deployment, API docs, and architecture guides

❤️

Patient Advocate

Learn how open-source precision medicine changes access for everyone

Traditional vs. HCLS AI Factory

What used to take months now takes hours

Metric	Traditional Approach	HCLS AI Factory
Sequence Alignment	12 – 24 hours	2 – 3 hours
Variant Calling	8 – 12 hours	1 – 2 hours
Annotation & Interpretation	Days of manual work	Minutes (automated)
Target Identification	Weeks of literature review	Minutes (AI-powered)
Drug Candidate Design	Months of medicinal chemistry	8 – 16 minutes
Total Time	Weeks to months	< 5 hours
Infrastructure Cost	$100K+ (cluster)	$3,999 (DGX Spark)
Reproducibility	Variable	Deterministic

Origin

In 2012, I set out to use my high-performance computing background for something that mattered. I started with one conviction: no parent should ever have to lose a child to disease.

That conviction led me to Pediatric Neuroblastoma. I taught myself biology, genomics, molecular pathways, drug discovery — whatever the work required. I made one commitment early: I would not profit from this. Whatever I built, I would give away freely, so others could build on it and move faster than any one person ever could alone.

Thousands of hours later, this is the result.

— Adam Jones

Why This Is Open

This project is open by design, not as a shortcut or a visibility exercise, but as a deliberate decision about how foundational healthcare infrastructure should be built.

The challenges in precision medicine are no longer primarily scientific — they are architectural. Fragmented pipelines, opaque tooling, and closed systems slow the transition from genomic data to actionable insight.

By openly publishing the HCLS AI Factory, this project provides a reproducible, inspectable reference implementation for end-to-end genomics, AI reasoning, and therapeutic exploration.

Open access to infrastructure knowledge accelerates progress, enables collaboration, and shifts innovation away from re-solving plumbing toward advancing care.

Who This Is For

For people and institutions working at the intersection of healthcare, life sciences, and AI who need systems, not abstractions.

Researchers & Bioinformaticians

Building, extending, or validating secondary genomics pipelines and variant interpretation workflows who need reproducible, inspectable infrastructure.

Clinicians & Translational Teams

Exploring how genomic data, AI reasoning, and therapeutic insights can be integrated into real-world decision-making.

Academic Medical Centers

Teaching, researching, or operationalizing genomics and AI at scale, where transparency, reproducibility, and extensibility are critical.

AI & Systems Engineers

Interested in how real biomedical workloads behave when treated as first-class AI systems, including data flow, orchestration, and reasoning layers.

Platform Builders

Designing future healthcare platforms who want a concrete reference architecture for AI-native pipelines rather than high-level diagrams.

Anyone, Anywhere

This project is not limited to a single specialty, disease, or institution. It is designed to be a shared foundation that can support many domains.

Getting Started

The HCLS AI Factory documentation site provides everything needed to deploy and run a complete precision medicine pipeline — from patient DNA to drug candidates — on a single NVIDIA DGX Spark. Resources include a quick-start checklist, full deployment guide, live demo walkthrough, and detailed technical documentation for each of the three pipeline stages. The site also offers architecture diagrams, a comprehensive project bible, learning guides for both introductory and professional audiences, and all source code under an Apache 2.0 open-source license.

Architecture

Complete pipeline running on a single NVIDIA DGX Spark

Click to explore the full architecture →

From Patient DNA to New Medicine Infographic

Technology Stack

NVIDIA DGX Spark

Parabricks 4.6

DeepVariant

BWA-MEM2

BioNeMo MolMIM

DiffDock

Milvus 2.4

Anthropic Claude

RDKit

Docker Compose

Nextflow

Grafana

Prometheus

Streamlit

Flask

Take it. Use it. Make it better.

All HCLS AI Factory code is Apache 2.0. Deploy the full pipeline on your DGX Spark in minutes.

Deployment Guide Read the White Paper