Launch Announcement Drafts¶
Ready-to-post drafts for LinkedIn, Twitter/X, and Hacker News. Edit to your voice before posting.
LinkedIn Post¶
I've been working on this for 14 years. Today I'm giving it away.
In 2012, I set out to use my high-performance computing background for something that mattered. I started with one conviction: no parent should ever have to lose a child to disease.
That conviction led me to Pediatric Neuroblastoma. I taught myself biology, genomics, molecular pathways, drug discovery -- whatever the work required. I made one commitment early: I would not profit from this.
Today I'm releasing the HCLS AI Factory -- an open-source platform that takes raw patient DNA and produces ranked drug candidates in under 5 hours. On a single desktop workstation.
What it does:
- Takes raw DNA sequencing data (~200 GB) and runs GPU-accelerated variant calling (NVIDIA Parabricks) -- 11.7 million variants in 2-4 hours instead of days
- Annotates variants against ClinVar, AlphaMissense, and a curated knowledge base of 201 genes across 13 therapeutic areas
- Uses RAG-grounded AI reasoning (Milvus + Claude) to identify druggable gene targets
- Generates 100 novel drug candidates via BioNeMo (MolMIM + DiffDock) in under 16 minutes
- Three domain-specific intelligence agents (CAR-T, Medical Imaging, Precision Oncology) with 1,296 tests
The hardware: an NVIDIA DGX Spark. $3,999. Sits on a desk.
The traditional path from patient DNA to drug candidates takes 6-18 months and $50K-500K+ in infrastructure. This does it in an afternoon.
Everything is Apache 2.0. No vendor lock-in. No strings attached.
My hope is simple: a family sitting in a hospital room gets access to the same computational tools that used to require a research institution. A graduate student at any university can run a complete precision medicine pipeline on hardware they can actually afford. A startup can prototype therapeutic hypotheses without a six-figure compute budget.
If you work in genomics, drug discovery, or clinical research -- take it. Use it. Make it better.
GitHub: https://github.com/ajones1923/hcls-ai-factory Docs: https://hcls-ai-factory.org
PrecisionMedicine #OpenSource #Genomics #DrugDiscovery #NVIDIA #DGXSpark #AI #Bioinformatics #Apache2¶
Twitter/X Thread¶
Tweet 1 (hook):
I spent 14 years building this. Today I'm releasing it for free.
An open-source platform that goes from raw patient DNA to ranked drug candidates in under 5 hours.
On a $3,999 desktop workstation.
Apache 2.0. No strings attached.
Thread:
Tweet 2 (what it does):
The HCLS AI Factory is a 3-stage pipeline:
Stage 1: GPU-accelerated genomics (Parabricks) -- 11.7M variants in 2-4 hrs Stage 2: RAG-grounded target ID (Milvus + Claude) -- <5 sec per query Stage 3: AI drug discovery (MolMIM + DiffDock) -- 100 candidates in 16 min
All on one NVIDIA DGX Spark.
Tweet 3 (the numbers):
Key numbers:
- 3.56M searchable vectors in Milvus
- 201 genes across 13 therapeutic areas
- 171 druggable targets (85%)
- 1,296 agent tests in 3.78 sec
- Top drug candidate: +39% improvement over seed compound
Traditional approach: 6-18 months, $50K-500K+ This: <5 hours, $3,999
Tweet 4 (intelligence agents):
It includes 3 domain-specific intelligence agents:
- CAR-T: cell therapy evidence + comparative analysis
- Imaging: NVIDIA NIM workflows (VISTA-3D, MAISI) + FHIR R4
- Precision Oncology: MTB packets, trial matching, therapy ranking
Each connects to a shared genomic evidence base (3.5M vectors).
Tweet 5 (why):
I started this because of Pediatric Neuroblastoma.
My hope: a family in a hospital room gets access to the same tools that used to require a research institution. A grad student can run precision medicine on hardware they can afford.
Take it. Use it. Make it better.
Tweet 6 (links):
GitHub: https://github.com/ajones1923/hcls-ai-factory Docs: https://hcls-ai-factory.org Demo video: [link]
Apache 2.0 -- free for commercial and non-commercial use.
Built with: NVIDIA Parabricks, BioNeMo, Milvus, Anthropic Claude, RDKit
Hacker News (Show HN)¶
Title: Show HN: Open-source platform -- Patient DNA to drug candidates in 5 hours on a $4K workstation
Body:
I've been working on this for 14 years, starting from a HPC background and teaching myself genomics along the way. The HCLS AI Factory is an end-to-end precision medicine platform that processes raw DNA sequencing data into ranked novel drug candidates -- entirely on a single NVIDIA DGX Spark ($3,999).
Three stages:
-
GPU genomics (NVIDIA Parabricks): 200GB FASTQ -> 11.7M variant calls in 2-4 hours (vs 1-2 days on CPU)
-
RAG-grounded target identification (Milvus + Claude): 3.56M vectors, 201 genes across 13 therapeutic areas, <5 sec per query
-
AI drug discovery (BioNeMo MolMIM + DiffDock + RDKit): 100 ranked drug candidates in 8-16 minutes
The platform also includes three domain-specific intelligence agents (CAR-T cell therapy, medical imaging, precision oncology) with a combined 1,296 tests.
Traditional path: 6-18 months and $50K-500K+ in infrastructure. This does it in an afternoon on a desktop.
Everything is Apache 2.0. The demo targets VCP/Frontotemporal Dementia and produces candidates with +39% composite improvement over the clinical seed compound.
GitHub: https://github.com/ajones1923/hcls-ai-factory
Tech stack: NVIDIA Parabricks 4.6, Milvus 2.4, BGE-small-en-v1.5, Anthropic Claude, BioNeMo MolMIM/DiffDock, RDKit, Nextflow, Docker, Streamlit
Happy to answer questions about the architecture, benchmarks, or the biology.
Reddit (r/bioinformatics, r/genomics, r/MachineLearning)¶
Title: I open-sourced a platform that goes from raw patient DNA to drug candidates in <5 hours on a $4K workstation (Apache 2.0)
Body:
After 14 years of work, I'm releasing the HCLS AI Factory -- an end-to-end precision medicine platform that runs entirely on a single NVIDIA DGX Spark ($3,999).
What it does:
- Stage 1: GPU-accelerated variant calling via Parabricks (11.7M variants from 30x WGS in 2-4 hours)
- Stage 2: RAG pipeline with Milvus (3.56M vectors), ClinVar, AlphaMissense, and Claude for druggable target identification
- Stage 3: Generative drug discovery via BioNeMo MolMIM + DiffDock + RDKit (100 ranked candidates in 8-16 min)
Also includes:
- 3 domain-specific agents (CAR-T, Imaging, Oncology) with 1,296 tests
- 201 genes across 13 therapeutic areas, 85% druggable
- Synthetic demo VCF for quick evaluation
- One-command quickstart:
./quickstart.sh
Why it matters:
The traditional path from patient DNA to a drug lead takes 6-18 months and requires significant infrastructure investment. This brings the entire workflow to a desktop workstation that anyone can afford.
Apache 2.0 -- use it for research, commercial work, whatever you need.
GitHub: https://github.com/ajones1923/hcls-ai-factory Docs: https://hcls-ai-factory.org
Happy to discuss architecture decisions, benchmark methodology, or answer questions.