Skip to content

Architecture Guide

The HCLS AI Factory runs three GPU-accelerated AI pipelines on a single NVIDIA DGX Spark workstation — from raw FASTQ sequencing data to ranked drug candidates in under 5 hours.

Processing Stages

Stage 0 — Data Acquisition (one-time): Before the pipeline can run, all required data must be downloaded: HG002 FASTQ sequencing files (~200 GB), the GRCh38 reference genome with BWA-MEM2 index (~11 GB), ClinVar clinical annotations (4.1M variants), and AlphaMissense pathogenicity predictions (71M variants). The setup-data.sh script automates this entire process with checksum verification, parallel downloads, and idempotent resumption. This is a one-time step (~500 GB total).

Stage 1 — GPU Genomics (120–240 min): Raw FASTQ sequencing files are aligned to the human reference genome using BWA-MEM2, then variant-called with Google DeepVariant — both accelerated through NVIDIA Parabricks. The output is a clinical-grade VCF containing 11.7 million variants at >99% accuracy.

Stage 2 — Evidence RAG (interactive): Variants are annotated against ClinVar (4.1M clinical records) and AlphaMissense (71M pathogenicity predictions), embedded with BGE-small-en-v1.5, and indexed into a Milvus vector database (3.56M vectors). A conversational RAG interface powered by Anthropic Claude lets researchers query variants in natural language, with every answer grounded in retrieved clinical evidence.

Stage 3 — Drug Discovery (8–16 min): For validated targets, NVIDIA BioNeMo MolMIM generates novel molecular candidates from a seed compound, DiffDock predicts protein-ligand binding poses, and RDKit scores drug-likeness (QED, Lipinski, synthetic accessibility). The result: 100 ranked candidates per target with full structural and pharmacological profiles.

All three stages run on the DGX Spark's GB10 Grace Blackwell Superchip with 128 GB unified memory — connected via NVLink-C2C so the GPU and CPU share the same memory pool without transfer bottlenecks.


Architectural Infographic

HCLS AI Factory on NVIDIA DGX Spark — Architectural Infographic

Click image to expand


Architectural Infographic (Alt View)

HCLS AI Factory on NVIDIA DGX Spark — Architectural Infographic Alt View

Click image to expand


From Patient DNA to New Medicine Infographic

From Patient DNA to New Medicine Infographic

Click image to expand


Pipeline Logical Diagram

HCLS AI Factory on NVIDIA DGX Spark — Pipeline Logical Diagram

Click image to expand


draw.io Diagrams

High Level

HCLS AI Factory — draw.io High Level

Click image to expand

Medium Level

HCLS AI Factory — draw.io Medium Level

Click image to expand


Pipeline Mindmap

View Pipeline Mindmap (PDF)


Hardware Platform

Component Specification
System NVIDIA DGX Spark
GPU GB10 Grace Blackwell Superchip
Memory 128 GB unified LPDDR5x
CPU ARM64 cores (Grace)
Storage NVMe SSD
Price $3,999

Deep Dives

  • Project Bible — Complete technical reference with scoring formulas, thresholds, and configurations
  • White Paper — Architecture overview and design rationale