HCLS AI Factory -- 5-Minute Quickstart Video Script¶
Shot-by-shot storyboard for a screen recording demo of the HCLS AI Factory: an end-to-end precision medicine platform that transforms Patient DNA into Drug Candidates in under 5 hours on a single NVIDIA DGX Spark ($3,999). Apache 2.0 licensed, open-source. Created by Adam Jones (14+ years genomic research).
Overview¶
| Field | Detail |
|---|---|
| Duration | 5:00 (five minutes) |
| Format | Screen recording with voice-over narration; picture-in-picture camera optional |
| Audience | Bioinformaticians, computational biologists, translational researchers, AI-in-healthcare engineers |
| Tools needed | OBS Studio or ScreenFlow, terminal (font size 18+), browser (Chrome/Firefox, 1920x1080), microphone |
| Recording resolution | 1920x1080 @ 30 fps, exported as MP4 (H.264) |
| Tone | Conversational but technical -- assume the viewer understands VCFs, variant calling, and molecular docking |
Pre-Recording Checklist¶
- All services running (
./demo.sh --statusshows green across the board) - Browser tabs pre-loaded: Landing (:8080), Genomics (:5000), Chat (:8501), Drug Discovery (:8505), CAR-T (:8521), Imaging (:8525), Oncology (:8526)
- Terminal font size 18+ with dark background for readability
-
.env.exampleopen in an editor tab (VS Code or nano) for the Setup section - Chat UI history cleared so the VCP query is typed fresh on camera
- Drug Discovery UI reset to show a clean run for VCP/CB-5083
- Close all notifications, Slack, email -- clean desktop
Section 1: Opening (0:00 -- 0:30)¶
| Timecode | Shot | Screen Content | Narration | Notes |
|---|---|---|---|---|
| 0:00 | 1 | Terminal, clean prompt. | "This is the HCLS AI Factory -- an open-source platform that takes patient DNA and produces ranked drug candidates in under five hours, on a single NVIDIA DGX Spark that costs three thousand nine hundred ninety-nine dollars." | Speak steadily. Let the price point land. |
| 0:07 | 2 | Type: git clone https://github.com/ajones1923/hcls-ai-factory.git -- show clone output scrolling. |
"Everything is Apache 2.0. One git clone and you have the entire platform -- genomics, evidence reasoning, drug discovery, and three intelligence agents." | Let the clone finish or cut to completed output. |
| 0:18 | 3 | Type: cd hcls-ai-factory && ls -la -- show top-level directory listing with pipeline folders visible. |
"Three pipeline stages, a Nextflow orchestrator, a landing page, monitoring, and full test suites. Let me show you how to get it running." | Hold on directory listing for 3 seconds so viewers can read folder names. |
B-roll suggestion: Quick cut to a photo of the DGX Spark hardware sitting on a desk, overlaid with the text "GB10 GPU / 128 GB unified LPDDR5x / 20 ARM cores / $3,999". Hold for 3 seconds.
Transition: Direct cut to editor/terminal.
Section 2: Setup (0:30 -- 1:00)¶
| Timecode | Shot | Screen Content | Narration | Notes |
|---|---|---|---|---|
| 0:30 | 4 | Show .env.example in editor. Highlight the NGC_API_KEY, ANTHROPIC_API_KEY, and NVIDIA_API_KEY lines. |
"Configuration is a single .env file. You need three API keys: NGC for Parabricks, Anthropic for Claude, and NVIDIA for the BioNeMo NIM endpoints -- MolMIM and DiffDock." | Zoom in on the key lines. Do not show real key values. |
| 0:40 | 5 | Terminal: type cp .env.example .env then nano .env (briefly show editing, then exit). |
"Copy the example, drop in your keys, and you are configured. Everything else has sensible defaults -- Milvus on 19530, cloud NIM mode for ARM64 compatibility, Claude as the LLM provider." | Keep the nano view brief -- 3 seconds max. |
| 0:50 | 6 | Terminal: type ./setup-data.sh --status. Show the status dashboard output with download progress or completion checkmarks. |
"The setup script handles all data acquisition -- reference genomes, ClinVar, AlphaMissense, PDB structures. Run it with --status to see where you stand." | If data is already downloaded, the dashboard will show green checkmarks. That is ideal. |
B-roll suggestion: None needed. Fast-paced terminal work keeps attention.
Transition: Direct cut to ./demo.sh execution.
Section 3: Launch (1:00 -- 1:30)¶
| Timecode | Shot | Screen Content | Narration | Notes |
|---|---|---|---|---|
| 1:00 | 7 | Terminal: type ./demo.sh. Show the NVIDIA-green ASCII banner printing, prerequisites check (Docker, Python, Ollama), and service startup sequence. |
"One command launches everything. The demo script checks prerequisites, starts Milvus, spins up each pipeline UI, launches the intelligence agents, and opens the landing page." | Let the banner print fully -- it is visually distinctive. |
| 1:12 | 8 | Terminal continues: show services coming online -- "Waiting for Milvus... READY", "Waiting for RAG Streamlit... READY", etc. Green checkmarks appearing one by one. | "Each service gets a health check. When you see green across the board, you are ready to go. Full cold start takes about two to three minutes." | Hold on the checkmarks. Viewers will want to see every service come up. |
| 1:20 | 9 | Browser: Landing page at localhost:8080. Show the health grid with all services green, the three-stage pipeline diagram, and service links. |
"The landing page gives you the full picture -- every service, its port, its health status. This is your control panel. All green. Let us walk through each stage." | Mouse over a few service tiles to show they are clickable. |
B-roll suggestion: Optional lower-third overlay listing all ports: "Landing=8080 / Genomics=5000 / RAG API=5001 / Chat=8501 / Drug Discovery=8505 / Portal=8510 / CAR-T=8521 / Imaging=8525 / Oncology=8526".
Transition: Click the Genomics tile on the landing page, or direct-cut to :5000.
Section 4: Stage 1 -- GPU Genomics (1:30 -- 2:00)¶
| Timecode | Shot | Screen Content | Narration | Notes |
|---|---|---|---|---|
| 1:30 | 10 | Browser: Genomics Portal at localhost:5000. Show the HG002 sample info panel -- sample ID, input FASTQ size (~200 GB), reference genome. |
"Stage one is GPU-accelerated genomics. We are using NVIDIA Parabricks 4.6 with BWA-MEM2 for alignment and DeepVariant for variant calling. The demo sample is HG002 from the Genome in a Bottle consortium -- about 200 gigabytes of paired-end whole-genome sequencing data." | Emphasize "GPU-accelerated" -- this is the core differentiator over CPU pipelines. |
| 1:40 | 11 | Show the pipeline output section: variant count (11.7M), runtime (120-240 min on DGX Spark), accuracy (>99% concordance). | "On the DGX Spark, alignment through variant calling completes in two to four hours. The output is 11.7 million variant calls at greater than 99 percent concordance. On a CPU cluster, this same work takes 24 to 48 hours." | Pause briefly after "24 to 48 hours" to let the comparison land. |
| 1:50 | 12 | Scroll to show VCF output summary, or show a terminal snippet of a VCF header with the DeepVariant version tag. | "The VCF flows directly into Stage 2 -- no manual handoff, no file conversion, no waiting. This is the point where most traditional workflows stop and hand off to a separate bioinformatics team." | This sets up the seamless pipeline narrative. |
B-roll suggestion: Side-by-side comparison graphic: "CPU Cluster: 24-48 hrs, $50K-500K+" vs. "DGX Spark: 2-4 hrs, $3,999". Hold for 4 seconds.
Transition: Click through to Chat UI or direct-cut to :8501.
Section 5: Stage 2 -- RAG/Chat Evidence Engine (2:00 -- 3:00)¶
| Timecode | Shot | Screen Content | Narration | Notes |
|---|---|---|---|---|
| 2:00 | 13 | Browser: Chat UI at localhost:8501. Clean interface, empty chat history, input box visible. |
"Stage two is where we turn variants into actionable targets. This is a retrieval-augmented generation system backed by Milvus with 3.56 million searchable vectors -- ClinVar variants, AlphaMissense pathogenicity predictions, and a curated knowledge base covering 201 genes across 13 therapeutic areas." | Speak at a measured pace. These numbers matter to the audience. |
| 2:15 | 14 | Type into the chat box: "What is known about VCP mutations in frontotemporal dementia? What variants are pathogenic and what makes VCP a druggable target?" Press enter. | "Let me query for VCP -- Valosin-Containing Protein -- in the context of frontotemporal dementia. This is a real research question with a known druggable target." | Type at a readable speed. Viewers will want to see the query. |
| 2:25 | 15 | Show Claude's response streaming in. The response should include: ClinVar variant matches with clinical significance, AlphaMissense pathogenicity scores, structural and functional context for VCP, and a druggability assessment. | "Claude synthesizes evidence from all three collections in under five seconds. You can see ClinVar hits with clinical significance ratings, AlphaMissense pathogenicity scores for specific missense variants, and a druggability assessment grounded in the structural data." | Let the response stream for several seconds. Do not rush past it. The quality of the synthesis is a key differentiator. |
| 2:42 | 16 | Scroll through the response to show specific variant details -- rsIDs, amino acid changes, significance classifications. Highlight any Milvus hit counts if displayed in the UI. | "Every claim is backed by vector-retrieved evidence. This is not a hallucinating chatbot -- it is a grounded reasoning engine. 85 percent of the 201 genes in the knowledge base have confirmed druggable targets, and VCP is one of them." | Point the cursor at specific evidence citations as you speak. |
| 2:55 | 17 | Optionally show a second query or show the sidebar with collection statistics (vector counts, embedding model info). | "You can drill deeper -- ask about specific variants, compare across diseases, explore structural implications. The system handles it conversationally, but every answer traces back to indexed evidence." | Keep this brief. The point is to show depth without burning time. |
B-roll suggestion: Lower-third overlay: "3.56M vectors / ClinVar ~2.7M / AlphaMissense 71M / 201 genes / 13 therapeutic areas / <5 sec query latency".
Transition: Narrate the handoff: "VCP is our target. Now let us generate drug candidates." Direct-cut to :8505.
Section 6: Stage 3 -- Drug Discovery (3:00 -- 4:00)¶
| Timecode | Shot | Screen Content | Narration | Notes |
|---|---|---|---|---|
| 3:00 | 18 | Browser: Drug Discovery UI at localhost:8505. Show the target input panel with VCP selected, seed compound CB-5083 entered, PDB structures listed (5FTK, 8OOI, 9DIL, 7K56). |
"Stage three takes the validated target and generates novel drug candidates. We are targeting VCP with CB-5083 as the seed compound -- a known VCP inhibitor. The system pulls crystal structures from RCSB PDB automatically." | Point to each PDB ID as you mention it. |
| 3:12 | 19 | Show the molecule generation step: MolMIM producing SMILES strings, generation progress indicator. | "MolMIM, one of NVIDIA's BioNeMo NIMs, generates structurally novel molecules seeded from CB-5083. It runs in cloud mode against the NVIDIA health API -- no local GPU container needed, which is critical for ARM64 compatibility on the DGX Spark." | Mention cloud NIM explicitly -- this is an architecture decision viewers will care about. |
| 3:25 | 20 | Show the docking step: DiffDock progress, confidence scores appearing for each molecule-protein pair. | "Each generated molecule gets docked against the VCP crystal structures using DiffDock. We run 10 poses per molecule and score them for binding affinity and geometric confidence." | Let a few docking scores appear on screen. |
| 3:40 | 21 | Show the ranked candidates table: columns for molecule ID, SMILES, docking score, QED, Lipinski pass/fail, composite score. Scroll through the top 10-20 candidates. | "The output is a ranked table of drug candidates scored on docking affinity, drug-likeness via QED, and Lipinski Rule of Five compliance. Our top VCP candidate shows a 39 percent composite improvement over the CB-5083 seed compound." | Slow down on "39 percent composite improvement" -- this is the headline result. |
| 3:52 | 22 | Optionally click into a top candidate to show its 2D structure visualization or 3D docking pose if the UI supports it. | "The entire drug discovery stage -- from target input to ranked candidates -- runs in 8 to 16 minutes. That is structure retrieval, molecule generation, 3D conformer creation, molecular docking, scoring, and report generation." | Summarize the sub-steps to reinforce the automation. |
B-roll suggestion: Split-screen showing the MolMIM SMILES output on the left and the ranked table on the right. Overlay text: "100 candidates / 10 poses each / 8-16 min total".
Transition: "Beyond the core pipeline, we have built three intelligence agents. Let me show you." Direct-cut to agent UIs.
Section 7: Intelligence Agents (4:00 -- 4:30)¶
| Timecode | Shot | Screen Content | Narration | Notes |
|---|---|---|---|---|
| 4:00 | 23 | Browser: CAR-T Intelligence Agent at localhost:8521. Show the main interface with collection selector, evidence search, or a sample query result. |
"The CAR-T Intelligence Agent is a specialized evidence engine for chimeric antigen receptor T-cell therapy. Ten dedicated collections, 6,266 vectors, and it handles everything from target antigen analysis to manufacturing protocol evidence. 241 tests, all passing." | Quick tour -- do not linger. 10 seconds max per agent. |
| 4:10 | 24 | Browser: Imaging Intelligence Agent at localhost:8525. Show the interface with NIM service indicators (VISTA-3D, MAISI, VILA-M3, Llama-3). |
"The Imaging Intelligence Agent integrates four NVIDIA NIM microservices -- VISTA-3D for segmentation, MAISI for synthetic imaging, VILA-M3 for visual question answering, and Llama-3 for report generation. Ten collections, 539 tests. It produces FHIR R4 DiagnosticReports." | Mention FHIR R4 -- it signals clinical interoperability to the audience. |
| 4:20 | 25 | Browser: Precision Oncology Agent at localhost:8526. Show the MTB packet generation interface or a case summary view. |
"The Precision Oncology Agent handles molecular tumor board workflows -- case creation, therapy ranking, clinical trial matching, and MTB packet generation. Eleven collections, 516 tests, FHIR R4 bundle export." | This is the newest agent. Keep it crisp. |
B-roll suggestion: Lower-third overlay table: "CAR-T: 10 collections, 241 tests / Imaging: 10 collections, 4 NIMs, 539 tests / Oncology: 11 collections, 516 tests / Total: 1,296 tests in 3.78 sec".
Transition: "Let me bring it all together." Direct-cut back to landing page.
Section 8: Closing (4:30 -- 5:00)¶
| Timecode | Shot | Screen Content | Narration | Notes |
|---|---|---|---|---|
| 4:30 | 26 | Browser: Return to Landing page at localhost:8080. All services green. |
"Here is the full picture. Every service healthy. Three pipeline stages, three intelligence agents, vector database, monitoring -- all running on a single workstation." | Let the green health grid fill the screen. |
| 4:38 | 27 | Hold on landing page. Overlay or narrate the key numbers. | "Let me put the numbers in context. End-to-end, this platform goes from raw FASTQ to ranked drug candidates in under five hours. The traditional approach takes 6 to 18 months -- that is a 99 percent reduction in time. The hardware is a $3,999 DGX Spark versus the $50,000 to $500,000 you would typically spend on cluster infrastructure and software licenses." | Speak deliberately. These are the numbers the viewer will remember. |
| 4:50 | 28 | Show the GitHub URL in the browser address bar or navigate to the GitHub repo page. | "171 druggable targets across 13 therapeutic areas. 3.56 million searchable vectors. 1,296 agent tests running in under 4 seconds. And every line of it is Apache 2.0 on GitHub." | Rattle off the stats with confidence. |
| 4:55 | 29 | Browser on GitHub repo page, or terminal with the repo URL displayed. Clean ending frame. | "Clone it, extend it, build on it. The link is in the description. I am Adam Jones -- thanks for watching." | End with a clean pause. Hold the frame for 3 seconds of silence before the video ends. |
B-roll suggestion: End card with GitHub URL (github.com/ajones1923/hcls-ai-factory), Apache 2.0 badge, and DGX Spark photo. Hold for 5 seconds.
Transition: Fade to black.
Summary of Key Numbers (Reference for Narration)¶
Keep these numbers handy during recording. They appear throughout the script but are consolidated here for quick reference during retakes.
| Metric | Value |
|---|---|
| End-to-end time | < 5 hours |
| Traditional approach | 6-18 months |
| Time reduction | ~99% |
| Hardware cost | $3,999 (DGX Spark) |
| Traditional infrastructure | $50K-500K+ |
| GPU | NVIDIA GB10 Grace Blackwell |
| Memory | 128 GB unified LPDDR5x |
| CPU | 20 ARM cores (Grace) |
| Variant calls (Stage 1) | ~11.7 million |
| Stage 1 runtime | 120-240 min |
| Stage 1 accuracy | >99% concordance |
| CPU baseline for Stage 1 | 24-48 hours |
| Searchable vectors (Milvus) | 3.56 million |
| ClinVar records | ~2.7 million |
| AlphaMissense records | 71 million |
| Knowledge base genes | 201 across 13 therapeutic areas |
| Druggable targets | 171 (85% of 201) |
| Query latency (Stage 2) | < 5 seconds |
| Stage 3 runtime | 8-16 min |
| Top VCP candidate improvement | +39% composite over CB-5083 |
| CAR-T Agent tests | 241 |
| Imaging Agent tests | 539 |
| Precision Oncology Agent tests | 516 |
| Total agent tests | 1,296 in 3.78 sec |
| License | Apache 2.0 |
Post-Production Notes¶
- Intro/Outro cards: Use the project banner from
docs/diagrams/hcls-ai-factory-diagram.pngas the opening card. GitHub repo URL and Apache 2.0 badge on the end card. - Lower thirds: Use a consistent lower-third style for all metric overlays. White text on a semi-transparent dark bar. Keep font size large enough to read at 720p.
- Music: Optional low-energy ambient track under the narration. Keep it subtle -- the audience is technical and will tune out anything distracting. No music during the terminal and chat UI sections where the viewer is reading code.
- Captions: Generate closed captions. Many viewers in academic and clinical settings watch without sound.
- Thumbnail: DGX Spark hardware photo with overlay text: "Patient DNA to Drug Candidates / 5 Hours / $3,999".
- Video description: Include the GitHub URL, the three-stage pipeline summary, hardware specs, and a link to the full documentation.
- Upload targets: YouTube (primary), LinkedIn (native upload for better reach), project docs site (embedded).