Skip to content

Quick Start Guide

One-Line Commands

# Check prerequisites
./run.sh check

# Login to NGC
./run.sh login

# Download data (~200GB, several hours)
./run.sh download

# Setup reference genome
./run.sh reference

# Run fast test (chr20 only, ~5-20 min)
./run.sh test

# Run full genome (~120-240 min)
./run.sh full

Complete Workflow (Copy-Paste)

# Navigate to project
cd genomics-pipeline

# Run all setup steps
./run.sh check && \
./run.sh login && \
./run.sh download && \
./run.sh reference && \
./run.sh test

# If test succeeds, run full genome
./run.sh full

Minimal Test (Skip Large Download)

If you want to test the pipeline without downloading 200GB:

# 1. Check prerequisites
./run.sh check

# 2. Login to NGC
./run.sh login

# 3. Setup reference only
./run.sh reference

# 4. Use your own FASTQ files
# Copy them to: data/input/HG002_R1.fastq.gz
#               data/input/HG002_R2.fastq.gz

# 5. Run pipeline
./run.sh test   # or ./run.sh full

Monitor Progress

# Watch GPU utilization
watch -n 1 nvidia-smi

# Follow logs in real-time
tail -f data/output/logs/genome_fq2bam.log

# Check output files
ls -lh data/output/

Expected Output

After successful completion:

data/output/
├── HG002.chr20.bam            # Chr20 test BAM
├── HG002.chr20.vcf.gz         # Chr20 test VCF
├── HG002.genome.bam           # Full genome BAM
├── HG002.genome.vcf.gz        # Full genome VCF ← Main output!
├── HG002.genome.vcf.gz.tbi    # VCF index
└── logs/                      # All execution logs

Troubleshooting

GPU out of memory?

# Edit config/pipeline.env
# Set: LOW_MEMORY=1
./run.sh full

Download interrupted?

# Re-run download script - aria2 will resume
./run.sh download

Start over?

# Clean outputs only (keeps downloaded data)
./run.sh clean

# Clean everything
./run.sh clean-all

Time Estimates

Operation Duration
Prerequisites check < 1 min
NGC login < 1 min
Data download 2-6 hours
Reference setup 5-15 min
Chr20 test 5-20 min
Full genome 120-240 min

Next Steps

Use the output VCF for: - Evidence RAG analysis (Stage 2) - Variant annotation - Clinical interpretation - Research workflows


Need help? See README.md for detailed documentation.