Quick Start Guide¶
One-Line Commands¶
# Check prerequisites
./run.sh check
# Login to NGC
./run.sh login
# Download data (~200GB, several hours)
./run.sh download
# Setup reference genome
./run.sh reference
# Run fast test (chr20 only, ~5-20 min)
./run.sh test
# Run full genome (~120-240 min)
./run.sh full
Complete Workflow (Copy-Paste)¶
# Navigate to project
cd genomics-pipeline
# Run all setup steps
./run.sh check && \
./run.sh login && \
./run.sh download && \
./run.sh reference && \
./run.sh test
# If test succeeds, run full genome
./run.sh full
Minimal Test (Skip Large Download)¶
If you want to test the pipeline without downloading 200GB:
# 1. Check prerequisites
./run.sh check
# 2. Login to NGC
./run.sh login
# 3. Setup reference only
./run.sh reference
# 4. Use your own FASTQ files
# Copy them to: data/input/HG002_R1.fastq.gz
# data/input/HG002_R2.fastq.gz
# 5. Run pipeline
./run.sh test # or ./run.sh full
Monitor Progress¶
# Watch GPU utilization
watch -n 1 nvidia-smi
# Follow logs in real-time
tail -f data/output/logs/genome_fq2bam.log
# Check output files
ls -lh data/output/
Expected Output¶
After successful completion:
data/output/
├── HG002.chr20.bam # Chr20 test BAM
├── HG002.chr20.vcf.gz # Chr20 test VCF
├── HG002.genome.bam # Full genome BAM
├── HG002.genome.vcf.gz # Full genome VCF ← Main output!
├── HG002.genome.vcf.gz.tbi # VCF index
└── logs/ # All execution logs
Troubleshooting¶
GPU out of memory?
Download interrupted?
Start over?
Time Estimates¶
| Operation | Duration |
|---|---|
| Prerequisites check | < 1 min |
| NGC login | < 1 min |
| Data download | 2-6 hours |
| Reference setup | 5-15 min |
| Chr20 test | 5-20 min |
| Full genome | 120-240 min |
Next Steps¶
Use the output VCF for: - Evidence RAG analysis (Stage 2) - Variant annotation - Clinical interpretation - Research workflows
Need help? See README.md for detailed documentation.