Licensing and Cost Guide¶

What's free, what requires API keys, and what changes at enterprise scale.

HCLS AI Factory Code¶

All code in this repository — pipeline scripts, orchestrator, portal, documentation — is released under the Apache 2.0 license. You can use, modify, and distribute it freely.

Third-Party Component Licenses¶

The HCLS AI Factory integrates open-source tools, NVIDIA software, and a commercial API. Each has its own license terms.

Open-Source Components (No Cost)¶

Component	Technology	License
Genomics alignment	BWA-MEM2	MIT
Variant calling	Google DeepVariant	BSD-3-Clause
Vector database	Milvus 2.4	Apache 2.0
Embeddings	BGE-small-en-v1.5 (BAAI)	MIT
Chemistry toolkit	RDKit	BSD-3-Clause
Pipeline orchestration	Nextflow	Apache 2.0
Web interface	Streamlit	Apache 2.0
Monitoring	Grafana / Prometheus	AGPLv3 / Apache 2.0
Containers	Docker	Apache 2.0
PDF generation	ReportLab	BSD
VCF parsing	cyvcf2	MIT
3D visualization	py3Dmol	MIT

These are fully open-source. No API keys, no license fees, no restrictions on commercial use.

NVIDIA Software¶

Component	License	Dev/Test Cost	Production Cost
Parabricks 4.x	NVIDIA EULA	Free — no license required	Free — enterprise support optional via NVAIE
BioNeMo MolMIM (NIM)	NVIDIA AI Product Agreement	Free on DGX Spark	NVAIE subscription required
BioNeMo DiffDock (NIM)	NVIDIA AI Product Agreement	Free on DGX Spark	NVAIE subscription required

Key details:

Parabricks is free for all users. No license is required for Parabricks 4.x and later. Enterprise support is available through NVIDIA AI Enterprise (NVAIE) but is not required to run the software.
BioNeMo NIMs (MolMIM and DiffDock) are available at no cost for research, development, and testing through the NVIDIA Developer Program. On DGX Spark and other Grace Blackwell client systems, NIMs designated as Free SDKs may be used without a subscription when used on a personal workstation and not serving multiple users. Moving to production (hospital deployments, multi-user systems, or enterprise scale) requires an NVIDIA AI Enterprise subscription — currently $4,500/GPU/year.
NGC API Key is required to pull NIM containers. Free to obtain through the NVIDIA Developer Program.

Commercial API¶

Component	Provider	License	Cost
Claude (RAG reasoning)	Anthropic	Commercial API	Pay-per-token (pricing)

An Anthropic API key is required for Stage 2 RAG chat. Claude is a commercial service — not open-source, not free. The pipeline defaults to Claude Sonnet 4 ($3 per million input tokens, $15 per million output tokens). For a typical development session (10–50 RAG queries), expect roughly $0.50–$3 per session. Monthly costs during active development are typically $5–$50/month depending on query volume. See Anthropic API Pricing for current rates.

Cost Summary by Deployment Phase¶

Phase	Hardware	Software Licenses	API Keys
Phase 1 — Proof Build	DGX Spark ($3,999)	None required	Anthropic + NGC (free)
Phase 2 — Departmental	DGX B200 ($500K-$1M)	NVAIE ($4,500/GPU/year)	Anthropic + NGC
Phase 3 — Enterprise	DGX SuperPOD ($7M-$60M+)	NVAIE ($4,500/GPU/year)	Anthropic + NGC

For a Phase 1 proof build on DGX Spark, the only recurring costs are Anthropic API usage. All NVIDIA software components — Parabricks, MolMIM, and DiffDock — are available at no additional cost for research and development.

What "Open Project" Means¶

The HCLS AI Factory is an open project: all code authored for this platform is Apache 2.0, published on GitHub, and free to use.

The platform depends on components with different license models. The open-source tools (Milvus, RDKit, Nextflow, DeepVariant, BWA-MEM2) are fully free. The NVIDIA NIM containers are free for development on DGX Spark but require enterprise licensing at scale. The Anthropic Claude API is a paid commercial service.

This is a common pattern in modern AI platforms — open orchestration and code, with commercial or proprietary components at the inference layer.