Ending the Diagnostic Odyssey: A Multi-Collection RAG Architecture for Clinical Document Intelligence and Genomic-Autoimmune Correlation¶

Author: Adam Jones Date: March 2026 Version: 0.1.0 (Pre-Implementation) License: Apache 2.0

Part of the HCLS AI Factory -- an end-to-end precision medicine platform. https://github.com/ajones1923/hcls-ai-factory

Abstract¶

Autoimmune diseases affect approximately 50 million Americans and 800 million people worldwide, yet the average time from symptom onset to definitive diagnosis remains staggeringly long: 4.6 years for systemic lupus erythematosus (SLE), 5-7 years for postural orthostatic tachycardia syndrome (POTS), 6-10 years for celiac disease, 4-5 years for multiple sclerosis, and over 10 years for conditions like Ehlers-Danlos syndrome and systemic mastocytosis. This "diagnostic odyssey" results from a fundamental structural problem: autoimmune diseases are multi-system disorders that generate fragmented clinical data across dozens of specialist encounters, hundreds of laboratory tests, imaging studies, procedure reports, and genetic evaluations -- yet no existing tool synthesizes this longitudinal patient journey into a coherent diagnostic picture.

This paper presents the architectural design, clinical rationale, and product requirements for the Precision Autoimmune Agent -- a clinical document intelligence system built on multi-collection retrieval-augmented generation (RAG) that ingests thousands of patient clinical documents from the complete medical journey, identifies patterns across laboratory results, specialist assessments, imaging findings, and genomic data, and cross-references these patterns against known autoimmune disease signatures to dramatically accelerate diagnosis. The agent will unify 14 specialized Milvus vector collections spanning clinical documents (progress notes, lab reports, imaging reports, pathology, procedure notes), autoimmune reference knowledge (literature, clinical trials, autoantibody databases, HLA-disease associations, disease activity indices), genomic data (3.5 million variants from the HCLS AI Factory genomics pipeline), and longitudinal biomarker tracking -- enabling queries like "What patterns in this patient's 3-year clinical history suggest an underlying autoimmune etiology?" or "Are there genomic variants in this patient that increase susceptibility to lupus nephritis?"

The system extends the proven multi-collection RAG architecture established by five existing intelligence agents in the HCLS AI Factory (Precision Biomarker, Precision Oncology, CAR-T, Imaging, and Autoimmune prototype), adapting it with clinical document ingestion pipelines capable of processing thousands of patient records, NLP-based entity extraction for laboratory values and clinical findings, longitudinal biomarker trend analysis, HLA-disease association scoring with 50+ known associations, autoantibody panel interpretation across 14 antibody types, disease activity scoring (DAS28, SLEDAI-2K, CDAI, BASDAI), flare prediction algorithms, and pharmacogenomic-guided biologic therapy recommendations for 8 drug classes. Eight reference clinical workflows will cover the highest-impact diagnostic and monitoring scenarios: diagnostic odyssey acceleration, lupus nephritis surveillance, POTS/dysautonomia evaluation, inflammatory arthritis differentiation, overlap syndrome detection, biologic therapy optimization, flare prediction and prevention, and genomic-autoimmune risk profiling.

The agent will deploy on a single NVIDIA DGX Spark ($3,999) using BGE-small-en-v1.5 embeddings (384-dimensional, IVF_FLAT, COSINE), Claude Sonnet 4.6 for evidence synthesis, and shared NVIDIA NIM microservices for on-device inference. Licensed under Apache 2.0, the platform will democratize access to integrated autoimmune intelligence that currently requires multi-million-dollar institutional investments in informatics infrastructure -- bringing the diagnostic power of world-class rheumatology and immunology centers to any clinic worldwide.

Table of Contents¶

Introduction
The Autoimmune Diagnostic Crisis
Clinical Landscape and Market Analysis
Existing HCLS AI Factory Architecture
Precision Autoimmune Agent Architecture
Clinical Document Ingestion Pipeline
Milvus Collection Design
Clinical Workflows
Cross-Modal Integration and Genomic Correlation
NIM Integration Strategy
Knowledge Graph Design
Query Expansion and Retrieval Strategy
API and UI Design
Clinical Decision Support Engines
Reporting and Interoperability
Product Requirements Document
Data Acquisition Strategy
Validation and Testing Strategy
Regulatory Considerations
DGX Compute Progression
Implementation Roadmap
Risk Analysis
Competitive Landscape
Discussion
Conclusion
References

1. Introduction¶

1.1 The Autoimmune Disease Burden¶

Autoimmune diseases represent one of the most significant and underappreciated categories of chronic illness worldwide. The National Institutes of Health estimates that over 80 distinct autoimmune conditions collectively affect approximately 50 million Americans -- more than heart disease (30 million) and cancer (18 million) combined. Globally, the World Health Organization estimates 800 million people live with autoimmune conditions, with prevalence rising at 3-9% annually in developed nations.

The clinical and economic burden is staggering:

Annual U.S. healthcare costs: $100+ billion in direct medical expenses
Disability: Autoimmune diseases are among the top 10 leading causes of death in women under 65
Workforce impact: 25% of autoimmune patients report being unable to work during flares
Mental health: 30-40% prevalence of comorbid depression and anxiety
Mortality: SLE carries a standardized mortality ratio of 2.6; systemic sclerosis 3.5; systemic vasculitis 2.7

Yet despite this enormous burden, autoimmune diseases remain among the most difficult conditions to diagnose. The fundamental challenge is that autoimmune diseases are systemic -- they affect multiple organ systems simultaneously, producing constellations of symptoms that span the expertise of dozens of medical specialties. A lupus patient may see a rheumatologist for joint pain, a dermatologist for rashes, a nephrologist for kidney involvement, a hematologist for cytopenias, a neurologist for cognitive dysfunction, and a cardiologist for pericarditis -- each specialist generating clinical documentation in isolated electronic health record (EHR) silos that no existing system synthesizes into a unified diagnostic picture.

1.2 The Diagnostic Odyssey¶

The term "diagnostic odyssey" describes the years-long journey that autoimmune patients endure before receiving a correct diagnosis. Published data quantify this crisis:

Condition	Average Time to Diagnosis	Specialists Seen	Source
Systemic lupus erythematosus (SLE)	4.6 years	3-5	Lupus Foundation of America
Postural orthostatic tachycardia syndrome (POTS)	5.9 years	7+	Dysautonomia International
Ehlers-Danlos syndrome (hEDS)	10-12 years	8+	Ehlers-Danlos Society
Celiac disease	6-10 years	3-4	Celiac Disease Foundation
Multiple sclerosis	4-5 years	2-4	National MS Society
Ankylosing spondylitis	6-8 years	3-5	Spondylitis Association
Systemic mastocytosis	8-10 years	5+	The Mastocytosis Society
Myasthenia gravis	2-3 years	3-4	Myasthenia Gravis Foundation
Sjogren's syndrome	3-4 years	2-4	Sjogren's Foundation
Autoimmune encephalitis	1-3 years	4-6	Autoimmune Encephalitis Alliance

The consequences of delayed diagnosis are not merely inconvenient -- they are medically devastating. Each year of undiagnosed lupus nephritis increases the risk of irreversible kidney damage. Each year of untreated inflammatory arthritis produces measurable joint erosion. Each year of undiagnosed celiac disease compounds the risk of lymphoma, osteoporosis, and infertility. The diagnostic odyssey is not a quality-of-life issue; it is a patient safety crisis.

1.3 Why Clinical Document Intelligence Is the Solution¶

The data needed to diagnose most autoimmune patients already exists -- it is scattered across their medical records. A patient who has seen 7 specialists over 5 years may have generated:

50-200 progress notes documenting symptom patterns, physical exam findings, and clinical impressions
100-500 laboratory results tracking inflammatory markers, autoantibodies, complement levels, and organ function
10-30 imaging studies (X-rays, MRIs, CT scans, ultrasounds) with radiologist interpretations
5-15 procedure reports (biopsies, endoscopies, nerve conduction studies, tilt table tests)
Genetic testing results (HLA typing, whole exome sequencing, pharmacogenomic panels)
Medication histories documenting treatment responses and failures

No human clinician can read thousands of pages of clinical documentation, identify subtle patterns across years of data, and correlate those patterns with the 80+ known autoimmune diseases and their genomic risk factors. This is precisely the task that a multi-collection RAG system with clinical document ingestion, NLP entity extraction, and genomic cross-referencing is designed to solve.

1.4 Our Contribution¶

The Precision Autoimmune Agent addresses the diagnostic odyssey through a clinical document intelligence architecture that:

Ingests thousands of patient clinical documents (progress notes, lab reports, imaging reports, pathology, procedures) via OCR and NLP pipelines into patient-specific vector collections
Extracts structured entities from unstructured clinical text -- laboratory values, autoantibody results, imaging findings, medication changes, symptom patterns -- using medical NLP models
Cross-references extracted patterns against 14 specialized Milvus collections containing autoimmune reference knowledge, HLA-disease associations (50+ conditions), autoantibody databases (14 antibody-disease maps), and 3.5 million genomic variants
Performs longitudinal biomarker trend analysis to detect rising inflammatory markers, falling complement levels, and other flare-predictive patterns that human review would miss
Calculates validated disease activity scores (DAS28-CRP, DAS28-ESR, SLEDAI-2K, CDAI, BASDAI) from extracted clinical data
Provides pharmacogenomic-guided biologic therapy recommendations for 8 drug classes based on HLA typing and genotype data
Generates diagnostic hypothesis reports that synthesize thousands of clinical data points into prioritized differential diagnoses with supporting evidence citations
Runs on a single NVIDIA DGX Spark ($3,999), democratizing access to diagnostic intelligence that currently requires multi-million-dollar institutional informatics platforms

2. The Autoimmune Diagnostic Crisis¶

2.1 Data Fragmentation as the Root Cause¶

The diagnostic odyssey is not primarily a knowledge gap -- rheumatologists and immunologists understand autoimmune diseases well. The crisis is fundamentally a data synthesis problem. Patient data is fragmented across:

Multiple EHR systems: Patients who see specialists at different health systems generate records in incompatible EHRs (Epic, Cerner, Allscripts, Meditech). Even within a single system, clinical notes, laboratory results, and imaging reports are stored in separate modules.
Specialty-specific documentation: Each specialist documents findings relevant to their domain. A dermatologist notes a malar rash but does not order complement levels. A nephrologist documents proteinuria but does not review the dermatology notes describing the rash. The pattern (rash + proteinuria + low complement = lupus) exists in the aggregate record but is never seen by any single provider.
Temporal dispersion: Autoimmune symptoms evolve over months to years. A laboratory finding from 18 months ago (mildly positive ANA at 1:160) takes on completely different significance when combined with a physical exam finding from last week (Raynaud's phenomenon) and a lab result from last month (elevated anti-centromere antibody). No existing tool performs this temporal pattern recognition across the full clinical timeline.
Laboratory data in unstructured formats: Many laboratory results arrive as free-text reports (especially autoantibody panels, HLA typing results, pathology reports, and genetic testing). Extracting structured values from these reports requires medical NLP.
Genomic data in separate systems: When patients undergo HLA typing, whole exome sequencing, or pharmacogenomic testing, these results typically reside in genetic testing portals (Invitae, GeneDx, 23andMe, Color Health) disconnected from the clinical EHR.

2.2 The Overlapping Disease Problem¶

Autoimmune diseases frequently coexist, creating diagnostic complexity that exceeds the capacity of single-disease frameworks:

Overlap syndromes: Mixed connective tissue disease (MCTD), overlap of SLE with systemic sclerosis or myositis, "rhupus" (RA + SLE overlap)
POTS as a comorbid condition: POTS frequently co-occurs with Ehlers-Danlos syndrome (hEDS), mast cell activation syndrome (MCAS), small fiber neuropathy, and Sjogren's syndrome -- forming the POTS/hEDS/MCAS triad
Autoimmune polyendocrine syndromes: Type 1 (Addison's + hypoparathyroidism + chronic mucocutaneous candidiasis), Type 2 (Addison's + autoimmune thyroid disease + Type 1 diabetes)
Familial clustering: First-degree relatives of autoimmune patients have 2-10x higher risk of developing any autoimmune disease, not just the same disease

These overlapping patterns require a system that can simultaneously evaluate evidence for multiple autoimmune conditions and identify syndromic clusters that human pattern recognition would miss.

2.3 Why Existing Tools Fall Short¶

Current approaches to autoimmune diagnostics fail because they address only one dimension of the problem:

Classification criteria tools (ACR/EULAR calculators) require structured input that clinicians must manually extract from records -- they cannot ingest raw clinical documents.
Laboratory reference systems (UpToDate, DynaMed) provide disease-level reference information but cannot analyze individual patient data.
EHR analytics (Epic Cogito, Cerner HealtheIntent) offer population-level dashboards but lack the medical NLP and genomic integration needed for individual diagnostic reasoning.
General AI assistants lack the structured autoimmune knowledge graphs, validated scoring systems, and clinical document ingestion pipelines required for rigorous diagnostic support.
Commercial autoimmune panels (Exagen AVISE, Labcorp ARUP) test for specific autoantibodies but do not integrate results with clinical history, genomic data, or longitudinal biomarker trends.

2.4 The Case for Clinical Document Intelligence¶

The Precision Autoimmune Agent represents a fundamentally different approach: rather than asking clinicians to manually input structured data into diagnostic calculators, the system ingests the patient's raw clinical documentation and extracts the patterns itself. This approach has three critical advantages:

Completeness: The system analyzes all available data, not just what a single provider remembers or chooses to enter. A positive ANA from 3 years ago, a borderline complement level from 18 months ago, and a new arthritis presentation from last week are all considered simultaneously.
Pattern recognition at scale: A human clinician reviewing 200 clinical documents might miss the subtle temporal pattern of rising anti-dsDNA titers preceding each lupus flare. The system identifies these patterns algorithmically.
Genomic correlation: By cross-referencing extracted clinical findings with the 3.5 million genomic variants in the shared genomic_evidence collection, the system can identify genetic risk factors (HLA alleles, STAT4 variants, IRF5 polymorphisms) that strengthen or weaken specific diagnostic hypotheses.

3. Clinical Landscape and Market Analysis¶

3.1 Autoimmune AI Market¶

The autoimmune diagnostics and AI market is experiencing rapid growth driven by rising prevalence, diagnostic delay awareness, and precision medicine adoption:

Metric	Value	Source
Global autoimmune diagnostics market (2025)	$5.2 billion	Grand View Research
Projected market (2030)	$8.7 billion	Grand View Research
CAGR	10.8%	Grand View Research
AI in clinical diagnostics (2025)	$2.1 billion	MarketsandMarkets
AI in clinical diagnostics (2030)	$6.8 billion	MarketsandMarkets
U.S. autoimmune disease prevalence	50 million	NIH/AARDA
Global autoimmune prevalence	800 million	WHO
Average diagnostic delay (all autoimmune)	4.5 years	AARDA

3.2 Competitive Analysis¶

Solution	Approach	Limitations
Exagen AVISE	Lupus-specific autoantibody panel with cell-bound complement	Single-disease focus; no clinical document ingestion; no genomics
DxTerity AutoImmune Profile	At-home RNA expression testing	Screening only; no longitudinal integration; no imaging/clinical notes
IBM Watson for Genomics	Genomic variant interpretation	Discontinued; no autoimmune-specific workflows; no clinical document NLP
Google Health DeepMind	Medical image analysis	Imaging only; no laboratory, clinical note, or genomic integration
Epic Cognitive Computing	EHR-integrated NLP	Single-EHR vendor; limited autoimmune-specific knowledge; no genomic correlation
Precision Autoimmune Agent	Multi-collection RAG with clinical document ingestion, genomic correlation, 14 collections	Open-source; $3,999 hardware; cross-system document ingestion

3.3 Target Users¶

User Segment	Estimated Size	Primary Value
Academic rheumatology centers	150+ U.S. programs	Diagnostic acceleration, research data extraction
Community rheumatologists	5,500 U.S. practitioners	Complex case support, genomic interpretation
Primary care physicians	250,000+ U.S. PCPs	Early autoimmune detection, appropriate referral
Immunology researchers	15,000+ globally	Literature synthesis, clinical-genomic correlation
Patient advocacy organizations	100+ autoimmune foundations	Patient empowerment, second opinion support
Rare disease diagnostic centers	50+ NIH UDP, UDNI	Undiagnosed autoimmune case evaluation
Pharma/biotech (autoimmune pipeline)	200+ companies	Clinical trial design, biomarker identification

4. Existing HCLS AI Factory Architecture¶

4.1 Platform Overview¶

The HCLS AI Factory is a three-stage precision medicine pipeline that processes a patient sample from raw DNA sequencing data to drug candidate molecules in under 5 hours on a single NVIDIA DGX Spark:

Stage	Duration	Process	Output
1. Genomics	120-240 min	FASTQ to VCF via Parabricks 4.6, BWA-MEM2, DeepVariant	11.7 million variants
2. RAG/Chat	Interactive	ClinVar (~2.7M) + AlphaMissense (71M) + Milvus (3.5M vectors) + Claude	Variant interpretation, gene-drug associations
3. Drug Discovery	8-16 min	MolMIM generation + DiffDock docking + RDKit scoring	Candidate molecules with docking scores

Five intelligence agents currently extend Stage 2:

Agent	Collections	Vectors	Port (API/UI)	Focus
Precision Biomarker	10	~890	8510/8511	Biomarker interpretation
Precision Oncology	10	~950	8514/8515	Cancer variant analysis
CAR-T Intelligence	11	3,567,436	8521/8522	Cell therapy development
Imaging Intelligence	10	876	8524/8525	Medical imaging AI
Autoimmune (prototype)	0	0	—	Agent class only (331 lines)

The Precision Autoimmune Agent will be the sixth full intelligence agent, with the unique addition of a clinical document ingestion pipeline that enables patient-specific analysis at scale.

4.2 Shared Infrastructure¶

All intelligence agents share core infrastructure:

Vector database: Milvus 2.4 on localhost:19530 with etcd and MinIO backing
Embedding model: BGE-small-en-v1.5 (384-dimensional, IVF_FLAT index, COSINE similarity)
LLM: Claude Sonnet 4.6 (primary) with Llama-3 8B NIM fallback
Shared collection: genomic_evidence (3,561,170 variants) -- read-only access from all agents
Monitoring: Prometheus + Grafana dashboards
Configuration: Pydantic BaseSettings pattern

4.3 Proven Patterns Adapted for Autoimmune¶

The Precision Autoimmune Agent extends established patterns:

Multi-collection parallel search: ThreadPoolExecutor dispatching to 14 collections simultaneously (adapted from CAR-T's 11-collection architecture)
Knowledge graph augmentation: 7-dictionary knowledge graph (HLA associations, autoantibodies, disease activity, biologics, flare biomarkers, overlap syndromes, dysautonomia) -- adapted from CAR-T's 6-dictionary pattern
Query expansion: 18 domain-specific expansion maps (250+ keywords to 2,000+ terms) -- adapted from CAR-T's 12-map pattern
Comparative analysis: Auto-detection and resolution of "X vs Y" queries (e.g., "Compare RA vs lupus arthritis")
Multi-format export: Markdown, JSON, PDF with NVIDIA branding, FHIR R4 DiagnosticReport

New capability unique to this agent: Clinical document ingestion pipeline with medical NLP entity extraction, enabling patient-specific analysis from raw clinical records rather than manually structured input.

5. Precision Autoimmune Agent Architecture¶

5.1 System Diagram¶

Patient Clinical Documents (PDFs, HL7 FHIR, CCDA, free text)
    |
    v
[Clinical Document Ingestion Pipeline]
  - OCR (Tesseract / Azure Document Intelligence)
  - Section segmentation (clinical notes, labs, imaging, pathology)
  - Medical NLP entity extraction (laboratory values, diagnoses, medications)
  - Temporal normalization (date extraction and timeline construction)
    |
    v
[Patient Document Collection] (patient-specific vectors in Milvus)
    |
    v
[BGE-small-en-v1.5 Embedding]
(384-dim, asymmetric query prefix)
    |
    v
[Parallel Search: 14 Milvus Collections]
(ThreadPoolExecutor, IVF_FLAT / COSINE)
  - 3 patient-specific collections (clinical_documents, patient_labs, patient_timeline)
  - 8 reference collections (literature, trials, autoantibodies, hla_associations,
    disease_activity, biologics, overlap_syndromes, guidelines)
  - 2 shared collections (genomic_evidence, biomarker_reference)
  - 1 cross-agent collection (imaging_findings)
    |
    v
[Query Expansion] (18 maps, 250+ keywords -> 2,000+ terms)
    |
    v
[Knowledge Graph Augmentation]
(50+ HLA associations, 14 autoantibody maps, 5 disease activity indices,
 8 biologic therapies, 3 flare biomarker patterns, 12 overlap syndromes,
 10 dysautonomia conditions)
    |
    v
[Longitudinal Pattern Analysis]
  - Biomarker trend detection (rising CRP, falling complement)
  - Autoantibody seroconversion tracking
  - Symptom pattern recognition across clinical notes
  - Medication response correlation
    |
    v
[Score-Weighted Merge & Rank]
(citation relevance: high >= 0.75, medium >= 0.60)
    |
    v
[Claude Sonnet 4.6] --> Grounded diagnostic hypotheses with evidence citations
    |
    v
[Export] --> Markdown | JSON | PDF | FHIR R4 DiagnosticReport

5.2 Design Principles¶

Patient-centric: Collections are organized around the patient's complete clinical journey, not around individual tests or encounters
Longitudinal: Temporal relationships between clinical events are preserved and queryable
Genomic-first: Every diagnostic hypothesis is automatically cross-referenced against the patient's genomic data (if available)
Evidence-grounded: Every claim cites specific clinical documents, laboratory values, or reference literature
Multi-disease aware: The system evaluates evidence for all 13+ supported autoimmune conditions simultaneously, detecting overlaps and polyautoimmunity
Privacy-preserving: Patient documents remain local on the DGX Spark; no clinical data is sent to cloud APIs (only anonymized queries reach the LLM)

5.3 Port Allocation¶

Service	Port	Protocol
FastAPI (REST endpoints)	8530	HTTP
Streamlit UI	8531	HTTP
Document ingestion webhook	8532	HTTP
Shared Milvus	19530	gRPC
Shared etcd	2379	gRPC
Shared MinIO	9000	HTTP

6. Clinical Document Ingestion Pipeline¶

6.1 The Core Innovation¶

The clinical document ingestion pipeline is the defining capability of the Precision Autoimmune Agent. Unlike other agents that work with curated reference data, this agent ingests a patient's actual clinical records -- potentially thousands of documents spanning years of care -- and transforms them into searchable, structured vectors that can be queried alongside reference knowledge.

6.2 Ingestion Architecture¶

Input Sources:
  - PDF clinical documents (scanned or digital)
  - HL7 FHIR R4 Bundles (from patient portals, EHR exports)
  - C-CDA documents (Consolidated Clinical Document Architecture)
  - Free-text clinical notes (typed or dictated)
  - Laboratory result files (HL7 ORU messages, CSV exports)
  - Genetic testing reports (PDF from Invitae, GeneDx, Color Health)

Processing Pipeline:
  Step 1: Document Classification
    - Classify each document by type: progress_note, lab_report, imaging_report,
      pathology_report, procedure_note, genetic_report, medication_list, referral
    - Use document header patterns, section markers, and content heuristics

  Step 2: OCR and Text Extraction
    - Digital PDFs: direct text extraction via PyMuPDF
    - Scanned PDFs: OCR via Tesseract with medical vocabulary enhancement
    - HL7/FHIR: structured field extraction via fhir.resources library
    - C-CDA: XML parsing with section-level extraction

  Step 3: Section Segmentation
    - Identify clinical note sections: Chief Complaint, HPI, Review of Systems,
      Physical Exam, Assessment/Plan, Laboratory Results, Imaging
    - Segment laboratory reports into individual test results
    - Extract structured findings from imaging report impressions

  Step 4: Medical NLP Entity Extraction
    - Laboratory values: extract test name, numeric value, unit, reference range, flag
    - Medications: extract drug name, dose, frequency, route, start/stop dates
    - Diagnoses: extract ICD-10 codes, problem descriptions, laterality
    - Symptoms: extract symptom descriptions, severity, duration, anatomic location
    - Vital signs: extract HR, BP, temperature, weight, orthostatic measurements
    - Autoantibodies: extract antibody name, titer, pattern, interpretation

  Step 5: Temporal Normalization
    - Extract document dates and encounter dates
    - Construct patient timeline with all clinical events
    - Normalize date formats across different source systems
    - Calculate intervals between events (onset-to-diagnosis, treatment-to-response)

  Step 6: Embedding and Indexing
    - Generate embedding text from structured entities + context
    - Embed with BGE-small-en-v1.5 (384-dim)
    - Insert into patient-specific Milvus collections with full metadata

6.3 Entity Extraction Models¶

The NLP pipeline uses a combination of approaches optimized for medical text:

Component	Model/Approach	Purpose
Named entity recognition	SciSpaCy (en_core_sci_lg)	Clinical entity detection
Negation detection	NegEx algorithm	Filter negated findings ("no rash")
Laboratory value extraction	Regex + rule-based parser	Structured lab result extraction
Medication extraction	MedEx-UIMA adapted patterns	Drug/dose/frequency parsing
Temporal expression	SUTime-based parser	Date and duration normalization
Section segmentation	SecTag algorithm patterns	Clinical note section identification
Abbreviation expansion	UMLS abbreviation dictionary	Medical abbreviation resolution

6.4 Privacy and Security¶

Clinical document ingestion introduces significant privacy considerations:

All processing is local: Document ingestion, NLP extraction, embedding, and Milvus indexing occur entirely on the DGX Spark. No patient data leaves the device.
LLM query anonymization: When the system sends a query to Claude for synthesis, the prompt contains only the question and retrieved evidence snippets -- not raw clinical documents. Patient identifiers are stripped from evidence text before LLM submission.
Collection isolation: Each patient's clinical documents are stored in a patient-specific Milvus partition, ensuring query isolation between patients.
Encryption at rest: Milvus data on the DGX Spark NVMe SSD uses LUKS full-disk encryption.
Access control: API endpoints for document ingestion and patient queries require JWT authentication with role-based access control (clinician, researcher, administrator).
Audit logging: All document ingestion events and patient queries are logged with timestamps, user identity, and document identifiers for HIPAA compliance.

7. Milvus Collection Design¶

7.1 Index Configuration¶

All collections share the same index configuration:

Parameter	Value
Embedding model	BGE-small-en-v1.5
Dimensions	384
Index type	IVF_FLAT
nlist	1024
Metric	COSINE
nprobe (search)	16

7.2 Collection Schemas¶

Collection 1: autoimmune_clinical_documents -- Patient clinical document chunks

Field	Type	Description
id	VARCHAR(64)	Unique chunk identifier
patient_id	VARCHAR(32)	Patient identifier
document_type	VARCHAR(32)	progress_note, lab_report, imaging_report, pathology, procedure, genetic_report
document_date	VARCHAR(10)	ISO-8601 date
provider_specialty	VARCHAR(32)	Specialty of documenting provider
section	VARCHAR(32)	Note section (hpi, ros, exam, assessment, plan, labs, imaging)
text_chunk	VARCHAR(4096)	Document text chunk
extracted_entities	VARCHAR(2048)	JSON-encoded extracted entities
source_system	VARCHAR(32)	EHR/source system identifier
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 2: autoimmune_patient_labs -- Structured laboratory results extracted from clinical documents

Field	Type	Description
id	VARCHAR(64)	Unique result identifier
patient_id	VARCHAR(32)	Patient identifier
test_name	VARCHAR(64)	Laboratory test name (standardized)
loinc_code	VARCHAR(16)	LOINC code for standardized identification
value	FLOAT	Numeric result value
unit	VARCHAR(16)	Unit of measurement
reference_low	FLOAT	Lower reference range
reference_high	FLOAT	Upper reference range
flag	VARCHAR(8)	H (high), L (low), A (abnormal), N (normal)
collection_date	VARCHAR(10)	ISO-8601 date
text_context	VARCHAR(512)	Surrounding clinical context
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 3: autoimmune_patient_timeline -- Longitudinal patient events for temporal analysis

Field	Type	Description
id	VARCHAR(64)	Unique event identifier
patient_id	VARCHAR(32)	Patient identifier
event_type	VARCHAR(32)	symptom_onset, diagnosis, lab_result, medication_change, procedure, flare, hospitalization
event_date	VARCHAR(10)	ISO-8601 date
event_description	VARCHAR(512)	Event description
severity	VARCHAR(16)	mild, moderate, severe, critical
associated_diagnoses	VARCHAR(256)	Related diagnoses
associated_labs	VARCHAR(512)	Related laboratory values
text_summary	VARCHAR(1024)	Narrative summary
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 4: autoimmune_literature -- Published autoimmune research

Field	Type	Description
id	VARCHAR(64)	PMID or unique identifier
title	VARCHAR(256)	Article title
text_chunk	VARCHAR(4096)	Abstract or text chunk
disease_category	VARCHAR(64)	Primary autoimmune disease
study_type	VARCHAR(32)	meta_analysis, rct, cohort, case_report, review
year	INT16	Publication year
journal	VARCHAR(64)	Journal name
keywords	VARCHAR(512)	Author keywords
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 5: autoimmune_trials -- Clinical trial registrations

Field	Type	Description
id	VARCHAR(64)	NCT number
title	VARCHAR(256)	Trial title
text_summary	VARCHAR(2048)	Brief summary
phase	VARCHAR(8)	Phase 1, 2, 3, 4
status	VARCHAR(32)	Recruiting, completed, active
disease	VARCHAR(64)	Target disease
intervention	VARCHAR(128)	Drug or intervention
sponsor	VARCHAR(64)	Lead sponsor
enrollment	INT32	Target enrollment
start_year	INT16	Year trial started
primary_endpoint	VARCHAR(256)	Primary outcome measure
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 6: autoimmune_autoantibodies -- Autoantibody reference database

Field	Type	Description
id	VARCHAR(64)	Unique identifier
antibody_name	VARCHAR(32)	Autoantibody name (ANA, anti-dsDNA, RF, anti-CCP, etc.)
text_summary	VARCHAR(2048)	Clinical description
associated_diseases	VARCHAR(256)	Associated autoimmune conditions
sensitivity	FLOAT	Diagnostic sensitivity
specificity	FLOAT	Diagnostic specificity
pattern	VARCHAR(32)	Staining pattern (for ANA)
clinical_significance	VARCHAR(512)	Clinical interpretation guidance
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 7: autoimmune_hla_associations -- HLA-disease association database

Field	Type	Description
id	VARCHAR(64)	Unique identifier
hla_allele	VARCHAR(16)	HLA allele (e.g., B27:05, DRB104:01)
text_summary	VARCHAR(2048)	Association description
disease	VARCHAR(64)	Associated disease
odds_ratio	FLOAT	Odds ratio for disease risk
population	VARCHAR(32)	Study population
pmid	VARCHAR(16)	PubMed reference ID
mechanism	VARCHAR(512)	Proposed molecular mechanism
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 8: autoimmune_disease_activity -- Disease activity scoring reference

Field	Type	Description
id	VARCHAR(64)	Unique identifier
score_name	VARCHAR(32)	DAS28-CRP, SLEDAI-2K, CDAI, BASDAI, etc.
text_summary	VARCHAR(2048)	Score description and interpretation
disease	VARCHAR(64)	Applicable disease
components	VARCHAR(512)	Score components
thresholds	VARCHAR(256)	Activity level thresholds (remission, low, moderate, high)
reference	VARCHAR(16)	PMID reference
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 9: autoimmune_biologics -- Biologic therapy reference

Field	Type	Description
id	VARCHAR(64)	Unique identifier
drug_name	VARCHAR(32)	Drug name
drug_class	VARCHAR(32)	TNF inhibitor, IL-6 inhibitor, JAK inhibitor, etc.
text_summary	VARCHAR(2048)	Drug description and mechanism
mechanism	VARCHAR(256)	Mechanism of action
indicated_diseases	VARCHAR(256)	FDA-approved indications
pgx_considerations	VARCHAR(512)	Pharmacogenomic considerations
contraindications	VARCHAR(256)	Contraindications
monitoring	VARCHAR(256)	Required monitoring
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 10: autoimmune_overlap_syndromes -- Overlap and polyautoimmunity reference

Field	Type	Description
id	VARCHAR(64)	Unique identifier
syndrome_name	VARCHAR(64)	Overlap syndrome name
text_summary	VARCHAR(2048)	Clinical description
component_diseases	VARCHAR(256)	Component diseases
diagnostic_criteria	VARCHAR(512)	Diagnostic criteria
key_autoantibodies	VARCHAR(128)	Characteristic antibodies
prevalence	VARCHAR(32)	Estimated prevalence
management	VARCHAR(512)	Treatment approach
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 11: autoimmune_guidelines -- Clinical practice guidelines

Field	Type	Description
id	VARCHAR(64)	Unique identifier
guideline_title	VARCHAR(256)	Guideline title
text_chunk	VARCHAR(4096)	Guideline text chunk
issuing_body	VARCHAR(64)	ACR, EULAR, AGA, AAN, etc.
disease	VARCHAR(64)	Target disease
year	INT16	Publication year
recommendation_strength	VARCHAR(16)	Strong, conditional, expert consensus
evidence_level	VARCHAR(8)	High, moderate, low, very low
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 12: autoimmune_dysautonomia -- POTS/dysautonomia reference knowledge

Field	Type	Description
id	VARCHAR(64)	Unique identifier
condition	VARCHAR(64)	POTS, neurocardiogenic syncope, MSA, AAG, etc.
text_summary	VARCHAR(2048)	Clinical description
diagnostic_criteria	VARCHAR(512)	Diagnostic criteria
autonomic_tests	VARCHAR(256)	Recommended autonomic testing
comorbidities	VARCHAR(256)	Common comorbidities (hEDS, MCAS, SFN)
treatment_options	VARCHAR(512)	Treatment approaches
autoimmune_associations	VARCHAR(256)	Known autoimmune associations
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 13: genomic_evidence -- Shared genomic variant data (read-only)

Field	Type	Description
id	VARCHAR(64)	Variant identifier
text_summary	VARCHAR(2048)	Variant description
chrom	VARCHAR(2)	Chromosome
pos	INT64	Position
ref	VARCHAR(512)	Reference allele
alt	VARCHAR(512)	Alternate allele
gene	VARCHAR(16)	Gene symbol
consequence	VARCHAR(32)	Variant consequence
clinical_significance	VARCHAR(32)	ClinVar classification
disease_associations	VARCHAR(512)	Associated diseases
am_pathogenicity	FLOAT	AlphaMissense score
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

Collection 14: autoimmune_biomarker_trends -- Longitudinal biomarker trend data

Field	Type	Description
id	VARCHAR(64)	Unique identifier
patient_id	VARCHAR(32)	Patient identifier
biomarker	VARCHAR(32)	Biomarker name
trend_type	VARCHAR(16)	rising, falling, stable, fluctuating
values_json	VARCHAR(1024)	JSON array of {date, value} pairs
trend_slope	FLOAT	Calculated trend slope
clinical_significance	VARCHAR(256)	Clinical interpretation
associated_disease	VARCHAR(64)	Disease context
alert_level	VARCHAR(16)	normal, watch, warning, critical
text_summary	VARCHAR(512)	Trend narrative
embedding	FLOAT_VECTOR(384)	BGE-small-en-v1.5 vector

7.3 Collection Search Weights¶

Collection	Weight	Rationale
Clinical Documents	0.18	Patient-specific primary evidence
Patient Labs	0.16	Quantitative diagnostic data
Patient Timeline	0.10	Longitudinal pattern context
Literature	0.10	Published evidence base
Autoantibodies	0.08	Diagnostic specificity data
HLA Associations	0.07	Genetic susceptibility evidence
Disease Activity	0.06	Scoring system reference
Guidelines	0.06	Clinical practice standards
Overlap Syndromes	0.05	Multi-disease pattern recognition
Biologics	0.04	Treatment reference
Dysautonomia	0.04	POTS/autonomic dysfunction reference
Genomic Evidence	0.03	Variant-level genetic data
Biomarker Trends	0.02	Longitudinal biomarker patterns
Trials	0.01	Clinical trial context

7.4 Estimated Vector Counts¶

Collection	Estimated Vectors	Source
Clinical Documents	5,000-50,000 per patient	Patient clinical records
Patient Labs	1,000-10,000 per patient	Extracted laboratory results
Patient Timeline	100-1,000 per patient	Longitudinal events
Literature	8,000	PubMed autoimmune corpus
Trials	2,500	ClinicalTrials.gov
Autoantibodies	200	Curated reference data
HLA Associations	300	Curated from GWAS/literature
Disease Activity	150	Scoring system reference
Biologics	100	Drug reference data
Overlap Syndromes	80	Curated reference data
Guidelines	500	ACR/EULAR/AGA guidelines
Dysautonomia	150	Curated reference data
Genomic Evidence	3,561,170	HCLS AI Factory Stage 1+2 (shared, read-only)
Biomarker Trends	100-5,000 per patient	Computed from Patient Labs
Reference total	~3,573,150
Per-patient addition	~6,100-66,000

8. Clinical Workflows¶

8.1 Workflow Architecture¶

Each workflow follows the pattern: Trigger -> Ingest -> Extract -> Correlate -> Score -> Report. Workflows can be triggered manually by clinician query, automatically upon document ingestion, or on a scheduled basis for longitudinal monitoring.

8.2 Eight Reference Workflows¶

Workflow 1: Diagnostic Odyssey Accelerator¶

Purpose: Ingest a patient's complete clinical record set and generate a prioritized differential diagnosis for suspected autoimmune disease.

Trigger: Clinician uploads patient's clinical document package or connects to EHR export.

Process: 1. Ingest all clinical documents through the document ingestion pipeline (Section 6) 2. Extract all laboratory values, autoantibody results, imaging findings, symptoms, and diagnoses 3. Construct patient timeline with all clinical events 4. Query all 14 collections simultaneously to identify patterns consistent with autoimmune diseases 5. Score each candidate diagnosis using classification criteria: - SLE: 2019 ACR/EULAR criteria (score >= 10 with ANA entry criterion) - RA: 2010 ACR/EULAR criteria (score >= 6) - Sjogren's: 2016 ACR/EULAR criteria (score >= 4) - Systemic sclerosis: 2013 ACR/EULAR criteria (score >= 9) - AS: Modified New York criteria or ASAS criteria 6. Cross-reference with genomic evidence (HLA alleles, autoimmune risk variants) 7. Generate diagnostic hypothesis report with confidence scores, supporting evidence, and recommended next steps

Output: Ranked differential diagnosis with evidence citations, classification criteria scores, recommended confirmatory tests, and urgency assessment.

Example query: "This 34-year-old woman has been evaluated by 5 specialists over 3 years. What autoimmune diagnoses are supported by her clinical record?"

Workflow 2: Lupus Nephritis Surveillance¶

Purpose: Monitor SLE patients for early signs of renal involvement through longitudinal biomarker tracking.

Trigger: Automated quarterly review or upon ingestion of new laboratory results.

Process: 1. Track longitudinal biomarkers: anti-dsDNA titers, complement C3/C4, urinalysis (protein, casts), serum creatinine, urine protein-to-creatinine ratio 2. Detect flare-predictive patterns: rising anti-dsDNA + falling complement precedes clinical flare by 4-12 weeks 3. Calculate SLEDAI-2K renal domain score from available data 4. Cross-reference with ISN/RPS 2003 lupus nephritis classification if biopsy data available 5. Assess current immunosuppressive regimen against ACR 2024 lupus nephritis guidelines 6. Generate surveillance report with trend visualizations and action items

Output: Lupus nephritis risk assessment, biomarker trend analysis, SLEDAI-2K score, treatment adequacy evaluation, and recommended monitoring schedule.

Example query: "Has this lupus patient's complement been trending down? What is her current nephritis risk?"

Workflow 3: POTS/Dysautonomia Evaluation¶

Purpose: Identify POTS and associated conditions (hEDS, MCAS, small fiber neuropathy) from clinical records, reducing the typical 5-7 year diagnostic delay.

Trigger: Clinician query or detection of orthostatic vital sign patterns in ingested records.

Process: 1. Extract orthostatic vital signs from clinical documents (supine vs. standing HR and BP) 2. Identify POTS criteria: sustained HR increase >= 30 bpm (>= 40 bpm if 12-19 years old) within 10 minutes of standing, without orthostatic hypotension 3. Screen for comorbid conditions: - Ehlers-Danlos syndrome: joint hypermobility scores, skin findings, family history - Mast cell activation syndrome: tryptase levels, prostaglandin D2, histamine metabolites - Small fiber neuropathy: IENFD biopsy results, QSART data, sudomotor testing 4. Cross-reference with autoimmune etiologies: Sjogren's-associated autonomic neuropathy, autoimmune autonomic ganglionopathy (ganglionic AChR antibodies) 5. Check genomic evidence for relevant variants (TPSAB1 for hereditary alpha-tryptasemia, COL5A1/COL3A1 for EDS) 6. Generate comprehensive dysautonomia evaluation report

Output: POTS diagnostic assessment, comorbidity screening results, autoimmune etiology evaluation, genetic risk factors, and management recommendations.

Example query: "This patient has had tachycardia, fatigue, and syncope for 2 years. Do her records support a POTS diagnosis? Are there autoimmune associations?"

Workflow 4: Inflammatory Arthritis Differentiation¶

Purpose: Differentiate between rheumatoid arthritis, psoriatic arthritis, reactive arthritis, crystal arthropathies, and lupus arthritis using clinical document analysis.

Trigger: Clinician query for a patient presenting with inflammatory joint symptoms.

Process: 1. Extract joint examination findings from clinical documents (tender joints, swollen joints, distribution pattern) 2. Retrieve autoantibody results: RF, anti-CCP, ANA, HLA-B27 3. Extract imaging findings: erosions, joint space narrowing, enthesitis, dactylitis, sacroiliitis 4. Calculate DAS28-CRP and CDAI if components available 5. Apply classification criteria for RA (2010 ACR/EULAR), PsA (CASPAR), SpA (ASAS), and gout (2015 ACR/EULAR) 6. Cross-reference with HLA associations: B27 for SpA, DRB1 shared epitope for RA, C06:02 for psoriasis 7. Generate differential diagnosis with evidence

Output: Inflammatory arthritis differential diagnosis, classification criteria scores, HLA risk profile, disease activity assessment, and treatment pathway recommendations.

Workflow 5: Overlap Syndrome Detection¶

Purpose: Identify multi-disease autoimmune overlap syndromes that single-disease evaluation would miss.

Trigger: Detection of autoantibody patterns or symptom constellations spanning multiple autoimmune categories.

Process: 1. Analyze complete autoantibody profile for overlap patterns: - Anti-U1-RNP at high titer: mixed connective tissue disease (MCTD) - ANA + anti-dsDNA + RF + anti-CCP: rhupus (RA + SLE overlap) - Anti-SSA + anti-centromere: Sjogren's + limited systemic sclerosis overlap - Anti-Jo-1 + anti-SSA: antisynthetase syndrome with Sjogren's features 2. Screen for polyautoimmune syndromes (APS-1, APS-2, IPEX) 3. Cross-reference with genomic evidence for shared susceptibility loci (STAT4, IRF5, PTPN22, TNFAIP3) 4. Evaluate for immune dysregulation syndromes if multiple autoimmune conditions present 5. Generate overlap syndrome report with component disease assessments

Output: Overlap syndrome assessment, polyautoimmunity risk evaluation, shared genetic susceptibility analysis, and integrated management recommendations.

Workflow 6: Biologic Therapy Optimization¶

Purpose: Recommend biologic therapies based on disease profile, prior treatment responses, and pharmacogenomic considerations.

Trigger: Clinician query about treatment options or detection of inadequate disease control.

Process: 1. Assess current disease activity from most recent clinical data 2. Review treatment history: prior biologics, duration, response, reason for discontinuation 3. Apply 8-drug biologic database with indication matching: - TNF inhibitors: adalimumab, etanercept, infliximab, certolizumab, golimumab - IL-6 inhibitors: tocilizumab, sarilumab - IL-17 inhibitors: secukinumab, ixekizumab - IL-12/23 inhibitors: ustekinumab - IL-23 inhibitors: guselkumab, risankizumab - B-cell depleters: rituximab, obinutuzumab - BLyS inhibitors: belimumab - JAK inhibitors: tofacitinib, baricitinib, upadacitinib - T-cell co-stimulation modulators: abatacept 4. Apply pharmacogenomic filters: - HLA-DRB103:01: increased risk of anti-drug antibodies with adalimumab - FCGR3A V158F: affects rituximab ADCC efficacy - CYP3A4/CYP2C19: affects tofacitinib metabolism - IL6R Asp358Ala (rs2228145): affects tocilizumab response - HLA-C06:02: predicts better PASI response to secukinumab 5. Check contraindications against patient profile (TB status, hepatitis B, heart failure, IBD) 6. Generate personalized treatment recommendation with evidence

Output: Ranked biologic therapy recommendations, pharmacogenomic considerations, contraindication alerts, monitoring requirements, and switching rationale from current therapy.

Workflow 7: Flare Prediction and Prevention¶

Purpose: Predict autoimmune disease flares by detecting early biomarker patterns in longitudinal laboratory data.

Trigger: Automated analysis upon ingestion of new laboratory results, or scheduled weekly/monthly review.

Process: 1. Extract longitudinal biomarker data from patient labs collection 2. Calculate trend analysis for disease-specific biomarkers: - RA: CRP trend, ESR trend, IL-6, MMP-3, 14-3-3eta - SLE: anti-dsDNA titer trend, complement C3/C4 trends, lymphocyte count, proteinuria - IBD: fecal calprotectin trend, CRP, lactoferrin, albumin 3. Apply flare prediction algorithm: - Base risk: 0.3 (30% baseline) - Each elevated inflammatory marker: +0.15 - Each falling protective marker (complement, albumin): +0.15 - Stable markers: protective factor 4. Classify risk: Low (<0.4), Moderate (0.4-0.6), High (0.6-0.8), Imminent (>0.8) 5. Generate monitoring recommendations and preemptive intervention suggestions 6. Alert clinician if risk crosses threshold

Output: Flare risk assessment with contributing factors, protective factors, recommended biomarker monitoring schedule, and intervention recommendations.

Workflow 8: Genomic-Autoimmune Risk Profiling¶

Purpose: Analyze a patient's genomic data for autoimmune disease risk variants, combining HLA typing with non-HLA susceptibility loci.

Trigger: Availability of genomic data (from HCLS AI Factory Stage 1 or external genetic testing reports).

Process: 1. Query genomic_evidence collection for autoimmune-associated variants: - HLA alleles: B27 (AS), DRB104 (RA), DRB103 (SLE, Sjogren's, T1D), DRB115 (MS), DQB102 (celiac), C06 (psoriasis), B51 (Behcet's), B08 (MG) - Non-HLA risk genes: PTPN22 R620W (multiple autoimmune), STAT4 rs7574865 (SLE, RA), IRF5 rs2004640 (SLE), TNFAIP3 rs2230926 (SLE, RA), IL23R rs11209026 (IBD, psoriasis, AS), CTLA4 +49 A/G (autoimmune thyroid, T1D), TPSAB1 copy number (hereditary alpha-tryptasemia / MCAS) 2. Calculate polygenic risk scores for each autoimmune condition 3. Cross-reference genetic risk with clinical findings from patient documents 4. Identify pharmacogenomically actionable variants 5. Generate comprehensive genomic-autoimmune risk report

Output: Autoimmune genetic risk profile, HLA-disease associations with odds ratios, non-HLA risk variant assessment, pharmacogenomic actionable findings, and recommended genetic counseling topics.

9.1 The Genomic-Autoimmune Bridge¶

The shared genomic_evidence collection (3,561,170 variants) enables a transformative capability: automatic correlation between a patient's genomic profile and their autoimmune clinical presentation. This bridge operates through several mechanisms:

HLA Typing from Genomic Data: When a patient's genome is processed through Stage 1 (Parabricks), HLA alleles can be extracted using HLA typing tools (OptiType for Class I, HLA-HD for Class I+II). These alleles are then automatically cross-referenced against the autoimmune_hla_associations collection to generate disease susceptibility profiles.

Non-HLA Autoimmune Risk Variants: The system queries the genomic_evidence collection for known autoimmune susceptibility loci:

Gene	Variant	Associated Diseases	OR Range
PTPN22	R620W (rs2476601)	RA, SLE, T1D, Hashimoto's, Graves'	1.5-2.0
STAT4	rs7574865	SLE, RA, Sjogren's	1.3-1.6
IRF5	rs2004640	SLE	1.4-1.8
TNFAIP3	rs2230926	SLE, RA	1.7-2.3
IL23R	rs11209026	Crohn's, UC, AS, psoriasis	0.4 (protective)
CTLA4	rs231775	T1D, autoimmune thyroid, RA	1.2-1.5
IL2RA	rs2104286	MS, T1D	1.1-1.3
BANK1	rs10516487	SLE	1.3-1.4
BLK	rs13277113	SLE	1.3-1.4
TPSAB1	CNV (duplication)	Hereditary alpha-tryptasemia / MCAS	4-6% prevalence

9.2 Autoimmune Trigger Conditions¶

The system defines 12 genomic-autoimmune trigger conditions that automatically activate cross-collection queries:

HLA-B*27 detected -> Query AS, reactive arthritis, anterior uveitis, IBD-associated spondyloarthritis collections
HLA-DRB1*04 shared epitope detected -> Query RA literature, anti-CCP interpretation, ACPA-positive RA management
HLA-DRB103 + DQB102 detected -> Query celiac, T1D, SLE, Sjogren's, autoimmune thyroid collections
HLA-DRB1*15 detected -> Query MS literature, NMO differential, imaging (brain MRI white matter lesions)
PTPN22 R620W detected -> Query polyautoimmunity risk, T-cell signaling pathway, multiple autoimmune screening
HLA-C*06:02 detected -> Query psoriasis, PsA, IL-17 inhibitor response prediction
IL23R protective variant detected -> Note protective factor for IBD, psoriasis, AS in diagnostic scoring
STAT4 risk variant detected -> Increase SLE, RA susceptibility weighting in diagnostic algorithm
Anti-dsDNA + low complement + TNFAIP3 variant -> High-priority lupus nephritis surveillance trigger
HLA-B*51 detected -> Query Behcet's disease, evaluate oral/genital ulcers, pathergy testing
TPSAB1 duplication detected -> Query MCAS, hereditary alpha-tryptasemia, POTS/hEDS/MCAS triad
Multiple autoimmune risk variants detected -> Activate polyautoimmunity screening workflow

9.3 Integration with Other Agents¶

The Precision Autoimmune Agent integrates with sibling agents in the HCLS AI Factory:

Agent	Integration	Example
Precision Biomarker	Inflammation biomarker interpretation	CRP, ESR, ferritin, calprotectin trends shared with autoimmune flare prediction
Precision Oncology	Autoimmune paraneoplastic screening	Anti-NMDA receptor encephalitis -> ovarian teratoma screening
Imaging Intelligence	Joint/organ imaging correlation	Joint erosions on hand MRI -> RA disease activity assessment
CAR-T Intelligence	Autoimmune complications of CAR-T	CRS, autoimmune cytopenias post-CAR-T therapy
Cardiology (future)	Cardiac autoimmune manifestations	Lupus pericarditis, myocarditis, autoimmune POTS
Neurology (future)	Neurological autoimmune manifestations	MS, NMO, autoimmune encephalitis, autoimmune neuropathy

10. NIM Integration Strategy¶

10.1 Shared NIM Services¶

The Precision Autoimmune Agent leverages NVIDIA NIM microservices already deployed for the HCLS AI Factory:

NIM Service	Port	Autoimmune Application
Llama-3 8B	8520	Local LLM fallback for evidence synthesis when Claude API unavailable
VISTA-3D	8530 (shared with Imaging)	Joint imaging segmentation for arthritis assessment
VILA-M3	8532 (shared with Imaging)	Clinical document image understanding (scanned documents)

10.2 Future NIM Extensions¶

NIM Service	Application	Status
BioNeMo ESMFold	Autoantibody structure prediction for epitope analysis	Planned
NeMo Guardrails	Safety guardrails for clinical recommendation generation	Planned
NVIDIA FLARE	Federated learning for multi-institutional autoimmune pattern discovery	Research

11. Knowledge Graph Design¶

11.1 Graph Structure¶

The Precision Autoimmune Agent employs a 7-dictionary knowledge graph containing structured clinical data that complements vector retrieval:

Dictionary	Entries	Content
`HLA_DISEASE_ASSOCIATIONS`	50+	HLA alleles mapped to autoimmune diseases with odds ratios, PMIDs, and mechanism notes
`AUTOANTIBODY_DISEASE_MAP`	14 antibodies x 1-4 diseases	Autoantibody-disease associations with sensitivity, specificity, and staining patterns
`DISEASE_ACTIVITY_THRESHOLDS`	5 scoring systems	DAS28-CRP, DAS28-ESR, SLEDAI-2K, CDAI, BASDAI with component definitions and level thresholds
`BIOLOGIC_THERAPIES`	8 drugs (expandable to 20+)	Drug class, mechanism, indications, PGx considerations, contraindications, monitoring
`FLARE_BIOMARKER_PATTERNS`	3 diseases (expandable to 13)	Early warning biomarkers, threshold patterns, protective signals for RA, SLE, IBD
`OVERLAP_SYNDROMES`	12 syndromes	Component diseases, diagnostic criteria, key autoantibodies, prevalence, management
`DYSAUTONOMIA_CONDITIONS`	10 conditions	POTS, NCS, MSA, AAG, hEDS, MCAS, SFN, PAF, baroreflex failure, familial dysautonomia

11.2 Example Knowledge Graph Entries¶

HLA-Disease Association (HLA-B*27:05):

{
  "allele": "HLA-B*27:05",
  "associations": [
    {
      "disease": "ankylosing_spondylitis",
      "odds_ratio": 87.4,
      "pmid": "25603694",
      "note": "Strongest known HLA-disease association. Arthritogenic peptide hypothesis: B27 presents self-peptides to autoreactive CD8+ T-cells. Also: B27 misfolding triggers UPR and IL-23 production.",
      "population": "European",
      "mechanism": "Arthritogenic peptide presentation + misfolding/UPR + IL-23 axis"
    },
    {
      "disease": "reactive_arthritis",
      "odds_ratio": 20.0,
      "pmid": "25603694"
    },
    {
      "disease": "anterior_uveitis",
      "odds_ratio": 10.5,
      "pmid": "25603694"
    },
    {
      "disease": "ibd_spondyloarthritis",
      "odds_ratio": 8.0,
      "pmid": "25603694"
    }
  ]
}

Autoantibody Map (anti-dsDNA):

{
  "antibody": "anti-dsDNA",
  "associations": [
    {
      "disease": "systemic_lupus_erythematosus",
      "sensitivity": 0.70,
      "specificity": 0.95,
      "note": "Titers correlate with disease activity, especially lupus nephritis. Rising titers precede clinical flare by 4-12 weeks. Part of 2019 ACR/EULAR SLE criteria (6 points).",
      "assay_methods": ["Farr assay (gold standard)", "ELISA", "CLIFT (Crithidia luciliae)"],
      "monitoring_frequency": "Every 3-6 months in active SLE; with complement levels"
    }
  ]
}

POTS/Dysautonomia Entry:

{
  "condition": "postural_orthostatic_tachycardia_syndrome",
  "abbreviation": "POTS",
  "diagnostic_criteria": {
    "heart_rate_increase": ">= 30 bpm within 10 min of standing (adults); >= 40 bpm (12-19 years)",
    "absence_of": "Orthostatic hypotension (BP drop > 20/10 mmHg)",
    "duration": "Symptoms present >= 6 months",
    "exclusion": "No other cause of tachycardia (anemia, hyperthyroidism, deconditioning)"
  },
  "subtypes": ["Neuropathic POTS", "Hyperadrenergic POTS", "Hypovolemic POTS", "Autoimmune POTS"],
  "comorbidities": {
    "ehlers_danlos_heds": {"prevalence": "up to 80%", "mechanism": "Connective tissue laxity -> venous pooling"},
    "mcas": {"prevalence": "up to 65%", "mechanism": "Mast cell mediator release -> vasodilation"},
    "small_fiber_neuropathy": {"prevalence": "up to 50%", "mechanism": "Sudomotor/vasomotor nerve loss"},
    "sjogrens_syndrome": {"prevalence": "15-25%", "mechanism": "Autoimmune autonomic neuropathy"},
    "autoimmune_ganglionopathy": {"antibody": "ganglionic AChR", "prevalence": "10-15%"}
  },
  "autoimmune_associations": [
    "Ganglionic AChR antibodies (autoimmune autonomic ganglionopathy)",
    "Anti-adrenergic receptor antibodies",
    "Anti-muscarinic receptor antibodies",
    "Sjogren's-associated autonomic neuropathy",
    "Post-viral autoimmune autonomic dysfunction"
  ]
}

Overlap Syndrome Entry (Mixed Connective Tissue Disease):

{
  "syndrome": "mixed_connective_tissue_disease",
  "abbreviation": "MCTD",
  "component_diseases": ["SLE", "Systemic sclerosis", "Polymyositis"],
  "key_autoantibody": "anti-U1-RNP (high titer, required for diagnosis)",
  "diagnostic_criteria": "Alarcon-Segovia criteria: anti-U1-RNP >= 1:1600 + 3 of 5 clinical criteria (edema of hands, synovitis, myositis, Raynaud's, acrosclerosis)",
  "prevalence": "1.9-3.8 per 100,000",
  "management": "Treat dominant clinical feature; corticosteroids for myositis/serositis; immunosuppressants for organ involvement",
  "prognosis": "May evolve into definite SLE, SSc, or PM over time (differentiated MCTD)"
}

12. Query Expansion and Retrieval Strategy¶

12.1 Autoimmune-Specific Query Expansion Maps¶

The system implements 18 domain-specific query expansion maps:

#	Category	Keywords	Expanded Terms	Example
1	Autoimmune diseases	30	280	"lupus" -> SLE, lupus nephritis, anti-dsDNA, complement C3/C4, hydroxychloroquine, belimumab, ACR/EULAR criteria
2	Autoantibodies	20	180	"ANA" -> antinuclear antibody, ANA pattern, homogeneous, speckled, nucleolar, centromere, IIF, HEp-2
3	HLA alleles	15	120	"B27" -> HLA-B*27, ankylosing spondylitis, spondyloarthritis, sacroiliitis, uveitis, reactive arthritis
4	Inflammatory markers	12	90	"CRP" -> C-reactive protein, inflammation, acute phase, IL-6 driven, liver synthesis
5	Biologics	20	160	"rituximab" -> anti-CD20, B-cell depletion, ADCC, IV infusion, PML risk, hepatitis B screening
6	Disease activity	8	60	"DAS28" -> disease activity score, 28 joints, CRP, ESR, tender, swollen, patient global, remission
7	Dysautonomia	15	140	"POTS" -> postural orthostatic tachycardia, tilt table test, standing HR, orthostatic intolerance, dysautonomia
8	Overlap syndromes	10	80	"MCTD" -> mixed connective tissue disease, anti-U1-RNP, overlap syndrome, Raynaud's, edema of hands
9	Immunology mechanisms	18	150	"Th17" -> T-helper 17, IL-17, RORgammaT, autoimmune, mucosal immunity, psoriasis, SpA, IBD
10	Genomics	15	120	"PTPN22" -> protein tyrosine phosphatase, R620W, rs2476601, T-cell signaling, autoimmune risk variant
11	Laboratory panels	12	90	"ENA panel" -> extractable nuclear antigens, SSA, SSB, Sm, RNP, Scl-70, Jo-1
12	Imaging	10	70	"joint erosion" -> marginal erosion, bone erosion, MRI, X-ray, Sharp score, modified Sharp, progression
13	Flare patterns	8	60	"flare" -> disease flare, exacerbation, relapse, breakthrough, loss of response, secondary failure
14	Treatments	18	140	"methotrexate" -> MTX, csDMARD, folic acid, hepatotoxicity, pneumonitis, weekly dosing, anchor drug
15	Pregnancy	8	60	"pregnancy lupus" -> neonatal lupus, anti-SSA, congenital heart block, hydroxychloroquine continuation
16	Pediatric	10	80	"JIA" -> juvenile idiopathic arthritis, oligoarticular, polyarticular, systemic, enthesitis-related, RF positive
17	Infections	10	70	"TB screening" -> tuberculosis, QuantiFERON, T-SPOT, latent TB, biologic contraindication, isoniazid
18	Comorbidities	12	90	"cardiovascular risk" -> accelerated atherosclerosis, lupus vasculitis, anti-phospholipid, thrombosis
	Total	251	2,040

12.2 Comparative Analysis Detection¶

The system auto-detects comparative queries and routes them to dual-retrieval:

Supported comparison types: - Disease vs. disease: "Compare RA vs lupus arthritis," "POTS vs neurocardiogenic syncope" - Drug vs. drug: "Adalimumab vs rituximab for RA," "Tofacitinib vs baricitinib" - Antibody vs. antibody: "RF vs anti-CCP for RA diagnosis" - Scoring systems: "DAS28 vs CDAI for RA monitoring" - HLA alleles: "B27:05 vs B27:02 for AS risk"

13. API and UI Design¶

13.1 FastAPI Endpoints (Port 8530)¶

Method	Endpoint	Description
POST	`/query`	Multi-collection RAG query with evidence synthesis
POST	`/compare`	Comparative analysis (X vs Y)
POST	`/documents/ingest`	Ingest patient clinical documents
POST	`/documents/batch`	Batch ingest multiple documents
GET	`/documents/{patient_id}/status`	Document ingestion status
GET	`/patient/{patient_id}/timeline`	Patient clinical timeline
GET	`/patient/{patient_id}/labs`	Patient laboratory trends
POST	`/patient/{patient_id}/diagnostic`	Run diagnostic odyssey workflow
POST	`/patient/{patient_id}/flare-risk`	Calculate flare risk assessment
POST	`/patient/{patient_id}/genomic-risk`	Genomic-autoimmune risk profile
POST	`/patient/{patient_id}/biologic-rec`	Biologic therapy recommendation
POST	`/reports/generate`	Generate clinical report (Markdown, JSON, PDF)
POST	`/reports/fhir`	Generate FHIR R4 DiagnosticReport
GET	`/collections/stats`	Collection vector counts
GET	`/health`	Health check
GET	`/metrics`	Prometheus metrics

13.2 Streamlit UI (Port 8531) -- 10 Tabs¶

Tab	Name	Purpose
1	Evidence Explorer	Multi-collection RAG search across autoimmune knowledge
2	Document Ingest	Upload and process patient clinical documents
3	Diagnostic Workup	Run diagnostic odyssey workflow with classification criteria scoring
4	Patient Timeline	Interactive timeline visualization of clinical events
5	Lab Trends	Longitudinal biomarker trend analysis with flare prediction
6	Genomic Risk	Autoimmune genetic risk profiling (HLA + non-HLA variants)
7	Biologic Advisor	Pharmacogenomic-guided therapy recommendations
8	Overlap Detector	Multi-disease overlap syndrome assessment
9	Reports & Export	Generate PDF/FHIR reports with NVIDIA branding
10	Benchmarks	Diagnostic accuracy validation against classification criteria

13.3 Demo Cases¶

#	Case	Key Findings	Expected Diagnosis
1	34F, 3-year diagnostic odyssey	Malar rash, arthritis, proteinuria, ANA 1:640 homogeneous, anti-dsDNA+, low C3/C4	SLE with lupus nephritis
2	28F, chronic fatigue and syncope	HR increase 42 bpm on standing, joint hypermobility score 7/9, elevated tryptase	POTS / hEDS / MCAS triad
3	45M, inflammatory back pain	HLA-B*27+, bilateral sacroiliitis on MRI, CRP 24 mg/L, morning stiffness >30 min	Ankylosing spondylitis
4	52F, dry eyes/mouth + fatigue	ANA 1:320 speckled, anti-SSA+, anti-SSB+, Schirmer test <5mm, lip biopsy focus score 3	Sjogren's syndrome
5	38F, multiple autoimmune features	Anti-U1-RNP 1:5120, Raynaud's, puffy fingers, myositis, arthritis	Mixed connective tissue disease

14. Clinical Decision Support Engines¶

14.1 Validated Disease Activity and Diagnostic Scores¶

Score	Disease	Components	Thresholds
DAS28-CRP	Rheumatoid arthritis	Tender joints (28), swollen joints (28), CRP, patient global VAS	Remission <2.6, Low <3.2, Moderate <5.1, High >=5.1
DAS28-ESR	Rheumatoid arthritis	Tender joints (28), swollen joints (28), ESR, patient global VAS	Remission <2.6, Low <3.2, Moderate <5.1, High >=5.1
SLEDAI-2K	SLE	16 weighted items (seizure, psychosis, vasculitis, arthritis, etc.)	Inactive 0, Mild 1-4, Moderate 5-11, High 12+, Very high 20+
CDAI	Rheumatoid arthritis	Tender joints (28), swollen joints (28), patient global, evaluator global	Remission <=2.8, Low <=10, Moderate <=22, High >22
BASDAI	Ankylosing spondylitis	6 questions (fatigue, spinal pain, joint pain, enthesitis, stiffness)	Inactive <2, Low 2-3, Moderate 3-4, Active >=4
ACR/EULAR SLE 2019	SLE (diagnosis)	Entry: ANA+; 7 domains, 22 criteria with weights	Score >=10: classify as SLE
ACR/EULAR RA 2010	RA (diagnosis)	Joint involvement, serology (RF/anti-CCP), acute phase, duration	Score >=6: classify as RA
Beighton score	Joint hypermobility	9-point scale assessing bilateral flexibility	>=5/9: generalized joint hypermobility
COMPASS-31	Dysautonomia	31 items across 6 autonomic domains	Higher scores = more severe autonomic dysfunction
Sheldon POTS criteria	POTS	Standing HR increase, absence of orthostatic hypotension, duration	Meet all criteria = POTS diagnosis

14.2 Classification Criteria Engine¶

The system implements automated scoring for major classification criteria:

2019 ACR/EULAR SLE criteria: Entry criterion (ANA >= 1:80) + additive weighted criteria across 7 clinical domains and 4 immunology domains. Score >= 10 classifies as SLE. Each criterion is scored only if not better explained by another diagnosis.
2010 ACR/EULAR RA criteria: Four domains (joint involvement 0-5, serology 0-3, acute phase reactants 0-1, symptom duration 0-1). Score >= 6 classifies as RA.
2016 ACR/EULAR Sjogren's criteria: Weighted items including labial gland biopsy focus score >= 1 (3 points), anti-SSA+ (3 points), ocular staining score >= 5 (1 point), Schirmer test <= 5mm (1 point), unstimulated salivary flow <= 0.1 mL/min (1 point). Score >= 4 classifies as Sjogren's.
CASPAR PsA criteria: Inflammatory musculoskeletal disease + score >= 3 from: current psoriasis (2), personal/family history psoriasis (1), nail dystrophy (1), negative RF (1), dactylitis (1), juxta-articular bone formation (1).

15. Reporting and Interoperability¶

15.1 Export Formats¶

Format	Use Case	Implementation
Markdown	In-app display, clinician review	Template-based rendering with evidence tables
JSON	API integration, downstream analysis	Pydantic `.model_dump()` serialization
PDF	Formal clinical reports, sharing	ReportLab with NVIDIA branding (#76B900 green)
FHIR R4	EHR integration, interoperability	DiagnosticReport resource with coded observations

15.2 FHIR R4 Autoimmune Coding¶

FHIR Resource	Coding System	Example
DiagnosticReport	LOINC	51967-8 (Genetic analysis summary)
Condition	SNOMED CT	55464009 (Systemic lupus erythematosus)
Condition	SNOMED CT	69896004 (Rheumatoid arthritis)
Condition	ICD-10-CM	M32.14 (Lupus nephritis)
Observation	LOINC	33935-8 (Anti-dsDNA Ab, quantitative)
Observation	LOINC	14585-3 (ANA by IIF)
MedicationRequest	RxNorm	327361 (Adalimumab 40mg/0.8mL)
AllergyIntolerance	SNOMED CT	294468003 (Rituximab adverse reaction)
Observation	LOINC	30522-7 (C-reactive protein, high sensitivity)

16. Product Requirements Document¶

16.1 Product Vision¶

For rheumatologists, immunologists, and primary care physicians who need to diagnose complex autoimmune diseases faster and more accurately, the Precision Autoimmune Agent is a clinical document intelligence system that ingests a patient's complete clinical record, identifies patterns across years of clinical data, cross-references with genomic and autoimmune reference knowledge, and generates prioritized diagnostic hypotheses with evidence citations. Unlike existing autoimmune diagnostic tools that require manual data entry and evaluate one disease at a time, our product ingests raw clinical documents, analyzes all supported autoimmune conditions simultaneously, and incorporates genomic correlation -- reducing the average diagnostic odyssey from 4+ years to weeks.

16.2 User Stories¶

Epic 1: Clinical Document Ingestion

ID	Story	Priority
US-1.1	As a rheumatologist, I want to upload a patient's PDF medical records so that the system can extract and index all relevant clinical data	P0
US-1.2	As a clinician, I want to import FHIR R4 bundles from the patient's EHR portal so that structured data is automatically ingested	P0
US-1.3	As a clinician, I want to see the status of document processing (in progress, completed, errors) so that I know when analysis is ready	P0
US-1.4	As a clinician, I want scanned documents to be OCR-processed so that handwritten or faxed records are included in the analysis	P1

Epic 2: Diagnostic Intelligence

ID	Story	Priority
US-2.1	As a rheumatologist, I want the system to score my patient against ACR/EULAR classification criteria for SLE, RA, and Sjogren's so that I can see which diagnoses are supported by the data	P0
US-2.2	As a clinician, I want a ranked differential diagnosis with confidence scores so that I can prioritize my diagnostic workup	P0
US-2.3	As a PCP, I want the system to flag autoimmune red flags in my patient's records so that I can make appropriate specialist referrals	P0
US-2.4	As a rheumatologist, I want the system to detect overlap syndromes (MCTD, rhupus) so that I don't miss multi-disease patterns	P1

Epic 3: Genomic-Autoimmune Correlation

ID	Story	Priority
US-3.1	As a clinician, I want to see HLA-disease associations with odds ratios when HLA typing is available so that I can assess genetic susceptibility	P0
US-3.2	As a geneticist, I want to query the patient's genomic data for known autoimmune risk variants (PTPN22, STAT4, IRF5) so that I can generate a genetic risk profile	P1
US-3.3	As a rheumatologist, I want pharmacogenomic data integrated with biologic therapy recommendations so that I can select the most effective treatment	P1

Epic 4: Longitudinal Monitoring

ID	Story	Priority
US-4.1	As a rheumatologist, I want to track disease activity scores (DAS28, SLEDAI) over time so that I can assess treatment response	P0
US-4.2	As a clinician, I want automated flare risk predictions based on biomarker trends so that I can intervene before clinical flares occur	P1
US-4.3	As a clinician, I want a visual patient timeline showing all clinical events, labs, and diagnoses so that I can see the complete clinical picture at a glance	P1

Epic 5: POTS/Dysautonomia

ID	Story	Priority
US-5.1	As a clinician, I want the system to identify POTS criteria from orthostatic vital signs in the clinical record so that POTS is not missed	P1
US-5.2	As a clinician, I want the system to screen for POTS comorbidities (hEDS, MCAS, SFN) so that the complete syndrome is identified	P1
US-5.3	As a clinician, I want autoimmune etiologies of POTS (Sjogren's, AAG) evaluated automatically so that treatable causes are identified	P2

Epic 6: Reporting and Export

ID	Story	Priority
US-6.1	As a clinician, I want to generate a PDF diagnostic report with evidence citations so that I can share findings with colleagues	P0
US-6.2	As a health system, I want FHIR R4 DiagnosticReport export so that findings can be integrated back into the EHR	P1
US-6.3	As a researcher, I want JSON export of all analysis results so that I can perform downstream statistical analysis	P2

16.3 Non-Functional Requirements¶

Requirement	Target	Rationale
Document ingestion throughput	100 documents/minute	Support batch upload of complete patient records
Entity extraction accuracy	>90% F1 for laboratory values	Clinical reliability requirement
Query response time (reference)	<30 seconds	Acceptable for clinical decision support
Query response time (patient)	<60 seconds (with 10K patient vectors)	Larger patient-specific search space
Concurrent patients	50 per DGX Spark	Multi-clinician usage scenario
Data retention	Configurable per institution	HIPAA compliance
Uptime	99.5%	Clinical workflow reliability
HIPAA compliance	Full	Required for patient data handling

16.4 Prioritization Matrix¶

Phase	Scope	Duration
Phase 1 (MVP)	Reference collections (literature, trials, autoantibodies, HLA, guidelines) + basic RAG query + autoantibody interpretation + HLA analysis + disease activity scoring	6 weeks
Phase 2 (Document Intelligence)	Clinical document ingestion pipeline + patient-specific collections + timeline construction + laboratory extraction + NLP entity extraction	6 weeks
Phase 3 (Advanced Analytics)	Genomic correlation + flare prediction + overlap syndrome detection + biologic therapy optimization + POTS/dysautonomia evaluation + FHIR export + comparative analysis	6 weeks

17. Data Acquisition Strategy¶

17.1 Automated Ingest Pipelines¶

Source	Method	Target Collection	Refresh Cadence
PubMed	E-utilities API (esearch + efetch)	autoimmune_literature	Weekly
ClinicalTrials.gov	V2 API	autoimmune_trials	Weekly
ACR/EULAR guidelines	Manual curation + PDF ingestion	autoimmune_guidelines	Quarterly
HLA-disease GWAS	Literature curation + GWAS Catalog	autoimmune_hla_associations	Monthly
Autoantibody reference	Expert curation from ACR/EULAR criteria	autoimmune_autoantibodies	Quarterly
Biologic drug database	DailyMed + FDA labels	autoimmune_biologics	Monthly
Dysautonomia reference	Expert curation from consensus statements	autoimmune_dysautonomia	Quarterly

17.2 Patient Document Sources¶

Source	Format	Ingestion Method
Patient portal exports	PDF, CCDA	Upload via Streamlit UI or API
EHR FHIR endpoints	FHIR R4 Bundle	SMART on FHIR integration
Genetic testing portals	PDF reports	OCR + genetic report parser
Laboratory result files	HL7 ORU, CSV	Structured parser
Outside records (faxed/scanned)	Image PDF	Tesseract OCR + NLP
Patient-provided records	Various	Document classifier + appropriate parser

17.3 PubMed Search Strategy¶

("autoimmune disease" OR "autoimmunity" OR "systemic lupus erythematosus" OR
 "rheumatoid arthritis" OR "multiple sclerosis" OR "inflammatory bowel disease" OR
 "psoriasis" OR "ankylosing spondylitis" OR "sjogren" OR "scleroderma" OR
 "myasthenia gravis" OR "celiac disease" OR "type 1 diabetes" OR "POTS" OR
 "dysautonomia" OR "postural orthostatic tachycardia" OR "Ehlers-Danlos" OR
 "mast cell activation" OR "autoantibody" OR "HLA association" OR
 "biologic therapy" OR "immunosuppressive") AND
("diagnosis" OR "biomarker" OR "classification criteria" OR "disease activity" OR
 "genomic" OR "genetic risk" OR "pharmacogenomics" OR "flare prediction" OR
 "overlap syndrome" OR "treatment response")

Estimated yield: 8,000-12,000 abstracts per refresh cycle.

18. Validation and Testing Strategy¶

18.1 Test Architecture¶

Test Type	Scope	Target Count
Unit tests	Knowledge graph, NLP extractors, scoring engines, models	150+
Integration tests	Collection operations, ingestion pipelines, API endpoints	80+
NLP accuracy tests	Entity extraction precision/recall against annotated clinical notes	50+
Clinical validation	Classification criteria scoring against expert-adjudicated cases	30+
End-to-end tests	Full workflow execution with synthetic patient records	15+
Performance tests	Query latency, ingestion throughput, concurrent access	10+

18.2 NLP Validation Approach¶

The clinical document NLP pipeline will be validated against:

i2b2/VA NLP challenge datasets: De-identified clinical notes with annotated entities (medications, diagnoses, laboratory values) -- standard benchmark for clinical NLP
MIMIC-III discharge summaries: Large corpus of ICU discharge summaries with structured data for validation
Synthetic autoimmune cases: Custom-generated clinical documents with known entities for regression testing
Expert review: Board-certified rheumatologist review of entity extraction accuracy on 100 sampled documents

18.3 Diagnostic Accuracy Validation¶

Classification criteria scoring engines will be validated by:

Creating 50+ synthetic patient profiles with known ACR/EULAR classification criteria scores
Comparing system-calculated scores against expert-calculated scores
Measuring sensitivity, specificity, and concordance for each classification system
Target: >95% concordance with expert scoring for structured data, >85% for NLP-extracted data

19. Regulatory Considerations¶

19.1 FDA Classification¶

The Precision Autoimmune Agent is designed as a Clinical Decision Support (CDS) tool that meets the criteria for exemption from FDA device regulation under the 21st Century Cures Act, Section 3060(a):

Not intended to replace clinical judgment: The system provides diagnostic hypotheses and evidence for clinician review, not autonomous diagnosis
Displays underlying evidence: All recommendations include source citations, classification criteria scores, and confidence levels
Clinician can independently verify: Every evidence citation links to its primary source (PubMed, ClinicalTrials.gov, clinical documents)
Intended for qualified professionals: The system is designed for use by licensed clinicians, not patients

19.2 HIPAA Compliance¶

Requirement	Implementation
PHI at rest	LUKS full-disk encryption on DGX Spark NVMe
PHI in transit	TLS 1.3 for all API communications
Access control	JWT authentication with RBAC
Audit logging	All patient data access logged with timestamp, user, action
Minimum necessary	LLM queries contain anonymized evidence snippets, not raw PHI
BAA	Required for cloud LLM provider (Anthropic) if PHI included in queries
De-identification	Patient identifiers stripped before LLM prompt construction

19.3 Data Privacy Architecture¶

The privacy architecture ensures that patient data never leaves the DGX Spark:

Patient Documents -> [Local OCR/NLP] -> [Local Milvus] -> [Local Query]
                                                              |
                                                              v
                                              [Anonymized evidence snippets]
                                                              |
                                                              v
                                              [Cloud LLM (Claude Sonnet 4.6)]
                                                              |
                                                              v
                                              [Synthesized response returned]

Only the anonymized evidence snippets (with patient identifiers removed) are sent to the cloud LLM. All raw clinical documents, extracted entities, and patient-specific collections remain on the local device.

20. DGX Compute Progression¶

20.1 DGX Spark (Current Target -- $3,999)¶

Component	Specification	Autoimmune Agent Usage
GPU	GB10 (Blackwell)	Embedding generation, NLP inference
Memory	128 GB unified LPDDR5x	Milvus index (~6 GB), embedding model (~130 MB), NLP models (~500 MB)
CPU	20 ARM cores (Grace)	Document ingestion, OCR, entity extraction
Storage	NVMe SSD	Milvus data, patient documents, audit logs

Memory budget for 10 concurrent patients (worst case): - Milvus reference collections: ~6 GB - 10 patients x 50,000 vectors x 384 dims x 4 bytes: ~740 MB - BGE-small-en-v1.5: 130 MB - SciSpaCy NLP models: 500 MB - LLM inference (Llama-3 8B NIM): ~16 GB - Operating system and overhead: ~8 GB - Total: ~31.4 GB of 128 GB (75% headroom)

20.2 Future: DGX Spark Cluster / DGX Station¶

For institutions processing hundreds of patients concurrently, horizontal scaling via multiple DGX Sparks or vertical scaling to DGX Station provides additional capacity. The Milvus distributed architecture (with external etcd and MinIO) supports seamless cluster scaling.

21. Implementation Roadmap¶

21.1 Three-Phase, 18-Week Plan¶

Phase 1: Reference Knowledge Foundation (Weeks 1-6)

Week	Deliverable
1-2	Milvus collection schemas (autoimmune_literature, autoimmune_trials, autoimmune_autoantibodies, autoimmune_hla_associations, autoimmune_guidelines) + PubMed ingest pipeline
3-4	Knowledge graph (7 dictionaries), query expansion maps (18 maps), entity alias resolution, disease activity scoring engines
5-6	FastAPI endpoints (query, compare, collections), Streamlit Evidence Explorer tab, basic RAG engine with parallel search, 150+ unit tests

Phase 2: Clinical Document Intelligence (Weeks 7-12)

Week	Deliverable
7-8	Document ingestion pipeline: PDF extraction (PyMuPDF), OCR (Tesseract), document classification, section segmentation
9-10	Medical NLP entity extraction: laboratory values, medications, diagnoses, symptoms, vital signs. Patient-specific Milvus collections (clinical_documents, patient_labs, patient_timeline)
11-12	Patient timeline construction, lab trend analysis, Streamlit Document Ingest + Patient Timeline + Lab Trends tabs, integration tests

Phase 3: Advanced Analytics and Genomic Integration (Weeks 13-18)

Week	Deliverable
13-14	Genomic-autoimmune correlation engine, HLA typing integration, non-HLA risk variant queries, autoimmune trigger conditions
15-16	Classification criteria scoring (SLE, RA, Sjogren's, PsA), overlap syndrome detection, POTS/dysautonomia evaluation, flare prediction engine
17-18	Biologic therapy optimizer with PGx, PDF/FHIR report generation, demo cases, end-to-end testing, documentation, docker-compose deployment

22. Risk Analysis¶

22.1 Technical Risks¶

Risk	Severity	Mitigation
NLP entity extraction accuracy insufficient for clinical use	High	Validate against i2b2 benchmarks; implement confidence scores; flag low-confidence extractions for human review
OCR quality for scanned/faxed documents	Medium	Pre-process with image enhancement; support manual correction; track OCR confidence scores
Patient document volume exceeds Milvus capacity on single node	Medium	Implement patient partition pruning; archive inactive patient collections; scale to Milvus cluster if needed
Classification criteria scoring discordance with expert assessment	High	Validate against expert-adjudicated cases; implement "uncertain" category for borderline scores; always display raw criteria components
LLM hallucination of diagnostic conclusions	High	Ground all conclusions in cited evidence; implement confidence thresholds; append disclaimers; require clinician verification

22.2 Clinical Risks¶

Risk	Severity	Mitigation
False positive autoimmune diagnosis leads to unnecessary immunosuppression	Critical	System provides diagnostic hypotheses, not diagnoses; always recommend confirmatory testing; clinician makes final determination
False negative: system misses autoimmune diagnosis	High	Multi-disease simultaneous evaluation reduces single-disease bias; overlap syndrome detection captures atypical presentations
PHI exposure via LLM API	Critical	De-identify all patient data before LLM submission; implement PHI detection guardrails; audit all outbound API calls
Patient self-diagnosis from system output	Medium	System designed for clinician use only; add role-based access control; include prominent disclaimers

22.3 Operational Risks¶

Risk	Severity	Mitigation
HIPAA violation from inadequate audit logging	Critical	Comprehensive audit trail for all patient data access; regular compliance review; encryption at rest and in transit
Document ingestion pipeline overwhelmed during batch upload	Medium	Queue-based architecture with progress tracking; configurable batch sizes; async processing
Knowledge graph becomes stale	Low	Automated weekly literature refresh; quarterly guideline review; version tracking for all knowledge base entries

23. Competitive Landscape¶

23.1 Why the Precision Autoimmune Agent Is Unique¶

No existing product combines all four of these capabilities:

Clinical document ingestion at patient scale: The system ingests thousands of patient documents and extracts structured data -- not just a single lab panel or questionnaire
Multi-disease simultaneous evaluation: The system evaluates evidence for 13+ autoimmune conditions in parallel, detecting overlaps that single-disease tools miss
Genomic correlation: HLA typing, non-HLA risk variants, and pharmacogenomic data are integrated directly into diagnostic and therapeutic reasoning
$3,999 hardware: The system runs on a desktop workstation, not a cloud platform with per-query pricing or enterprise licensing

23.2 Competitive Matrix¶

Capability	Precision Autoimmune Agent	Exagen AVISE	Epic Cogito	Google Health	Commercial AI Platforms
Clinical document ingestion	Yes (thousands)	No	Limited (single EHR)	No	No
Multi-disease evaluation	13+ diseases	Lupus only	Custom build required	No	1-3 diseases
Genomic correlation	HLA + non-HLA + PGx	No	No	Limited	No
Autoantibody interpretation	14 antibody types	10 lupus-specific	No	No	Variable
Disease activity scoring	5 validated indices	SLE only	Custom	No	Variable
Longitudinal trend analysis	Full timeline	No	Basic charts	No	Limited
POTS/dysautonomia	Full evaluation	No	No	No	No
Overlap syndrome detection	12 syndromes	No	No	No	No
Open-source	Apache 2.0	Proprietary	Proprietary	Proprietary	Proprietary
Hardware cost	$3,999 (one-time)	Per-test fee	EHR license	Cloud pricing	$50K-500K/year

24. Discussion¶

24.1 Technical Feasibility¶

The clinical document ingestion approach described in this paper is technically feasible with current technology. Medical NLP has matured significantly: SciSpaCy achieves >90% F1 on clinical entity extraction benchmarks, Tesseract OCR with medical vocabulary enhancement handles most clinical document formats, and FHIR R4 provides a standardized interoperability layer for structured data exchange. The multi-collection RAG architecture has been proven across five existing HCLS AI Factory agents, with the CAR-T Intelligence Agent demonstrating that 3.5+ million vectors can be searched in parallel across 11 collections in under 30 seconds on a single DGX Spark.

The primary technical challenge is NLP accuracy on the wide variety of clinical document formats encountered in real-world autoimmune patient records. Handwritten notes, faxed outside records, and non-standard laboratory report formats will require robust fallback handling and confidence scoring. The validation strategy (Section 18) addresses this through multi-benchmark evaluation and expert review.

24.2 Clinical Impact¶

The potential clinical impact of reducing the autoimmune diagnostic odyssey is substantial. For lupus alone, compressing the average 4.6-year diagnostic delay could prevent irreversible organ damage (nephritis, cardiovascular disease, neuropsychiatric complications) in tens of thousands of patients annually. For POTS patients, reducing the 5-7 year diagnostic delay addresses a condition that affects an estimated 1-3 million Americans, many of whom are unable to work or attend school during their diagnostic journey.

The system's ability to detect overlap syndromes -- conditions that by definition span multiple specialty domains -- addresses a diagnostic blind spot that no single-specialty tool can solve. A patient with MCTD features seen by a rheumatologist, a dermatologist, and a pulmonologist separately may never receive a unifying diagnosis; the Precision Autoimmune Agent evaluates all evidence simultaneously.

24.3 The Genomic-Autoimmune Frontier¶

The integration of genomic data with clinical document analysis represents a frontier capability. Most autoimmune patients never undergo HLA typing or whole exome sequencing unless a specific indication arises. As the cost of genomic testing continues to fall (whole exome sequencing now under $250), routine genomic profiling for autoimmune risk assessment becomes increasingly practical. The Precision Autoimmune Agent is designed to leverage this data when available, providing a bridge between clinical phenotyping and genetic risk assessment that few existing tools offer.

The POTS/hEDS/MCAS triad is a particularly compelling use case for genomic integration. TPSAB1 duplication (hereditary alpha-tryptasemia) is present in 4-6% of the general population and substantially increases MCAS risk. COL5A1 and COL3A1 variants are associated with Ehlers-Danlos subtypes. Identifying these genetic factors alongside clinical autonomic testing data creates a diagnostic framework that is far more powerful than either approach alone.

24.4 Limitations¶

NLP accuracy ceiling: Clinical NLP on real-world documents will not achieve 100% accuracy. Handwritten notes, non-standard abbreviations, and context-dependent interpretations will produce extraction errors that must be flagged and reviewed.
Diagnosis vs. diagnostic support: The system generates diagnostic hypotheses, not diagnoses. Classification criteria scores are calculated from available data, which may be incomplete. Clinician judgment remains essential.
Limited to supported diseases: The initial 13 autoimmune diseases and 10 dysautonomia conditions cover the most common entities but do not encompass all autoimmune conditions. Rare diseases (autoimmune retinopathy, susac syndrome, IgG4-related disease) will require future collection expansion.
Cloud LLM dependency: The synthesis step requires Claude API access. On-device LLM deployment via NIM would eliminate this dependency but with reduced generation quality.
Single-patient optimization: The current architecture is optimized for individual patient analysis, not population-level epidemiology. Cohort analysis features are a future extension.

25. Conclusion¶

25.1 Key Contributions¶

This paper has presented the architectural design and product requirements for the Precision Autoimmune Agent, a clinical document intelligence system that addresses the autoimmune diagnostic odyssey through six key innovations:

Clinical document ingestion at scale: A pipeline that ingests thousands of patient clinical documents (PDFs, FHIR bundles, C-CDA, free text), extracts structured clinical entities using medical NLP, and indexes them in patient-specific Milvus vector collections for semantic search.
Multi-disease simultaneous evaluation: 14 specialized Milvus collections with parallel search across autoimmune reference knowledge, patient-specific clinical data, and genomic evidence -- enabling evaluation of 13+ autoimmune conditions in a single query.
Genomic-autoimmune correlation: Automatic cross-referencing of clinical findings with HLA-disease associations (50+ conditions), non-HLA autoimmune risk variants (PTPN22, STAT4, IRF5, TNFAIP3, IL23R, CTLA4, TPSAB1), and pharmacogenomic data for therapy optimization.
Longitudinal pattern recognition: Biomarker trend analysis across years of laboratory data to detect flare-predictive patterns (rising anti-dsDNA, falling complement, rising calprotectin) that human review of individual lab reports would miss.
POTS/dysautonomia integration: Purpose-built evaluation for POTS, hEDS, MCAS, and associated autonomic dysfunction -- conditions that are systematically underdiagnosed due to their multi-system nature and overlap with autoimmune disease.
Hardware democratization: The complete system runs on a single NVIDIA DGX Spark ($3,999), with all patient data processed locally for HIPAA compliance -- bringing world-class autoimmune diagnostic intelligence to any clinic worldwide.

25.2 Future Directions¶

Multi-institutional federated learning via NVIDIA FLARE for autoimmune pattern discovery across health systems without sharing patient data
Patient-reported outcomes integration via smartphone apps for symptom tracking between clinic visits
Wearable device integration for continuous heart rate monitoring (POTS) and activity tracking (RA, AS)
Natural language patient history intake where patients describe symptoms conversationally and the system structures the data
Autoimmune disease prediction using pre-symptomatic biomarker patterns and genetic risk scores to identify individuals at risk before clinical onset
Drug repurposing by analyzing treatment response patterns across the autoimmune disease spectrum
International guideline support for NICE (UK), DGRh (Germany), JCR (Japan), and APLAR (Asia-Pacific) guidelines

25.3 Closing Remarks¶

The autoimmune diagnostic odyssey is not inevitable. The data needed to diagnose most autoimmune patients already exists in their medical records -- it is simply fragmented across too many systems, too many specialists, and too many years for any human to synthesize. The Precision Autoimmune Agent demonstrates that a carefully designed clinical document intelligence system, built on multi-collection RAG with genomic correlation and running on accessible hardware, can transform this fragmented data landscape into actionable diagnostic intelligence.

For the 50 million Americans and 800 million people worldwide living with autoimmune diseases -- many still undiagnosed -- this technology has the potential to compress years of suffering into weeks of systematic evaluation. By open-sourcing this system under the Apache 2.0 license, we aim to make this capability available to every clinic, every rheumatologist, and every patient who needs it.

26. References¶

Autoimmune Disease Epidemiology and Burden¶

Jacobson DL, Gange SJ, Rose NR, Graham NM. Epidemiology and estimated population burden of selected autoimmune diseases in the United States. Clin Immunol Immunopathol. 1997;84(3):223-243. doi:10.1006/clin.1997.4412
American Autoimmune Related Diseases Association. Autoimmune Disease Statistics. https://autoimmune.org/resource-center/autoimmune-statistics/
GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020;396(10258):1204-1222.

Diagnostic Delay in Autoimmune Disease¶

Lupus Foundation of America. Lupus Facts and Statistics. Median diagnostic delay 4.6 years. https://www.lupus.org/resources/lupus-facts-and-statistics
Dysautonomia International. Diagnostic Delay Survey Results (2019). Median 5.9 years to POTS diagnosis.
Rubio-Tapia A, Kyle RA, Kaplan EL, et al. Increased prevalence and mortality in undiagnosed celiac disease. Gastroenterology. 2009;137(1):88-93. doi:10.1053/j.gastro.2009.03.059
Feldtkeller E, Khan MA, van der Heijde D, et al. Age at disease onset and diagnosis delay in HLA-B27 negative vs. positive patients with ankylosing spondylitis. Rheumatol Int. 2003;23(2):61-66.

Classification Criteria¶

Aringer M, Costenbader K, Daikh D, et al. 2019 European League Against Rheumatism/American College of Rheumatology Classification Criteria for Systemic Lupus Erythematosus. Arthritis Rheumatol. 2019;71(9):1400-1412. doi:10.1002/art.40930
Aletaha D, Neogi T, Silman AJ, et al. 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum. 2010;62(9):2569-2581. doi:10.1002/art.27584
Shiboski CH, Shiboski SC, Seror R, et al. 2016 American College of Rheumatology/European League Against Rheumatism Classification Criteria for Primary Sjogren's Syndrome. Arthritis Rheumatol. 2017;69(1):35-45. doi:10.1002/art.39859
van den Hoogen F, Khanna D, Fransen J, et al. 2013 classification criteria for systemic sclerosis: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum. 2013;65(11):2737-2747.

HLA and Autoimmune Genetics¶

Trowsdale J, Knight JC. Major histocompatibility complex genomics and human disease. Annu Rev Genomics Hum Genet. 2013;14:301-323. doi:10.1146/annurev-genom-091212-153455
Brown MA, Kenna T, Wordsworth BP. Genetics of ankylosing spondylitis -- insights into pathogenesis. Nat Rev Rheumatol. 2016;12(2):81-91. doi:10.1038/nrrheum.2015.133
Raychaudhuri S, Sandor C, Stahl EA, et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat Genet. 2012;44(3):291-296. doi:10.1038/ng.1076

POTS and Dysautonomia¶

Sheldon RS, Grubb BP 2nd, Olshansky B, et al. 2015 Heart Rhythm Society Expert Consensus Statement on the Diagnosis and Treatment of Postural Tachycardia Syndrome. Heart Rhythm. 2015;12(6):e41-e63. doi:10.1016/j.hrthm.2015.03.029
Vernino S, Bourne KM, Stiles LE, et al. Postural Orthostatic Tachycardia Syndrome (POTS): State of the Science and Clinical Care from a 2019 National Institutes of Health Expert Consensus Meeting. Auton Neurosci. 2021;235:102828.
Dahan S, Tomljenovic L, Shoenfeld Y. Postural Orthostatic Tachycardia Syndrome (POTS) -- A Novel Member of the Autoimmune Family. Lupus. 2016;25(4):339-342.

Autoimmune Biomarkers and Disease Activity¶

van Gestel AM, Prevoo ML, van 't Hof MA, et al. Development and validation of the European League Against Rheumatism response criteria for rheumatoid arthritis. Arthritis Rheum. 1996;39(1):34-40. (DAS28)
Gladman DD, Ibanez D, Urowitz MB. Systemic lupus erythematosus disease activity index 2000. J Rheumatol. 2002;29(2):288-291. (SLEDAI-2K)
Garrett S, Jenkinson T, Kennedy LG, et al. A new approach to defining disease status in ankylosing spondylitis: the Bath Ankylosing Spondylitis Disease Activity Index. J Rheumatol. 1994;21(12):2286-2291. (BASDAI)

Biologic Therapy and Pharmacogenomics¶

Smolen JS, Landewe RBM, Bijlsma JWJ, et al. EULAR recommendations for the management of rheumatoid arthritis with synthetic and biological disease-modifying antirheumatic drugs: 2019 update. Ann Rheum Dis. 2020;79(6):685-699.
Fanouriakis A, Kostopoulou M, Alunno A, et al. 2019 update of the EULAR recommendations for the management of systemic lupus erythematosus. Ann Rheum Dis. 2019;78(6):736-745.

Clinical NLP and RAG Architecture¶

Neumann M, King D, Beltagy I, Ammar W. ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. Proceedings of BioNLP. 2019. (SciSpaCy)
Lewis P, Perez E, Piktus A, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems. 2020;33:9459-9474.
Wang J, Yi X, Guo R, et al. Milvus: A Purpose-Built Vector Data Management System. Proceedings of the 2021 International Conference on Management of Data. 2021:2614-2627. doi:10.1145/3448016.3457550
Xiao S, Liu Z, Zhang P, Muennighoff N. C-Pack: Packaged Resources To Advance General Chinese Embedding. 2023. arXiv:2309.07597 (BGE embedding model)

Precision Autoimmune Agent -- HCLS AI Factory v0.1.0 (Pre-Implementation) Apache 2.0 License | https://github.com/ajones1923/hcls-ai-factory