Breaking Down Data Silos in Autoimmune Disease Management: A Multi-Collection RAG Architecture for Clinical Intelligence¶

Author: Adam Jones Date: March 2026 Version: 1.0.0 License: Apache 2.0

Part of the HCLS AI Factory -- an end-to-end precision medicine platform. https://github.com/ajones1923/hcls-ai-factory

Abstract¶

Autoimmune diseases affect an estimated 50 million Americans and represent one of the most diagnostically challenging areas of medicine. The average autoimmune patient sees four or more specialists over four or more years before receiving a correct diagnosis -- a phenomenon known as the "diagnostic odyssey." Clinical data for these patients is fragmented across rheumatology notes, immunology panels, HLA genotyping reports, imaging studies, pathology results, and pharmacy records, often spanning multiple institutions and electronic health record systems. A single integrative question such as "Given this patient's autoantibody profile, HLA genotype, and lab trends, what is the most likely diagnosis and which biologic therapy is pharmacogenomically appropriate?" requires manual synthesis across at least six distinct data domains, a process that can consume hours of specialist time and is prone to oversight.

This paper presents the Precision Autoimmune Intelligence Agent, an AI-powered multi-collection retrieval-augmented generation (RAG) system that unifies 14 specialized Milvus vector collections spanning the full autoimmune clinical workflow. The system encodes knowledge of 13 autoimmune diseases, 24 autoantibodies with sensitivity/specificity data, 22 HLA alleles with disease-specific odds ratios, 20 disease activity scoring systems, 13 flare biomarker patterns, 22 biologic therapies with pharmacogenomic considerations, 10 ACR/EULAR classification criteria sets, and 9 overlap syndromes. It employs 384-dimensional BAAI/bge-small-en-v1.5 embeddings with IVF_FLAT indexing and COSINE similarity, keyword-based disease area detection for dynamic collection weight adjustment, and Claude for evidence synthesis. The agent includes a five-stage clinical analysis pipeline, a diagnostic engine with classification criteria evaluation and differential diagnosis generation, a PDF clinical document ingestion pipeline supporting 9 demo patients with over 150 clinical PDFs, and multi-format export (Markdown, FHIR R4, PDF). With 431 automated tests across 8 test files, 14 REST API endpoints, and a 10-tab Streamlit UI, the system runs on a single NVIDIA DGX Spark and is released under the Apache 2.0 license. We demonstrate that a domain-specific multi-collection RAG architecture can transform fragmented autoimmune clinical data into integrated diagnostic intelligence, compressing the diagnostic odyssey from years to minutes.

1. Introduction¶

1.1 The Autoimmune Data Challenge¶

Autoimmune diseases represent a uniquely difficult class of conditions for clinical informatics. Unlike cancers -- where a biopsy typically yields a definitive tissue diagnosis -- autoimmune diseases are diagnosed through the convergence of clinical findings, serological markers, genetic predispositions, imaging patterns, and the exclusion of mimics. The data required for accurate autoimmune diagnosis and management is distributed across fundamentally different systems:

Clinical documents reside in institutional EHRs, often spanning multiple specialties (rheumatology, neurology, gastroenterology, dermatology, endocrinology) with inconsistent terminology and incomplete cross-referencing.
Autoantibody panels are reported by reference laboratories with varying sensitivity and specificity depending on assay methodology, and interpretation requires understanding of disease-specific pre-test probabilities.
HLA genotyping data is generated by molecular laboratories but rarely integrated with clinical decision-making beyond transplant settings, despite strong disease associations (e.g., HLA-B*27:05 with ankylosing spondylitis, OR=87.4).
Disease activity scores (DAS28-CRP, SLEDAI-2K, BASDAI, PASI, and 16 others) require structured data extraction from clinical notes and lab values that is almost never automated.
Longitudinal lab trends -- the rising anti-dsDNA titer, the falling complement C3, the creeping calprotectin -- are critical for flare prediction but are buried in sequential lab reports that clinicians must mentally integrate.
Pharmacogenomic data relevant to biologic therapy selection (CYP3A4 metabolism for JAK inhibitors, FCGR3A polymorphisms affecting rituximab efficacy, HLA-based immunogenicity risk) exists in silos separate from prescribing workflows.
Overlap syndromes -- where features of multiple autoimmune diseases coexist -- require cross-referencing criteria sets that were designed for single-disease classification.

The consequence is that the most important clinical questions in autoimmune medicine are the hardest to answer computationally.

1.2 Limitations of Traditional Approaches¶

Current approaches to autoimmune clinical intelligence suffer from structural limitations:

Keyword-based literature search misses semantically related content. A PubMed search for "lupus nephritis biomarkers" will not find papers discussing "anti-dsDNA titer correlation with renal flare" unless those exact terms appear.
Single-domain tools (UpToDate, DynaMed, clinical calculators) each address one aspect of autoimmune care, requiring clinicians to manually synthesize findings across platforms.
Classification criteria calculators evaluate one disease at a time and cannot perform differential diagnosis across multiple autoimmune conditions simultaneously.
General-purpose AI assistants lack the structured knowledge bases, autoantibody databases, HLA association data, and citation provenance required for clinical decision support.
EHR-integrated tools are institution-specific and cannot incorporate external knowledge from published literature, clinical trials, or pharmacogenomic databases.

1.3 Our Contribution¶

The Precision Autoimmune Intelligence Agent addresses these limitations through a multi-collection RAG architecture that:

Unifies 14 specialized vector collections spanning the complete autoimmune clinical workflow into a single query interface
Employs parallel vector search across all collections simultaneously with configurable per-collection weights that sum to 1.0
Augments retrieval with a domain knowledge base (v2.0.0) containing curated data on 22 HLA alleles, 24 autoantibodies, 22 biologic therapies, 20 disease activity scoring systems, 13 flare biomarker patterns, and 10 classification criteria sets
Detects disease areas from query keywords and dynamically adjusts collection weights to prioritize disease-relevant evidence
Synthesizes evidence through Claude with a domain-expert system prompt covering autoantibody interpretation, HLA associations, disease activity scoring, flare prediction, biologic therapy selection, diagnostic odyssey analysis, and overlap syndrome detection
Provides structured citation formats ([AutoAb:name], [HLA:allele], [Activity:score], [Therapy:drug], [Literature:PMID], [Trial:NCT_ID]) maintaining the traceability that clinical work demands
Implements a diagnostic engine with ACR/EULAR classification criteria evaluation, differential diagnosis generation, and overlap syndrome detection
Ingests clinical PDF documents and builds patient timelines that visualize the diagnostic odyssey
Runs on a single NVIDIA DGX Spark, democratizing access to autoimmune clinical intelligence

2. Background¶

2.1 Autoimmune Disease: Current Landscape¶

Autoimmune diseases encompass more than 80 distinct conditions in which the immune system attacks the body's own tissues. The 13 diseases covered by this system represent the most clinically significant autoimmune conditions by prevalence, severity, and therapeutic complexity:

Disease	Abbreviation	Primary Specialty	Estimated US Prevalence
Rheumatoid Arthritis	RA	Rheumatology	1.3 million
Systemic Lupus Erythematosus	SLE	Rheumatology	200,000-300,000
Multiple Sclerosis	MS	Neurology	1 million
Type 1 Diabetes	T1D	Endocrinology	1.6 million
Inflammatory Bowel Disease	IBD	Gastroenterology	3 million
Psoriasis	PsO	Dermatology	7.5 million
Ankylosing Spondylitis	AS	Rheumatology	560,000
Sjogren's Syndrome	SjS	Rheumatology	1-4 million
Systemic Sclerosis	SSc	Rheumatology	100,000
Myasthenia Gravis	MG	Neurology	60,000-80,000
Celiac Disease	CD	Gastroenterology	2 million
Graves' Disease	GD	Endocrinology	1.2 million
Hashimoto's Thyroiditis	HT	Endocrinology	14 million

These 13 conditions collectively affect tens of millions of patients, span at least five medical specialties, and require fundamentally different diagnostic and therapeutic approaches. Yet they share common immunological mechanisms, overlapping serological markers, and a tendency to co-occur -- creating a clinical puzzle that demands integrated analysis.

Autoimmune disease management generates data across at least eight distinct categories, each with its own structure, vocabulary, and source systems:

Clinical documents -- Progress notes, consultation letters, discharge summaries spanning multiple specialties and institutions
Laboratory results -- Autoantibody panels, inflammatory markers, organ-specific labs, each with reference ranges that vary by assay
Genetic data -- HLA genotyping, pharmacogenomic variants (CYP2C19, CYP3A4, FCGR3A), disease susceptibility loci
Disease activity assessments -- Scored instruments (DAS28, SLEDAI-2K, BASDAI, PASI) that require structured data extraction from unstructured notes
Imaging -- Joint erosions (RA), sacroiliac MRI (AS), brain MRI (MS), chest CT (SSc-ILD), each interpreted by different subspecialists
Pathology -- Renal biopsy (lupus nephritis), intestinal biopsy (celiac), skin biopsy (psoriasis), each with disease-specific classification systems
Pharmacotherapy records -- Biologic therapy history, treatment responses, adverse events, and pharmacogenomic considerations
Temporal patterns -- The longitudinal trajectory of biomarkers that signals disease flare, remission, or progression

Each data category has a different optimal schema, different source systems, different update cadences, and different relevance to different clinical questions. A traditional single-collection vector database cannot adequately represent this heterogeneity. A multi-collection architecture, where each collection has a purpose-built schema with domain-specific metadata fields, is essential.

3. System Architecture¶

3.1 Overview¶

The Precision Autoimmune Intelligence Agent implements a multi-collection RAG architecture with six core components: (1) domain-specific Milvus collections with typed schemas, (2) parallel vector search with configurable collection weights, (3) a structured domain knowledge base, (4) keyword-based disease area detection with dynamic weight adjustment, (5) a diagnostic engine with classification criteria evaluation, and (6) LLM synthesis with structured citation provenance. The system processes a user query through the following pipeline:

User Query
    |
    v
[Disease Area Detection] -- keyword matching across 13 disease vocabularies
    |
    v
[BAAI/bge-small-en-v1.5 Embedding]
(384-dim, asymmetric query prefix)
    |
    v
[Parallel Search: 14 Milvus Collections]
(ThreadPoolExecutor, IVF_FLAT / COSINE, nprobe=16)
    |
    v
[Collection Weight Adjustment]
(disease-specific weights applied dynamically)
    |
    v
[Knowledge Base Augmentation v2.0.0]
(22 HLA alleles, 24 autoantibodies, 22 therapies,
 20 activity scores, 13 flare patterns, 10 criteria sets)
    |
    v
[Score-Weighted Merge & Rank]
(citation relevance: high >= 0.80, medium >= 0.60)
    |
    v
[Five-Stage Clinical Pipeline]
Patient Profile -> Autoantibody Interpretation -> HLA Analysis ->
Disease Activity Scoring -> Flare Prediction -> Biologic Recommendations
    |
    v
[Diagnostic Engine]
(ACR/EULAR criteria evaluation, differential diagnosis,
 overlap syndrome detection)
    |
    v
[Claude] --> Grounded response with structured citations
    |
    v
[Export] --> Markdown | FHIR R4 | PDF

3.2 The 14 Collections¶

The system maintains 14 Milvus vector collections, each with a purpose-built schema containing domain-specific metadata fields. Thirteen collections are owned and populated by the autoimmune agent; the fourteenth (genomic_evidence) is a read-only shared collection created by the upstream genomics pipeline.

#	Collection	Weight	Schema Fields	Description
1	`autoimmune_clinical_documents`	0.18	id, embedding, text_chunk, patient_id, doc_type, specialty, provider, visit_date, source_file, page_number, chunk_index	Ingested clinical PDFs (progress notes, labs, imaging, pathology)
2	`autoimmune_patient_labs`	0.14	id, embedding, text_chunk, patient_id, test_name, value, unit, reference_range, flag, collection_date, panel_name	Laboratory results with reference range analysis and flags
3	`autoimmune_autoantibody_panels`	0.12	id, embedding, text_chunk, antibody_name, associated_diseases, sensitivity, specificity, pattern, clinical_significance, interpretation_guide	24 autoantibodies with sensitivity/specificity data
4	`autoimmune_hla_associations`	0.08	id, embedding, text_chunk, allele, disease, odds_ratio, population, pmid, mechanism, clinical_implication	22 HLA alleles with disease-specific odds ratios
5	`autoimmune_disease_criteria`	0.08	id, embedding, text_chunk, disease, criteria_set, criteria_type, year, required_score, criteria_items, sensitivity_specificity	ACR/EULAR classification and diagnostic criteria
6	`autoimmune_disease_activity`	0.07	id, embedding, text_chunk, score_name, disease, components, thresholds, interpretation, monitoring_frequency	20 disease activity scoring systems with thresholds
7	`autoimmune_flare_patterns`	0.06	id, embedding, text_chunk, disease, biomarker_pattern, early_warning_signs, typical_timeline, protective_factors, intervention_triggers	Flare prediction biomarker patterns for 13 diseases
8	`autoimmune_biologic_therapies`	0.06	id, embedding, text_chunk, drug_name, drug_class, mechanism, indicated_diseases, pgx_considerations, contraindications, monitoring, dosing, evidence_level	22 biologic drugs with PGx data
9	`autoimmune_pgx_rules`	0.04	id, embedding, text_chunk, gene, variant, drug, phenotype, recommendation, evidence_level, pmid	Pharmacogenomic dosing rules (CPIC-aligned)
10	`autoimmune_clinical_trials`	0.05	id, embedding, text_chunk, title, nct_id, phase, status, disease, intervention, biomarker_criteria, enrollment, start_year, sponsor	Active autoimmune clinical trials
11	`autoimmune_literature`	0.05	id, embedding, text_chunk, title, authors, journal, year, pmid, disease_focus, keywords, abstract_summary	Published autoimmune research
12	`autoimmune_patient_timelines`	0.03	id, embedding, text_chunk, patient_id, event_type, event_date, description, provider, specialty, days_from_first_symptom	Diagnostic odyssey timeline events
13	`autoimmune_cross_disease`	0.02	id, embedding, text_chunk, primary_disease, associated_conditions, shared_pathways, shared_biomarkers, overlap_criteria, co_occurrence_rate	Overlap syndromes and shared pathogenic mechanisms
14	`genomic_evidence`	0.02	(shared read-only)	Genomic variant evidence from the upstream genomics pipeline

The collection weights sum to 1.00 and reflect the relative clinical importance of each data domain. Clinical documents and patient labs receive the highest weights (0.18 and 0.14 respectively) because they contain the most patient-specific, actionable information. Genomic evidence and cross-disease patterns receive the lowest weights (0.02 each) as they provide supplementary context rather than primary clinical data.

3.3 The Five-Stage Clinical Pipeline¶

The agent processes patient data through a five-stage clinical analysis pipeline that mirrors the diagnostic reasoning of an experienced rheumatologist or clinical immunologist:

Stage 1: Patient Profile Construction. Clinical documents are ingested, chunked, embedded, and stored in the autoimmune_clinical_documents collection. The system extracts demographics, chief complaints, medication history, and specialist visit history to construct a longitudinal patient profile.

Stage 2: Autoantibody Interpretation. Laboratory results are cross-referenced against the 24-autoantibody knowledge base. Each autoantibody is evaluated with its disease-specific sensitivity and specificity. ANA patterns (homogeneous, speckled, nucleolar, centromere) are interpreted in context. The system flags clinically significant findings such as high-specificity markers (anti-Smith for SLE, specificity 0.99; anti-CCP for RA, specificity 0.95; anti-Scl-70 for SSc, specificity 0.99).

Stage 3: HLA Analysis. If HLA genotyping data is available, the system evaluates disease risk using the 22-allele knowledge base with population-specific odds ratios. Strong associations (OR > 5) are flagged as high-priority findings. For example, HLA-B*27:05 positivity in a patient with inflammatory back pain triggers an ankylosing spondylitis risk alert (OR=87.4).

Stage 4: Disease Activity Scoring. The system evaluates available clinical and laboratory data against all applicable disease activity scoring instruments. For RA, this includes DAS28-CRP, DAS28-ESR, CDAI, and SDAI. For SLE, SLEDAI-2K. For AS, BASDAI and ASDAS. Each score is classified into remission, low, moderate, or high activity categories based on validated thresholds.

Stage 5: Flare Prediction and Biologic Recommendations. Longitudinal biomarker trends are analyzed against 13 disease-specific flare patterns to assess imminent flare risk. Based on the integrated assessment, the system generates biologic therapy recommendations with pharmacogenomic considerations, contraindication screening, and monitoring requirements.

3.4 Embedding Strategy¶

All text is embedded using the BAAI/bge-small-en-v1.5 model, producing 384-dimensional dense vectors. This model was selected for its strong performance on biomedical text retrieval benchmarks relative to its compact size, which is critical for deployment on a single DGX Spark. Key parameters:

Model: BAAI/bge-small-en-v1.5
Dimensions: 384
Index type: IVF_FLAT (nlist=1024)
Search metric: COSINE
Search parameter: nprobe=16
Query prefix: "Represent this sentence for searching relevant passages: " (asymmetric instruction for query embedding)
Batch size: 32 documents per embedding batch

The IVF_FLAT index with nlist=1024 provides an effective balance between recall and search speed across the 14 collections. With nprobe=16, approximately 1.6% of clusters are searched per query, delivering sub-100ms search latency per collection while maintaining high recall for clinical queries.

4. Knowledge Augmentation¶

4.1 The Domain Knowledge Base (v2.0.0)¶

The system's domain knowledge base (version 2.0.0, last updated 2026-03-10) provides structured clinical reference data that augments vector retrieval. Unlike vector search, which depends on semantic similarity between query and stored text, the knowledge base provides deterministic, curated facts that are injected into the LLM context when relevant entities are detected. The knowledge base draws from authoritative sources:

ACR/EULAR Classification Criteria (2010-2019)
HLA Disease Association Database (PMID:28622507)
CPIC Guidelines for Biologics (2024)
UpToDate Autoimmune Disease Management (2025)
2017 McDonald Criteria for MS (PMID:29275977)
2016 ACR/EULAR Sjogren's Classification (PMID:27785888)
2013 ACR/EULAR SSc Classification (PMID:24098041)
ESPGHAN Celiac Diagnostic Guidelines (PMID:22197856)
FDA Biologic Therapy Approvals (2024-2025)

The knowledge base contains the following curated data:

Domain	Count	Key Data Points
HLA alleles	22	Allele, disease, odds ratio, PMID, mechanism, clinical implication
Autoantibodies	24	Antibody name, associated diseases, sensitivity, specificity, pattern, interpretation
Biologic therapies	22	Drug name, class, mechanism, indications, PGx considerations, contraindications, monitoring
Disease activity scores	20	Score name, disease, components, thresholds (remission/low/moderate/high), reference
Flare biomarker patterns	13	Disease, early warning biomarkers, thresholds, protective signals
Classification criteria	10	Disease, criteria set, threshold score, criteria items
Overlap syndromes	9	Primary disease, associated conditions, shared pathways, shared biomarkers
Lab test patterns	45	Test name, reference ranges, flag conditions, clinical significance

4.2 Entity Resolution and Query Routing¶

When a query mentions a specific entity -- an autoantibody, HLA allele, drug, or disease -- the system resolves it against the knowledge base and injects the structured data into the LLM context. Entity resolution handles common aliases and abbreviations:

"anti-CCP" resolves to the autoantibody entry with RA sensitivity 0.67 and specificity 0.95
"B27" or "HLA-B27" resolves to HLA-B*27:05 with AS odds ratio 87.4
"rituximab" resolves to the biologic therapy entry with anti-CD20 mechanism, RA/SLE/MG indications, and FCGR3A PGx considerations
"DAS28" resolves to both DAS28-CRP and DAS28-ESR scoring systems with remission/low/moderate/high thresholds

This entity resolution ensures that even when vector search does not surface a relevant knowledge base entry (e.g., the query is semantically distant from the stored text), the deterministic lookup provides the correct clinical reference data.

4.3 Context Functions¶

Knowledge base entries are formatted as structured context blocks injected into the LLM prompt. For example, when a query mentions HLA-B*27:05, the following context is injected:

[HLA Association Context]
Allele: HLA-B*27:05
Disease: Ankylosing Spondylitis
Odds Ratio: 87.4 (PMID:25603694)
Note: Strongest known HLA-disease association
Clinical Implication: In a patient with inflammatory back pain and
sacroiliitis, HLA-B*27:05 positivity strongly supports AS diagnosis.

Similarly, autoantibody context includes sensitivity and specificity data, biologic therapy context includes pharmacogenomic considerations and monitoring requirements, and disease activity context includes validated score thresholds.

5. Query Expansion and Disease Area Detection¶

5.1 The Recall Problem¶

A fundamental challenge in biomedical RAG is recall -- ensuring that semantically relevant documents are retrieved even when the query uses different terminology than the stored text. A clinician asking about "morning stiffness in the hands" expects results about rheumatoid arthritis, even if the stored documents use terms like "synovitis," "DAS28," or "anti-CCP." Pure vector similarity can miss these connections when the embedding model has not seen sufficient co-occurrence of these terms during training.

5.2 Keyword-Based Disease Detection¶

The system implements keyword-based disease area detection that maps query terms to the 13 supported autoimmune diseases. Each disease has a curated vocabulary of 6-10 keywords and phrases:

Rheumatoid Arthritis: "rheumatoid," "anti-ccp," "das28," "joint swelling," "synovitis," "morning stiffness," "erosive arthritis," "citrullinated"
Systemic Lupus: "lupus," "anti-dsdna," "sledai," "butterfly rash," "nephritis," "complement," "photosensitivity," "malar rash," "discoid"
Ankylosing Spondylitis: "ankylosing," "spondylitis," "basdai," "hla-b27," "sacroiliitis," "axial spondyloarthritis," "bamboo spine," "uveitis"
Multiple Sclerosis: "multiple sclerosis," "demyelinating," "oligoclonal," "relapsing remitting," "interferon beta"
Sjogren's Syndrome: "sjogren," "dry eyes," "dry mouth," "schirmer," "anti-ssa," "anti-ssb," "sicca"
Systemic Sclerosis: "scleroderma," "raynaud," "anti-scl-70," "anti-centromere," "skin thickening," "pulmonary fibrosis," "crest"
IBD: "crohn," "colitis," "ibd," "calprotectin," "ulcerative," "inflammatory bowel," "fistula"
Psoriasis: "psoriasis," "pasi," "plaque," "il-17," "biologics psoriasis"

And similarly for type 1 diabetes, myasthenia gravis, celiac disease, Graves' disease, and Hashimoto's thyroiditis. When disease-specific keywords are detected, the system adjusts collection search weights to prioritize the most relevant data domains.

5.3 Multi-Collection Weight Adjustment¶

When a query is classified as disease-specific, the base collection weights (Section 3.2) are dynamically adjusted. For example, a query about SLE flare management would increase weights for:

autoimmune_flare_patterns (SLE-specific biomarker patterns: anti-dsDNA titer, complement C3/C4, lymphocyte count)
autoimmune_disease_activity (SLEDAI-2K scoring)
autoimmune_autoantibody_panels (anti-dsDNA, anti-Smith, ANA)
autoimmune_biologic_therapies (belimumab, rituximab, hydroxychloroquine)

This dynamic weighting ensures that disease-specific queries retrieve the most clinically relevant evidence, while general queries (e.g., "compare TNF inhibitors across autoimmune diseases") maintain the default weight distribution.

6. Data Integration¶

6.1 Clinical Document Ingestion (PDF Pipeline)¶

The system includes a PDF clinical document ingestion pipeline that processes patient records into searchable vector embeddings. The pipeline:

Extracts text from uploaded PDF files at 200 DPI resolution
Chunks text into segments of up to 2,500 characters with 200-character overlap to preserve context across chunk boundaries
Extracts metadata including patient ID, document type (progress note, lab report, imaging, pathology), specialty, provider, and visit date
Embeds chunks using bge-small-en-v1.5 in batches of 32
Inserts vectors into the autoimmune_clinical_documents collection with full metadata
Builds timeline events in the autoimmune_patient_timelines collection, tracking the diagnostic odyssey

The system supports PDF uploads up to 50 MB via the FastAPI endpoint, with input validation and sanitization of filter values to prevent injection attacks.

6.2 Demo Patient Data (9 Patients)¶

The system ships with 9 demo patients representing diverse autoimmune presentations, each with a full set of synthetic clinical PDF documents:

Patient	Demographics	Primary Diagnosis	PDFs	Date Range	Key Features
Sarah Mitchell	34F	SLE / Lupus Nephritis	35	2022-2025	Class IV nephritis, anti-dsDNA/complement monitoring, belimumab response
Maya Rodriguez	28F	POTS/hEDS/MCAS Triad	28	2021-2025	Diagnostic odyssey across 5+ specialties, dismissed symptoms, eventual diagnosis
David Park	45M	Ankylosing Spondylitis	26	2019-2025	HLA-B*27:05 positive, BASDAI/ASDAS tracking, TNF inhibitor therapy
Linda Chen	--	Sjogren's Syndrome	20	--	Anti-SSA/SSB positive, ESSDAI monitoring, glandular involvement
Rachel Thompson	--	MCTD (Mixed Connective Tissue Disease)	22	--	High-titer anti-U1 RNP, overlap features of SLE/SSc/myositis
Emma Williams	--	(additional case)	--	--	Additional autoimmune presentation
James Cooper	--	(additional case)	--	--	Additional autoimmune presentation
Karen Foster	--	(additional case)	--	--	Additional autoimmune presentation
Michael Torres	--	(additional case)	--	--	Additional autoimmune presentation

These demo patients provide a realistic clinical dataset for demonstrating the system's diagnostic capabilities, timeline analysis, and treatment recommendation features.

6.3 Curated Seed Data¶

Beyond clinical documents, the system populates its 14 collections with curated seed data derived from authoritative sources:

Autoantibody panels: 24 autoantibodies with disease associations, sensitivity, specificity, and interpretation guides sourced from clinical immunology references
HLA associations: 22 alleles with disease-specific odds ratios sourced from the HLA Disease Association Database and validated against published meta-analyses
Biologic therapies: 22 drugs with mechanisms, indications, pharmacogenomic considerations, contraindications, and monitoring requirements sourced from FDA labeling and CPIC guidelines
Disease activity scores: 20 scoring systems with validated thresholds sourced from original validation studies
Flare patterns: 13 disease-specific biomarker pattern sets with thresholds and protective signals compiled from longitudinal cohort studies
Classification criteria: 10 ACR/EULAR criteria sets with scoring algorithms sourced from official ACR/EULAR publications

7. Diagnostic Engine¶

7.1 Classification Criteria Evaluation¶

The diagnostic engine implements programmatic evaluation of ACR/EULAR classification criteria for supported diseases. For each disease, the system encodes the official criteria point system and evaluates available patient data against it. Examples:

2010 ACR/EULAR RA Classification: A patient with 4-10 small joint involvement (3 points), high-positive anti-CCP (3 points), abnormal CRP (1 point), and symptom duration >= 6 weeks (1 point) scores 8/10, exceeding the >= 6 threshold for definite RA classification.

2019 ACR/EULAR SLE Classification: Entry criterion is ANA >= 1:80. The point system evaluates domains including constitutional (fever: 2 points), hematologic (thrombocytopenia: 4 points), neuropsychiatric (seizure: 5 points), mucocutaneous (acute cutaneous: 6 points), serosal (acute pericarditis: 6 points), musculoskeletal (joint involvement: 6 points), renal (class III/IV nephritis: 10 points), and immunology (anti-dsDNA or anti-Smith: 6 points). A score >= 10 with positive ANA supports SLE classification.

7.2 Differential Diagnosis Generation¶

The diagnostic engine generates differential diagnoses by evaluating a patient's clinical data against all 13 supported autoimmune diseases simultaneously. The system:

Collects all available autoantibody results and calculates disease-specific likelihood scores based on sensitivity and specificity
Evaluates HLA genotype data against disease association odds ratios
Matches clinical features against classification criteria for each disease
Ranks candidate diagnoses by integrated score
Identifies the most discriminating tests that would differentiate between the top candidates

This approach is particularly valuable for patients early in their diagnostic odyssey, where clinical features are evolving and may not yet meet full classification criteria for any single disease.

7.3 Overlap Syndrome Detection¶

The system monitors for overlap syndromes -- clinical scenarios where features of multiple autoimmune diseases coexist. The knowledge base includes 9 recognized overlap syndromes:

Mixed Connective Tissue Disease (MCTD): Overlap of SLE, SSc, and polymyositis features with high-titer anti-U1 RNP
Rhupus: RA-SLE overlap with erosive arthritis and lupus serological features
POTS/hEDS/MCAS Triad: Postural orthostatic tachycardia, hypermobile Ehlers-Danlos, and mast cell activation syndrome
SSc-Myositis Overlap: Systemic sclerosis with inflammatory myopathy features (anti-Pm-Scl antibody)
Antisynthetase Syndrome: Myositis, ILD, mechanic's hands, Raynaud's (anti-Jo-1 antibody)

And four additional recognized overlap patterns. When a patient's data triggers features across multiple disease criteria sets, the system generates an overlap syndrome alert with shared pathogenic mechanisms and recommended additional testing.

8. Evidence Quality and Citation Provenance¶

8.1 Citation Relevance Scoring¶

Every evidence item retrieved from Milvus collections receives a relevance score based on cosine similarity. The system classifies citations into three tiers:

Tier	Score Threshold	Display
High	>= 0.80	Highlighted, presented first
Medium	>= 0.60	Standard presentation
Low	< 0.60	Available but de-emphasized

Only items scoring above the global threshold of 0.40 (the SCORE_THRESHOLD parameter) are included in the evidence set. The MAX_EVIDENCE_ITEMS parameter caps the total evidence set at 30 items across all 14 collections, with TOP_K_PER_COLLECTION set to 5 items per collection.

8.2 Citation Format Types¶

The system uses structured citation formats that provide both human-readable identification and machine-parseable provenance:

Citation Format	Example	Domain
`[AutoAb:name]`	[AutoAb:anti-CCP]	Autoantibody reference data
`[HLA:allele]`	[HLA:B*27:05]	HLA-disease association
`[Activity:score_name]`	[Activity:DAS28-CRP]	Disease activity scoring
`[Therapy:drug]`	[Therapy:Adalimumab]	Biologic therapy recommendation
`[Literature:PMID]`	Literature:25603694	Published research (clickable)
`[Trial:NCT_ID]`	Trial:NCT04567890	Clinical trial (clickable)

Literature and clinical trial citations include hyperlinks to PubMed and ClinicalTrials.gov respectively, enabling one-click verification of source material.

8.3 Multi-Format Export (Markdown, FHIR R4, PDF)¶

The system supports three export formats for clinical reports:

Markdown: Structured text with headers, tables, and citation links suitable for clinical documentation and collaboration tools
FHIR R4: HL7 FHIR R4-compliant JSON resources (DiagnosticReport, Observation, MedicationRequest) for integration with standards-based health information systems
PDF: Formatted clinical reports suitable for printing and formal medical record inclusion

All three formats preserve full citation provenance, ensuring that the evidence chain from source data to clinical recommendation is traceable regardless of output format.

9. Hardware Democratization¶

9.1 The Cost Barrier¶

Clinical decision-support systems for autoimmune disease have historically required significant computational infrastructure. Cloud-based NLP pipelines for clinical document processing can cost $10,000-50,000 per month at scale. Commercial clinical intelligence platforms charge $50,000-100,000 per year for institutional licenses. This cost barrier excludes community rheumatology practices, academic medical centers in resource-limited settings, and researchers in developing countries where autoimmune disease burden is increasing but diagnostic infrastructure remains limited.

9.2 The DGX Spark Platform¶

The Precision Autoimmune Intelligence Agent runs on a single NVIDIA DGX Spark, a desktop-class AI computing system. The DGX Spark provides sufficient GPU memory and compute to run the full stack: Milvus vector database, bge-small-en-v1.5 embedding model, PDF document processing, 14-collection parallel search, and the Streamlit UI -- all locally, without cloud connectivity requirements.

Key infrastructure components:

Milvus (port 19530): Vector database with etcd for metadata and MinIO for object storage
FastAPI (port 8532): REST API server with 14 endpoints
Streamlit (port 8531): 10-tab clinical interface
Embedding model: BAAI/bge-small-en-v1.5 (384 dimensions, ~130 MB model size)
LLM: Claude (Anthropic API) for evidence synthesis

9.3 Local vs. Cloud Architecture¶

The system is designed for local-first operation. Patient clinical documents, laboratory results, and genetic data remain on the local DGX Spark and are never transmitted to external servers. The only external API call is to Anthropic's Claude for evidence synthesis, and this call contains only the aggregated evidence context and query -- not raw patient records. This architecture addresses the fundamental tension between AI-powered clinical intelligence and patient data privacy, enabling institutions to deploy sophisticated autoimmune decision support without routing protected health information through cloud infrastructure.

10. Results and Capabilities¶

10.1 Performance Metrics¶

Metric	Value
Vector collections	14
Autoimmune diseases covered	13
Autoantibodies with sensitivity/specificity	24
HLA alleles with odds ratios	22
Biologic therapies with PGx data	22
Disease activity scoring systems	20
Flare biomarker patterns	13
Classification criteria sets (ACR/EULAR)	10
Overlap syndromes	9
Lab test patterns	45
Demo patients with clinical PDFs	9
Automated tests	455 (across 8 test files)
REST API endpoints	14
Streamlit UI tabs	10
Embedding dimensions	384
Score threshold	0.40
Top-K per collection	5
Max evidence items	30
Knowledge base version	2.0.0 (updated 2026-03-10)

10.2 Query Capabilities (Example Queries)¶

The system supports a wide range of clinical queries spanning diagnostic, therapeutic, and analytical domains:

Diagnostic queries: - "Patient has positive ANA (1:640 homogeneous), anti-dsDNA 1:320, low C3/C4, and proteinuria 1.2g/24h. Evaluate for SLE and assess lupus nephritis risk." - "28-year-old female with 3 years of fatigue, joint pain, dry eyes, and recurrent oral ulcers. ANA speckled 1:320, anti-SSA positive. Differential diagnosis?" - "HLA-B*27:05 positive male with chronic lower back pain and morning stiffness >30 minutes. BASDAI 6.2. Assess for ankylosing spondylitis."

Therapeutic queries: - "Patient with RA (DAS28-CRP 5.4) failed methotrexate and adalimumab. CYP3A4 normal metabolizer. What are the next biologic options?" - "SLE patient with SLEDAI-2K of 14 and rising anti-dsDNA. Currently on hydroxychloroquine and mycophenolate. Should we add belimumab?" - "Compare TNF inhibitors for ankylosing spondylitis: adalimumab vs. etanercept vs. infliximab -- efficacy, safety, and PGx considerations."

Analytical queries: - "Analyze Sarah Mitchell's diagnostic timeline. When did the first lupus signs appear and how long until diagnosis?" - "What overlap syndrome features does Rachel Thompson's MCTD presentation include?" - "Which HLA alleles confer protective effects against type 1 diabetes?"

10.3 Clinical Analysis Capabilities¶

The system performs integrated clinical analyses that would traditionally require manual synthesis by an experienced specialist:

Longitudinal biomarker trend analysis: Tracking anti-dsDNA titers, complement levels, inflammatory markers, and organ-specific labs over time to identify disease flare trajectories
Multi-criteria differential diagnosis: Simultaneously evaluating patient data against classification criteria for all 13 diseases to rank diagnostic probabilities
Pharmacogenomically-informed therapy selection: Matching drug mechanisms to patient genotype (HLA, CYP450, FCGR3A) to predict efficacy and adverse event risk
Diagnostic odyssey visualization: Building patient timelines that reveal the often-years-long path from first symptom to correct diagnosis, identifying missed opportunities and diagnostic delays
Cross-disease pattern recognition: Detecting overlap syndrome features that may be missed when each specialty evaluates only its own disease domain

11. Integration with the HCLS AI Factory¶

11.1 The Three-Stage Pipeline¶

The Precision Autoimmune Intelligence Agent operates as a specialized component within the HCLS AI Factory, an end-to-end precision medicine platform with three core pipeline stages:

Genomics Pipeline (genomics-pipeline/): Processes raw FASTQ sequencing data through Parabricks/DeepVariant/BWA-MEM2 to produce annotated VCF files. GPU-accelerated alignment and variant calling reduces processing time from 24-48 hours (CPU) to 120-240 minutes (GPU).
RAG/Chat Pipeline (rag-chat-pipeline/): Milvus-based vector search with Claude AI for variant interpretation, clinical query answering, and evidence synthesis. This is the shared infrastructure layer that all intelligence agents build upon.
Drug Discovery Pipeline (drug-discovery-pipeline/): BioNeMo MolMIM for molecular generation, DiffDock for protein-ligand docking, and RDKit for chemical property analysis. Takes identified therapeutic targets and generates candidate drug molecules.

The autoimmune agent extends the RAG/Chat Pipeline with 14 disease-specific collections and a clinical intelligence layer.

11.2 The Genomic Evidence Bridge¶

The genomic_evidence collection (collection #14, weight 0.02) serves as a read-only bridge between the genomics pipeline and the autoimmune agent. When a patient's genome is processed through the genomics pipeline, disease-relevant variants -- particularly HLA alleles and autoimmune susceptibility loci -- are stored in this shared collection. The autoimmune agent can then incorporate genomic evidence into its clinical analyses without duplicating variant data.

For example, when the genomics pipeline identifies that a patient carries HLA-B*27:05, this information flows into the autoimmune agent's HLA analysis stage, where it is cross-referenced against the 22-allele knowledge base to assess ankylosing spondylitis risk (OR=87.4).

11.3 Cross-Agent Communication¶

The autoimmune agent communicates with other HCLS AI Factory intelligence agents via an event bus:

Biomarker Agent: Shares inflammatory marker trends (CRP, ESR, IL-6) and autoantibody titer trajectories that inform autoimmune disease monitoring
Imaging Agent: Receives imaging findings (joint erosions, sacroiliac inflammation, brain demyelination) that contribute to diagnostic criteria evaluation
Oncology Agent: Coordinates on immune-related adverse events (irAEs) from checkpoint inhibitor therapy, which can trigger autoimmune manifestations

11.4 Platform Consistency¶

The autoimmune agent follows HCLS AI Factory platform conventions:

Shared library: Uses lib/hcls_common/ modules for configuration, Milvus connectivity, LLM integration, and security
Embedding model: Same BAAI/bge-small-en-v1.5 (384 dimensions) used across all agents for vector space consistency
Vector database: Milvus with COSINE metric and IVF_FLAT indexing, shared infrastructure at port 19530
Health monitoring: Integrated with the platform's Prometheus + Grafana monitoring stack
Landing page: Accessible via the Flask hub at port 8080
Docker: Deployable via docker-compose.dgx-spark.yml alongside all other platform services

12. Discussion¶

12.1 Implications for Autoimmune Disease Management¶

The Precision Autoimmune Intelligence Agent demonstrates that multi-collection RAG architectures can effectively address the data fragmentation problem in autoimmune disease management. By organizing clinical knowledge into 14 domain-specific collections with purpose-built schemas, the system preserves the structure and metadata that is lost in monolithic vector databases. A lab result is not just text -- it has a value, a unit, a reference range, a flag, and a collection date. An HLA association is not just a paragraph -- it has an allele, a disease, an odds ratio, and a PMID. By encoding this structure in Milvus collection schemas, the system can perform filtering, sorting, and structured reasoning that pure text-based retrieval cannot.

12.2 The Diagnostic Odyssey Problem¶

The autoimmune diagnostic odyssey -- the average 4+ years and 4+ specialists before correct diagnosis -- represents a systemic failure of clinical information integration, not a failure of individual clinician knowledge. Each specialist may correctly identify findings within their domain but lack visibility into the broader pattern. The rheumatologist sees the joint symptoms but may not know about the dermatologist's psoriasis diagnosis. The neurologist evaluates the demyelinating lesions but may not connect them to the ophthalmologist's uveitis finding that suggests a systemic inflammatory process.

The Precision Autoimmune Intelligence Agent addresses this by providing a unified view across all clinical data domains. When a patient's records are ingested, the system can identify cross-specialty patterns that suggest overlap syndromes or alternative diagnoses. The timeline builder visualizes the diagnostic odyssey, revealing the temporal relationships between symptoms, specialist visits, tests, and diagnoses that are invisible in a traditional chart review.

The demo patient Maya Rodriguez exemplifies this challenge: a 28-year-old woman with the POTS/hEDS/MCAS triad who saw 5+ specialists over multiple years before her symptoms were connected into a coherent diagnostic framework. The system's ability to surface cross-domain patterns in her 28 clinical PDFs demonstrates the practical value of integrated analysis.

12.3 Democratization of Clinical Intelligence¶

By running on a single NVIDIA DGX Spark and releasing under the Apache 2.0 license, the system democratizes access to autoimmune clinical intelligence. A community rheumatology practice, a medical school, or a hospital in a resource-limited setting can deploy the full system without cloud infrastructure costs or commercial license fees. The local-first architecture ensures that patient data remains on-premises, addressing privacy concerns that often prevent adoption of cloud-based clinical AI tools.

12.4 Limitations¶

Several limitations should be acknowledged:

Training data scope: The knowledge base covers 13 autoimmune diseases. Rarer conditions (e.g., Behcet's, ANCA-associated vasculitis, eosinophilic GPA) are partially represented through autoantibody and HLA data but lack full criteria evaluation and flare pattern support.
External LLM dependency: Evidence synthesis requires an API call to Anthropic's Claude. While only aggregated evidence context is transmitted (not raw patient data), this represents an external dependency that may be unacceptable in certain security environments.
Classification criteria as approximation: The programmatic criteria evaluation is a simplification of clinical criteria that often require expert judgment on individual items (e.g., distinguishing "acute cutaneous lupus" from other rashes).
Temporal reasoning: While the system tracks longitudinal biomarker trends, its ability to reason about complex temporal patterns (e.g., the significance of a 6-month gap between symptom onset and first specialist visit) is limited by the LLM's general temporal reasoning capabilities.
Validation scope: The 455 automated tests validate system functionality but do not constitute clinical validation against gold-standard diagnostic outcomes. Prospective clinical studies would be required to assess diagnostic accuracy in real-world settings.
Single-language: The system currently operates in English only, limiting its utility in multilingual clinical environments.

13. Conclusion¶

13.1 Key Contributions¶

This paper presents the Precision Autoimmune Intelligence Agent, a multi-collection RAG system that makes the following contributions:

Multi-collection architecture for autoimmune clinical data: A 14-collection Milvus vector database with domain-specific schemas that preserves the structure, metadata, and clinical semantics of autoimmune data across eight distinct data categories.
Integrated diagnostic engine: Programmatic evaluation of ACR/EULAR classification criteria, multi-disease differential diagnosis generation, and overlap syndrome detection that mirrors the integrative reasoning of an experienced clinical immunologist.
Comprehensive domain knowledge base: Version 2.0.0 of a curated knowledge base containing 22 HLA alleles, 24 autoantibodies, 22 biologic therapies, 20 disease activity scores, 13 flare patterns, 10 classification criteria sets, 9 overlap syndromes, and 45 lab test patterns -- all sourced from peer-reviewed references with PMID provenance.
Clinical document ingestion pipeline: A PDF processing pipeline that extracts, chunks, embeds, and indexes clinical documents while building diagnostic odyssey timelines that visualize the often-years-long path to autoimmune diagnosis.
Pharmacogenomically-informed therapy selection: Integration of HLA genotype, CYP450 metabolism, and FCGR3A polymorphism data into biologic therapy recommendations for 22 drugs across 7 drug classes.
Hardware democratization: The complete system runs on a single NVIDIA DGX Spark, released under the Apache 2.0 license, making autoimmune clinical intelligence accessible to institutions that cannot afford commercial clinical decision-support platforms.

13.2 Future Directions¶

Several directions for future development are planned:

Expanded disease coverage: Adding support for ANCA-associated vasculitis, Behcet's disease, autoimmune hepatitis, primary biliary cholangitis, and other conditions to cover the full spectrum of autoimmune disease.
Clinical validation studies: Prospective evaluation of diagnostic accuracy against expert rheumatologist consensus for classification criteria evaluation, differential diagnosis, and overlap syndrome detection.
Imaging integration: Deeper integration with the Imaging Intelligence Agent for automated interpretation of joint radiographs (erosion scoring), sacroiliac MRI (SPARCC scoring), and brain MRI (lesion counting for MS).
Longitudinal learning: Incorporation of treatment outcome data to enable the system to learn from institutional experience which therapeutic strategies produce the best outcomes for specific patient profiles.
Multilingual support: Extending the system to support clinical documents and queries in Spanish, Mandarin, and other languages to serve diverse patient populations.
FHIR integration depth: Bidirectional FHIR R4 integration enabling the agent to read from and write to institutional EHR systems via SMART on FHIR, moving beyond export-only functionality.

13.3 Closing Remarks¶

Autoimmune diseases represent a class of conditions where the gap between available knowledge and clinical application is particularly wide. The data exists -- in autoantibody databases, HLA association studies, classification criteria publications, pharmacogenomic guidelines, and years of clinical documentation -- but it is fragmented across systems, specialties, and institutions in ways that make integrated analysis prohibitively time-consuming.

The Precision Autoimmune Intelligence Agent demonstrates that a carefully designed multi-collection RAG architecture, augmented with a curated domain knowledge base and a clinical diagnostic engine, can bridge this gap. By unifying 14 data domains into a single query interface, the system enables the kind of cross-domain reasoning that autoimmune diagnosis demands -- reasoning that currently depends on the rare clinician who has both the breadth of knowledge and the time to manually synthesize data across specialties.

The diagnostic odyssey is not an inevitable feature of autoimmune disease. It is a consequence of data fragmentation. This system represents a step toward ending it.

14. References¶

Aringer M, Costenbader K, Daikh D, et al. 2019 European League Against Rheumatism/American College of Rheumatology classification criteria for systemic lupus erythematosus. Ann Rheum Dis. 2019;78(9):1151-1159.
Aletaha D, Neogi T, Silman AJ, et al. 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum. 2010;62(9):2569-2581.
van der Linden S, Valkenburg HA, Cats A. Assessment of the Spondylarthropathies (ASAS). Ann Intern Med. 1984;100(3):417-421.
Shiboski CH, Shiboski SC, Seror R, et al. 2016 American College of Rheumatology/European League Against Rheumatism classification criteria for primary Sjogren's syndrome. Ann Rheum Dis. 2017;76(1):9-16. PMID:27785888.
van den Hoogen F, Khanna D, Fransen J, et al. 2013 classification criteria for systemic sclerosis: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum. 2013;65(11):2737-2747. PMID:24098041.
Thompson AJ, Banwell BL, Barkhof F, et al. Diagnosis of multiple sclerosis: 2017 revisions of the McDonald criteria. Lancet Neurol. 2018;17(2):162-173. PMID:29275977.
Husby S, Koletzko S, Korponay-Szabo IR, et al. European Society for Pediatric Gastroenterology, Hepatology, and Nutrition guidelines for the diagnosis of coeliac disease. J Pediatr Gastroenterol Nutr. 2012;54(1):136-160. PMID:22197856.
Brown MA, Kenna T, Wordsworth BP. Genetics of ankylosing spondylitis -- insights into pathogenesis. Nat Rev Rheumatol. 2016;12(2):81-91. PMID:25603694.
International MHC and Autoimmunity Genetics Network. Mapping of multiple susceptibility variants within the MHC region for 7 immune-mediated diseases. Proc Natl Acad Sci USA. 2009;106(44):18680-18685. PMID:19864127.
Noble JA, Valdes AM. Genetics of the HLA region in the prediction of type 1 diabetes. Curr Diab Rep. 2011;11(6):533-542. PMID:17554300.
International Multiple Sclerosis Genetics Consortium. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature. 2011;476(7359):214-219. PMID:21833088.
Kirino Y, Bertsias G, Ishigatsubo Y, et al. Genome-wide association analysis identifies new susceptibility loci for Behcet's disease and epistasis between HLA-B51 and ERAP1. Nat Genet*. 2013;45(2):202-207. PMID:22704706.
Tsoi LC, Spain SL, Knight J, et al. Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity. Nat Genet. 2012;44(12):1341-1348. PMID:23143594.
Meriggioli MN, Sanders DB. Autoimmune myasthenia gravis: emerging clinical and biological heterogeneity. Lancet Neurol. 2009;8(5):475-490. PMID:16710306.
Prevoo ML, van 't Hof MA, Kuper HH, et al. Modified disease activity scores that include twenty-eight-joint counts. Development and validation in a prospective longitudinal study of patients with rheumatoid arthritis. Arthritis Rheum. 1995;38(1):44-48. PMID:15593215.
Gladman DD, Ibanez D, Urowitz MB. Systemic Lupus Erythematosus Disease Activity Index 2000. J Rheumatol. 2002;29(2):288-291. PMID:12115176.
Garrett S, Jenkinson T, Kennedy LG, et al. A new approach to defining disease status in ankylosing spondylitis: the Bath Ankylosing Spondylitis Disease Activity Index. J Rheumatol. 1994;21(12):2286-2291. PMID:8003055.
Seror R, Ravaud P, Bowman SJ, et al. EULAR Sjogren's Syndrome Disease Activity Index: development of a consensus systemic disease activity index for primary Sjogren's syndrome. Ann Rheum Dis. 2010;69(6):1103-1109. PMID:20032223.
CPIC Guidelines for Pharmacogenomics-Guided Biologic Therapy. Clinical Pharmacogenetics Implementation Consortium. 2024.
American Diabetes Association. Standards of Medical Care in Diabetes -- 2025. Diabetes Care. 2025;48(Suppl 1).