EvoDoc

EvoDoc: ENS-powered AI turning symptoms into disease predictions with .evodoc identities.

EvoDoc

Created At

ETHGlobal New Delhi

Project Description

EvoDoc is a medical AI prototype that predicts diseases from symptoms, powered by ENS-style identities.

Normally, medical AI systems rely on dataset-specific encodings — like assigning “asthma = 0, pneumonia = 1” — but these IDs change every time you retrain or switch datasets. That makes models hard to reproduce, combine, or scale. EvoDoc solves this by introducing .evodoc identities for every disease, symptom, patient, and model.

So instead of “disease=1,” we get stable, human-readable names like asthma.evodoc linked to a permanent hex code and ICD-10 mapping. Symptoms like cough.evodoc, patients like john_28.evodoc, and even models like xgb.v1.evodoc all live in the same registry. This means models stay consistent across training runs, different datasets, or even different hospitals.

We’ve already trained modular models on respiratory and heart diseases (~90% accuracy each). Each model expands the ENS registry without breaking older ones. Predictions come back as ENS names with probabilities, codes, and categories.

In short: EvoDoc makes medical AI modular, auditable, and interoperable by giving it a naming system as robust as ENS — turning messy label encodings into portable, verifiable identities.

How it's Made

How It's Made: EvoDoc ENS Healthcare AI Building EvoDoc was like solving a complex puzzle where each piece had to fit perfectly with ENS at the center. Here's the real story of how we built this ENS-first medical AI system. The Core Challenge: Making ENS the Heart, Not an Add-On The biggest technical challenge wasn't training the AI models—it was completely rethinking how machine learning systems handle identity. Traditional ML uses arbitrary integer labels (0, 1, 2, 3...) that break everything when you want to add new categories later. We had to replace this entire paradigm with ENS-based stable identities. Technology Stack and Architecture Backend Foundation: We built everything in Python using pandas and NumPy for data manipulation, scikit-learn for machine learning, and joblib for model persistence. The choice of Logistic Regression wasn't accidental-it handles sparse 377-dimensional symptom vectors beautifully and trains fast enough for live demos. ENS Registry Engine: This was the most critical piece. We created a custom registry system that mimics ENS behavior locally, storing entities as JSON with monotonic counters that generate stable hex codes. Each medical entity gets a canonical .evodoc name, a permanent hex address (like 0x00000B), and rich metadata including ICD-10 codes. Incremental Training Pipeline: The breakthrough was designing a training system where models never retrain existing knowledge. When you train the second model, it only registers new diseases in the ENS registry while preserving all existing hex codes. This lets us combine models that were trained on completely different datasets. The Hacky Parts That Actually Work Label Encoder Replacement: We completely bypassed scikit-learn's LabelEncoder and built our own ENS-aware encoding system. Instead of arbitrary integers, diseases map to stable ENS IDs that persist across training sessions. This was surprisingly tricky because we had to handle rare classes, stratified splits, and model combination all while maintaining ID stability. Model Combination Magic: Combining models trained on different disease subsets required some creative engineering. We pad probability vectors to a common length and use ENS IDs to align outputs. The CombinedModel class averages predictions from specialized models, but only works because ENS provides the stable identity layer. Interactive Demo Flow: We designed the entire system as separate CLI scripts that judges can run step-by-step. Each script does one thing well: split.py, setup.py, train1.py, train2.py, combine.py, test.py. This wasn't just for demo purposes-it actually makes the ENS integration more transparent. Data Engineering Decisions Dataset Splitting Strategy: We manually curated disease groups (respiratory, cardiovascular, diabetes, neurological) to create meaningful splits for incremental training. This required analyzing the Kaggle dataset and grouping diseases by medical keywords, then balancing sample sizes for fair comparison. Sparse Feature Handling: With 377 binary symptom features, most patient vectors are extremely sparse. We optimized for this by keeping data as pandas DataFrames throughout the pipeline to preserve feature names and avoid sklearn warnings about unnamed arrays. Rare Class Filtering: We implemented automatic filtering of disease classes with fewer than 5 samples to prevent stratified split failures. This was essential for stable training across different dataset subsets. ENS Integration Deep Dive Stable Hex Code Generation: Each ENS entity gets a permanent hex code generated from a monotonic counter. These codes replace traditional ML label encodings and enable cross-model compatibility. The format 0x000001, 0x000002, etc. mimics blockchain addresses. Metadata Architecture: Every .evodoc entity carries rich metadata-diseases have ICD-10 codes, symptoms have descriptions, patients have profile pointers. This metadata travels with the ENS name, making the system self-documenting. Registry Persistence: The ENS registry persists as data/ens.json with atomic updates during training. We designed it to be easily portable to on-chain ENS with IPFS storage for larger metadata. Performance Optimizations Training Configuration: We tuned Logistic Regression with saga solver, 200-1000 max iterations, and parallel processing (n_jobs=-1). For larger datasets, we automatically reduce iterations and increase regularization to maintain training speed. Memory Management: Models and registries are loaded on-demand and cached appropriately. The combined model only loads individual models when needed, keeping memory footprint reasonable. Demo Responsiveness: The interactive patient flow suggests symptoms by category to help users pick inputs that align with trained models, ensuring high-confidence predictions during demos. The AI-Assisted Development Parts We used AI assistance strategically for research, code generation, and evaluation scripting. AI helped with canonicalizing medical names, generating stable hex codes, and creating evaluation frameworks for Top-K metrics. However, all the core ENS integration logic, incremental training architecture, and model combination strategies were human-designed. What Makes This Hackathon-Ready Live Demo Flow: Every step can be executed live in front of judges. Train model 1, show the ENS registry, train model 2 on a different dataset, combine without retraining, then run predictions with ENS-resolved outputs. Verifiable Claims: The system generates concrete metrics-400+ ENS entities, 85% accuracy, zero retraining required. Judges can verify these claims by running the code themselves. ENS-First Design: This isn't traditional ML with ENS bolted on. ENS stable identities are what enable incremental training, model combination, and verifiable provenance. Remove ENS and the core innovation disappears. The Technical Debt We're Proud Of Local ENS Registry: We implemented ENS behavior locally rather than deploying on-chain for the hackathon. This was the right trade-off-it demonstrates the concept perfectly while keeping the demo reliable and fast. Hardcoded Medical Mappings: Our medicine lookup and ICD-10 assignments are demo-scale. In production, these would integrate with RxNorm and SIDER APIs, but for the hackathon, curated mappings tell the story better. CLI-First UX: The command-line interface isn't pretty, but it's perfect for demonstrating the step-by-step ENS integration to technical judges who want to see exactly what's happening. Building EvoDoc taught us that ENS isn't just about naming-it's about creating stable identity layers that enable entirely new architectures. The incremental training capability only exists because ENS provides permanent, human-readable identities that persist across training sessions. That's the real innovation here.

background image mobile

Join the mailing list

Get the latest news and updates

EvoDoc | ETHGlobal