Verifiable AI dataset sampling using Pyth Entropy for fair, reproducible training splits
FairAI enables verifiable and reproducible dataset sampling for AI models using Pyth Entropy-based randomness. When datasets are split into train and test sets, randomness can bias model performance or be manipulated — FairAI prevents that by generating provably fair random splits backed by a verifiable entropy source.
Users can upload a dataset through a browser-based frontend, request verifiable randomness (simulated Pyth Entropy), and perform splits whose results are logged with a cryptographic proof. Anyone can later reproduce and verify the exact split using the proof log — ensuring full transparency in AI data pipelines.
This approach helps researchers, auditors, and developers trust their AI data preparation by making randomness itself verifiable and immutable.
FairAI is built with a FastAPI backend and a React + TailwindCSS frontend connected via REST APIs. Backend (Python + FastAPI): Handles dataset upload, entropy simulation, data splitting, and proof generation. Uses pandas and scikit-learn for dataset handling and splitting. Generates cryptographic hashes (SHA256) of sampled indices for proof logs. Simulates randomness fetching via a mock Pyth Entropy API. Frontend (React + Vite + Tailwind): Lets users upload datasets, request entropy, view visualized splits, and verify proof logs. Uses Axios for API calls and Recharts for simple data visualization. Built for quick, clean demos with minimal setup. All components run locally, requiring no blockchain deployment — but are architected to easily plug into real Pyth Entropy feeds in the future. This hack uses verifiable randomness as a trust layer for AI reproducibility, bridging the gap between on-chain verifiability and off-chain data science.

