Shards LLMs across TEEs for private, secure, cost-effective, decentralized AI inferencing.
TeeTee is a decentralized AI platform that splits large language models (LLMs) into smaller shards and hosts each shard in a Trusted Execution Environment (TEE). By distributing model layers across multiple TEEs, TeeTee overcomes memory constraints, bolsters data privacy, and significantly reduces hosting costs. This allows organizations with limited budgets to pool resources while accessing powerful, high-performance models in a verifiably secure and confidential environment.
Key features include:
Layer Splitting & Decentralized Hosting: Spreads large-scale LLMs across multiple TEE nodes, each securely running a subset of model layers.
Privacy & Security: TEEs use hardware-level protection, ensuring that user data and model weights remain encrypted and tamper-proof.
Scalability & Cost Efficiency: Organizations share hosting costs and resources while collectively benefiting from large-scale model performance.
On-Chain Verifiability: TeeTee integrates with smart contracts on Base Sepolia for token-based usage, profit sharing, and attestation, ensuring transparent, auditable proof of secure inference.
Flexible Integration: Users can pay per inference with tokens or self-host a shard to gain direct access without incurring additional costs.
In the long run, TeeTee aims to evolve into a fully decentralized “world computer” for AI by distributing even the largest models across numerous TEE nodes for robust security and high performance. However, our current three-day proof of concept (PoC) demonstrates this model-sharding concept on a smaller scale by splitting a moderate-sized LLM between two TEEs, showcasing that secure multi-node inference, decentralized cost-sharing, and on-chain attestations are indeed feasible.
TeeTee’s foundation relies on Trusted Execution Environments (TEEs), specifically leveraging Phala Network and their Confidential Virtual Machines (CVMs). Each CVM operates as a secure node (“TEE”) that runs a portion of our split model. Below is a breakdown of the major components:
Phala Network TEE Hosting: We deploy Dockerized model shards to Phala Network’s CVMs, each functioning as an isolated TEE. Every input and output is wrapped with on-chain attestations, these can be verified through Phala Network’s TEE Explorer, ensuring that users have cryptographic proof of the secure, untampered execution. We initially experimented with other potential hosting solutions like a partner track called Marline but ultimately settled on Phala due to its robust support for TEE-based deployments and easier to use.
Layer-Splitting for Memory Constraints: Because TEEs can have strict memory caps, we split a moderately sized LLM into two shards, each handled by a separate TEE node. The first TEE processes initial layers, then securely passes partially processed data to the second TEE, which completes the remaining layers.
Docker & Containerization: Our images package all the necessary dependencies, including Python, the model’s layers, and inference logic. This standardized approach simplifies deploying each shard across distinct CVMs.
On-Chain Verifiability: We integrate smart contracts on Base Sepolia for token-based usage, profit sharing, and verifying each inference. By attesting each inference on-chain, anyone can confirm that the computation happened inside a genuine TEE without exposing the raw data or model parameters.
Front-End & User Experience: We use Next.js for a streamlined, server-rendered UI, and Tailwind CSS for fast, consistent styling. Through our web interface, users can pay tokens per inference or self-host a shard for direct access, our contracts record token balances and usage metrics.
Why Phala Network? Phala’s TEE-based architecture is user-friendly, making it simpler to spin up secure environments quickly even within a brief development window. Their built-in on-chain attestations and TEE Explorer gave us an out-of-the-box solution for end-to-end verifiability, which was critical for our proof of concept.
By distributing workload across multiple TEEs, TeeTee showcases that secure model sharding, decentralized cost-sharing, and on-chain proof of execution are fully achievable. While this proof of concept only took three days to develop, it demonstrates a working blueprint for future, large-scale TEE-based AI.