Tomu lets users pay-as-they-go for LLM apps via x402 micropayments - no fixed subscription fees.
Tomu is a Web3 pay-as-you-go access layer for AI services (for example, AI image generation today and potentially live video or other real-time streams in the future). Instead of locking users into monthly subscriptions, Tomu charges per use / per time slice using x402-ws-stream—the WebSocket Streaming Payments extension of x402—running on Polygon.
What Tomu does
Metered access to AI: Users can request an AI action (e.g., generate an image). Access is granted only while prepaid funds exist for a small “slice” of time or for a single request.
Micropayments over WebSockets: All messaging—content, payment control, and (recommended) blockchain RPC—travels over WebSockets (wss) to keep latency low and UX predictable.
Agent-run point of sale: Tomu’s backend agent automatically challenges, verifies, settles (optional), pauses/resumes, and logs each slice, demonstrating agentic payments in production.
How the flow works (end-to-end)
User asks Tomu to generate an image (or other AI output).
Tomu issues a pay requirement on the same WS connection using stream.require (defines pricePerUnit, unitSeconds, TTL, asset, payTo, etc.).
The user’s wallet responds with an x402 “exact” PaymentPayload (EIP-3009/USDC-style) via stream.pay, containing from, to, value, validAfter, validBefore, and a fresh nonce.
Tomu calls the Facilitator over WS: x402.verify (and optionally x402.settle for on-chain-per-slice).
If valid, Tomu streams the AI output for the prepaid window and sends stream.accept { prepaidUntilMs }.
Before TTL, Tomu requires the next slice; if it doesn’t arrive in time, Tomu pauses (stream.pause) and later resumes once paid.
All payments and results are tracked for analytics, reconciliation, and refunds/disputes when needed.
Why this matters
Removes subscription friction: New users can try AI services without a monthly commitment.
Predictable, low-latency UX: Sliced prepay over WS aligns costs with real-time generation.
Privacy & trust-minimization: Most data stays off-chain; only settlement proofs/receipts are posted on-chain.
Agentic Payments fit: Shows an autonomous backend agent running a micropayment point of sale with verification, pausing, settlement, reconciliation, and refund paths.
Infrastructure with Fluence
To support this architecture, the modified LiteLLM service, Convex backend, x402 Facilitator, and Nginx reverse proxy are all deployed on a virtual machine provisioned through Fluence. Fluence allowed the team to spin up and orchestrate a full Web3 + AI environment quickly, so payments, model serving and the facilitator node run seamlessly inside a single managed VM.
Stack & components
Payments / Protocol
x402-ws-stream (EVM-only) on Polygon – USDC micropayments using EIP-3009 “exact” scheme.
Single WebSocket multiplexes content + payment control:
Core methods: stream.init, stream.require, stream.pay, stream.accept/stream.reject, stream.keepalive, stream.pause/resume/end.
Facilitator methods: x402.supported, x402.verify, x402.settle.
Slice accounting: unitSeconds (10–120 s recommended), TTL trigger ~30–70 % into the unit, clock-skew buffer ≥ 5 s.
Security constraints: Unique nonce per slice, tight validAfter/validBefore (slice end + 5–10 s), automatic pause at TTL if unpaid.
Backend (Agent)
Express.js for routing and WebSocket endpoints, combined with Convex (serverless DB + Chef template) for state management.
Facilitator WS mirrors the HTTP API: verifies EIP-3009 payloads and optionally settles on-chain per slice.
Buyer/Seller WS loops (in progress): auto-require next slice, verify payments, pause/resume on TTL, emit prepaidUntilMs heartbeats, and record usage & receipts for reconciliation.
Frontend
Vite + React UI.
RainbowKit for wallet connection; ENS names/avatars for a recognizable, user-friendly experience.
AI models
OpenRouter aggregator (e.g., Stable Diffusion XL, DALL·E 3, Claude 3 Haiku).
The backend streams generation updates only while the payment slice is valid.
Infrastructure with Fluence
We provisioned a dedicated VM on Fluence to host the entire backend stack.
On this Fluence VM we deployed:
a modified LiteLLM service to handle AI model requests,
the Convex backend,
the x402 Facilitator service,
and Nginx as reverse proxy and load balancer.
Using Fluence allowed us to spin up a secure, fully integrated Web3 + AI environment quickly, with all components co-located for low-latency streaming.
Settlement modes (trade-offs)
On-chain per slice (trustless): call x402.settle after each successful verify. Strong guarantees; more on-chain transactions (tune unitSeconds to control frequency).
Deferred batch settlement (cheaper, partially trusted): verify each slice, record usage off-chain, and settle periodically.
What was “hacky” or notable
WS-only everything: Content, payments, and EVM RPC all run over wss:// to minimize latency and avoid header gymnastics—exactly the UX x402-ws-stream targets.
Tight windowing & pause semantics: We implemented strict validBefore/TTL behavior to ensure billing lines up precisely with compute time; the stream cannot overrun its prepaid horizon.
Multiplexed control/data: A single connection carries both user content and payment control, simplifying infrastructure and improving resilience.
Partner technology benefits
x402 on Polygon: fast, inexpensive settlement with standardized EIP-3009 payloads.
OpenRouter: instant access to multiple AI models through one API.
RainbowKit + ENS: smooth wallet onboarding and human-readable identities.
Convex: quick serverless state, logs, and reconciliation tables without managing infrastructure.
Fluence: one-click VM provisioning that let us deploy LiteLLM, Convex, the x402 Facilitator, and Nginx in a single secure environment—critical for delivering a working demo within hackathon time constraints.
This architecture demonstrates an agent-run point of sale where automated request → payment verification → settlement → content delivery happens seamlessly and in real time.