Vox

Voice agent that turns speech into multi-step crypto transactions with LI.FI Composer

Project Description

Vox is a voice-activated smart wallet for macOS that turns a single spoken sentence into a single self-custodial onchain transaction, powered by LI.FI Composer.

The Problem

Doing anything non-trivial onchain today is a slog. "Move half my USDC into a yield vault and send the rest to a friend" isn't one action — it's a string of disconnected, error-prone steps: approve, swap, bridge, wait, approve again, deposit, then a separate transfer. Each step is its own signature, its own gas decision, its own chance to fat-finger an address or get stuck mid-flow with funds bridged but not deposited. The mental overhead of remembering which chain, which token, and which protocol means most people never use the full power of their wallet. It's a command line with no commands.

The Solution

Vox lets you just say what you want. You speak a command — "move half my Arbitrum USDC into a Morpho vault and convert the rest to ETH" — and Vox transcribes it, plans it, and executes it as one spoken sentence → one signature.

The key design decision is that the language model never touches your funds. Claude only ever produces an intent: a zod-validated Flow object describing the source, the destination legs, and the proportions. That Flow drives both Claude's structured output and the UI's typed preview. LI.FI Composer is the execution layer that compiles the intent into a real, quoted, atomic transaction — every step lands or the whole thing reverts. Nothing moves until you see the resolved steps and confirm. You talk to your wallet, an LLM plans the workflow, and Composer executes it.

Key Features

Voice-to-transaction pipeline: Speak naturally; xAI Grok transcribes, Claude plans, LI.FI executes — no forms, no chain pickers, no manual bridging. Atomic multi-step flows: Split a token across legs — zap one slice into a Morpho/Aave vault, swap another to ETH, send a third to a named contact — compiled into a single transaction that either fully lands or fully reverts. Named contacts & N-way dispersal: "Disperse 300 USDC across my wallets" or "send 50 to Trading" resolve named recipients into one atomic same-chain dispersal. Cross-chain in one signature: Bridge + swap + deposit compose into a single LI.FI route, with FASTEST-bridge preference and automatic retry on transient route failures. Propose-then-confirm safety model: proposeFlow returns a fully-quoted preview and never moves funds; execution runs only after explicit confirmation, streaming per-step progress live over SSE. Live Flow preview card: The UI renders the resolved multi-step flow before you commit, then lights up each step in real time as it executes. Portfolio & yield discovery: Vox surfaces live per-wallet balances and discovers existing Morpho yield positions with APY.

Architecture

Vox runs as three pieces. A Tauri 2 shell (Rust) hosts a React webview and a Node sidecar — a companion backend process spawned alongside the app. The backend SDKs (LI.FI, Claude, viem, Grok) are JavaScript, so they run in the sidecar bound to 127.0.0.1; the webview talks to it through a typed vox client over HTTP + SSE. The dev signing key and all API keys live in the sidecar and never reach the webview — the renderer holds nothing sensitive. User intent flows one direction: mic → STT → Claude Flow → Composer propose (quote + preview) → confirm → execute → per-step progress.

In Essence

Vox turns your voice into your wallet's command line. Instead of stitching together approvals, swaps, bridges, and deposits by hand, you say one sentence and the protocol handles the rest — atomically, self-custodially, and with a live preview before anything signs.

How it's Made

Vox is a full-stack TypeScript application built as a Tauri 2 desktop app: a thin Rust shell hosting a React webview and a Node sidecar backend, connected over HTTP + Server-Sent Events. The split exists for one reason — the SDKs we depend on (LI.FI, Anthropic, viem) are JavaScript with no Rust equivalents, so the backend runs as a sidecar process while Tauri stays a lifecycle-and-window shell. The sidecar binds only to 127.0.0.1:4317, and Tauri's CSP allow-lists connect-src to that port, so the dev signing key and every API key stay in the Node process and the webview never holds a secret.

The core idea is that the language model produces an intent, never a transaction. When you speak, the renderer records mic audio via MediaRecorder (with Web Audio level metering and silence auto-stop) and POSTs the raw bytes to the sidecar. xAI Grok STT transcribes them, then Anthropic Claude turns the transcript — plus live balances, known vaults, and named contacts — into a zod-validated Flow. The schema is a discriminated union (compose | unknown) that drives both Claude's structured output and the UI's typed view, so a malformed plan can't reach the execution layer. A compose Flow carries a source (chain, asset, amount as absolute/fraction/max), one or more destination legs (each with an action — deposit / swap / hold / transfer — optional protocol, recipient, and share), and a human-readable outline for an instant preview before LI.FI resolves anything.

A tri-engine dispatcher in the sidecar then routes the Flow to the right execution path based on its shape:

Atomic same-chain ERC-20 flows go through @lifi/composer-sdk (gated by canComposeFlow()). We build a Composer Flow from Claude's intent and compile it into a single atomic transaction — split a token, zap one leg into a vault, transfer another — where every step lands or the whole thing reverts. Cross-chain flows go through @lifi/sdk v4: setting toToken to a vault token composes swap + bridge + deposit into one route. Same-chain native-token transfers take a direct sendTransaction path (gated by isNativeTransferFlow()). Partner Technologies:

LI.FI (Composer + SDK v4): LI.FI is the cross-chain execution engine and the core of the project. We build an SDKClient with createClient(...) and an EthereumProvider from @lifi/sdk-provider-ethereum, wiring in our viem wallet via getWalletClient and switchChain so LI.FI signs directly through the sidecar's key. For cross-chain flows the pipeline is getQuote() → convertQuoteToRoute() → executeRoute() with an update hook that fires on every state change; we map LI.FI's internal process types into the user-facing step names that light up in the live Flow preview card. Setting toToken to a vault token is what turns a transfer into a Composer flow — that single substitution composes a swap, a bridge, and a deposit into one route. We prefer the FASTEST bridge and retry transient route failures (424s / timeouts) so a flaky upstream doesn't kill a demo. For atomic same-chain flows we use @lifi/composer-sdk directly, building a flow with its guards/materialisers/resources and compiling it to a single signed transaction. LI.FI handles all the route-finding, approval management, and bridge execution — we feed it an intent and stream the output.

Anthropic Claude (claude-opus-4-8): Claude is the planner. We give it the transcript, live per-wallet balances, the vault registry, and the contact list, and ask for a zod-validated Flow as structured output. Because the same zod schema both constrains Claude and types the UI, the model physically can't emit a shape the renderer doesn't understand; anything it can't map falls back to an unknown Flow carrying a clarifying question instead of a silent failure.

xAI Grok (STT): Grok handles speech-to-text. The renderer ships audio bytes to the sidecar, which auto-detects the format from the file header (browser default is webm/opus, no transcoding) and posts them to Grok's /v1/stt endpoint — keeping the API key server-side.

The split-and-act logic walks each destination leg and resolves its slice of the source amount from the allocation share, then routes per-leg: a same-chain ERC-20 transfer goes direct via viem writeContract, a same-chain swap or vault deposit goes through the atomic Composer, and anything cross-chain goes through a LI.FI route.

The notably hacky-but-necessary bit is the third dispatch path. Native tokens (ETH) have no ERC-20 allowance slot for the atomic Composer to discover, and a plain "send 0.1 ETH to Trading" can't be quoted as a LI.FI route since there's nothing to swap or bridge. Early on, native sends 422'd against both engines. So we built a dedicated native-transfer path that detects same-chain, native-source, all-transfer flows and executes them as raw sendTransaction calls — mirroring the propose/execute/SSE shape of the other two engines so the UI can't tell the difference. We also reserve a buffered amount of gas on native max/all sends ("send all my ETH") so the transaction doesn't fail by trying to send the exact balance including the gas it needs to pay for itself.

The wallet layer is a viem account loaded from an env key (a throwaway pre-funded EOA for the hackathon), with multiple managed accounts where any can sign — a Flow can name its source wallet ("from Trading") and the dispatcher sets the active signer before quoting. The frontend is Tailwind v4 with shadcn-style primitives, Zustand for the idle → … → ready → executing → done session phase machine fed by the SSE event stream, TanStack Query for balances and positions, and Framer Motion for the per-step animation on the Flow preview card. The renderer ships with a mock engine by default so the UI runs and demos with no keys; VITE_VOX_MOCK=false points it at the real sidecar, and proposeFlowFromText("…") drives the entire pipeline by typing for environments with no mic.

The whole system is designed so the spoken intent is just a request, the zod Flow is the contract, LI.FI Composer is the only thing that actually moves funds, and nothing executes until you've seen the resolved steps and confirmed — fully self-custodial, with the keys never leaving the sidecar.

background image mobile

Join the mailing list

Get the latest news and updates