Every meeting AI keeps a fingerprint of your voice. VFTE lets you own, control, and delete it.
VFTE (Voice Fingerprint TEE) is a confidential voice-identity layer that gives people ownership of their own voiceprint. Every meeting assistant Otter, Fireflies, Granola has to store a fingerprint of your voice to recognize you across calls; today they do it silently, opt-out, and are being sued for it. VFTE flips that. Each person's voiceprint is stored inside a Trusted Execution Environment (TEE) the operator can't read, and you log in with the email you already use for meetings to see exactly which workspaces and companies use your voiceprint and control it. Stay anonymous, pause enrollment, or delete it for good. Opt-in by design, enforced inside the enclave. It's a drop-in identity layer any meeting tool, transcription service, or voice app can plug into turning a biometric liability into user-owned trust. Your voice, your keys.
VFTE is a confidential voice-identity layer running on TEE infrastructure, with three things wired together: a capture bot, the VFTE identity service, and an intelligence layer.
Real-time speaker diarization and cross-session voiceprint identification, entirely on CPU — no GPU, because TEEs can't afford them. Torch-free Python/FastAPI: a pure-NumPy fbank front-end feeds a CAM++ 512-d speaker embedder in ONNX Runtime; diarization runs through diart (pyannote segmentation-3.0), hot-swappable with a DiariZen / WavLM engine behind one interface. Identity is open-set — calibrated-cosine matching with a rejection tier and sigmoid-calibrated confidence.
Three things we're proud of:
feed() returns segments immediately.Voiceprints are stored AES-256-encrypted at rest in a workspace-scoped SQLite store, each carrying an owner_email and enroll_allowed/identify_allowed flags enforced on every match (a hot in-memory flag cache means enforcement never has to decrypt). Around it: an append-only usage ledger, standalone Google OAuth hand-rolled in Python stdlib (urllib) to keep the core image lean, a signed session cookie, and a Next.js 16 / React 19 / Tailwind 4 dashboard to stay-anonymous, pause, or forget your voiceprint.
cloud-api.near.ai), so transcripts are generated confidentially and never harvested.The in-person flow records in the browser (getUserMedia/MediaRecorder), then the backend fans the clip to VFTE and NEAR Whisper in parallel (asyncio.gather) and merges by timestamp — diarizer ∥ ASR, not a pipeline. Browser MediaRecorder emits webm/opus that NEAR's Whisper rejects, so we transcode to 16 kHz mono WAV with ffmpeg on the way in. And we shipped the consent schema onto a live AES-encrypted SQLite DB via an in-place ALTER migration (with a .bak safety net).

