SonicIP

Marketplace for verified voice data—creators earn, AI trains on authentic voices.

SonicIP

Created At

ETHGlobal New Delhi

Project Description

SonicIPChain — Detailed Project Description

Elevator summary SonicIPChain is a demographic-verified voice-data marketplace that lets individuals securely monetize their voice recordings and lets companies buy high-quality, verified audio datasets for AI training. Contributors upload voice samples, verify identity and demographics, and mint a tokenized record of their contribution. Buyers filter by demographic attributes (age, gender, region, accent, language, etc.), purchase curated WAV datasets, and receive verified, labeled audio ready for model training or TTS fine-tuning (e.g., 11Labs).


1) What the product does (detailed)

  • Collects voice recordings from real people (contributors).
  • Verifies contributor identity and demographic attributes (age range, gender, region, language, accent).
  • Fingerprints each recording to produce a unique voice hash that detects duplicates and links samples to a single contributor identity.
  • Encrypts & stores audio client-side using AES-256, pins encrypted content to Lighthouse IPFS, and stores immutable metadata & provenance on Flow blockchain.
  • Tokenizes contributions as NFTs (or access tokens) to record ownership, consent, usage permissions, and royalty rules.
  • Markets & sells curated datasets via a marketplace UI and API where buyers apply demographic filters and purchase WAV datasets for AI training.
  • Delivers datasets with metadata, integrity proofs, and signed licensing to enable legal, auditable use in model training and TTS pipelines.

2) Core user journeys

Contributor (voice owner)

  1. Sign up → consent to terms and licensing.
  2. Complete identity verification (govt ID + liveness + self-reported demographics).
  3. Record audio via the app or upload pre-recorded WAV files.
  4. Client encrypts audio (AES-256) and computes voice fingerprint + biometric hash.
  5. Encrypted audio stored on IPFS; metadata + verification status + fingerprint minted as an AudioNFT on Flow.
  6. Contributor sets license/price; earns revenue and royalties when datasets using their voice are sold.

Buyer (AI developer / company)

  1. Sign up & accept data-use terms (enterprise agreements where applicable).
  2. Use demographic filters (e.g., male, 21–25, Arabic accent, Indian English) to build a query.
  3. View sample clips + contributor verification badges and dataset quality metrics.
  4. Purchase dataset or subscribe to API access.
  5. Receive WAV files, metadata (demographics + fingerprint + consent record), and cryptographic proof of integrity.
  6. Use the dataset to fine-tune speech or TTS models (e.g., 11Labs) under the supplied license.

3) Technical architecture (high level)

  • Client (Web/Mobile): recording UI, local AES-256 encryption, identity capture (photo/ID/liveness), local fingerprint generation, wallet integration for Flow transactions.
  • Backend API: handles onboarding, verification workflows, dataset composition, payment, marketplace logic, and metadata indexing.
  • Lighthouse IPFS: stores encrypted audio blobs and serves pinned content.
  • Flow Blockchain (Cadence smart contracts): stores immutable metadata, minting of AudioNFTs, royalty rules, verification flags, and access-control tokens.
  • Voice Fingerprinting Service: extracts acoustic embeddings (e.g., MFCCs + neural embeddings) and generates a compact voice fingerprint + cryptographic hash for deduplication & provenance.
  • Identity Verification Provider: integrates KYC/liveness provider(s) for document checks and biometric matching.
  • Payments & Licensing: fiat/crypto payment gateway, automated licensing agreements, and audit trail.
  • Enterprise API / SDK: for bulk dataset requests and in-pipeline integration with training environments.

4) Identity verification & anti-fraud

  • Multi-factor verification: government ID check + selfie liveness + optional third-party KYC for higher tiers.
  • Demographic validation: automated checks (document vs. self-declared) + manual review for flagged cases.
  • Voice liveness & anti-spoofing: analyze acoustic features & challenge-response recordings to prevent replay attacks.
  • Anti-sybil / duplicate detection: voice fingerprinting prevents the same person from registering multiple fake identities; reputation scoring surfaces trustworthy contributors.
  • Audit trail: each verification event recorded on blockchain (or hashed on-chain) for transparency.

5) Voice fingerprinting & provenance

  • Fingerprinting approach: compute robust acoustic embeddings (neural embedding + spectrogram features) then derive a deterministic cryptographic hash of the embedding to create a unique voice fingerprint ID.
  • Purpose: prevent duplicate contributions, link recordings to the verified identity, and provide tamper detection.
  • Provenance: fingerprint + verification status + consent metadata are stored on-chain (or hashed on-chain) as immutable proof of origin when dataset purchases occur.

6) Storage, encryption & access control

  • Client-side AES-256 encryption: audio is encrypted before leaving the user's device; only authorized buyers receive decryption keys if licensing permits.
  • IPFS (Lighthouse) for persistence: encrypted blobs are pinned to decentralized storage to prevent single-point loss.
  • Key management: contributors control keys for initial upload; platform manages ephemeral access keys for buyers under license. Consider hardware security modules (HSMs) for enterprise-grade key custody.
  • Revocation & persistent records: while encrypted files remain on IPFS (immutable), access can be revoked by not re-issuing decryption keys and by updating smart contract license flags. On-chain provenance remains as immutable evidence of prior consent and sales.

7) Marketplace & dataset delivery

  • Filter builder UI: choose demographics, language, accent, recording length, audio quality, and labeling needs.
  • Quality controls: automatic quality checks (SNR, clipping, metadata completeness), optional human curation.
  • Dataset packaging: buyer receives WAV files organized and labeled with contributor metadata, voice fingerprint, consent timestamp, license terms, and a cryptographic integrity signature.
  • Licensing: clear terms (commercial, non-commercial, research) tied to NFTs or signed certificates; royalties and usage reporting enforced via smart contracts where feasible.

8) Integrations & downstream usage

  • TTS & voice cloning pipelines: datasets formatted for direct use in TTS fine-tuning (11Labs, open-source toolkits).
  • APIs / SDKs: for programmatic dataset queries, streaming sample pulls, and enterprise ingestion.
  • Analytics: dataset quality scores, demographic coverage heatmaps, contributor payout dashboards.

9) Business model & monetization

  • Platform transaction fee: e.g., 20% cut on dataset sales.
  • Verification fees: nominal charge per advanced verification (premium KYC, notarized IDs).
  • Enterprise subscriptions: API access, custom dataset curation, and SLAs.
  • Storage & pinning: premium paid pinning and archival services.
  • Analytics & tooling: paid dashboard for dataset insights and usage tracking.
  • Royalties: contributors earn ongoing revenue through programmable royalties on secondary uses enforced by smart contracts.

10) Use cases & customers

  • Voice assistants & IVR that need local accents and age-appropriate tones.
  • TTS & voice cloning companies seeking authentic, demographic-specific voices.
  • Gaming & entertainment for character voices and regional dubbing.
  • Healthcare & accessibility for accurate local-language prompts for seniors.
  • Advertising & localization for culturally authentic voiceovers.
  • Academic / research datasets for sociolinguistics and speech science.

11) Compliance, ethics & privacy

  • Explicit informed consent: contributors must agree to clear license terms and revenue split before onboarding.
  • Age restrictions: strict handling for minors; require guardian consent or disallow minor participation per jurisdiction.
  • Data protection: follow GDPR/PDPA principles—data minimization, purpose limitation, deletion requests for personal metadata where possible. (Note: immutable blockchain records remain; ensure consent records and hashes comply with legal guidance.)
  • Ethical use controls: buyer vetting and disallowed use categories (deeply harmful content, non-consensual impersonation) enforced in TOS and via technical controls.
  • Transparency: contributors can view who bought/used their voice and payout history.

12) Roadmap & milestones (example)

  • MVP: identity verification, fingerprinting, IPFS storage, Flow NFT minting (completed).
  • Near term: marketplace UI, buyer dataset composer, enterprise API (weeks → months).
  • Next: automated quality labeling, compliance tooling, regional expansion, localized contributor acquisition campaigns.
  • Long term: protocol-level licensing, exchangeable voice credits, partnerships with TTS platforms.

13) Key metrics to track

  • Contributors onboarded / verified
  • Active contributors per demographic bucket (age/gender/language/region)
  • Datasets sold / average price per minute of audio
  • Revenue (platform fees) and contributor payouts
  • Buyer retention & enterprise contracts
  • Dataset quality scores & dispute rate

14) Risks & mitigations

  • Privacy/regulatory risk: mitigate with robust consent, legal counsel, and country-specific compliance flows.
  • Fraud / spoofing: multi-factor verification, liveness checks, fingerprinting, manual review for edge cases.
  • Data misuse: strict TOS + buyer vetting + technical access controls + auditing.
  • Market adoption: differentiate on verified demographics and compensation fairness; partner with early adopter AI firms.

15) Example short pitch (1–2 paragraphs)

SonicIPChain is a demographic-verified voice data marketplace that lets people monetize their voice recordings while giving AI companies instant access to high-quality, verified audio datasets engineered for real-world performance. Contributors record or upload audio, pass identity/demographic verification, and mint tokenized evidence of consent and provenance. Buyers select precise demographic filters, purchase WAV datasets packaged with signatures and licensing, and use them to train or fine-tune voice models—delivering authentic, locally resonant voices instead of generic, one-size-fits-all outputs.

How it's Made

We built SonicIPChain as a privacy-first voice-data marketplace using modern web, blockchain, and AI tools.

  • Frontend: React/Next.js web app + React Native recorder; used Web Audio API and Web Crypto API for in-browser recording & AES-256 encryption.
  • Backend: Node.js/Express APIs with Python micro-service for voice fingerprinting (MFCC + ECAPA embeddings) and audio quality scoring.
  • Blockchain: Flow + Cadence smart contracts to mint AudioNFTs storing fingerprint hash, demographic verification, and royalty logic.
  • Storage: Encrypted WAV files pinned on Lighthouse IPFS; keys shared only with licensed buyers.
  • Verification: Integrated KYC & liveness APIs to confirm age, gender, region, and prevent sybil attacks.
  • Marketplace: Buyer dashboard to filter contributors by demographics and purchase curated WAV datasets; smart-contract event triggers access and payment.

Hacky wins:

  • Ran fingerprinting model in a Colab GPU during hackathon for speed.
  • Linked IPFS hashes to NFTs for transparent provenance.
  • Built a simple SQL-based demographic filter to assemble datasets instantly.

➡️ This stack let us go from voice upload → verified NFT → buyer-ready WAV dataset within the hackathon timeframe.

background image mobile

Join the mailing list

Get the latest news and updates

SonicIP | ETHGlobal