Have you seen the "Hawk Tuah" meme? From just a few seconds of viral content, the woman in the clip turned her unexpected fame into a growing business, with podcasts, merch, and more. This is a perfect example of the rise of user-generated content (UGC) and how some are smart in monetizing their fame.
Our project helps creators get paid for their voice. Specifically, we enable them to lend their voice to text-to-speech (TTS) models, and earn 90% of profits.
TTS technology is booming. Startups like ElevenLabs and tech giants like OpenAI are building powerful tools, but most models still rely on a limited number of voices. There’s a growing demand for variety and authenticity in TTS audio.
We make it easy for anyone to contribute their voice and earn money from it. Users record a short voice clip, which captures their unique vocal signature. Once added to the system, whenever their voice is used in a TTS model, they receive 90% of the revenue generated. On the other hand, speech generation users pay for the service like how they pay for other AI models, on a credit basis given on usage.
The platform is Web2 in design but built on Web3 infrastructure to ensure fair payouts and transparent ownership. Users sign up with just an email address, and a crypto wallet is created for them automatically in the background. Voice users (i.e., TTS model consumers) buy credits and are charged based on the length of the text they want to have orated.
In short: we’re building the economic engine behind the next generation of voice talent, powered by real people who will be fairly compensated.
This project is meant to look completely Web2 native and have as minimal exposure to blockchain rails for users. However, blockchain is used in the backend to guarantee the economic incentive model, ensuring that voice talent will always be paid 90% of the profits for their voice usage.
Website:
The website was built primarily with v0.dev, and images were crafted with OpenAI Sora.
- The structure is React + Typescript based
- A Next.js App router is used to enhance flow through the website
- Tailwind CSS is used for style
- Supabase SQL server is used while figuring out data storage (with Walrus later on, and account details with Privy)
Text-to-speech (TTS) model:
- I based it off the open-source project, OpenVoice (https://github.com/myshell-ai/OpenVoice)
- This model allows users to provide a mp3 clip of a voice and use that to generate a TTS model.
- I made it working locally, and have isolated out the portions of the project that deal with extracting voice signatures and using the voice signature to generate speech.
- The model has a base TTS model that is augmented by voice signatures generated from mp3 clips, and the voice signatures are stored as tensors
- Unfortunately, to deploy this on chain (intention was to use Oasis Protocol), the model needed to be dockerized. Debugging the dockerization process was looking to be too time consuming, so instead, I've used OpenAI's TTS model to demonstrate the concept. However, I still included the openvoice model because it is very cool and I spent a lot of time to get it to work locally
- The OpenAI TTS model used is "tts-1". The voice input, which is text, will be used as a way to demonstrate how in the future that it can be replaced with model parameter files, (.pth).
<Privy> Account Abstraction:
- Users sign up to the app through Privy and are automatically generated a EVM wallet
- This will give them an address to use for payments and collecting fees
- It's useful because they don't have to know anything about crypto to sign up and it feels like a very familiar experience to Web2
- I wanted to use a smart wallet but it doesn't look like there are smart contract provider for it (as well as bundlers and paymasters) on Oasis Protocol, the coordination layer for the protocol. Therefore, I just simulated it with faucet tokens to the accounts used to test how it works.
<Oasis Protocol> AI Model Inference Infrastructure:
- This is the blockchain that will hold the TTS inference service and contains the smart contract to guarantee fair payment
- I'm using ROFL to have a encrypted TEE for the inference service. It's important to me to not leak out voice signatures or generated content. While blockchain infrastructure is important to guarantee the economic incentive model, leaked data would make it difficult to build trust with users. Hence, having a TEE was very important
- The dockerization of the openvoice tts model didn't work, but the dockerization of openai's model did work. I then used FastAPI to have it be a persistent backend service.
- We had to deploy this rofl app on a special node in order to have exposed ports for the backend service. However, unknown issues led this to be unresolved.
<Oasis Protocol> Smart Contract:
- The economic incentive model is guaranteed by blockchain, so it's important to me to deploy a contract that shows that
- I have a TokenDistributor contract deployed (0x9C0b235e9FE17d9D8269b89f90DdE6453C13A81C - testnet). It has the function, distributeTokens.
- Whenever a speech generation user is looking to use the tts-model, the distributeTokens function is called to:
- generate the speech from the text input
- execute the distributeTokens function to send money to the voice owner, with 10% of proceeds going back to the protocol owner
- I have it hard coded as 1 TEST token at the moment, but this is meant to be variable in the future, like how OpenAI charges on a token basis given the length of the input
<Walrus> Model Storage:
- In the future, model parameters are meant to be stored on Walrus. That's to minimize centralization by the project, and to ensure end users maintain ownership of their voice
- Intention is for pth files to be generated when voice talent sign up and input their voice file, and that the pth files are encrypted then stored on Walrus
- When a tts model is used, the ROFL app uses the HTTP API to get the model parameters from the storage in Walrus, with the blobid stored in a Supabase database
- To mimic this behavior (as openvoice dockerization didn't work and I settled for openai's tts model instead), voice names get stored on Walrus and are then queried by the dockerized instances for ROFL-ization
- Voice talent registering on Privy and having their voice sent in would trigger an openai tts voice name to be stored to Walrus, mimicking what it would be like to send model parameters in the future. The blobid is then stored in a Supabase sql database to be queried in the future when the voice is used
Database:
- I used Supabase for a sql database
- the intention was for it to store:
- the privy id of the user
- EVM address of the user
- the blobid address of the user
- It is needed to relate the voice used for tts models and the address to pay the fees to as well as the blobid to find their voice model parameters
Lastly, I heavily used ChatGPT and Cursor for development.