Generating Succinct ZK Proofs for Content-Generating Machine Learning Models; Proving that AI-Generated content is authentic with ZKML
Section 9: Generating Succinct ZK Proofs for Content-Generating Machine Learning Models.
We trained a machine learning model for on-chain AI art generation, then created a novel tool that converts it (and lots of other models) to zero-knowledge proof verifiable on the blockchain. We finally created a variant of the ERC721 protocol that harnesses this capability to empower Web 3 art creators.
What did we do:
We trained a machine learning model to produce AI-Generated Content.
We encoded the model’s art generation process into an arithmetic circuit to be consumed by a zk-SNARK (Zero-Knowledge Succinct Non-Interactive Argument of Knowledge) prover on-chain (Solidity).
We created a novel ERC-721 based NFT protocol incorporating the zk-SNARK verifier that allows AI artists to uniquely mint and certify their creations.
We created & open-sourced a generalized tool & framework to convert almost any machine learning algorithms into zero-knowledge proofs that are easy integrable with the blockchain.
Why is it significant?
Why is it hard?
Machine learning models are huge: e.g. FastBERT: ~1800 MFLOPS (million floating point operations). Normally you can’t run it on a blockchain.
Zero-knowledge-proofs are hard to generate and operate: ceremony, contribution, beacon, etc. Not to mention the speed.* Frankly, the three key stakeholders don’t talk to each other at the moment.
2.1. Machine learning engineers don’t understand the tooling and requirements to convert an algorithm to a blockchain-friendly format
2.2 ZKP developers lack visibility into the capabilities of AIGC models and the performance / ecosystem / adoption gap
3.3 Creators hoping to leverage both Web 3 & AI lack guidance and support
How It’s Made
Renowned AI artists teaming up with ZK experts and veterans in machine learning model deployment
Contribution 1: A blockchain-friendly AIGC model
We trained a tiny machine learning model trained on artworks throughout history. This represents a typical workflow for a creator leveraging AI model. The model takes inspiration from Wasserstein GAN with gradient penalty, as well as spectral normalization.
We created a novel method that rewrites all floating point operations in this model into integer-only operations, which made it blockchain- and zk- viable. We combined several techniques in integer-only-quantization (previously only available in TF-Lite), ported it to PyTorch, then resolved issues of decreased generator performance in the presence of training-time quantization. We also introduced a new scheme of integer-only re-quantization by restricting the scale of powers of two. We also fused batch normalization and ReLU units into 2D convolutions for simplified quantized inference.
Contribution 2: A ML -> ZK model translator
We then generalized our tailored approach into a translator that translates machine learning models in PyTorch (a very popular framework) into arithmetic circuits in Circom (a popular zk-SNARK proof generator). We first used TorchScript's JIT compiler to generate a computational trace of the generator in fully quantized form, then imported it to an intermediate representation in TVM that includes passes that automatically translate certain divisions into fixed point multiplications. We then implemented a graph runtime in Python to build up a neural network in Circom that supports a wide range of operators ranging from convolution to matrix multiplication. To mimic typed and untyped integer arithmetic expressed in TVM, we crafted specialized Circom building blocks with clear bound checks and used elements greater than p/2 for negative numbers.
Our method is capable of generating models with hundreds of components and millions of parameters - really limited by the inherent performance of ZK systems.
The code is open-sourced at: https://github.com/zk-ml/uchikoma
Almost infinite rooms for imagination:
(shown) Vision models -> AIGC
Language models -> chatbot, writing assistant
Linear models and decision trees -> fraud detection, sybil attack prevention
Multi-modal models -> recommender systems
Contribution 3: An creators-empowering NFT protocol
Finally, we bootstrapped the classic ERC-721 protocol to incorporate ML & ZK capabilities to give artists and creators more control over their generated content.
We used EBMP (created by our team member) to directly render on-chain images using data stored in smart contracts. This is done with inspiration from dom's technique by manually concatenating bytes as specified by the BMP protocol and placing that into an SVG base64 string.
The ZK prover oversees that the image is indeed generated by a machine learning model. Only the holder of the machine learning algorithm (presumed to be the AI artist) is able to mint the model. Given a public AIGC model (e.g. StableDiffusion), each creator can prove their contribution either to the original dataset that trained the model or by carefully designed prompts. Our ZK prover can hold both assertions true without revealing the artists’ identities of prompts. We could directly export a solidity-based verifier for Groth16 in snarkjs with respect to our R1Cs circuits.
What this means for our sponsor:
Polygon - converting machine learning models to zero-knowledge proofs have profound implications for expanding the capabilities of blockchains, enabling people in underdeveloped countries to generate income with on-chain AIGC or ZK-ML-enhanced trustless freelancing.
Optimism - governance tech in consensus is tricky. Our tools can enhance existing voting and Sybil resistance techniques by allowing an ML-based approach: imagine a self-evolving DAO smart contract powered by a neural network.
Worldcoin - generation of World ID is an intricate interplay between the Orb hardware and many algorithmic innovations, many of which are machine learning algorithms. Our generalized ML - ZK model translator could enhance this process by verifying the validity of a World ID without revealing key IPs powering the algorithm.
IPFS & Filecoin - zero-knowledge proofs create order-or-magnitudes overhead in proving key storage compared to a normal machine learning model deployment. Storing these artifacts on IPFS has many benefits, such as cost and decentralization: a proving key that allows users to easily transfer stablecoins across borders for remittance purposes may be subject to constant malicious alterings and censorship.
0x - Our generalized ML -> ZK pipeline could help the 0x on-chain infrastructure to attain machine-learning-related capabilities, such as ML-algorithm-based auto-trader, wash trading detector, smart and resilient governance smart contracts, etc.
ENS - ENS has a long way to go to be truly a soul-bound identity. Our generalized ML -> ZK pipeline could help the ENS on-chain infrastructure to attain machine-learning-related capabilities, improving in Sybil attack resistance or trustless biometrics processing.