We built a way to audit machine learning models with ZK proofs -- it could also be used to build on-chain data marketplaces!
We built a way to trustless audit machine learning models with zero-knowledge proofs. Many machine learning models are private, but as users, we may want certain guarantees, such as whether a model was trained on data that was demographically diverse (healthcare settings), or data that was not copyrighted (generative models).
We used zero-knowledge proof cryptography for this, and go into greater depths in our video!
We utilized the open-source ZKML library, which produces zero-knowledge proofs for tensor computational graphs.
Our specific contributions during the hackathon was creating a transpiler from scratch with Python and Tensorflow, that takes a TFLite Machine Learning model, and produces a tensor computational graph that exactly describes the stochastic gradient descent computation of the neural network -- this effectively proves the training of a neural network. We built our transpiler so that it can work on any arbitrary machine learning model.
We had to make extensive change to the ZKML repo, such as having to add custom layers for updating machine learning weights, a large integer division for softmax, as well as many tensor manipulation operations such as Broadcast, Rotate, Permute, Reflect
We found that the naive softmax implementation greatly decreases the model accuracy, so we investigated with more numerical stable forms of softmax. We extended the ZKML repo with this. (https://arxiv.org/pdf/2202.03493.pdf)
Our specific contributions during the hackathon are here: https://github.com/ddkang/zkml/pull/11/files#diff-cbb020b3e74381bd104ea2be1b4dc42927bac1075933d99238493c315351c21a
This is the actual repository link: https://github.com/ddkang/zkml. Please note that we linked a dummy repo for now because we didn't have time to make it public.