Compress 10000 XGBoost Decision Tree to 1 with recursive proof
Data and Model The project processed over a million training samples and more than 600,000 testing samples. Initially, the model comprised over 50,000 decision trees, posing significant challenges in terms of computational resources and efficiency.
Recursive Zero-Knowledge Proof Technique This technique involves compressing the proof of XGBoost's decision trees into an iterative recursion over a tree proof. The method significantly simplifies the complexity of the circuit, making it more manageable and efficient.
Challenges and Solutions Model Optimization and Compression: The project successfully compresses the number of decision trees to 900, with a negligible accuracy loss (less than 0.00002), ensuring optimized model complexity without compromising performance. Parameter Complementation and Alignment: A novel scheme was introduced to align decision tree parameters, allowing diverse trees to be handled by a single circuit file, thus reducing the overhead of generating multiple circuit layouts.
Recursive Forest uses a recursive zero-knowledge proof technique to compress the proof of XGBoost's forest of decision trees into an iterative recursion over a tree proof, greatly simplifying the complexity of the circuit. In addition to the recursive proof capability provided by Mina, there are the following challenges in realizing recursive zero-knowledge proofs for XGBoost: First, the sample model of this project was trained using real polarized radar data.The project uses data from the "How Much Did It Rain?" competition on Kaggle, including NEXRAD and MADIS data in CSV format , with over a million samples for training and over 600,000 for testing. The final model size obtained was over 50,000 decision trees. At this size, even using recursive proof techniques requires a significant resource overhead. Therefore, we first optimize the model complexity with guaranteed model accuracy, and compress the number of decision trees with loss of accuracy less than 0.00002 to 900. In addition, decision trees do not use the same features and parameters, and a direct recursive proof would require the generation of a large number of circuit layouts with more parameters. I propose a decision tree parameter complementation and alignment scheme that allows the same circuit file to handle different decision trees.