project screenshot 1
project screenshot 2
project screenshot 3

LLMBench

Verifiable on-chain Large Language Models drift benchmarking

LLMBench

Created At

ETHGlobal Paris

Project Description

This project was inspired by the following paper: https://arxiv.org/abs/2307.09009

It detailed how ChatGPT's behavior and performance have been changing over time. That makes it difficult for companies to integrate the models into their pipeline, considering the unpredictability of these changes. The researchers thus developed a set of benchmarks that they ran on two snapshots of OpenAI's models. In this project, this benchmarking process is made recurrent and the results are stored on-chain for immutability and transparency purposes.

How it's Made

The project is split into three modules:

  1. Front-end: one-page application made with Vue.js 3 with GPT-3.5 Turbo and GPT-4 benchmarks
  2. Smart Contract: benchmark storing contract deployed on Gnosis
  3. LLMDrift Scripts: scripts meant for bacalhau, running the LLMDrift benchmarks on the gpt-3.5-turbo and gpt-4 current models, and writing the result on the Gnosis chain. These scripts were based on the "lchen001/LLMDrift" repo, developed by the researchers of the aforementioned paper.
background image mobile

Join the mailing list

Get the latest news and updates