Introduction to Zero-Knowledge Machine Learning for Blockchains

Introduction to Zero-Knowledge Machine Learning for Blockchains
Photo by Scott Webb / Unsplash

Building a Zero-Knowledge Machine Learning (ZKML) solution is not the same as building a typical machine learning solution or a general software solution. Overseeing a ZKML project involves some unique aspects and potential pitfalls.

In this article, we'll go over what is needed to deliver a successful ZKML solution.

Who is this article for?

This article is for curious people looking to build a ZKML solution in a blockchain context. I'm going to assume you want to build a large-scale machine learning project that takes real risk (if it fails) and generates real value (when it succeeds).

What is machine learning and why is it important?

Although machine learning offers substantial benefits in numerous fields, its capabilities are sometimes overhyped. Science fiction films have influenced many people's perceptions, leading to the misconception that machine learning equips machines with human-like intelligence. In truth, machine learning is more accurately viewed as a specialized instrument for data analysis rather than a universal solution for every challenge.

What is machine learning?

Machine learning is ultimately about finding patterns in structured data and making predictions. These can be (and often are) predictions about what will happen in the future. But this is not the only way you’ll find the term “predictions” used in machine learning solutions. Often it also means predicting answers to questions like: “What kind of dog is in this image?” The latter kind of prediction isn’t a time-based prediction (looking into the future), but rather a prediction in terms of: “What answer would an all-knowing oracle give if asked this specific question?”

What sets machine learning apart from conventional software?

Machine learning can alternatively be described as "acquiring knowledge from data". In contrast to traditional software solutions, which are based on deduction (where a knowledgeable individual formulates a set of rules, encoded as a series of if statements, to apply to data), machine learning solutions rely on induction (where a machine learning algorithm autonomously identifies patterns by analyzing numerous examples, allowing these discovered rules to be applied to additional data).

Conventional Software vs Machine Learning

Why is Machine Learning important for the blockchain?

Machine Learning works specially great with structured data. In many fields, Machine Learning has been used to create impressive breakthroughs. Each field like finance, medicine or marketing expands to many subfields and more importantly, similar algorithms can be used across very different fields.

In each field Machine Learning is enabled by huge datasets, giving incredible value to the data itself. The more data we have, further improvement we will see in machine learning models.

Blockchains are, in a sense, a huge set of structured data with transactions, blocks, events, smart contracts and user interactions. Can smart contracts leverage machine learning to create a new set of primitives and breakthroughs the same way we have seen in other fields? My answer is yes.

Zero-Knowledge Machine Learning and what it means for blockchains

A zero-knowledge (ZK) proof is a cryptographic method where one entity, known as the prover, can demonstrate to another entity, referred to as the verifier, the validity of a certain statement, without disclosing any extra information apart from the truthfulness of that statement.

In the context of blockchains, we can use a ZK proof to validate the correctness of a computation that happened outside of the blockchain. This primitive when combined with machine learning can enable the usage of model outputs (aka inferences) inside smart contracts to create business impact. This is what we call Zero-Knowledge Machine Learning (ZKML).

Zero-Knowledge Machine Learning

Is machine learning the right fit for your protocol?

There is a lot hype and potential that now everyone wants to use it, but it doesn't mean it's always a good idea. You could face two main risks:

  • Forcing machine learning into a solution where it can't add value in a effort to be relevant.
  • Failing to leverage machine learning in a solution where it would add value.

While every protocol is unique there are good rules of thumb to follow in order to identify it.

There are so many problems in the world worth working on and so many discoveries to make, you have to make a choice. My preference is to focus my efforts on solving problems that will help people.
— Andrew Ng

Machine learning is a good fit for your protocol if...

You've got tons of clean and well-structured data that you rely on to make decisions. This reliance on data usage shows that a machine learning solution could be useful. For example, you check gas usage to decide distribution in an aggregator.

You use hand-crafted rules or heuristics to solve your problem. If you're able to tackle the problem to some extent with these heuristics, then machine learning can assist in uncovering more intricate and precise rules based on current examples. For example, providing discounts to users in loans or token swap fees based in the transaction history.

Machine learning is probably a bad fit for your protocol if...

You depend on human effort for tasks that appear repetitive. However, these tasks often aren't as repetitive as they might seem and can involve nuanced actions. For instance, if your team handles writing sales emails or managing support cases, a machine learning solution could boost their efficiency. But it's important to note that it wouldn't fully replace them, as many aspects of their work require empathy and creativity, elements that machine learning can't replicate.

You anticipate accumulating a substantial amount of data. Often, teams introduce machine learning early on, foreseeing that their application will handle large data volumes upon becoming successful. However, in such scenarios, it's usually more advantageous to concentrate initially on collecting and structuring the data, extracting insights manually, and considering the integration of machine learning at a later phase.

Remember, these are simply suggestions. Almost every business can discover applications for machine learning. The real considerations are the timing of its implementation and the expected return on investment (ROI). For smaller companies or those with limited data, manual data analysis might yield a higher ROI. However, for others, machine learning could be a significant value-add.

When it comes to protocols and blockchains, the challenge often isn't the availability of data, but rather finding it and compiling a dataset where machine learning can add significant value. Moreover, this data is often dispersed across various APIs or isn't readily accessible.

Building ZKML solutions is about production not experimentation

You might be inclined to spend a couple of weeks to whip up a PoC (proof of concept) – a model that processes your protocol's data to generate some potentially valuable predictions. With countless tutorials out there from conventional use cases showing how to do this with a few lines of code and standard libraries, it's easy to believe that rolling out your model into a live environment will be straightforward or that a very similar could be applied for protocols. But, it's actually quite challenging. While it's a common misconception that creating the initial model is 80% of the work in a machine learning solution, those with experience in developing and deploying such solutions know that the PoC is really just about 10% of the entire process. If we extrapolate to blockchain and protocols, the number could be even lower as there are many more challenges than what we could find in conventional companies.

Machine learning solutions can generally be categorized into three core elements:

  1. Data: This is essential for identifying existing patterns and constructing the model.
  2. Code: This aspect involves defining and operating the model, as well as its integration with other services.
  3. Model: This component is crucial for producing predictions.

When you compare a proof of concept (PoC) with a production-level solution, you'll find that each of these elements exhibits distinct differences in their implementation and complexity.

When it comes to production, the complexity of systems increases, and they have unique requirements. For instance, in blockchain, integrating ZKML solutions demands ongoing monitoring to ensure the predictions meet quality standards and to monitor potential model decay. This decay can occur as blocks are processed and inferences are consumed over time. It's also crucial to consider issues like model bias and data drift, especially in rapidly changing environments. In short:

If you want to dive deeper into this topic, I recommend reading this paper about hidden technical debt in machine learning systems.

Hidden Technical Debt of Machine Learning Systems

ZKML Challenges in production

Going down to ZKML specific problems, technical debt could be even higher. Most available ZKML frameworks work through transpilation of models into the respective ZK backend of choice. Now, traditional model registries for keeping track of models are not enough as there needs to be an additional management of model version between original models and transpiled ones.

Also, the need to deal with quality issues due to the quantization and transpilation process that is usually needed as well as the extended inferencing pipeline with the addition of proving and verification. Depending on the latency that your solution may need in terms of blocks of delay between data observation and inference verification, ZKML could not be the solution you need or want. This topic will be further explained in a new blog entry, be sure to subscribe to be up to date.

Common challenges in ZKML and why you might still fail

The following challenges could potentially put the success of your solution at risk if you don't act accordingly:

  1. Lack of Trust in Your Model: Accuracy isn't the only measure of success. If your team and customers don't understand how your model arrives at its conclusions, they may be hesitant to trust it. It's important to grasp the necessity for explainability specially if we plan to automate processes in the blockchain that could affect user's funds.
  2. Data Quality Issues: Unlike the AI in science fiction, real-life AI requires data that's clean and well-formatted, much like what you'd find in an Excel spreadsheet. If your data is messy or disorganized, your AI won't perform as expected. Having transactions data of your protocol or basic information is not enough to generate quality datasets.
  3. Unsustainable Solutions: A model that works today might not work tomorrow if it's not properly maintained. Future-proofing your solution with a solid update and maintenance pipeline is crucial.
  4. Over-Complexity: There's always the risk of building a complex model when a simpler solution exists. It's wise to establish a basic, sensible benchmark before diving into more elaborate machine learning models.
  5. Changes in External Conditions: Machine learning models are as good as the data they're trained on. Sudden changes in external conditions, like a global crisis, protocol hacks or exploits can render a model obsolete if it can't adapt to new data patterns.
  6. Inherent Biases: If the data used to train your machine learning model contains biases, these will be reflected in its performance. It's imperative to identify and address these biases early on to avoid larger issues later.
  7. Underestimating Complexity: building ZKML solutions is not just about having fast provers, building APIs or decentralized compute networks. This will just make you fail faster in the endeavour of building machine learning solutions. Having strong a foundation and the right tools to guide you will enable reliable ZKML solutions.

Embarking on a machine learning project can be overwhelming, specially in the context of blockchains and protocols now that adoption is starting to flourish, but understanding these pitfalls can prepare you for the journey. If you need more guidance, don't hesitate to reach out to me for assistance fran@gizatech.xyz and explore how we can help you at Giza.

Subscribe to The Verifiable Black Box

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe