Author: Nickqiao & Misty, Geek Web3
In April of this year, Vitalik visited the Hong Kong Blockchain Summit and delivered a speech titled "Reaching the Limits of Protocol Design," in which he once again mentioned the potential of ZK-SNARKs in the Ethereum Danksharding roadmap and looked forward to the huge help of ASIC chips for ZK acceleration.
Previously, Zhang Ye, co-founder of Scroll, also pointed out that the application space of ZK in traditional fields may be larger than that in Web3. There is a huge demand for ZK in trusted computing, databases, verifiable hardware, content anti-counterfeiting, and zkML. If real-time generation of ZK proofs can be implemented, both Web3 and traditional industries are expected to usher in a paradigm-level change. However, from the perspective of efficiency and economic cost, it is still a long way off to achieve large-scale adoption of ZK.
In fact, as early as 2022, top venture capital firms a16z and Paradigm publicly released reports, explicitly expressing their emphasis on ZK hardware acceleration. Paradigm even asserted that the future income of ZK miners may be comparable to that of Bitcoin or Ethereum miners, and hardware acceleration solutions based on GPU, FPGA, and ASIC will have huge market space. Subsequently, with the popularity of mainstream ZK Rollups such as Scroll and Starknet, hardware acceleration has become a hot concept in the market, and this enthusiasm has become even stronger with the imminent launch of projects like Cysic.
We have reason to believe that based on the huge demand space for ZK, the SaaS model of ZK mining pools and real-time ZKP generation can open up a brand new industrial chain. In this highly potential new territory, hardware manufacturers with strong support and first-mover advantage in ZK are fully capable of becoming the next generation of Bitmain, dominating the fertile land of hardware acceleration.
In the field of hardware acceleration, Cysic may be one of the most anticipated contenders. The team has won important awards from the well-known ZKP technology competition platform ZPrize and has been serving as a mentor for ZPrize since 2023. Its roadmap includes ToB-end ZK mining pools and ToC ZK-Depin hardware, which has attracted the attention of top VCs such as Polychain, ABCDE, OKX Ventures, and Hashkey, completing a total of nearly $20 million in financing.
With the Cysic testnet set to launch at the end of July and the imminent opening of its ZK mining pool, discussions about Cysic in various communities are becoming increasingly heated. This article aims to help more people understand the product principles and business models of Cysic, and provide a simple popular science explanation of ZK hardware acceleration principles. In the following text, we will briefly summarize the relevant knowledge of Cysic to help more people reduce the threshold of understanding.
Understanding ZK Proof Systems from Workflow
ZK proof systems are actually very complex, but if you want to have a simple understanding of their overall structure, you can decompose them from the perspective of functions and workflow. For a system that ZK-izes ordinary computations, the core process can be summarized as follows:
First, we need to interact with the ZK system through the front end to submit the content to be proved. The front end will format this content to facilitate processing by the ZK proof system. Then, the system will generate ZK Proof through specific proof systems or frameworks (such as Halo2, Plonk, etc.). This process can be further divided into the following steps:
Problem setting: First, we need to determine what content needs to be proved. For example, the Prover declares that they know/have certain data, "I know a solution N to the equation F(x)=w," but they do not want others to see the value of N.
Arithmeticization and CSP: After the Prover submits the content to be proved, the system will establish a specialized mathematical model/program to express the content to be proved equivalently, and then format it for processing by the proof system. Specifically, the aforementioned declaration "I know a solution N to the equation F(x)=w" will be transformed from the original mathematical equation into the form of logic gate circuits and polynomials.
- Then, the system will select suitable proof systems such as Halo, Plonk, etc., to compile the content generated in the previous steps into usable ZKP programs. The Prover uses this ZKP program to generate a proof, which is then verified by the Verifier.
Systems like zkEVM, which are frequently adopted in Ethereum Layer 2, essentially compile smart contracts into the underlying operation codes of EVM, then convert each operation code into the form of logic gate circuits/polynomial constraints, and further process them by the backend ZK proof system.
It is worth mentioning that the widely used ZKP technology solutions in blockchain currently mainly involve zk-SNARK (Zero-Knowledge Succinct Non-Interactive Argument of Knowledge), and most ZK Rollups utilize the succinctness of SNARK rather than zero-knowledge. Succinctness means that ZKP occupies very little space and can compress a large amount of content into a few hundred bytes, resulting in very low verification costs.
As a result, the workload between the Prover and Verifier is asymmetric. The cost of generating ZKP is very high for the Prover, while the verification cost is very low for the Verifier. By leveraging this asymmetry in the "single Prover, multiple Verifier" scenario, ZK can concentrate the overall cost on the Prover side, greatly reducing the cost for the Verifier. This model is extremely advantageous for decentralized verification, and this is the approach taken by Ethereum Layer 2.
However, this model of shifting verification costs to the ZK generation side is not a silver bullet. For ZK Rollup projects, the high cost of generating ZKP will ultimately be transferred back to user experience and transaction fees, which is not conducive to the long-term development of ZK Rollup.
Despite the significant potential of ZK in scenarios of trustless and decentralized verification, limited by the bottleneck in generation time, whether it is zkEVM, zkVM, ZK Rollup, or ZK bridge, they currently do not have the economic foundation for large-scale adoption.
In response to this, ZK acceleration projects represented by Cysic, Ingonyama, and Irreducible have emerged, each attempting to reduce the generation cost of ZKP from different directions. In the following text, we will briefly introduce the main expenses and acceleration methods of ZKP generation from a technical perspective, and explain why Cysic has enormous potential in the ZK acceleration race.
Computational Expenses: MSM and NTT
Many people know that the time expense for Prover to generate ZKP proofs is very high. In ZK-SNARK protocols, there is often a situation where the Verifier only needs one second to verify the proof, but the generation of the proof may take the Prover half a day or even a whole day. In order to efficiently use ZKP proof computation, it is necessary to convert the computation format from classical programs to ZK-friendly formats.
Currently, there are two methods to achieve this: one is to write circuits using some proof system frameworks, such as Halo2; the other is to use domain-specific languages (DSLs) such as Cairo or Circom to convert computations into intermediate representations, which are then submitted to the proof system. The proof system will generate ZK proofs based on the circuits or intermediate representations compiled from DSLs.
The more complex the operation, the longer it takes to generate the proof. Additionally, certain operations are inherently unfriendly to ZK, and implementing them requires additional work. For example, the SHA or Keccak hash functions are unfriendly to ZKP, and using these functions will result in longer proof generation times. Even operations with low execution costs on classical computers may be unfriendly to ZKP.
Setting aside ZK-unfriendly computational tasks, although the proof generation process may vary depending on the chosen proof system, the bottleneck is essentially similar. In the generation of ZK proofs, there are two computational tasks that consume the most computing resources: Multi-Scalar Multiplication (MSM) and Number Theoretic Transform (NTT). These two tasks can account for 80-95% of the proof generation time, depending on the commitment scheme and specific implementation of ZKP.
MSM mainly deals with multiple scalar multiplication on elliptic curves, while NTT is a Fast Fourier Transform (FFT) on finite fields, used to accelerate polynomial multiplication. Different combinations of schemes will result in different FFT/MSM load ratios.
Taking Stark as an example, its Polynomial Commitment Scheme (PCS) uses FRI, a hash-based commitment, instead of elliptic curves used by KZG or IPA, completely eliminating MSM calculations. The higher up in the table means more FFT calculations are needed, while the lower down means more MSM calculations are needed.
Optimization Solutions
Since MSM calculations involve predictable memory access and can be heavily parallelized, they require a significant amount of memory resources. Additionally, MSM also faces scalability challenges and may be slow even with parallelization. While MSM can potentially be accelerated in hardware, it requires significant memory and parallel computing resources.
NTT often involves random memory access, making it unfriendly to hardware, and difficult to handle in distributed infrastructure. This is because the random access nature of NTT would inevitably require accessing data from other nodes in a distributed environment, leading to a significant decrease in performance due to network interactions.
Therefore, accessing stored data and data movement become a major bottleneck, limiting the parallelization capabilities of NTT calculations. Most of the work to accelerate NTT is focused on managing how computation interacts with memory.
In fact, the simplest way to solve the efficiency bottlenecks of MSM and NTT is to completely eliminate these operations. Some newly proposed algorithms, such as Hyperplonk, modify Plonk to eliminate NTT operations. This makes Hyperplonk easier to accelerate, but introduces new bottlenecks, as well as the computationally expensive sumcheck protocol. There is also the STARK algorithm, which does not require MSM, but its FRI protocol introduces a large number of hash calculations.
ZK Hardware Acceleration and Cysic's Ultimate Goal
While optimizations at the software and algorithm levels are very important and valuable, they have clear limitations. In order to fully optimize the efficiency of ZKP generation, hardware acceleration must be used, just like how ASICs and GPUs eventually dominated the BTC and ETH mining markets.
So the question is: what is the best hardware for accelerating ZKP generation? There are currently several types of hardware that can achieve ZK acceleration, such as GPU, FPGA, or ASIC, each with its own advantages and disadvantages.
Let's compare these hardware options:
First, let's illustrate their differences in development with a simple example. For example, if we want to implement a simple parallel multiplication:
- On a GPU, using the APIs provided by the CUDA SDK, we can develop as if writing native code, thus gaining parallel computing capabilities;
- On an FPGA, we need to learn hardware description languages to control hardware-level connections and implement parallel algorithms;
- On an ASIC, the chip design phase directly fixes the arrangement of transistor connections at the hardware level, and no further modifications can be made.
Each of these solutions has its own advantages and disadvantages, suitable for different stages of development in the ZK race. Cysic is committed to becoming the ultimate solution for ZK hardware acceleration, with a phased strategy:
- Develop an SDK based on GPU to provide solutions for ZK applications and integrate all GPU resources across the network;
- Utilize the flexibility and balanced characteristics of FPGA to quickly implement customized ZK hardware acceleration.
- Independently develop ZK Depin hardware based on ASIC.
- Cysic Network will integrate all computing power of ZK Depin and GPU as a SAAS platform/mining pool, providing computing power and verification solutions for the entire ZK industry.
Let's now delve into the detailed differences in ZK acceleration solutions and Cysic's development strategy through an analysis of multiple segmented tracks.
ZK Mining Pool and SaaS Platform: Cysic Network
In fact, both well-known ZK Rollups such as Scroll and Polygon zkEVM have explicitly proposed the concept of "decentralized Prover" in their roadmaps, which is essentially the construction of ZK mining pools. This market-oriented approach allows ZK Rollup projects to lighten their burden and incentivize miners and mining pool operators to continuously optimize ZK acceleration solutions.
In Cysic's roadmap, there is a clear plan for a ZK mining pool and SaaS platform called Cysic Network. It will not only integrate Cysic's own computing power but also absorb third-party computing resources through mining incentives, including idle GPUs and zk DePIN devices held by ordinary users.
The entire verification workflow is illustrated as follows:
- zk project parties submit proof generation tasks to agents, whose job is to forward the proof tasks to the verification network. Initially, these agents will be operated by Cysic, and later asset pledging will be introduced to allow anyone to become an agent;
- Provers accept proof tasks and use hardware to generate ZK proofs. Provers need to pledge tokens to participate in proof task contracting and will receive rewards upon completing proof tasks;
- The Verification Committee is responsible for checking the validity of proofs generated by Provers and voting. Once a certain number of votes are reached, the proof will be considered valid. Verifiers join the committee by pledging tokens, participate in voting, and receive rewards. This process can be combined with EigenLayer's AVS concept, reusing existing Restaking facilities.
The detailed interaction process is as follows:
In fact, there is a point in the above process where both asset pledging and incentive distribution, as well as the submission of computing tasks, rely on a dedicated platform, which requires a blockchain as a dedicated facility.
Cysic Network has built a dedicated public chain using a unique consensus algorithm called Proof of Compute (PoC). The basic principle is based on the VRF function and the historical performance of the Prover, such as device availability, number of proof submissions, proof accuracy, etc., to select a block producer responsible for recording information about various devices and distributing token incentives.
In addition to ZK mining pools and SaaS platforms, Cysic has made significant progress in ZK acceleration solutions based on different hardware. Let's understand their achievements in the GPU, FPGA, and ASIC routes.
GPU, FPGA, and ASIC
The core of ZK hardware acceleration is to parallelize key operations as much as possible. From the perspective of hardware functionality, CPUs allocate a large portion of their chip area to provide control functions and various levels of cache in order to achieve maximum flexibility and generality, resulting in weak parallel computing capabilities.
In GPUs, the proportion of chip area used for computation is significantly increased, allowing them to support large-scale parallel processing. GPUs are now very popular, and libraries such as Nvidia CUDA can help developers utilize the parallelism of GPUs without needing to understand the underlying hardware. The CUDA SDK can encapsulate CUDA ZK libraries to accelerate MSM and NTT calculations.
FPGAs consist of arrays of small processing units. Programming FPGAs requires the use of specialized hardware description languages, which are then compiled into transistor circuit combinations. Therefore, FPGAs directly implement specific algorithms using transistor circuits without the need for instruction system compilation. This customization and flexibility far surpasses that of GPUs.
Currently, FPGAs are priced at approximately one-third of GPUs and can achieve more than ten times the energy efficiency of GPUs. This significant energy efficiency advantage is partly due to the fact that GPUs need to be connected to host devices, which typically consume a lot of power. It can be said that FPGAs can add more computing modules to meet the demands of MSM and NTT without increasing energy consumption. This makes FPGAs particularly suitable for ZK proof scenarios that are computationally intensive, require high data throughput, and low response times.
However, the biggest problem with FPGAs is the lack of programming experience among developers. For ZK project parties, organizing a team with expertise in both cryptography and FPGA engineering is extremely difficult.
ASICs completely implement a program using hardware, and once designed, the hardware cannot be changed. Correspondingly, the programs that ASICs can execute cannot be changed and can only be used for specific tasks. The hardware acceleration advantages of FPGA in MSM and NTT also apply to ASICs. Due to their specialized circuit design, ASICs have the highest efficiency and lowest energy consumption among all solutions.
For the current mainstream ZK circuits, Cysic aims to achieve proof times of 1-5 seconds, and only ASICs can achieve this goal.
While these advantages may sound very attractive, ZK technology is rapidly evolving, and the design and production cycle of ASICs typically takes 1-2 years, with costs reaching 10-20 million USD. Therefore, large-scale production should only be invested in once ZK technology is stable enough to avoid producing chips that quickly become obsolete.
In the fields of GPU, FPGA, and ASIC, Cysic has made comprehensive layouts;
In terms of GPU acceleration solutions, with the emergence of various new ZK proof systems, Cysic has adapted them based on its self-developed CUDA acceleration SDK and has linked tens of thousands of top-level GPU cards in Cysic's GPU computing network through community resource aggregation. The Cysic CUDA SDK accelerates MSM and NTT calculations by 50%-80% or more compared to the latest open-source frameworks.
In the FPGA domain, Cysic has achieved the fastest implementation of MSM, NTT, Poseidon Merkle tree, and other modules globally through its self-developed solution, covering the most critical parts of ZK computation. This solution has been prototyped and verified by several top ZK projects.
Cysic's self-developed SolarMSM can complete 2^30-scale MSM calculations in 0.195 seconds, while SolarNTT can complete 2^30-scale NTT calculations in 0.218 seconds, making it the highest-performing FPGA hardware acceleration result among all publicly available results.
In the ASIC field, although large-scale application of ZK ASICs is still a certain distance away, Cysic has already laid out this track in advance and launched its self-developed ZK DePIN chips and devices.
To attract C-end users and meet the performance and cost requirements of different ZK project parties, Cysic will launch two ZK hardware products: ZK Air and ZK Pro.
ZK Air is similar in size to a power bank or a laptop power supply, and ordinary users can directly connect it to a laptop, iPad, or even a smartphone via a Type-C interface to provide computing power support for specific ZK projects and receive rewards. Currently, ZK Air's computing power still surpasses consumer-grade graphics cards and can accelerate small-scale ZK proof generation tasks.
ZK Pro is similar to a traditional mining machine and achieves the effect of interconnecting multiple top consumer-grade graphics cards in a GPU server, significantly accelerating ZK proof generation, suitable for large-scale ZK projects such as ZK-Rollup and ZKML (Zero knowledge machine learning).
Through these two devices, Cysic will ultimately build a stable and reliable ZK-DePIN network. These two devices are still under development and are expected to be launched in 2025.
In addition, through Cysic Network, C-end users can join the ZK hardware acceleration market with very low barriers to entry. Coupled with the significant demand for computing power from ZK project parties, this may lead to another wave of enthusiasm similar to Bitcoin mining, and the market size of the ZK computing field may experience explosive growth once again.
References:
https://medium.com/amber-group/need-for-speed-zero-knowledge-1e29d4a82fcd
https://figmentcapital.medium.com/accelerating-zero-knowledge-proofs-cfc806de611b
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。