Conducting research on parallel execution blockchain systems.

The performance comparison between Sei, Aptos, Sui, and Crystality/PREDA highlights the continuous evolution of parallelization in the blockchain field. Author: PREDA Source: ChainFeeds

The design of parallel execution models is complex in both traditional database and blockchain technology fields. This is because the design process needs to consider multiple dimensions, and the choice of each dimension will have a profound impact on the overall performance and scalability of the system. This article will delve into the most representative parallel architecture of several blockchain execution layers and present in detail the experimental results we have obtained in terms of performance and scalability for these architectures.

From one dimension, the blockchain field has always been in pursuit of high performance and high scalability for chains. Even after the emergence of multi-chain systems and Layer2 systems, the execution capability of each smart contract is still limited by the ability of a single virtual machine (VM). With the emergence of Parallel VM, this limitation has been overcome. Parallel VM allows transactions of a single smart contract to be executed simultaneously on multiple EVM/VMs, thereby utilizing more CPU cores to improve performance. We believe that among the many high-performance blockchain systems that support parallel VM, Sei (V2), Aptos, Sui, Crystality, and PREDA are the most representative, and each system has unique design advantages. In the beginning of this article, we present the first set of experimental results. The following figure shows the absolute value of transactions per second (TPS) for Sei, Aptos, Sui, Crystality, and PREDA when executing the same ERC20 smart contract on a 128-core machine. From this set of experimental results, the PREDA model has a significant advantage in TPS and scalability compared to the five parallel execution systems. We will provide detailed analysis and other experimental data in the following sections.

Below, we will explain in detail the specific methods and operations used in our experiments: we first compared the TPS values of the five systems, that is, throughput. The same transaction volume was used for the TPS comparison experiments on different chains. Considering the different programming languages and underlying virtual machines used in different systems, a single throughput comparison cannot fully explain the advantages and disadvantages of the systems. We also compared the relative acceleration results, that is, the Speedup Ratio, which is the acceleration effect of executing the same number of transactions on multiple VMs relative to executing them on a single VM. In Sui, Aptos, Crystality, and PREDA, each thread was allocated a dedicated CPU core. For all detailed experimental data, including absolute TPS values and acceleration ratios, please refer to the complete experimental report. The table below shows the data sources, implementation processes, and evaluation methods used in the experiments.

Overview of parallel execution models Aptos and Sui, two projects, both originated from the failed blockchain project Diem announced by Meta (formerly Facebook). Both projects were founded by former Meta engineers—Aptos was founded by Avery Ching, and Sui was founded by Sam Blackshear. However, the technical paths followed by the two projects are different. Aptos strictly adheres to the original Move programming language developed for Diem, while Sui has made extensive modifications to Move. Next, we will explore the differences in the parallelization models of Aptos and Sui, analyze how their different approaches affect performance, and focus on their respective advantages.

Aptos: High-performance Layer 1 with optimistic parallelization Aptos is a Layer 1 that achieves parallel execution of smart contracts through an optimistic parallelization mechanism, thereby improving high performance. Specifically, in optimistic parallelization, transactions are initially assumed to have no state conflicts and are executed in parallel. After execution, the system checks for conflicts and resolves them by rolling back and executing serially or by using different scheduling to re-execute conflicting transactions. This speculative execution method assumes that most transactions will not conflict, thereby maximizing the advantages of parallel execution, while also providing a backup mechanism for handling conflicts. Advantages of optimistic parallelization: (1) No need to modify programs: Easy implementation without the need to modify existing code. (2) Efficiency in scenarios with low to moderate conflict percentages: By allowing many transactions to proceed concurrently and resolving conflicts when they occur, throughput is maximized, and in many real-world scenarios, conflicts are relatively rare. Aptos uses the MOVE programming language for smart contract development and employs the Aptos MOVE virtual machine in its system implementation.

Sui: High-performance Layer 1 with pessimistic parallelization Sui adopts a pessimistic parallelization strategy. In pessimistic parallelization, the system pre-checks whether transactions may cause resource contention before execution. Programmers need to specify the resources (i.e., state) that each transaction needs to access. The system performs a pre-check on each received transaction to detect potential conflicts. Only transactions that do not involve resource contention with transactions currently being executed are sent to the execution engine for parallel execution. Advantages of pessimistic parallelization: (1) Avoiding rollbacks: By identifying and avoiding conflicts before execution, this method minimizes the need for rollbacks and re-execution, thereby achieving more predictable performance. (2) Efficiency in high-conflict scenarios: Highly effective in high contention environments, ensuring that only non-conflicting transactions are executed in parallel, reducing the overhead of conflict resolution. Sui also uses the MOVE programming language but has its own Sui MOVE extension and employs the Sui MOVE virtual machine in its system implementation.

Sei: Optimistic parallelization compatible with Solidity and EVM When Sei was initially launched as a public chain, its positioning was as a transactional application chain built on the Cosmos SDK, and it has now been upgraded to the first parallelized EVM chain. In terms of parallel execution, Sei adopts a method similar to the Aptos model, which we refer to as optimistic parallelization. The distinctive feature of the optimistic parallelization adopted by Sei (V2) is its use of the Solidity programming language and the standard Ethereum Virtual Machine (EVM), ensuring compatibility with EVM and Solidity.

Crystality and PREDA: Parallel relay execution architecture Both Crystality and PREDA support a parallel relay execution distributed architecture. PREDA is specifically designed for parallelized general-purpose smart contracts in multi-EVM blockchain architectures. The relationship between the two is that Crystality is a programming language for parallel EVM/GPU, based on the PREDA model. From a system perspective, PREDA makes it possible for the first time in the blockchain field to fully parallelize contract functions, thereby maximizing the concurrency of a set of transactions. This ensures the efficient utilization of all EVM instances, thereby achieving optimal performance and scalability under certain hardware configurations. Unlike the sequential execution of Solidity and Move, and the Shared Everything architecture design, the PREDA model adopts the Shared Nothing architecture to break the state dependency in parallel execution and ensure that different EVM instances never access the same contract state, thereby almost completely avoiding write conflicts. In PREDA, contract functions are decomposed into multiple ordered steps, each of which depends on a parallelizable and conflict-free part of the state. Transactions initiated by users are first sent to an EVM that holds the user's address state. During transaction execution, the execution flow can move between EVMs by issuing relay transactions, maintaining data immutability, and moving the execution flow between EVMs based on data dependencies.

Experimental data for the five representative contracts In our evaluation, we tested five widely used smart contracts—ETH TokenTransfer, Voting, Airdrop, CryptoKitties, and MillionPixel, as well as MyToken (ERC20)—on various blockchain systems, including Sei, Aptos, Sui, Crystality, and PREDA.

We conducted detailed experiments to compare the performance of different parallel execution systems, focusing on transactions per second (TPS) and speedup ratio, which measure the relative performance improvement when executed on multiple virtual machines compared to a single virtual machine for each system. For all detailed experimental data, including absolute TPS values and speedup ratios, please refer to the complete experimental report.

ETH TokenTransfer Contract: This experiment used actual historical ETH transactions similar to the standard ERC20 smart contract.

Voting Contract: The Voting contract is an excellent example of how the PREDA model simplifies parallel voting algorithms. It utilizes Crystality and PREDA's data splitting, relay, and execution mechanisms, and performs better in both absolute TPS and speedup ratio compared to optimistic (Aptos) and pessimistic (Sui) parallelization methods. The original sequential algorithm in Solidity now allows parallel voting across virtual machines and aggregates results from temporary arrays.

AirDrop: This contract triggers multiple token or NFT transfers from one address to multiple addresses, following a one-to-many state change pattern. In this case, two transactions in Sei, Aptos, or Sui cannot be executed in parallel, but only the higher granularity parallelism of the PREDA model allows these transactions to be processed in a pipeline mode.

CryptoKitties: This contract is a popular game contract on Ethereum involving breeding offspring cats based on the genes of parent cats. Unlike the previous contracts, this contract requires access to multiple address states, including "parent cat," "mother cat," and "newborn cat," and involves more complex calculations when computing the genes of the newborn cat from the parent genes.

MillionPixel: In this game contract on Ethereum, users compete to mark coordinates on a map. This smart contract demonstrates the flexibility of the PREDA model. In addition to partitioning contract states by address, programmers can customize partition keys, such as switching from address type to uint32 type in this case.

To facilitate the understanding of the extensive data mentioned above, we will focus on analyzing two particularly representative contracts.

ETH Token Transfer Contract: When replaying historical ETH transaction data, the absolute throughput and scalability ratios of the five systems decreased compared to the ERC20 experiment. This is due to the repetition of addresses in historical transactions leading to state contention (read-write conflicts or write-write conflicts), hindering the concurrent execution of these transactions in parallel EVMs.

Voting Contract: The Sei contract can only be executed sequentially, with no speedup when running multiple EVMs. Similar results would be observed for other systems if the algorithm is not transformed into a parallel algorithm. For the parallel implementation of Aptos and Sui, multiple resources must be initialized at different addresses for the temporary results of the "proposal" variable. Additionally, the parallel implementation must provide manual scheduling based on the addresses of voters to guide voters' transactions to different virtual machines and access temporary results for parallel execution.

Inspiration from experimental results: Comparing optimistic and pessimistic parallel methods, Aptos and Sui each have their best performance in different specific scenarios. In the case of the ERC20 transfer, Aptos outperforms Sui because there are very few conflicts due to the use of randomly generated addresses in each transaction. Conversely, in the ETH test case, Sui outperforms Aptos due to the large number of conflicts resulting from replaying historical ETH transactions.

Performance analysis of Aptos: The performance analysis data for Aptos when running these two contracts (using the same smart contract, but with transaction data using either randomly generated or historical transaction data) is shown in the table. Due to the time-consuming nature of performance analysis, the maximum number of parallel virtual machines used for testing was limited to 64. The transaction execution in Aptos includes two steps: execution and verification. The test data shows that a large number of transaction execution states are marked as "SUSPEND" (suspended), and these transactions have a long execution time. "SUSPEND" means that the transaction execution is paused until its state dependencies are resolved. For random transactions on 64 virtual machines, the total number of executions and verifications is 102,219 and 139,426, respectively. For historical transactions, these numbers increase to 186,948 and 667,148, and the number of suspended transactions increases from 66 to 46,913. Therefore, when a large number of state conflicts occur during transaction execution, rollback becomes a heavy burden for optimistic parallelization.

Time analysis of Sui execution: The following chart shows the time details of Sui in the ETH Token Transfer contract test and Voting contract test. In Sui's parallel execution engine, there are three main steps: (1) Queue time: the waiting time before a transaction is selected by the transaction manager; (2) Task management time: the time from when a transaction is placed in Sui's Executing Txns hash map or Pending Txns hash map until it is received by Sui's Execution Driver; (3) Function execution time: the time taken by the working thread in the Execution Driver to execute the contract function. Task management time involves Locking and waiting. Comparing these two charts, it is evident that the task management time in the Voting test is significantly higher than in the ETH Token Transfer test. This is because in the Voting test, access to shared objects requires Locking and waiting to avoid conflicts, resulting in task management time being 2 to 4 orders of magnitude higher than function execution time and queue time. In contrast, in the ETH Token Transfer test, task management time is much lower because only Owned Objects are used, bypassing concurrency control.

Limitations of Aptos and Sui: In summary, Aptos adopts optimistic parallelization, allowing parallel transaction execution even in the presence of conflicts. This method based on optimistic concurrency control (OCC) is very effective for read-intensive workloads, which is common in databases and big data systems with few write requests. However, in blockchain systems, this method may incur significant gas costs due to on-chain execution involving gas fees. In reality, users typically send read-only requests (such as historical transactions or block queries) to off-chain databases like Etherscan, reserving on-chain execution for write requests. In this scenario, OCC systems like Aptos frequently encounter transaction "Suspend" and suspension, reducing the overall performance of parallel virtual machines. In contrast, Sui adopts pessimistic parallelization, rigorously verifying the state dependencies between transactions and preventing conflicts during execution through a Locking mechanism. This method based on pessimistic concurrency control (PCC) is more suitable for compute-intensive workloads, where the associated overhead is negligible. However, in operations with simple logic, the overhead of PCC can easily become a performance bottleneck. Many transactions executed on blockchain systems, such as ERC20 token transfers, Move token transfers, or NFT transfers, involve relatively simple operations. In these cases, the overhead of PCC becomes a limiting factor for parallel system performance. To address these challenges, PREDA proposes a system that almost completely avoids PCC overhead and the need for OCC re-execution. This method achieves almost conflict-free parallel execution by efficiently splitting on-chain states.

Performance of Crystality and PREDA: In all contract tests, the performance data of Crystality and PREDA is significantly better than Sei, Aptos, and Sui, with PREDA performing particularly well because it executes in native binary mode rather than WASM. This high performance is attributed to almost conflict-free parallel execution.

PREDA has considered the following two key aspects from the beginning of its design: defining different contract state ranges for state splitting and maintenance. To achieve the execution flow of transactions switching from one virtual machine to another. The core of PREDA lies in the introduction of Programmable Contract Scopes, which splits the contract state into non-overlapping, parallelizable fine-grained segments, and introduces Asynchronous Functional Relay to describe the execution flow switching between different EVMs. Let's further explain the meaning of these concepts. In PREDA, a contract function is decomposed into multiple ordered steps, each step depending on a single, parallelizable state segment without conflicts. For example, in a Token transfer, there are typically two steps: extraction, which involves accessing the Sender's state and extracting a specified amount of Tokens, and deposit, which involves accessing the Recipient's state and depositing the corresponding amount of Tokens. The latest parallel mechanisms implemented in Sei, Aptos, and Sui attempt to synchronously execute all steps in each transaction. If the access to the state between two transactions is shared or updated, such as when the Sender or Recipient is the same, these two transactions cannot be executed in parallel. However, PREDA adopts a divisible and asynchronous mechanism, where each step of a transaction is decomposed based on its data access dependencies, allowing each step to be asynchronously executed independently of other steps. Access to the same state is strictly serialized according to the order determined in the original transaction block and guaranteed by the consensus algorithm, i.e., ordered by the block creator. For example, Token transfer transactions Txn 0 (transferring tokens from address state A to state B) and Txn 1 (transferring from state A to state C) can sequentially access A twice (for Txn 0 and Txn 1, respectively), and then access B and C in parallel.

Architecture comparison of parallel execution in Aptos, Sei, and PREDA: Despite the significant performance advantages that PREDA and Crystality can provide to blockchain systems, they also have limitations. Load imbalance between parallel EVMs: Crystality's data splitting and execution flow redirection mechanism may lead to load imbalance issues during runtime for parallel EVMs. We observed this issue when replaying historical ETH Token transfer transactions using the MyToken contract. To evaluate the load distribution, we counted the number of transactions executed on each EVM, including original transactions and relay transactions, and then calculated the range and standard deviation of these numbers. The results showed that the range of transaction counts executed on 64 EVMs is comparable to the range on 2 EVMs, indicating the presence of hotspots at certain EVM addresses (i.e., historical transactions concentrated on a subset of addresses). Further investigation of the ETH dataset revealed that each hotspot address involved over 4000 transactions. It should be noted that, to our knowledge, Aptos and Sui cannot parallelize execution in this scenario either. Our test data indicates that as the number of EVMs increases, the standard deviation decreases, suggesting that adding more EVMs helps alleviate the load imbalance issue. A feasible solution to address hotspots on the blockchain is to use multiple addresses instead of a single address for sending or receiving tokens. If the load imbalance is caused by several non-hotspot addresses mapping to the same virtual machine, existing methods in sharded blockchains, such as data migration, may be helpful.

Program rewriting: Another significant limitation of PREDA and Crystality is that developers need to rewrite smart contracts using directives. It would greatly enhance the developer experience if there were a tool that could automatically translate existing smart contracts written in Solidity, Move, or Rust into equivalent Crystality smart contracts. Based on previous experiences, it is not difficult to achieve, as there has been research exploring translation between different languages, such as from Solidity to Move and from Python to Solidity. The advancement of natural language processing technology greatly enhances the potential for automatic code generation. These advancements, combined with rule-based and pattern-based compiler translation technologies (such as SQL to MapReduce translation for big data and computation graph to matrix computation translation for deep learning), can fully support the development of automated smart contract translation tools.

Conclusion: The performance comparison between Sei, Aptos, Sui, and Crystality/PREDA highlights the continuous evolution in the field of blockchain parallelization. Aptos (and Sei) and Sui demonstrate the potential of optimistic and pessimistic parallelization mechanisms, respectively, showing advantages in different scenarios. However, the significant performance improvement of Crystality and PREDA indicates that more advanced parallelization models may be the key to unlocking higher levels of scalability and efficiency. To summarize our exploration and observations of the three main parallelization methods in the blockchain field, we have compiled a table. If you want a takeaway from this article, it is the content of this table.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Conducting research on parallel execution blockchain systems.

Selected Articles by Foresight News

Table of Contents

Related Articles