Firedancer Validator launched, paving the way for widespread adoption of Solana.

CN
1 year ago

This article is from: 《What is Firedancer? A Deep Dive into Solana 2.0》

Original author: 0xIchigo

Translator for Odaily Planet Daily: 如何

Firedancer Validator Launched, Paving the Way for Solana's Large-scale Adoption

As we all know, Solana, as one of the high-performance representatives in the current public chain, its fast on-chain processing speed has been sought after by many projects and has also attracted the favor of traditional giants like Visa. However, Solana has always had the hidden danger of network downtime. How to solve the problem of network downtime, Firedancer, which will be launched by Jump, may provide an answer for its clients or customers.

This article will start with the role of validators and validator clients in the blockchain, and explore how the Firedancer validator client can empower the Solana network.

Translated by Odaily Planet Daily.

What is the diversity of validators and validator clients?

A validator is a computer that participates in proof-of-stake blockchains. Validators are the backbone of the Solana network, responsible for processing transactions and participating in the consensus process. Validators secure the network by locking a certain amount of Solana native tokens as collateral. The staked tokens can be seen as a security deposit that economically ties the validator to the network. This incentivizes validators to accurately and efficiently perform their tasks, as they will be rewarded based on their contributions. At the same time, validators will also be penalized for malicious or faulty behavior. The validator's stake will be reduced due to improper behavior, a process known as slashing. Therefore, validators have ample motivation to perform their duties correctly to increase their stake.

A validator client is an application used by validators to perform tasks. The client is the foundation of the validator, participating in the consensus process through its encrypted unique identity.

Having multiple different clients can improve fault tolerance. For example, if no single client controls more than 33% of the stake, crashes or liveness faults will not cause the network to crash. Similarly, if a client has an error resulting in an invalid state transition, as long as less than 33% of the stake uses that client, the network can avoid security faults. This is because the majority of the network will remain in a valid state, preventing the blockchain from splitting or forking. Therefore, the diversity of validator clients can enhance the network's resilience, preventing a single client's error or vulnerability from causing serious impact on the entire network.

Client diversity can be measured by the percentage of stake each client runs and the total number of available clients. As of the writing of this article, there are 1979 validators on the Solana network. The two clients used by these validators on the mainnet are provided by Solana Labs and Jito Labs. When Solana was launched in March 2020, it used a validator client developed by Solana Labs. In August 2022, Jito Labs released a second validator client. This client is a branch of the Solana Labs code maintained and deployed by Jito. The client optimizes the extraction of maximum extractable value (MEV) from blocks. Jito's client creates a pseudo-mempool because Solana streams blocks without a mempool. It is worth noting that a mempool is a queue of unconfirmed and pending transactions. The pseudo-mempool allows validators to search these transactions, bundle them optimally, and submit them to Jito's block engine.

Firedancer Validator Launched, Paving the Way for Solana's Large-scale Adoption

As of October 2023, Solana Labs' client holds 68.55% of active stake, while Jito holds 31.45%. The number of validators using the Jito client has grown by 16% compared to the previous health report from the Solana Foundation. The growth in the usage of the Jito client demonstrates the evolution trend of client diversification.

Although this growth is encouraging, it is not without flaws. It is important to emphasize that Jito's client is a branch of the Solana Labs client. This means that Jito shares many components with the original validator codebase and may be susceptible to attacks exploiting errors or vulnerabilities in the Solana Labs client. In an ideal future, Solana should have at least four independent validator clients. Different teams will build these clients using different programming languages. No single implementation will hold more than 33% of the stake, as each client will hold approximately 25% of the stake. This idealized setup would eliminate single points of failure throughout the validator stack.

Developing a second independent validator client is crucial for achieving this future, and Jump is committed to this goal.

Why is Jump building a new validator client?

Solana's mainnet has experienced four instances of downtime in the past, each requiring hundreds of validators to be manually repaired. These downtime incidents have highlighted concerns about the reliability of the Solana network. Jump believes that the protocol itself is reliable, and attributes the downtime to software module issues affecting consensus. Therefore, Jump is developing a new validator client to address these issues. The overall goal of this client is to improve the stability and efficiency of the Solana network.

Developing an independent validator client is a daunting task. However, this is not the first time Jump has built a reliable global network. In the past, securities trading (i.e., buying and selling stocks) was manually executed by market specialists. With the emergence of electronic trading platforms, securities trading became more open. This openness increased competition, automation, and reduced the time and cost for investors to trade. Technical competition among market specialists began.

Traders make a living from trading. A better trading experience requires a focus on software, hardware, and network solutions. These systems must have high machine intelligence, low real-time latency, high throughput, high adaptability, high scalability, high reliability, and high accountability.

Off-the-shelf solutions (i.e., software that companies can directly purchase) are not a competitive advantage. Sending the right orders to the exchange in a sub-second manner is an expensive way to lose money. The intense competition in high-frequency trading has led to an endless development cycle, establishing a world-class global trading infrastructure.

Firedancer Validator Launched, Paving the Way for Solana's Large-scale Adoption

This scenario may sound familiar. The requirements for a successful trading system are similar to those for a successful blockchain. A blockchain needs to be a high-performance, fault-tolerant, low-latency network. A slow blockchain is a failed technology that cannot meet the needs of modern enterprise applications, hindering innovation, scalability, and real-world utility. With over twenty years of experience in global network expansion and high-performance system development, Jump is the perfect team to create an independent validator client. Kevin Bowers, Chief Scientist at Jump Trading, is responsible for overseeing the entire construction process.

Why is the speed of light too slow?

Kevin Bowers detailed the problem of the speed of light being too slow. The speed of light is a finite constant that provides a natural limit on the amount of computation a single transistor can handle. Currently, bits are modeled by the transmission of electrons in transistors. Shannon's capacity theorem (i.e., the maximum amount of error-free data that can be sent on a channel) limits the number of bits that can be transmitted through a transistor. Due to fundamental physical and information theory constraints, the speed of computation is limited by the speed at which electrons move in matter and the amount of data that can be sent. These constraints become apparent when pushing supercomputers to their limits. Therefore, there is a "significant mismatch between the ability of computers to process data and the ability to transmit data."

Firedancer Validator Launched, Paving the Way for Solana's Large-scale Adoption

Taking the Intel Core i9 13900K CPU as an example. It has 24 x86 cores, a base clock frequency of 2.2 GHz, and a maximum turbo boost clock frequency of 5.8 GHz. In the worst case, light needs to propagate a total distance of about 52.0 millimeters within the CPU. The Manhattan distance of the CPU (the distance between two points measured along orthogonal axes) is approximately 73.6 millimeters. At the maximum turbo boost clock frequency of 5.8 GHz, light can transmit approximately 51.7 millimeters in the air. This means that in a single clock cycle, a signal can almost complete a round trip between any two points in the CPU.

However, the actual situation is much worse. These measurements use the propagation speed of light in the air, while signals actually propagate through silicon dioxide (SiO2). Within a clock cycle of 5.8 GHz, light can propagate approximately 26.2 millimeters in silicon dioxide. In silicon (Si), light can only propagate approximately 15.0 millimeters within a clock cycle of 5.8 GHz, slightly more than half the length of the CPU's long side.

The Firedancer team believes that the recent development of computing technology has been more about integrating more cores into CPUs rather than increasing their speed. When people need higher performance, they are encouraged to purchase more hardware. This is effective when throughput is the bottleneck. However, the actual bottleneck is the speed of light. This natural limitation has led to decision paralysis. For any optimization, many components in the system do not have an immediate return because they are not well optimized. Unoptimized parts deteriorate over time as they have fewer available computing resources. So, what can be done now?

In the field of high-performance computing, everything ultimately needs to be optimized. The result is the establishment of production systems for vectorized trading and quantitative research that operate at the limits of physics and information theory on a global scale. This includes creating custom network switching technology to meet these physical limitations with lock-free algorithms. Jump is both a technology company and a trading company. At the forefront of science fiction and reality, Jump and Solana currently face remarkably similar problems, and Jump is developing Firedancer.

What is Firedancer

Firedancer is a brand new independent validator client developed by the Firedancer team using the C language. Firedancer's design emphasizes reliability, with a modular architecture, minimal dependencies, and extensive testing processes. It proposes significant rewrites of three functional components (network, runtime, and consensus) of the Solana Labs client. Each level has been optimized for maximum performance, so the client's operational capacity is only limited by the validator's hardware, not by the current software efficiency limitations. Through Firedancer, Solana will be able to scale according to bandwidth and hardware.

The goals of Firedancer are:

  • Record and standardize the Solana protocol (ultimately, people should be able to create a Solana validator by looking at the documentation rather than the Rust validator code);

  • Increase the diversity of validator clients;

  • Improve the performance of the ecosystem.

How Firedancer Works

Modular Architecture

Firedancer differs from the current Solana validator client through its unique modular architecture. Unlike Solana Labs' Rust validator client, which runs as a single process, Firedancer consists of many independent Linux C processes called "tiles." A tile is a process and some memory. This tile architecture is the foundation of Firedancer's operational philosophy and its approach to improving robustness and efficiency.

A process is an instance of a running program. It is a fundamental component of modern operating systems, representing the execution of a set of instructions. Each process has its own memory space and resources, which the operating system allocates and manages independently, unaffected by other processes. A process is like an independent worker in a large factory, using its own tools and workspace to perform specific tasks.

In Firedancer, each tile is an independent process with a specific role. For example, the QUIC tile is responsible for handling incoming QUIC traffic and forwarding encapsulated transactions to the verify tile. The verify tile is responsible for signature verification, and other tiles have similar tasks. These tiles run independently and concurrently, collectively constituting the functionality of the entire system. Independent Linux processes can form small, independent fault domains. This means that a problem with a tile will have minimal impact on the entire system, or a small "impact range," and will not immediately jeopardize the entire validator.

A key advantage of the Firedancer architecture is the ability to replace and upgrade each tile within seconds, without any downtime. This is in stark contrast to the requirement of the Rust validator client from Solana Labs to be completely shut down before an upgrade. This difference stems from the lack of stability in the application binary interface (ABI) in Rust, which prevents instant upgrades in a pure Rust environment. By using C processes, the binary stability in the C runtime model can significantly reduce downtime associated with upgrades. This is because each tile manages the validator state in different workspaces. These shared memory objects persist as long as the validator is running. During a restart or upgrade process, each tile can seamlessly continue processing tasks from where it left off.

Overall, Firedancer is built with a NUMA-aware, tile-based architecture. In this architecture, each tile uses a CPU core. It has high-performance message passing between tiles, optimized memory locality, resource layout, and component latency.

Network Processing

Firedancer's network processing is designed to handle the high-intensity demands of the Solana network when upgrading to gigabits per second. This process is divided into inbound and outbound activities.

Inbound activities mainly involve receiving transactions from users. Firedancer's performance is crucial because if validators fall behind in processing packets, consensus messages may be lost. The operational bandwidth of current Solana nodes is approximately 0.2 Gbps, while the maximum bandwidth peak recorded by Jump nodes is approximately 40 GBps. This bandwidth peak highlights the need for a powerful and scalable inbound processing solution.

Outbound activities include block packing, block creation, and fragment transmission. These steps are crucial for the secure and efficient operation of the Solana network. The performance of these tasks not only affects throughput but also the overall reliability of the network.

Firedancer aims to address the point-to-point interface issues Solana has had in the past when handling transactions. A major drawback of Solana's point-to-point interface in the past was its lack of congestion control when processing inbound transactions. This drawback led to downtime on September 14, 2021 (17 hours) and April 30, 2022 (7 hours).

In response, Solana has undergone several network upgrades to handle high transaction loads properly. Firedancer follows suit, adopting QUIC as its traffic control scheme. QUIC is a multiplexed transport network protocol and forms the basis of HTTP/3. It plays a crucial role in resisting DDoS attacks and managing network traffic. However, it is important to note that in some cases, the cost outweighs the benefit. QUIC, combined with data center-specific hardware, is used to mitigate DDoS attacks, eliminating the motivation for transaction flood attacks.

The 151-page specification of QUIC has brought considerable complexity to development. Due to the inability to find an existing C library that meets their licensing, performance, and reliability requirements, the Firedancer team built their own implementation. Firedancer's QUIC implementation, nicknamed fd_quic, introduces optimized data structures and algorithms to ensure minimal memory allocation and prevent memory exhaustion.

Firedancer's custom network stack is at the core of its processing capabilities. The stack is designed from scratch to take advantage of Receive-Side Scaling (RSS). RSS is a hardware-accelerated form of network load balancing that distributes network traffic to different CPU cores to increase the parallelism of network processing. Each CPU core handles a portion of incoming traffic with minimal additional overhead. This approach, by eliminating complex schedulers, locks, and atomic operations, outperforms traditional software-based load balancing.

Firedancer introduces a new message passing framework for composing high-performance tile applications. These tiles can bypass the kernel network based on sockets and utilize AFXDP, an address family optimized for high-performance packet processing. Using AFXDP allows Firedancer to read data directly from the network interface buffer.

The tile system in Firedancer's stack implements various high-performance computing concepts, including:

  • NUMA Awareness - NUMA (Non-Uniform Memory Access) is a computer memory design where processor access to its own memory is faster than accessing memory associated with other processors. For Firedancer, having NUMA awareness means the client can efficiently handle memory in multi-processor configurations. This is crucial for high transaction volume processing as it optimizes the utilization of available hardware resources.
  • Cache Locality - Cache locality refers to the utilization of data stored in caches close to the processor. This is often a variant of temporal locality (i.e., recently accessed data). In Firedancer, attention to cache locality means it is designed to handle network data while minimizing latency and maximizing speed.
  • Lock-Free Concurrency - Lock-free concurrency refers to designing algorithms without the need for locking mechanisms (such as mutexes) to manage concurrent operations. For Firedancer, lock-free concurrency allows multiple network operations to execute in parallel without the delays caused by locks. Lock-free concurrency enhances Firedancer's ability to handle a large number of transactions simultaneously.
  • Large Page Sizes - Using large page sizes in memory management helps in handling data sets, reducing page table lookups, and potential memory fragmentation. For Firedancer, this means improved efficiency in memory handling, which is beneficial for processing large amounts of network data.

Build System

Firedancer's build system follows a set of principles to ensure reliability and consistency. It emphasizes minimizing external dependencies and treats all tools involved in the build process as dependencies. This includes pinning each dependency (including the compiler) to exact versions. A key aspect of this system is environment isolation during the build steps. Environment isolation enhances portability as the build process is not affected by the system environment.

Why Firedancer is Faster

Advanced Data Parallelism

Firedancer utilizes advanced data parallelism available in modern processors for cryptographic tasks such as ED25519 signature verification. Modern CPUs have Single Instruction, Multiple Data (SIMD) instructions for processing multiple data elements simultaneously and the ability to run multiple instructions per CPU cycle. Applying a single instruction in parallel to an array or vector of data elements is often more efficient in terms of area, time, and power. In this regard, improvements in parallel data processing can dominate throughput improvements compared to purely increasing processing speed.

One area where Firedancer uses data parallelism is in optimizing signature verification computations. This approach allows processing data elements in arrays or vectors simultaneously to maximize throughput and minimize latency. The core of the ED25519 implementation involves operations in Galois fields, which are well-suited for cryptographic algorithms and binary computations. In Galois fields, operations such as addition, subtraction, multiplication, and division are defined based on the binary properties of the computer system. Here is an example of a Galois field defined by 2^3:

Firedancer Validator Launched, Paving the Way for Solana's Large-scale Adoption

The only issue is that ED25519 uses a Galois field defined by 2^(255-19). Domain elements can be viewed as numbers from 0 to 2^(255-19). Basic operations are as follows:

Firedancer Validator Launched, Paving the Way for Solana's Large-scale Adoption

Addition, subtraction, and multiplication are almost uint256t mathematics (i.e., calculations using unsigned integers, with a maximum value of 2^(256-1)). Division calculations are challenging. Regular CPUs and GPUs do not support uint256t mathematics, let alone "almost uint256_t mathematics," let alone exceptionally difficult division. Implementing this mathematics and making it high-performance is a key issue, depending on how we can simulate these mathematical operations.

Firedancer Validator Launched, Paving the Way for Solana's Large-scale Adoption

Firedancer's implementation breaks down arithmetic operations by handling numbers more flexibly. By applying the principles of regular long division and multiplication, carrying numbers from one column to the next, we can process these columns in parallel. The fastest way to simulate these mathematical operations is to represent uint256_t as six 43-bit numbers with a 9-bit "carry." This allows existing 64-bit operations to be performed on the CPU while providing enough space for the carry bit. This arrangement reduces the need for frequent carry propagation and enables Firedancer to handle large numbers more efficiently.

The implementation leverages data parallelism by reorganizing arithmetic computations into parallel column summations. Processing columns in parallel can speed up overall calculations by transforming tasks that could have been sequential bottlenecks into parallelizable tasks. Firedancer also utilizes vectorized instruction sets such as AVX512 and its IFMA extension (AVX512-IFMA). These instruction sets allow processing of the aforementioned Galois field arithmetic, improving speed and efficiency.

Firedancer's AVX512 accelerated implementation is extremely fast. On a single 2.3 GHz Icelake server core, the clock performance per core is more than twice that of its 2022 Breakpoint demonstration. The implementation has 100% vector channel utilization and massive data parallelization. This is another outstanding demonstration by the Firedancer team, as dealing with independent parallel tasks is much easier due to light speed latency, compared to handling one thing at a time, even with custom hardware.

Utilizing FPGA for High-Speed Network Communication

Each CPU core can process approximately 30,000 signature verifications per second. While they are a high-energy-efficient choice, they have limitations in large-scale operations due to their sequential processing approach. GPUs elevate this processing capability to approximately 1 million verifications per core per second. However, they consume about 300W per unit and have inherent latency due to batch processing.

FPGAs have become a better choice. They match the throughput of GPUs but with significantly lower power consumption, approximately 50W per FPGA. Their latency is also lower than the ten-millisecond latency of GPUs. FPGAs provide a more responsive real-time processing solution with a latency of approximately 200 microseconds. Unlike batch processing in GPUs, FPGAs in Firedancer individually process each transaction in a streaming manner. The result of using FPGAs in Firedancer is that with power consumption below 400W, 8 FPGAs can process 8 million signatures per second.

The team demonstrated the ED25519 signature verification process in Firedancer at Breakpoint 2022. The process involves multiple stages, including performing SHA-512 calculations in a pure RTL pipeline and various checks and computations in a custom ECC-CPU processor pipeline. Essentially, the Firedancer team wrote a compiler and assembler for their custom processor, obtained Python code from the RFC (Request for Comments), and used operator-overloaded objects to generate machine code, which was then placed on the ECC-CPU.

Firedancer Validator Launched, Paving the Way for Solana's Large-scale Adoption

It is noteworthy that Firedancer adopts the form factor style of AWS accelerators to balance robustness and network connectivity. This choice addresses challenges associated with direct network connections, a feature that is often restricted in cloud providers. Through this choice, Firedancer ensures that its advanced capabilities seamlessly integrate within the constraints of cloud infrastructure.

Firedancer Validator Launched, Paving the Way for Solana's Large-scale Adoption

We must recognize that different operations need actual physical space, not just conceptual data space. By strategically arranging the physical components, Firedancer makes them compact and reusable. This configuration allows Firedancer to maximize the efficiency of its FPGAs, using 7-year-old FPGAs on machines from 8 years ago to process 8 million transactions per second.

Basic Challenges in Network Communication

The fundamental challenge in network communication is broadcasting new transactions globally. The peer-to-peer nature of the internet, limited bandwidth, and latency issues limit the feasibility of traditional methods such as direct broadcasting using the network. Distributing data in a circular or tree-like structure partially addresses these issues, but data packets may still be lost during transmission.

Reed-Solomon coding is the preferred solution to address these issues. It introduces data transmission redundancy (i.e., parity check information) to recover lost data packets. The basic concept is that two points can define a line, and any two points on this line can reconstruct the original data points. By constructing a polynomial based on the data points and distributing the different points of this function in separate data packets, the original data can be reconstructed as long as the receiver receives at least two data packets.

We construct a polynomial because using the formula for points on a traditional line (y = mx + b) is computationally slow. Firedancer uses Lagrange interpolation (a method specifically for polynomial construction) to speed up the process. It simplifies the polynomial creation process required for Reed-Solomon coding. It also transforms this process into a more efficient matrix-vector multiplication suitable for higher-order polynomials. This matrix has a highly structured pattern that recurs in a recursive manner, allowing the first row of this pattern to fully determine it. This structure means there is a faster way to perform multiplication calculations. Firedancer uses an O(n log n) method, which was introduced in a 2016 paper on how to use this matrix for multiplication, the fastest known theoretical method for Reed-Solomon coding. The result is the efficient calculation of parity check information compared to traditional methods:

  • RS encoding speed exceeds 120 Gbps per core;
  • RS decoding speed reaches up to 50 Gbps per core;
  • These metrics are compared to the current approximately 8 Gbps per core RS encoding (rust-rse).

With this optimized Reed-Solomon coding method, Firedancer can compute parity check information 14 times faster than traditional methods. This makes the data encoding and decoding process fast and reliable, crucial for maintaining high throughput and low latency globally.

How Does Firedancer Ensure Security?

Opportunities

All validators currently use software based on the original validator client. If Firedancer differs from Solana Labs' client, Firedancer can improve the diversity of Solana's client and supply chain. This includes using similar dependencies and developing their client in Rust.

Solana Labs and Jito validator clients run as separate processes. Adding security to a monolithic application once it is running in production is challenging. Validators running these clients would have to shut down for immediate security upgrades in pure Rust. The Firedancer team can build a secure architecture from the start for developing their new client.

Firedancer also has the advantage of learning from past experiences. Solana Labs developed the validator client in a startup environment. This fast-paced environment means Labs needs to act quickly to enter the market. This raises concerns about their future development. The Firedancer team can see what Labs has done and what other teams on-chain have done, and ask what they would do differently if they were to develop the validator client from scratch.

Challenges

While different from Solana Labs' client, Firedancer must closely replicate its behavior. Failure to do so could introduce consistency errors, posing a security risk. This problem can be mitigated by incentivizing a portion of shares to run on both clients, keeping Firedancer's total share below 33% for a longer period. In any case, the Firedancer team needs to implement the full feature set of the protocol, regardless of the difficulty or security of its implementation. Everything must be consistent with Firedancer. Therefore, the team cannot develop code in isolation and must review it based on the functionality of the Labs client. This situation is exacerbated by the lack of specifications and documentation, meaning Firedancer must introduce inefficient constructs in the protocol.

The Firedancer team must also be aware that they are developing their new client in C language. C language does not provide native memory safety guarantees like languages such as Rust. The primary goal of the Firedancer codebase is to reduce the occurrence and impact of memory safety vulnerabilities. This goal needs special attention, as Firedancer is a fast-paced project. Firedancer must find a way to maintain development speed without introducing such errors. Operating system sandboxing is a practice that isolates Tiles from the operating system. Tiles are only allowed to access the resources needed for their work and execute system calls. As Tiles have well-defined purposes and the Firedancer team has developed most of the client code, permissions for Tiles are stripped according to the principle of least privilege.

Implementing Deep Defense Design

Firedancer Validator Launched, Paving the Way for Solana's Large-scale Adoption

All software will have security vulnerabilities at some point. Starting from the premise that software will have errors, Firedancer chooses to limit the potential impact of any vulnerability. This approach is known as deep defense. Deep defense is a strategy that uses various security measures to protect assets. If an attacker breaches a part of the system, there are additional measures in place to prevent the threat from affecting the entire system.

Firedancer's design aims to mitigate the risk between vulnerabilities and exploitation stages. For example, it is difficult for attackers to exploit memory safety vulnerabilities. This is because preventing such attacks is a well-researched problem. In C language, in-depth research on memory safety has led to a series of hardening techniques and compiler features, which the Firedancer team has used in Firedancer. Even if attackers manage to bypass industry best practices, it is difficult to disrupt the system through exploiting vulnerabilities. This is due to the presence of Tile isolation and operating system sandboxing.

Tile isolation is a result of Firedancer's parallel architecture. Since each Tile runs in its own Linux process, they have clear, singular purposes. For example, a QUIC Tile is responsible for handling incoming QUIC traffic and forwarding encapsulated transactions to the validation Tile. The validation Tile is then responsible for signature verification. Communication between the QUIC Tile and validation Tile is done through a shared memory interface (i.e., Linux processes can pass data between each other). The shared memory interface between the two Tiles acts as an isolation boundary. If a bug exists in the QUIC Tile that allows an attacker to execute arbitrary code when processing malicious QUIC packets, it will not affect other Tiles. In a monolithic process, this would lead to immediate compromise. If an attacker exploits this vulnerability to attack multiple validators, they could cause harm to the entire network. The attacker may degrade the performance of the QUIC Tile, but Firedancer's design limits this to the QUIC Tile.

Operating system sandboxing is the practice of isolating Tiles from the operating system. Tiles are only allowed to access the resources needed for their work and execute system calls. As Tiles have well-defined purposes and almost all the code is developed by the Firedancer team, permissions for Tiles are stripped down to the minimum according to the principle of least privilege. Tiles are placed in their own Linux namespaces, providing a limited view of the system. This narrow view prevents Tiles from accessing most of the file system, network, and any other processes running on the same system. The namespace provides a security-first boundary. However, if an attacker has a privilege escalation kernel vulnerability, they can still bypass this isolation. The system call interface is the last attack vector reachable from Tiles in the kernel. To prevent this, Firedancer uses seccomp-BPF to filter system calls before the kernel processes them. Clients can restrict Tiles to a set of selected system calls. In some cases, system call parameters can be filtered. This is important because Firedancer can ensure that read and write system calls only operate on specific file descriptors.

Adopting an Embedded Security Plan

During the development of Firedancer, emphasis is placed on embedding comprehensive security procedures at every stage. The client's security procedures are the result of ongoing collaboration between the development team and the security team, setting new standards for secure blockchain technology.

The process begins with a self-serve fuzz testing infrastructure. Fuzz testing is an automated technique for detecting error conditions that indicate vulnerabilities or crashes. Fuzz testing is performed on every component that accepts untrusted user input, including the P2P interface (parser) and the SBPF virtual machine. OSS-Fuzz maintains continuous fuzz coverage during code changes. The security team has also established a dedicated ClusterFuzzer instance for ongoing coverage-guided fuzz testing. Developers and security engineers also provide tools for fuzz testing (i.e., special versions of unit tests for security-critical components). Developers can also contribute new fuzz tests, which are automatically accepted and tested. The goal is to thoroughly fuzz test all parts before entering the next stage.

Internal code reviews help to discover vulnerabilities that tools may overlook. This stage focuses on high-risk, high-impact components. This stage serves as a feedback mechanism to provide feedback to other parts of the security program. The team applies all lessons learned in these reviews to improve fuzz testing coverage, introduce new static analysis checks for specific vulnerability categories, and even implement large-scale code refactoring to eliminate complex attack vectors. External security reviews are supplemented by industry-leading experts and active bug bounty programs, both before and after release.

Firedancer has also undergone extensive stress testing on various test networks. These test networks face various attacks and failures, such as node replication, network link failures, packet flooding, and consensus violations. The loads these networks endure far exceed any real-world scenario on the mainnet.

So, this leads to a question: What is the current state of Firedancer?

What is the Current State of Firedancer, and What is Frankendancer?

The Firedancer team is gradually developing Firedancer as a modular validator client. This aligns with their goals for documentation and standardization. This approach ensures that Firedancer stays in sync with the latest developments in Solana. This has led to the creation of Frankendancer. Frankendancer is a hybrid client model where the Firedancer team integrates the components they have developed into the existing validator client infrastructure. This development process allows for gradual improvements and testing of new features.

Frankendancer is like putting a sports car in the middle of traffic. As more components are developed and bottlenecks are eliminated, performance will continue to improve. This modular development process promotes a customizable and flexible validator environment. Here, developers can modify or replace specific components in the validator client according to their needs.

What is Actually Running

Firedancer Validator Launched, Paving the Way for Solana's Large-scale Adoption

Frankendancer implements all network functions of the Solana validator:

  • Inbound: QUIC, TPU, Sigverify, Dedup
  • Outbound: Block packing, creating/signing/sending Shreds (Turbine)

Frankendancer uses Firedancer's high-performance C network code on top of Solana Labs' Rust runtime and consensus code.

The architecture design of Frankendancer emphasizes high-end hardware optimization. While it supports running on low-end cloud hosts with standard Linux operating systems, the Firedancer team is optimizing Frankendancer for high-core-count servers. The long-term goal is to leverage existing hardware resources in the cloud to improve efficiency and performance. The client also supports multiple connections, hardware acceleration, randomized traffic steering for load balancing (ensuring even distribution of network traffic), and multiple process boundaries for additional security between components.

The technical efficiency is the cornerstone of Frankendancer. The system avoids memory allocation and atomic operations on critical paths, and all allocations are NUMA-optimized during initialization. This design ensures maximum efficiency and performance. Additionally, the ability to asynchronously and remotely check system components, as well as the flexible management of Tiles (asynchronous start, stop, and restart), adds a layer of robustness and adaptability to the system.

How is Frankendancer Performing?

Each Tile in Frankendancer can process 1,000,000 transactions per second (TPS) on the network inbound side. As each Tile uses a single CPU core, this performance scales linearly with the number of cores used. Frankendancer achieves this feat by using only four cores and fully utilizing a 25 Gbps network interface card (NIC) on each core.

In terms of network outbound operations, Frankendancer has achieved significant improvements through its Turbine optimization. The current standard node hardware achieves a speed of 6 Gbps per Tile. This includes a substantial speed increase in sharding (i.e., how block data is split and sent into the validator's network). Compared to the current standard Solana node, Frankendancer shows approximately a 22% increase in sharding speed without Merkle trees, and almost doubles the speed when using Merkle trees. This is a significant improvement for the block propagation and transaction reception performance of current validators.

Firedancer's network performance indicates that it has reached the hardware limit, achieving maximum performance compared to today's standard validator hardware. This marks an important technological milestone, demonstrating the client's effective and efficient handling of extreme workloads.

Frankendancer is Live on the Testnet

Frankendancer is currently staking, voting, and producing blocks on the test network. It coexists compatibly with Solana Labs and approximately 2900 other validators. This real-world deployment showcases the powerful performance of Firedancer on ordinary hardware. Currently, it is deployed on an Equinix Metal m3.large.x86 server with an AMD EPYC 7513 CPU. Many other validators also use the same type of server. It provides an affordable solution with pricing based on demand and varying rates between $3.10 and $4.65 per hour, depending on the location.

The progress made by Firedancer towards the mainnet launch brings several possibilities for node hardware:

  • Current validator hardware can achieve higher per-node performance capacity.
  • Firedancer's efficiency allows validators to use more affordable, lower-spec hardware while maintaining similar performance levels.
  • Firedancer's design enables it to take advantage of hardware and bandwidth advancements.

These developments, along with other initiatives such as Wiredancer (the Firedancer team's experiment with hardware acceleration) and the Rust-based modular runtime/SVM, make Firedancer a forward-thinking solution.

Firedancer's progress has also sparked discussions about whether to run Solana Labs' client alongside Firedancer, known as parallel running. This approach can maximize network activity by leveraging the advantages of both clients and mitigating the potential impact of any single client on the entire network. Additionally, this has led to speculation about whether projects like Jito would consider forking Firedancer, which could further optimize MEV extraction and transaction processing efficiency. Only time will tell.

Conclusion

Developers often view operations as occupying data space rather than physical space. In a scenario where the speed of light is a natural limit, this assumption leads to systems slowing down and failing to optimize their hardware correctly. In a highly adversarial and competitive environment, we cannot simply throw more hardware at Solana and expect it to perform better. We need optimization. Firedancer has revolutionized the structure and operation of the validator client. By building a reliable, highly modular, and high-performance validator client, the Firedancer team is preparing for the large-scale adoption of Solana.

Whether you are a junior developer or an ordinary Solana user, understanding Firedancer and its significance is crucial. This technological feat makes the fastest and highest-performing blockchain in the market even more outstanding. Solana aims to be a high-throughput, low-latency global state machine. Firedancer is a huge step towards perfecting these goals.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

ad
出入金首选欧易,注册立返20%
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink