Editor's note: This article is reprinted from the original content published by Turan Vural Yuki Yuminaga of Fenbushi Capital on April 5, 2024. Fenbushi Capital was founded in 2015 and is a leading blockchain asset management company in Asia with assets under management of $1.6 billion. The company aims to play a significant role in shaping the future of blockchain technology across various industries through research and investment. This article is an example of these efforts and represents the independent views of the authors, who have agreed to its publication here.
Data Availability (DA) is a core technology extended by Ethereum, which allows nodes to efficiently verify the availability of data on the network without hosting the related data. This is crucial for efficiently building rollups and other forms of vertical scaling, allowing executing nodes to ensure the availability of transaction data during settlement. This is also crucial for sharding and other forms of horizontal scaling (planned updates to the Ethereum network in the future) as nodes need to prove that transaction data (or blob) stored in network shards is indeed available to the network.
Several DA solutions have been discussed and released recently (such as Celestia, EigenDA, Avail), all aiming to provide high-performance and secure infrastructure for releasing DA for applications.
Compared to L1 such as Ethereum, the advantage of external DA solutions is that they provide a low-cost and high-performance carrier for on-chain data. DA solutions typically consist of their own public chains, which are designed to achieve low-cost and permissionless storage. Even when modified, hosting data locally on the blockchain is still extremely inefficient.
In light of this, we find it very intuitive to explore storage optimization solutions (such as Filecoin) as the basis for the DA layer. Filecoin uses its blockchain to coordinate storage transactions between users and storage providers, but allows data to be stored off-chain.
In this article, we have studied the feasibility of building a DA solution on top of a decentralized storage network (DSN). We specifically considered Filecoin, as it is the most widely adopted DSN to date. We outlined the opportunities such solutions would bring and the challenges that need to be overcome to build this solution.
The DA layer provides the following functions for the services that depend on it:
1. User security: No node can be certain that unavailable data is available.
2. Global security: Except for a few nodes, all nodes agree on the availability/unavailability of data.
3. Efficient data retrieval capability.
All of these need to be efficiently accomplished to achieve scalability. The DA layer provides higher performance at lower cost in the above three aspects. For example, any node can request a complete copy of the data to prove hosting, but this is inefficient. By providing a system for the above three points, we have implemented a DA layer that can provide the security required for L2 to coordinate with L1 and provide a stronger lower bound in the presence of a malicious majority.
Data Hosting
Data published to DA solutions has a valid lifecycle: long enough to resolve disputes or verify state transitions. As of the writing of this article, Ethereum calldata is the most commonly used solution for data availability projects (rollups).
Efficient Data Verification
Data Availability Sampling (DAS) is the standard approach to solving the DA problem. It has additional security advantages, enhancing the network actor's ability to verify state information from its peers. However, it depends on nodes to perform sampling: they must respond to DAS requests to ensure that mined transactions are not rejected, but nodes requesting samples do not have positive or negative incentives. From the perspective of the requesting node, not performing DAS does not incur negative punishment. For example, Celestia provides the first and only lightweight client-side implementation of DAS, providing stronger security assumptions for users and reducing data verification costs.
Efficient Access
DA needs to provide efficient data access for projects using it. A slow DA could become a bottleneck for services that depend on it, leading to inefficiency or system errors.
Decentralized Storage Network
A decentralized storage network (DSN, as described in the Filecoin whitepaper) is a permissionless network composed of storage providers that offer storage services to network users. Informally, it allows independent storage providers to coordinate storage transactions with users in need of storage services, providing low-cost and elastic data storage for users seeking affordable storage services. This is coordinated through a blockchain that records storage transactions and supports smart contract execution.
The DSN scheme is a tuple of three protocols: Put, Get, and Manage. This tuple has properties such as fault tolerance guarantees and participation incentives.
Put (Data) → Key
The user side executes Put to store data under a unique key. This is achieved by specifying the duration for which the data is stored on the network, the number of data copies stored for redundancy, and the price negotiated with the storage provider.
Get (Key) → Data
The user side executes Get to retrieve the data stored under the key.
Manage
Network participants call the management protocol to coordinate the storage space and services provided by providers and to repair errors. For Filecoin, this is managed through the blockchain. The blockchain records data transactions between users and data providers and proofs of correct data storage, ensuring the maintenance of data transactions. Proof of replication or proof of space-time generated by storage providers in response to network challenges is published to prove whether data is stored correctly. When a storage provider fails to generate replication or space-time proofs in a timely manner as required by the management protocol, storage errors occur, leading to a reduction in the provider's stake. If multiple providers host data copies on the network, transactions can be fulfilled by finding new storage providers to complete self-repair.
DSN Opportunities
So far, the work done by DA projects has been to transform the blockchain into a hot storage platform. Since DSN optimizes storage, instead of transforming the blockchain into a storage platform, we can simply transform the storage platform into a platform that provides data availability. The collateral provided by storage providers in the form of native FIL tokens can provide cryptographic economic security to ensure data storage. Finally, the programmability of storage transactions can provide flexibility for data availability clauses.
The most compelling motivation for transforming DSN functionality into a solution for the DA problem is to reduce the cost of data storage under DA solutions. As described below, the cost of storing data on Filecoin is much cheaper than storing data on Ethereum. Considering the current price of Ether/USD, writing 1 GB of calldata to Ethereum would cost over 3 million USD and would be pruned after 21 days. This calldata cost could account for more than half of the aggregate transaction cost based on Ethereum. However, the cost of storing 1 GB on Filecoin is less than 0.0002 USD per month. At this price or any similar price, ensuring DA will reduce transaction costs for users and help improve the performance and scalability of Web3.
Economic Security
In Filecoin, providing storage space requires collateral. If a provider fails to fulfill a transaction or does not comply with network guarantees, the collateral will be reduced. Storage providers who fail to provide services face the risk of losing collateral and any profits earned.
Incentive Mechanism Adjustment
This indicates that Filecoin is several orders of magnitude cheaper than current DA solutions, requiring only a fraction of a cent to store the same amount of data for the same duration. Unlike Ethereum nodes and other DA solution nodes, Filecoin nodes are optimized to provide storage services, with its proof system allowing nodes to prove storage without replicating storage between every node in the network. The basic overhead of the storage process in Filecoin can be negligible, disregarding the economic benefits for storage providers such as the energy cost of sealing data. This suggests that systems capable of providing secure and high-performance DA services on Filecoin have a market opportunity worth millions of dollars per GB compared to Ethereum.
Throughput
Next, we will consider the capacity of DA solutions and the demand generated by major Layer 2 rollups.
As the Filecoin blockchain organizes in tipsets, each block height has multiple blocks, allowing for a transaction volume not limited by consensus or block size. The strict data constraints of Filecoin are its network-wide storage capacity, not the capacity allowed by consensus.
For daily DA demand, we obtain data from Terry Chung and Wei Dai's Rollups DA and Execution, including daily averages over 30 days and data for a single sampling day. This allows us to consider both average demand and not overlook deviations from the average (e.g., Optimism's demand on August 15, 2023, was approximately 261,000,000 bytes, more than four times its 30-day average of 64,000,000 bytes).
From this selection, it is evident that while DA costs have the potential to decrease, a significant increase in DA demand is needed to effectively utilize Filecoin's 32 GB sector size. Although sealing a 32 GB sector with less than 32 GB of data would be wasteful, it can be done while still gaining cost advantages.
Architecture
In this section, we will consider the technical architecture that could be implemented today. We will consider this architecture in the context of any L2 application and the L1 chain served by L2. As this solution is an external DA solution, like Celestia and EigenDA, we do not consider Filecoin as an example L1.
Components
Even at a high level, DA on Filecoin would leverage many different functionalities within the Filecoin ecosystem.
Transactions: Downstream users transact on platforms requiring DA, which could be L2.
DA-Using Platforms: These platforms use DA as a service, which could be publishing transaction data to Filecoin DA on L2 or committing to L1 (e.g., Ethereum).
Layer 1: This is any L1 containing pointers to DA solution data commitments. This could be Ethereum, supporting L2 using Filecoin DA solutions.
Aggregator: At the front end of the Filecoin-based DA solution is an aggregator, a centralized component that receives transaction data from L2 and other DA end-users and aggregates them into payloads suitable for sealing into 32 GB sectors. While a simple proof of concept would involve a centralized aggregator, platforms using DA solutions could also run their own aggregators. For example, as an auxiliary device to L2 sequencers, the centralization of the aggregator is similar to L2 sequencers or EigenDA's dispersers. Once the aggregator compiles a payload close to 32 GB, it enters into a storage deal with storage providers to store the data. It assures users that their data will be included in the sector in the form of PoDSI (Data Availability Proof) and uses pCID (Piece CID) to identify their data once it enters the network. This pCID will be included in the state commitment on L1 for referencing data supporting transactions.
Verifier: Verifiers request data from storage providers to ensure the integrity of the state commitment and establish fraud proofs, which will be submitted to L1 in the presence of provable fraud.
Storage Deal: Once the aggregator compiles a payload close to 32 GB, it enters into a storage deal with storage providers to store the data.
Publish blob (Put): To initiate a Put, the DA end-user submits a blob containing transaction data to the aggregator. This can be done off-chain or on-chain through an on-chain aggregator oracle. Upon receiving the blob, the aggregator returns a PoDSI to the end-user, proving that their blob is included in the aggregated sector submitted to the subnet, along with a pCID (Piece Content Identifier). Once the blob is provided on Filecoin, it is used by the end-user and other relevant parties for referencing the blob.
Data transactions will appear on-chain within minutes of the deal being made. The maximum delay is the sealing time, which could take up to three hours. This means that even though the transaction is completed and users can be assured that the data will appear on the network, the data cannot be queried until the sealing process is completed. The Lotus client has fast retrieval capabilities, where unsealed data copies are stored alongside sealed copies, allowing for immediate service once the unsealed data is transferred to the data storage provider, as long as the retrieval transaction does not depend on proof of the data appearing on the network. However, this feature is at the discretion of the data provider and is not provided as part of the protocol with cryptographic guarantees. To provide fast retrieval guarantees, changes to the consensus and punishment/reward mechanisms would be needed to enforce it.
Retrieve blob (Get): Retrieval is similar to the put operation. A retrieval transaction is needed, and it will appear on-chain within minutes. The retrieval delay will depend on the terms of the transaction and whether unsealed data copies are stored for fast retrieval. In the case of fast retrieval, the delay will depend on network conditions. Without fast retrieval, the data needs to be unsealed before being provided to the end-user, which takes the same amount of time as sealing, approximately three hours. Therefore, without optimization, our maximum round-trip time is six hours, requiring significant improvements to the data service before it becomes a feasible DA or fraud proof system.
DA Proof: DA proofs can be divided into two steps: providing PoDSI during the transaction process by submitting data to the aggregator, and then providing continuous commitments of PoRep and PoST through the Filecoin consensus mechanism. As mentioned above, PoRep and PoST provide planned and provable guarantees for data custody and persistence.
This solution will heavily rely on bridges, as any DA-dependent end-user (whether building proofs or not) needs to be able to interact with Filecoin. For the pCID included in the state transition published on L1, verifiers can perform preliminary checks to ensure that no false pCID has been submitted. There are several ways to do this, such as using an Oracle on L1 to publish Filecoin data or using verifiers to verify the existence of data transactions or sectors corresponding to pCID. Similarly, the verification of the validity or fraud proof published on L1 may also require the use of bridges to ensure the validity or fraudulence of the proof. The available bridges currently are Axelar and Celer.
Security Analysis
The integrity of Filecoin is achieved through slashing collateral. Collateral can be slashed in two cases: storage faults or consensus faults. Storage faults refer to storage providers failing to provide proofs of storing data (PoRep or PoST), which is related to the lack of data availability in our model. Consensus faults refer to malicious behavior in the consensus, which manages the transaction ledger, while the FEVM is abstracted from the transaction ledger.
Sector faults refer to penalties incurred due to the failure to publish continuous storage proofs. Storage providers have a grace period of one day during which they will not be penalized for storage faults. After 42 days of a sector fault, the sector will be terminated, and the incurred fees will be destroyed.
BR(t) = ProjectedRewardFraction(t) * SectorQualityAdjustedPower
If a sector fault persists for 42 days or if the storage provider intentionally terminates the deal, sector termination occurs. The termination fee is equivalent to the maximum amount the sector earned before termination, capped at 90 days of income. Unpaid transaction fees will be refunded to the user, and the incurred fees will be destroyed.
max(SP(t), BR(StartEpoch, 20d) + BR(StartEpoch, 1d) * terminationRewardFactor * min(SectorAgeInDays, 140))
Storage market actor slashing occurs at the time of deal termination, which is a reduction of collateral provided by storage providers after the deal.
The security provided by Filecoin is fundamentally different from that of other blockchains. Blockchain data is typically secured through consensus, but Filecoin's consensus only secures the transaction ledger, not the security of data referenced by transactions. Data stored on Filecoin is only incentivized to be secure enough to motivate storage providers to provide storage. This means that the security of data stored on Filecoin is ensured through error penalties and commercial incentives (such as user reputation). In other words, data errors on the blockchain are equivalent to violating consensus, undermining the security of the blockchain or the validity of its transactions. Filecoin has fault tolerance in data storage, so it only uses consensus to ensure the security of its transaction ledger and transaction-related activities. The cost for a storage provider failing to fulfill their data deal is a penalty of up to 90 days' worth of storage rewards and the loss of collateral provided to ensure the deal.
Therefore, the cost of a data withholding attack initiated by a Filecoin provider is only the opportunity cost of retrieval transactions. Data retrieval on Filecoin relies on fees paid by users to incentivize storage providers. However, not responding to data retrieval requests does not have a negative impact on storage providers. To mitigate the risk of individual storage providers ignoring or refusing data retrieval transactions, data stored on Filecoin can be stored by multiple storage providers.
As the economic security behind Filecoin data is much lower than that of blockchain-based solutions, measures to prevent data manipulation must also be considered. Data manipulation is protected by the Filecoin proof system. Data is referenced by CID, and data corruption can be immediately detected through the CID. Therefore, data providers cannot provide corrupted data, as it is easy to verify if the received data matches the requested CID. Data providers cannot store corrupted data in the place of uncorrupted data. Upon receiving user data, providers must provide proof of correctly sealed data sectors to initiate data deals (selecting this option). Therefore, storage deals cannot be initiated with corrupted data. During the validity period of a storage deal, PoST will be provided to prove the state of the storage (note that this can prove both the state of the sealed data sector and the state since the last PoST). As PoST relies on the sealed sector at the time of generating the proof, a corrupted sector would result in a forged PoST, leading to a sector fault. Therefore, storage providers cannot store or provide corrupted data, cannot be rewarded for providing services for uncorrupted data, and cannot avoid punishment for tampering with user data.
Enhancing security can be achieved by increasing the collateral commitment made by storage providers to the storage market actor, currently determined by the storage provider and the user. Assuming this collateral amount is sufficiently high (e.g., the same as Ethereum validators' collateral), enough to incentivize providers not to default, it is reasonable to consider what else needs to be ensured for security (although this is highly capital-inefficient, as this collateral would be needed to ensure the security of each transaction blob or aggregated blob sector). Currently, data providers can choose to make data unavailable for up to 41 days before the storage market actor terminates the storage deal. Assuming short data deal times, we can assume the data is unavailable until the last day of the deal. Without malicious coordination, this situation can be mitigated by replication across multiple storage providers to continue providing data services.
We can consider the cost for an attacker to overturn consensus, either by accepting false proofs or rewriting the transaction ledger history, removing transactions from the order book without penalizing responsible storage providers. However, it is worth noting that in the case of such a security breach, the attacker can manipulate the Filecoin ledger at will. For an attacker to carry out such an attack, they would need at least a majority stake in the Filecoin chain. Stake is related to the storage provided to the network, and the current data volume on the Filecoin chain is 25 EiB (10¹⁶ bytes), meaning the malicious actor would need at least 12.5 EiB to provide their own chain to win the fork selection rule. Further mitigation of this situation can be achieved through slashing associated with consensus faults, with penalties being the loss of all staked collateral and block rewards, and suspension from participating in consensus.
Sidebar: Preventing Attacks on Other DA Solutions
While the above scenarios indicate shortcomings in protecting data from withholding attacks on Filecoin, it is not the only example.
Ethereum: Generally, the only way to guarantee a response from the Ethereum network is to run a full node. Therefore, full nodes do not need to respond to data retrieval requests outside of consensus. Structures like PeerDAS introduce a peer scoring system for nodes' responses to data retrieval, where nodes with sufficiently low scores (essentially DA reputation) may be isolated from the network.
Celestia: Compared to the Filecoin structure, Celestia has stronger security per byte and can withstand withholding attacks, but the only way to utilize this security is by hosting a full node. Requests made to Celestia infrastructure that are not internally owned and operated are reviewed and not penalized.
EigenDA: Similar to Celestia, any service can run an EigenDA Operator node to ensure the retrieval of their own data. Therefore, any data retrieval request outside of the protocol will be reviewed. Additionally, EigenDA has a centralized and trusted disperser responsible for data encoding, KZG commitments, and data distribution, similar to our aggregator.
Retrieval Security
Retrievability is essential for DA. Ideally, market forces would incentivize economically rational storage providers to accept retrieval transactions and compete with other providers to lower prices for users. Assuming this is enough to motivate data providers to provide retrieval services, demanding higher security given the importance of DA is also reasonable.
Currently, retrieval cannot be guaranteed through the above economic security. This is because it is difficult from an encryption perspective to trustlessly prove minimal ways that the user did not receive data (in cases where the user needs to refute the storage provider's claim of sending the data). To ensure retrieval security through Filecoin's economic security, a protocol-native retrieval guarantee is needed. With minimal changes to the protocol, this means retrieval needs to be associated with sector faults or deal terminations. Retriev is a proof of concept that can provide retrieval guarantees by mediating data retrieval disputes using a trusted "judge".
Addendum: Retrieval for Other DA Solutions
As mentioned, Filecoin lacks a protocol-native retrieval guarantee to prevent storage (or retrieval providers) from selfish behavior. For Ethereum and Celestia, the only way to read protocol data is by hosting a full node or trusting the infrastructure provider's SLA. As a Filecoin storage provider, ensuring retrieval is not straightforward. A similar setup in Filecoin is to become a storage provider (requiring significant infrastructure costs) and successfully accept storage deals as a user, at which point one would be paying to provide storage for oneself.
The delay in Filecoin is determined by various factors such as network, topology, storage provider user-side configuration, and hardware capabilities. The theoretical analysis we provide discusses these factors and the expected performance that can be achieved.
Due to the design of the Filecoin proof system and the lack of retrieval incentives, Filecoin has not been optimized for high-performance round-trip latency from initial data submission to initial data retrieval. High-performance retrieval on Filecoin is an active area of research, and it is constantly evolving with the improvement of storage provider capabilities and the introduction of new features in Filecoin. We define the "round trip" as the time from submitting a data transaction to the earliest availability of the data on the chain.
Block Time
In Filecoin's expected consensus, data transactions can be completed within a 30-second block time. One hour is the typical confirmation time for sensitive on-chain data (such as coin transfers).
Data Processing
Data processing time varies depending on the storage provider and configuration. Using standard storage provider hardware, the sealing process takes 3 hours. Storage providers typically reduce this 3-hour time through special user-side configurations, parallelization, and investing in more powerful hardware. This change also affects the duration of sector unsealing, and the fast retrieval option in Filecoin user interfaces (such as Lotus) can completely mitigate this situation. The fast retrieval setting stores unsealed data copies alongside sealed data, significantly speeding up retrieval time. Based on this, we can assume the worst-case delay from accepting a data transaction to the data being available on the chain is 3 hours.
Conclusion and Future Directions
This article explores how to build a Decentralized Autonomous Organization (DAO) using existing Decentralized Storage Networks (DSNs), specifically Filecoin. We consider the requirements of DAO as a critical element of Ethereum's extended infrastructure. We explore the feasibility of building DAO on Filecoin and use it to explore the opportunities that solutions on Filecoin will bring to the Ethereum ecosystem, or any opportunities that will benefit from a cost-effective DAO layer.
The Filecoin proof DSN can significantly improve data storage efficiency in blockchain-based decentralized systems, saving $100 million for writing 32 GB of data at current market prices. Although the demand for DAO is not yet sufficient to fill a 32 GB sector, the cost advantage of DAO still exists if empty sectors are sealed. While the current storage and retrieval latency on Filecoin is not suitable for hot storage needs, specific operations by storage providers can provide reasonable performance to ensure data availability within 3 hours.
Increased trust in Filecoin storage providers can be adjusted through variable collateral, as seen in EigenDA. Filecoin extends this adjustable security, allowing for the storage of a large number of replicas on the network, increasing adjustable Byzantine fault tolerance. To effectively prevent data withholding attacks, the issue of guaranteed and high-performance data retrieval needs to be addressed. However, as with any other solution, the only true way to guarantee retrievability is by hosting nodes or trusting infrastructure providers.
We see opportunities for DAO in the further development of Proof of Data Storage and Inclusion (PoDSI), which can (together with Filecoin's current proofs) replace Data Availability Sampling (DAS) to ensure data is included in larger sealed sectors. Depending on the actual situation, this may make slow data turnover tolerable, as fraudulent proofs can be published within 1 day to 1 week, and PoDSI can guarantee on demand. PoDSI is still a new technology and is under heavy development, so we do not yet know how efficient PoDSI will be or what mechanisms will be needed to build systems around it. As solutions for computing on Filecoin data already exist, a solution for computing PoDSI on sealed or unsealed data may not be far-fetched.
As the fields of DAO and Filecoin continue to evolve, new combinations of solutions and supporting technologies may bring about new proof of concepts. As demonstrated by the integration of Solana with the Filecoin network, DSN has the potential to serve as an extension technology. The cost of data storage on Filecoin provides an open opportunity with significant optimization potential. While the challenges discussed in this article are presented in the context of supporting DAO, their ultimate solutions will build a large number of new tools and systems beyond DAO.
The related chart data is from Filecoin spec, EIP-4844, EigenDA, Celestia implementation, Celenium, Starboard, file.app, Rollups DAO and Execution, and current approximate market prices.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。