Who owns my data? What projects in the data layer are worth paying attention to?

CN
11 days ago

Original Title: My Data is Not Mine: The Emergence of Data Layers

Original Author: 0xJeff (@Defi0xJeff)

Translation by: Asher (@Asher0210_)

Who owns my data? What data layer projects are worth paying attention to?

As people's attention is largely focused online, data has become the digital gold of this era. In 2024, the global average screen time is 6 hours and 40 minutes per day, an increase from previous years. In the United States, this number is even higher, reaching 7 hours and 3 minutes per day.

With such high engagement, the amount of data generated is staggering, with 3.2877 TB of data produced daily in 2024. In total, considering all newly generated, captured, replicated, or consumed data, it amounts to approximately 0.4 ZB of data per day (1 ZB = 1,000,000,000 TB).

However, despite the massive amount of data generated and consumed daily, users own very little:

  • Social Media: Data on platforms like X and Instagram is controlled by companies, even though this data is generated by users;

  • Internet of Things (IoT): Data from smart devices typically belongs to the device manufacturers or service providers unless otherwise specified by a specific agreement;

  • Health Data: While individuals have rights to their medical records, most data from health applications or wearable devices is controlled by the companies providing these services.

Cryptocurrency and Social Data

In the crypto space, we have seen the rise of Kaito AI, which indexes social data on the X platform and transforms it into actionable sentiment data for projects, KOLs, and thought leaders. The terms "yap" and "mindshare" have been promoted by the Kaito team due to their expertise in growth hacking (through their popular mindshare and yapper dashboards) and their ability to attract organic interest on Crypto Twitter.

"Yap" aims to incentivize the creation of high-quality content on the X platform, but many questions remain unanswered:

  • How are "yaps" "accurately" scored?

  • Will mentioning Kaito earn additional 'yaps'?

  • Is Kaito genuinely rewarding high-quality content, or does it favor controversial hot takes?

In addition to social data, discussions about data ownership, privacy, and transparency are becoming increasingly heated. With the rapid development of artificial intelligence, new questions arise: Who owns the data used to train AI models? Who can benefit from the results generated by AI? These questions pave the way for the emergence of Web3 data layers—a step towards a decentralized, user-driven data ecosystem.

The Emergence of Data Layers

In the Web3 space, an increasingly robust ecosystem of data layers, protocols, and infrastructure is forming, aimed at achieving personal data sovereignty, allowing individuals to better control their data and providing monetization opportunities.

Vana

Who owns my data? What data layer projects are worth paying attention to?

Vana has a core mission to empower users to control their data, especially in the context of artificial intelligence, where data is invaluable for training models. Vana has launched DataDAOs, community-driven entities where users pool their data for mutual benefit. Each DataDAO focuses on a specific dataset:

  • r/datadao: Focuses on Reddit user data, enabling users to control and monetize their contributions;

  • Volara: Handles data from the X platform, allowing users to benefit from their social media activities;

  • DNA DAO: Aims to manage genetic data with a focus on privacy and ownership.

Vana segments data into a tradable asset called "DLP." Each DLP aggregates data from specific domains, and users can stake tokens in these pools to earn rewards, with top pools rewarded based on community support and data quality. Vana stands out for its simplicity in data contribution. Users simply select a DataDAO and then either integrate their data directly via API or manually upload data, ultimately earning DataDAO tokens and VANA tokens as rewards.

Ocean Protocol

Who owns my data? What data layer projects are worth paying attention to?

Ocean Protocol is a decentralized data marketplace that allows data providers to share, sell, or license their data while consumers can access this data for AI and research purposes. Ocean Protocol uses "datatokens" (ERC20 tokens) to represent access rights to datasets, allowing data providers to monetize their data while maintaining control over access conditions.

The types of data traded on Ocean Protocol include:

  • Public data refers to open datasets, such as weather information, public demographics, or historical stock data, which are highly valuable for AI training and research;

  • Private data includes medical records, financial transactions, IoT sensor data, or personalized user data, which require strict privacy controls.

Compute-to-Data is another key feature of Ocean Protocol, allowing computations to be performed on data without moving it, ensuring the privacy and security of sensitive datasets.

Masa

Who owns my data? What data layer projects are worth paying attention to?

Masa focuses on creating an open layer for AI training data, providing real-time, high-quality, and low-cost data for AI agents and developers.

Masa has launched two subnets on the Bittensor network:

  • Subnet 42 (SN42): Aggregates and processes millions of data records daily, providing a foundation for AI agents and application development;

  • Subnet 59 (SN59) – "AI Agent Arena": A competitive environment where AI agents utilize real-time data from SN42 to compete for TAO release amounts based on performance metrics such as mindshare, user engagement, and self-improvement.

Additionally, Masa collaborates with Virtuals Protocol to provide real-time data capabilities for Virtuals Protocol agents. It has also launched the TAOCAT token, showcasing its capabilities (currently on Binance Alpha).

Open Ledger

Who owns my data? What data layer projects are worth paying attention to?

Open Ledger is building a blockchain specifically tailored for data, particularly for AI and machine learning applications, ensuring secure, decentralized, and verifiable data management, with highlights including:

  • Datanets: A network of specialized data sources within OpenLedger, curating and enriching real-world data for AI applications;

  • SLMs: AI models customized for specific industries or applications. The idea is to provide models that are not only more accurate in niche use cases but also privacy-compliant and less susceptible to biases present in general models;

  • Data validation: Ensuring the accuracy and credibility of data used to train specific language models (SLMs), which are accurate and reliable in specific use cases.

The Demand for Data in AI Training

The demand for high-quality data is surging to drive the development of AI and autonomous agents. Beyond initial training, AI agents also require real-time data for continuous learning and adaptation, with key challenges and opportunities including:

  • Data quality over quantity: AI models need high-quality, diverse, and relevant data to avoid bias or poor performance;

  • Data sovereignty and privacy: As shown by Vana, the monetization of data owned by users is gaining traction, which could reshape how AI training data is acquired;

  • Synthetic data: With growing concerns over privacy, synthetic data is increasingly recognized as a method to train AI models while mitigating ethical issues;

  • Data markets: The rise of data markets (both centralized and decentralized) is creating an economy where data is treated as a tradable asset;

  • AI in data management: AI is now being used to manage, clean, and enhance datasets, improving the quality of data for AI training.

As AI agents become more autonomous, their access to and ability to process real-time high-quality data will directly impact their effectiveness. This growing demand has led to the emergence of data markets specifically designed for AI agents, where both AI agents and humans can access high-quality data.

Web3 Agent Data Market

Cookie DAO aggregates social sentiment data from AI agents along with token-related information, transforming it into insights actionable by both humans and AI agents. The Cookie DataSwarm API enables AI agents to access real-time high-quality data for trading-related insights, which is one of the most common applications in the crypto space. Additionally, Cookie boasts 200,000 monthly active users and 20,000 daily active users, making it one of the largest AI agent data markets, with the COOKIE token at its core.

Finally, other noteworthy projects in this space include:

  • GoatIndex.ai focusing on insights from the Solana ecosystem;

  • Decentralised.Co focusing on niche data dashboards, such as GitHub and project-specific analytics.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink