TEE Brief Manual: A Guide from Basic Concepts to Best Practices for Safe Use

CN
PANews
Follow
7 months ago

Securing TEE Apps: A Developer's Guide

Authors: prateek, roshan, siddhartha & linguine (Marlin), krane (Asula)

Compiled by: Shew, GodRealmX

Since Apple announced the launch of private clouds and NVIDIA provided confidential computing in GPUs, Trusted Execution Environments (TEE) have become increasingly popular. Their confidentiality guarantees help protect user data (which may include private keys), while isolation ensures that programs deployed within them are not tampered with—whether by humans, other programs, or the operating system. Therefore, it is not surprising that the Crypto x AI field extensively uses TEEs to build products.

Like any new technology, TEEs are currently undergoing a period of optimistic experimentation. This article aims to provide developers and general readers with a foundational conceptual guide to understand what TEEs are, their security models, common vulnerabilities, and best practices for securely using TEEs. (Note: To make the text easier to understand, we have consciously replaced TEE terminology with simpler equivalents).

What is TEE

A TEE is an isolated environment within a processor or data center where programs can run without interference from the rest of the system. To prevent interference from other parts of the system, a series of designs are required, primarily involving strict access control, which regulates access to programs and data within the TEE from other parts of the system. Currently, TEEs are ubiquitous in mobile phones, servers, PCs, and cloud environments, making them very accessible and cost-effective.

The above content may sound vague and abstract; in reality, different servers and cloud providers implement TEEs differently, but the fundamental purpose is to prevent interference from other programs.

TEE Quick Guide: From Basic Concepts to Best Practices for Secure Use

Most readers may log into devices using biometric information, such as unlocking a phone with a fingerprint. But how do we ensure that malicious applications, websites, or jailbroken operating systems cannot access and steal this biometric information? In fact, aside from encrypting the data, the circuits in TEE devices do not allow any program to access the memory and processor areas occupied by sensitive data.

Hardware wallets are another example of TEE application scenarios. Hardware wallets connect to a computer and communicate in a sandboxed manner, but the computer cannot directly access the mnemonic phrases stored in the hardware wallet. In both cases, users trust that the device manufacturers have correctly designed the chips and provided appropriate firmware updates to prevent the export or viewing of confidential data within the TEE.

Security Model

Unfortunately, there are many types of TEE implementations, and these different implementations (Intel SGX, Intel TDX, AMD SEV, AWS Nitro Enclaves, ARM TrustZone) require independent security model analysis. In the remainder of this article, we will primarily discuss Intel SGX, TDX, and AWS Nitro, as these TEE systems have a larger user base and complete available development tools. The aforementioned systems are also the most commonly used TEE systems within Web3.

Generally, the workflow of applications deployed in a TEE is as follows:

  1. A "developer" writes some code, which may or may not be open source.
  2. The developer then packages the code into an Enclave Image File (EIF) that can run in the TEE.
  3. The EIF is hosted on a server with a TEE system. In some cases, developers can directly use a personal computer with a TEE to host the EIF for external services.
  4. Users can interact with the application through a predefined interface.

TEE Quick Guide: From Basic Concepts to Best Practices for Secure Use

Clearly, there are three potential risks involved:

  • Developer: What exactly does the code used to prepare the EIF do? The EIF code may not align with the business logic publicly promoted by the project party and may steal users' private data.
  • Server: Is the TEE server running the expected EIF file? Or is the EIF really being executed within the TEE? The server may also be running other programs within the TEE.
  • Vendor: Is the design of the TEE secure? Are there backdoors that leak all data within the TEE to the vendor?

Fortunately, there are now solutions to eliminate the above risks, namely Reproducible Builds and Remote Attestation.

So, what is a Reproducible Build? Modern software development often requires importing a large number of dependencies, such as external tools, libraries, or frameworks, which may also pose risks. Now, solutions like npm use the code hash corresponding to dependency files as a unique identifier. When npm detects that a dependency file does not match the recorded hash value, it can be assumed that the dependency file has been modified.

TEE Quick Guide: From Basic Concepts to Best Practices for Secure Use

Reproducible Builds can be considered a set of standards aimed at ensuring that when any code runs on any device, as long as it follows a predetermined process, it will ultimately yield a consistent hash value. Of course, in practice, we can also use products other than hashes as identifiers, which we refer to as code measurement.

Nix is a commonly used tool for Reproducible Builds. When the source code of a program is made public, anyone can inspect the code to ensure that the developer has not inserted any malicious content. Anyone can use Nix to build the code and check whether the produced artifacts have the same code measurement/hash as those deployed by the project party in the production environment. But how do we know the code measurement value of the program in the TEE? This involves the concept of "Remote Attestation."

TEE Quick Guide: From Basic Concepts to Best Practices for Secure Use

Remote Attestation is a signed message from the TEE platform (the trusted party) that contains the program's code measurement value, TEE platform version, etc. Remote Attestation allows external observers to know that a certain program is executing in a secure location that is inaccessible to anyone (the real TEE of xx version).

TEE Quick Guide: From Basic Concepts to Best Practices for Secure Use

Reproducible Builds and Remote Attestation enable any user to know the actual code running within the TEE and the TEE platform version information, thus preventing developers or servers from acting maliciously.

However, in the case of TEEs, there is always a need to trust the vendor. If the TEE vendor acts maliciously, they can directly forge Remote Attestation. Therefore, if the vendor is viewed as a potential attack vector, it is advisable not to rely solely on TEEs, but rather to combine them with ZK or consensus protocols.

The Appeal of TEE

In our view, the particularly appealing features of TEEs, especially their friendliness for deploying AI Agents, mainly include the following points:

  • Performance: TEEs can run LLM models with performance and cost overhead similar to ordinary servers. In contrast, zkML requires significant computational power to generate zk proofs for LLMs.
  • GPU Support: NVIDIA provides TEE computing support in its latest GPU series (Hopper, Blackwell, etc.).
  • Correctness: LLMs are non-deterministic; multiple inputs of the same prompt can yield different results. Therefore, multiple nodes (including observers attempting to create fraudulent proofs) may never reach a consensus on the results of LLM execution. In this scenario, we can trust that the LLM running in the TEE cannot be manipulated by malicious actors, and the programs within the TEE always run as written, making TEEs more suitable than opML or consensus for ensuring the reliability of LLM inference results.
  • Confidentiality: Data within the TEE is invisible to external programs. Therefore, private keys generated or received within the TEE are always secure. This feature can be used to assure users that any message signed by that key comes from the internal program of the TEE. Users can confidently entrust their private keys to the TEE and set some signing conditions, and can confirm that the signatures from the TEE meet the pre-set signing conditions.
  • Internet Connectivity: Through certain tools, programs running in the TEE can securely access the internet (without revealing queries or responses to the server running the TEE, while still providing correct data retrieval guarantees to third parties). This is very useful for retrieving information from third-party APIs and can be used to outsource computations to trusted but proprietary model providers.
  • Write Permissions: Unlike zk solutions, code running in the TEE can construct messages (whether tweets or transactions) and send them out via APIs and RPC networks.
  • Developer-Friendly: TEE-related frameworks and SDKs allow people to write code in any language and easily deploy programs to the TEE as if they were on cloud servers.

Regardless of the pros and cons, it is currently quite difficult to find alternative solutions for many use cases that utilize TEEs. We believe that the introduction of TEEs further expands the development space for on-chain applications, which may drive the emergence of new application scenarios.

TEE is Not a Silver Bullet

Programs running in a TEE are still susceptible to a range of attacks and errors. Like smart contracts, they can encounter a series of issues. For simplicity, we categorize potential vulnerabilities as follows:

  • Developer Negligence
  • Runtime Vulnerabilities
  • Architectural Design Flaws
  • Operational Issues

Developer Negligence

Whether intentional or unintentional, developers can undermine the security guarantees of programs in the TEE through deliberate or accidental code. This includes:

  • Opaque Code: The security model of TEE relies on external verifiability. Code transparency is crucial for verification by external third parties.
  • Issues with Code Measurement: Even if the code is public, without a third party rebuilding the code and checking the code measurement in the remote attestation, it cannot be verified. This is similar to receiving a zk proof without validating it.
  • Insecure Code: Even if you carefully generate and manage keys correctly within the TEE, the logic contained in the code may leak keys from the TEE during external calls. Additionally, the code may contain backdoors or vulnerabilities. Compared to traditional backend development, it requires a higher standard for software development and auditing, similar to smart contract development.
  • Supply Chain Attacks: Modern software development uses a large amount of third-party code. Supply chain attacks pose a significant threat to the integrity of TEEs.

Runtime Vulnerabilities

No matter how cautious developers are, they can still fall victim to runtime vulnerabilities. Developers must carefully consider whether any of the following could impact the security guarantees of their projects:

  • Dynamic Code: It may not always be possible to keep all code transparent. Sometimes, the use case itself requires dynamically executing opaque code loaded into the TEE at runtime. Such code can easily leak secrets or violate invariants, and great care must be taken to prevent this.
  • Dynamic Data: Most applications use external APIs and other data sources during execution. The security model must extend to include these data sources, which are on par with oracles in DeFi; incorrect or outdated data can lead to disaster. For example, in the case of AI Agents, over-reliance on LLM services like Claude.
  • Insecure and Unstable Communication: TEEs need to run on servers that contain TEE components. From a security perspective, the server running the TEE is essentially a perfect man-in-the-middle (MitM) between the TEE and external interactions. The server can not only eavesdrop on the TEE's external connections and see what is being sent but can also inspect specific IPs, restrict connections, and inject packets into the connection, aiming to deceive one party into thinking it comes from xx.

For example, a matching engine running in a TEE that can handle encrypted transactions cannot provide fair ordering guarantees (anti-MEV) because routers/gateways/hosts can still drop, delay, or prioritize packets based on the source IP address.

Architectural Flaws

The technology stack used by TEE applications should be approached with caution. The following issues may arise when building TEE applications:

  • Applications with Large Attack Surfaces: The attack surface of an application refers to the number of code modules that need to be guaranteed as completely secure. Code with a large attack surface is very difficult to audit and may hide bugs or exploitable vulnerabilities. This often conflicts with the experience of developers. For example, TEE programs that rely on Docker have a much larger attack surface compared to those that do not. Enclaves that depend on mature operating systems have a larger attack surface compared to TEE programs using the most lightweight operating systems.
  • Portability and Activity: In Web3, applications must be resistant to censorship. Anyone can start a TEE and take over inactive system participants, making applications within the TEE portable. The biggest challenge here is the portability of keys. Some TEE systems have key derivation mechanisms, but once the key derivation mechanism within the TEE is used, other servers cannot locally generate keys within external TEE programs, which limits TEE programs to the same machine, insufficient for maintaining portability.
  • Insecure Trust Roots: For example, when running an AI Agent in a TEE, how do you verify that a given address belongs to that Agent? If not carefully designed, this could lead to a real trust root being an external third party or a key custody platform rather than the TEE itself.

Operational Issues

Last but not least, there are some practical considerations regarding how to actually run a server executing TEE programs:

  • Insecure Platform Versions: TEE platforms occasionally receive security updates, which are reflected in the platform version in the remote attestation. If your TEE is not running on a secure platform version, hackers can exploit known attack vectors to steal keys from the TEE. Worse, your TEE may be running on a secure platform version today but could be insecure tomorrow.
  • Lack of Physical Security: Despite your best efforts, the TEE may be susceptible to side-channel attacks, which typically require physical access and control over the server where the TEE resides. Therefore, physical security is an important layer of defense. A related concept is cloud attestation, where you can prove that the TEE is running in a cloud data center that has physical security guarantees.

Building Secure TEE Programs

We divide our recommendations into the following points:

  • The safest approach
  • Necessary precautions to take
  • Use case-dependent recommendations

1. The Safest Approach: No External Dependencies

Creating highly secure applications may involve eliminating external dependencies, such as external inputs, APIs, or services, thereby reducing the attack surface. This approach ensures that the application runs independently without external interactions that could compromise its integrity or security. While this strategy may limit the functional diversity of the program, it can provide extremely high security.

If the model runs locally, this level of security can be achieved for most CryptoxAI use cases.

2. Necessary Precautions to Take

Regardless of whether the application has external dependencies, the following are essential!

Treat TEE applications like smart contracts, not backend applications; maintain a low update frequency and conduct rigorous testing.

Building TEE programs should be as rigorous as writing, testing, and updating smart contracts. Like smart contracts, TEEs operate in a highly sensitive and tamper-proof environment, where errors or unintended behavior can lead to severe consequences, including total loss of funds. Thorough audits, extensive testing, and minimal, carefully audited updates are crucial for ensuring the integrity and reliability of TEE-based applications.

Audit Code and Check Build Pipeline

The security of an application depends not only on the code itself but also on the tools used during the build process. A secure build pipeline is critical for preventing vulnerabilities. TEEs only guarantee that the provided code will run as expected, but they cannot fix defects introduced during the build process.

To mitigate risks, the code must undergo rigorous testing and auditing to eliminate errors and prevent unnecessary information leaks. Additionally, reproducible builds play a crucial role, especially when the code is developed by one party and used by another. Reproducible builds allow anyone to verify that the program executed within the TEE matches the original source code, ensuring transparency and trust. Without reproducible builds, determining the exact content of the program executing within the TEE is nearly impossible, jeopardizing the application's security.

For example, the source code of DeepWorm (a project running a worm brain simulation model in the TEE) is fully open. The executing program within the TEE is built reproducibly using the Nix pipeline.

Use Audited or Verified Libraries

When handling sensitive data in TEE programs, only use audited libraries for key management and private data processing. Unverified libraries may expose keys and compromise the security of the application. Prioritize well-reviewed, security-focused dependencies to maintain the confidentiality and integrity of the data.

Always Validate Proofs from the TEE

Users interacting with the TEE must validate the remote proofs or verification mechanisms generated by the TEE to ensure secure and trustworthy interactions. Without these checks, the server may manipulate responses, making it impossible to distinguish between genuine TEE outputs and tampered data. Remote attestation provides critical proof of the codebase and configuration running within the TEE, allowing us to determine whether the program executing within the TEE aligns with expectations.

Specific attestations can be conducted on-chain (Intel SGX, AWS Nitro), using ZK proofs (Intel SGX, AWS Nitro) for off-chain verification, or can be verified by users themselves or through hosted services (like t16z or MarlinHub).

3. Use Case-Dependent Recommendations

Depending on the target use case and structure of the application, the following tips may help make your application more secure.

Ensure that user interactions with the TEE are always executed over secure channels

The server hosting the TEE is inherently untrusted. The server can intercept and modify communications. In some cases, it may be acceptable for the server to read data without altering it, while in other cases, even reading data may be undesirable. To mitigate these risks, it is crucial to establish a secure end-to-end encrypted channel between the user and the TEE. At a minimum, ensure that messages contain signatures to verify their authenticity and origin. Additionally, users should always check the remote proofs provided by the TEE to verify that they are communicating with the correct TEE. This ensures the integrity and confidentiality of the communication.

For example, Oyster can support secure TLS issuance using CAA records and RFC8657. Additionally, it offers a TEE-native TLS protocol called Scallop, which does not rely on WebPKI.

Be Aware that TEE Memory is Transient

TEE memory is transient, meaning that when the TEE shuts down, its contents (including encryption keys) are lost. Without a secure mechanism to preserve this information, critical data may become permanently inaccessible, potentially jeopardizing funds or operations.

Multi-party computation (MPC) networks with decentralized storage systems like IPFS can serve as a solution to this issue. MPC networks split keys across multiple nodes, ensuring that no single node holds the complete key while allowing the network to reconstruct the key when needed. Data encrypted with this key can be securely stored on IPFS.

If necessary, the MPC network can provide keys to new TEE servers running the same image, provided specific conditions are met. This approach ensures resilience and robust security, maintaining data accessibility and confidentiality even in untrusted environments.

TEE Quick Guide: From Basic Concepts to Best Practices for Secure Use

Another solution is for the TEE to delegate related transactions to different MPC servers, which sign and aggregate the signatures before finalizing the transactions on-chain. The flexibility of this method is much lower and cannot be used to store API keys, passwords, or arbitrary data (without trusted third-party storage services).

TEE Quick Guide: From Basic Concepts to Best Practices for Secure Use

Reduce Attack Surface

For safety-critical use cases, it is worth attempting to minimize peripheral dependencies at the cost of developer experience as much as possible. For example, Dstack comes with a minimal kernel based on Yocto that includes only the modules required for Dstack to function. It may even be worthwhile to use older technologies like SGX (over TDX) since they do not require the bootloader or operating system to be part of the TEE.

Physical Isolation

The security of the TEE can be further enhanced by physically isolating it from potential human intervention. While we can trust data centers and cloud providers to provide physical security by hosting TEE servers in their facilities, projects like Spacecoin are exploring a rather interesting alternative—space. The SpaceTEE paper relies on security measures such as measuring the inertia tensor after launch to verify whether the satellite deviates from its expected trajectory during its entry into orbit.

Multiprover

Just as Ethereum relies on multiple client implementations to reduce the risk of bugs affecting the entire network, multiprovers use different TEE implementation schemes to enhance security and resilience. By running the same computational steps across multiple TEE platforms, multiprover ensures that vulnerabilities in one TEE implementation do not compromise the entire application. While this approach requires the computational process to be deterministic or defines consensus among different TEE implementations in non-deterministic cases, it also offers significant advantages such as fault isolation, redundancy, and cross-validation, making it a good choice for applications that require reliability guarantees.

TEE Quick Guide: From Basic Concepts to Best Practices for Secure Use

Looking Ahead

TEEs have clearly become a very exciting field. As mentioned earlier, the ubiquity of AI and its ongoing access to sensitive user data means that large tech companies like Apple and NVIDIA are using TEEs in their products and offering them as part of their offerings.

On the other hand, the crypto community has always placed a strong emphasis on security. As developers attempt to expand on-chain applications and use cases, we have seen TEEs become popular as a solution that provides the right balance between functionality and trust assumptions. While TEEs are not as trust-minimized as complete ZK solutions, we expect TEEs to become a pathway for slowly merging the products of Web3 companies and large tech companies.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

中奖率100%,每日可抽iPhone 17
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink