On December 19th, OpenAI announced the beta version of the "Preparedness Framework" on its official website, aimed at monitoring and managing the potential dangers of increasingly powerful artificial intelligence models.
<img src="https://www.wujieai.cc/" alt="Image Source: Generated by Wujie AI" />
Recently, OpenAI has been embroiled in controversy due to internal disputes, raising questions about its governance and accountability. At the same time, OpenAI's measures to ensure the safety of artificial intelligence models have been receiving increasing attention.
At the end of October, OpenAI announced the establishment of a "Preparedness team" to monitor and evaluate cutting-edge models' technical and risks, and to develop and maintain Risk Informed Development Policies (RDP). Additionally, the team will closely collaborate with the security systems team, the super-alignment team, and other security and policy teams.
Building on this, OpenAI has today released a document called the "Preparedness Framework," outlining how OpenAI will "track, assess, predict, and mitigate catastrophic risks," aiming to ensure the safety of cutting-edge AI models and attempt to address some issues.
Data-driven approach to AI safety
One of the core mechanisms of the OpenAI "Preparedness Framework" is the use of a risk "scorecard" for all cutting-edge AI models. It can assess and track various indicators of potential risks, such as the model's capabilities, vulnerabilities, and impacts.
According to the introduction, the scorecards will undergo repeated evaluations and regular updates for all models, triggering reviews and interventions when specific risk thresholds are reached.
OpenAI categorizes perceived risk ratings into four levels: "low," "medium," "high," and "severe," and lists four categories of risk areas that could lead to catastrophic consequences: cybersecurity, CBRN (chemical, biological, radiological, nuclear threats), persuasion, and model autonomy.
OpenAI emphasizes that only models with scores at "medium" or below after mitigation are eligible for deployment, while models with scores still at "high" after mitigation cannot be deployed but can be further developed. Additionally, OpenAI states that additional security measures will be implemented for models with high or severe (pre-mitigation) risks.
Furthermore, OpenAI will establish a cross-functional "Safety Advisory Group" to oversee technical work and establish an operational framework for safety decision-making.
Initially, the Preparedness team will drive technical work, inspect and evaluate cutting-edge models, and regularly submit reports to the internal Safety Advisory Group. Subsequently, the Safety Advisory Group will review all reports and submit them to both the leadership and the board of directors.
It is worth noting that OpenAI points out that while the leadership makes decisions, the board of directors has the right to revoke decisions.
In addition to the above measures, the Preparedness Framework also includes a key element, allowing "qualified independent third parties" outside of OpenAI to test its technology and receive feedback, while OpenAI will closely collaborate with external parties, as well as internal teams such as security systems, to track instances of misuse in the real world. This measure helps to ensure broader scrutiny and validation of AI model safety.
Currently, this security framework is still in the testing phase. OpenAI also states that the Preparedness Framework is not a static document but a dynamic and evolving one. They will continuously improve and update the framework based on new data, feedback, and research, and will share their research findings and best practices with the AI community.
So, how do industry professionals view this framework?
Sharp contrast with Anthropic's policy formation
Before OpenAI announced this news, its main competitor, Anthropic, had already released several important statements on AI safety.
Founded by former OpenAI researchers, Anthropic is also a leading AI lab. In September of this year, it released the "Responsible Scaling Policy," aiming to adopt a series of technical and organizational protocols to help manage the risks of increasingly powerful AI systems.
In the document, Anthropic defined a framework called AI Safety Level (ASL) to address catastrophic risks. This framework roughly follows the Biosafety Level (BSL) standards used by the US government to handle dangerous biological materials. The basic idea of the framework is to require security, assurance, and operational standards that are commensurate with the potential catastrophic risks of the model. Higher ASL safety levels require stricter security demonstrations.
According to the ASL framework, it is divided into the following four levels:
- ASL-1 refers to systems that do not pose meaningful catastrophic risks, such as the 2018 LLM or AI systems that only play chess.
- ASL-2 refers to systems that show early signs of dangerous capabilities, such as being able to provide instructions on how to manufacture biological weapons, but due to insufficient reliability or not providing information that search engines cannot provide. Current LLMs (including Claude) seem to belong to ASL-2.
- ASL-3 refers to systems that significantly increase catastrophic misuse risks compared to non-AI baselines (such as search engines or textbooks) or show low-level autonomy.
- ASL-4 and higher versions (ASL-5+) are not defined yet, as they are too far from current systems, but may involve qualitative upgrades in catastrophic misuse potential and autonomy.
It can be seen that the two frameworks have significant differences in structure and approach. Anthropic's policy is more formal and standardized, directly linking security measures to model capabilities, and suspending development if security cannot be demonstrated.
In contrast, OpenAI's framework is more flexible and adaptable, setting general risk thresholds that trigger reviews but not predefined levels.
Experts believe that both frameworks have their pros and cons, but Anthropic's approach may be more advantageous in incentivizing and enforcing security standards.
They analyze that Anthropic's policy tends to actively integrate security into the development process rather than reactively, and such strict approaches help reduce potential risks when deploying AI models. OpenAI's Preparedness Framework is more lenient, allowing for greater human judgment and errors, and may be controversial due to the lack of specific security levels.
Nevertheless, everything has two sides. While Anthropic's policy strictly defines security standards, it may also lack a certain degree of flexibility, leading to some restrictions on certain innovations.
Despite this, some observers still believe that OpenAI is catching up in terms of security protocols. Although there are differences, both frameworks represent an important step forward in the field of AI safety, which is often overshadowed by the pursuit of AI capabilities.
As AI models become more powerful and widespread, collaboration and coordination among leading labs and stakeholders in security technology are crucial to ensuring the beneficial and ethical use of AI for humanity.
References:
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。