OpenAI released Sora 2 on Tuesday, pairing its latest video generation model with a new social app that lets users create, share, and star in AI-generated clips. The company called the release a major step forward in simulating physical reality, with the model now producing synchronized audio alongside video for the first time.
The updated model can generate video clips showing complex physical interactions that earlier systems struggled with. In some of the examples, Sora generated olympic gymnastics routines, backflips on paddleboards, and characters performing triple axels without any apparent distortion or morphing. Unlike previous video generators that bend physics to fulfill text prompts, Sora 2 attempts to model realistic outcomes, including failure.
"Prior video models are overoptimistic—they will morph objects and deform reality to successfully execute upon a text prompt," OpenAI said in its announcement. Sora 2 "is better about obeying the laws of physics compared to prior systems."
The model generates background soundscapes, speech, and sound effects directly from text prompts. Up until now, the only model with that capability was Google’s Veo 3. The system also handles multiple-shot sequences while maintaining continuity across scene changes which is also very complex and requires a heavy understanding of both the characters and the environment.
OpenAI is selling Sora 2 as the "GPT-3.5 moment for video," comparing it to the language model that preceded ChatGPT. The original Sora, released in February 2024, represented what the company called the "GPT-1 moment"—the first indication that video generation was starting to work at scale.
A lot of better models quickly left Sora in the dust, so much that by the time OpenAI decided to release its model, Chinese alternatives were able to output better, more coherent video using the same prompts.
For now, the only way to test the model is by invite via the new iOS app, simply named Sora. Unlike the previous model, which could only be accessed through a website and focused on isolated video generations, the app appears to be more polished and versatile, introducing a feature called "cameos" that lets users insert themselves into generated scenes.
After recording a short video to verify identity and capture appearance and voice, users can appear in any Sora-created environment. The feature works for humans, animals, or objects, and users control who can use their likeness.
During the demo, the team at OpenAI generated videos of themselves featuring ads, doing kickflips and being featured in different situations in a style similar to a TikTok video or an Instagram Reel.
The app includes a customizable feed using what OpenAI described as a new class of recommender algorithms that accept natural language instructions. The system defaults to showing content from people users follow or interact with, and the company said it does not optimize for time spent scrolling. Built-in mechanisms periodically poll users about their well-being and offer options to adjust feed settings.
For teenagers, the app includes default limits on daily generations visible in the feed and stricter permissions on cameos. Parents can access controls through ChatGPT to manage scroll limits, algorithm personalization, and direct message settings.
Users will maintain full control over their cameos, and can revoke access or remove videos containing their likeness at any time. The app shows users all videos featuring their cameo, including drafts created by others that haven't been published.
Sora 2 is launching in the United States and Canada through the invite-based system, with plans for quick expansion to other countries. The service will be free with what OpenAI called "generous limits," though these remain subject to compute constraints. ChatGPT Pro subscribers get access to an experimental higher-quality version called Sora 2 Pro. The company plans to release Sora 2 through its API, and will keep the earlier Sora 1 Turbo model available.
OpenAI said Sora 2 will eventually offer users the option to pay for additional generations if demand exceeds available computing resources.
For now, if you don’t have an invite code, iPhone, or ChatGPT Pro, the only option is going for the limited Veo 3 runs or using local video generators like Wan. There are also cheaper options like Kling, Seedance, Hailuo, or Runway, but the appeal of having a highly realistic video model with social media features is certainly a plus that nobody else in the industry offers.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。