Tearing Apart the "Beauty Founder" and "Rich Second Generation" Internet Celebrity Labels: Exploring the Essence of Pika Being Pursued by Capital

Image source: Generated by Wujie AI

If you still don't know about Pika Labs, then you might be out of the loop. Because this AI startup, which was only established six months ago, has become the "new darling" of Silicon Valley capital.

We can see that the lineup of investors behind Pika Labs is quite luxurious, including OpenAI's two founding members Adam D'Angelo and Andrej Karpathy, former Github CEO Nat Friedman, HuggingFace founder Clem Delange, Giphy co-founder Alex Chung, YC partner Daniel Gross, and others. It's worth noting that this is almost half of Silicon Valley's AI industry.

In fact, it's not surprising that Pika Labs has become popular. This company is adorned with "second-generation rich," "beautiful female founder," "academic overachiever entrepreneurship," and other attention-grabbing "labels." But are these internet-famous "labels" the secret to Pika Labs' success?

Established for six months, only four employees, yet causing a sensation in Silicon Valley

When talking about Pika Labs, it's impossible to ignore the 95s girl, Guo Wenjing.

According to information, Guo Wenjing's mother is a high-achieving graduate of the Massachusetts Institute of Technology, and her father is the controlling shareholder of Zhejiang's first software company listed on the domestic main board, Xinyada Technology, Guo Huaqiang. Obviously, Guo Wenjing holds the script of a "second-generation rich" in her hands.

But in this background, Guo Wenjing's strong qualifications cannot be concealed. She was the first student in Zhejiang to be admitted to Harvard University ahead of schedule. After entering Harvard, while pursuing her studies, Guo Wenjing also interned at companies such as Meta, Microsoft, Google Brain, and Epic Games. After obtaining a master's degree in computer science and a bachelor's degree in mathematics, she went to Stanford University for a Ph.D.

It was her experience at Stanford University that led Guo Wenjing to meet Chenlin Meng, who later co-founded Pika Labs with her. In April of this year, the two of them dropped out of Stanford University to start Pika Labs. Just six months later, Pika Labs shocked the world.

Pika Labs founder Guo Wenjing (left) and co-founder and CTO Chenlin Meng.

On November 29, Pika Labs announced its latest video generation model, Pika 1.0, which can generate and edit 3D animations, cartoons, and movies with almost no threshold. Users only need to input a sentence to generate various styles of videos they want.

In the promotional video, Pika 1.0 has powerful semantic understanding. By entering the keywords "Musk wearing a spacesuit, 3D animation," a cartoon Musk wearing a spacesuit appears, with a SpaceX rocket in the background. The clarity and coherence of the generated video far surpass other AI video generation products on the market. It was this promotional video that "ignited" Silicon Valley for Pika.

(Promotional video animation)

In fact, on November 3, Pika Labs made its appearance in the public eye. At the release event of "The Wandering Earth 3," the industrial lab G!Lab was officially established, with strategic partners including Huawei, Xiaomi, and SenseTime, jointly exploring the creation of industrialized 3.0 movies using AI technology. Among the list of strategic partners, right next to SenseTime, in the second row, is Pika Labs. It was reported that director Guo Fan praised Pika's advanced research on AI videos after returning from a visit to the United States in October.

So far, Pika Labs has completed three rounds of financing, with a total amount of 55 million US dollars and an estimated valuation of over 1 billion RMB.

The popularity of Pika Labs represents a fresh ripple in the rapidly changing AIGC market. At the same time, many people are questioning why this company, which has only been established for six months and has only four employees, is being embraced by capital.

Peeling away the "celebrity" facade, what is the value of AI-generated videos?

In the AI large model craze sparked by ChatGPT this year, chatbots based on large language models have become the hottest entrepreneurial direction. In the field of AI applications for content generation, image generation is the primary scenario, followed by writing tools and video generation tools.

Compared to language models, AI-generated videos are a completely different type of model. They share similarities with models for generating images, but the difficulty is higher.

In an interview with the media, Guo Wenjing stated that videos have many issues different from images, such as ensuring the smoothness of the video, ensuring the movements, videos being larger than images, requiring larger GPU memory, considering logical issues in video generation, and whether to generate frame by frame or all at once. Many models now generate all at once, resulting in very short generated videos.

Chenlin Meng added that every frame of a video is an image, making it much more difficult than generating images. Because the quality of each frame needs to be high, and there needs to be correlation between adjacent frames. When the video is long, ensuring consistency for each frame is a very complex problem.

During training, handling video data requires processing multiple images, and the model needs to adapt to this situation. For example, transferring 100 frames of images to the GPU is a challenge. During inference, due to the generation of a large number of frames, the inference speed will be slower compared to a single image, and the computational cost will also increase.

In addition, controlling video generation is more difficult because the model needs to generate what happens in each frame, and users would not want to provide detailed descriptions for every frame.

Previously, the website of "Vice" magazine rated an AI-generated video of "Will Smith eating spaghetti" as the weirdest AI-generated video. In this video, the distorted face of Smith looks like a strange fish and attempts to scoop piles of spaghetti into his mouth from a fork or by hand, chewing large chunks of spaghetti. This nightmare-like video was generated from a harmless line of "Will Smith eating spaghetti" text.

This also indicates that the underlying models and technology of video generation tools still need continuous optimization. Currently, mainstream video generation models mainly rely on Transformer models and diffusion models. Tools based on diffusion models focus on improving video quality, overcoming the challenges of rough effects and lack of details, but also limiting the length of the video.

On the other hand, the training process of diffusion models requires very large memory and computational power, and only large companies and startups that have received substantial investment can afford the training costs of the model.

However, Original Universe New Voice believes that the technical difficulties in the field of AI-generated videos are only temporary and do not prevent it from becoming another track favored by capital. Moreover, the outbreak in this field will endow AI video tools with powerful product functions, thereby opening up broader application scenarios. Through text descriptions or other simple operations, AI video tools can generate high-quality and complete video content, reducing the threshold for video creation and enabling outsiders to accurately use videos for content presentation, potentially widely empowering content production in various industries and increasing cost-effectiveness and creative output.

Giants racing in the AI-generated video track

With the release of Pika 1.0, we can see that the competition in the AI video field is becoming increasingly intense.

On November 23, Adobe completed the acquisition of the AI startup Rephrase.ai, which primarily converts text into virtual image videos using AI technology. The acquisition of Rephrase.ai also signifies Adobe's first acquisition in the AI field.

Ashley Still, Senior Vice President and General Manager of Adobe, stated, "The Rephrase.ai team's expertise in generative AI audio-video technology and text-to-video generation tools will expand Adobe's generative video capabilities."

Original Universe New Voice believes that Adobe's acquisition of Rephrase.AI reflects the gradual shift of the AI-generated content trend from text and images to more complex forms such as videos.

With the application and innovation of related technologies, the AI-generated video track continues to heat up. We can see that models such as Meta's Emu model and the Gen-2 model released by Runway, a company invested in by Google, all support video content generation based on text.

Stable AI, a startup focused on developing AI products, also released its latest AI model, Stable Video Diffusion, which can generate videos from existing images. This model is an extension of the previously released Stable Diffusion text-to-image model and is one of the few AI models currently available that can generate videos.

In China, companies such as Baidu, Alibaba, Tencent, 360, and Kuaishou are also increasing their investment in large models for the video field and launching related AI models. For example, Alibaba has launched a "Text-to-Video Large Model" on the AI model community "ModelScope." The overall model parameters are approximately 1.7 billion, and it currently only supports English input. The diffusion model uses the Unet3D structure to achieve video generation by iteratively denoising from pure Gaussian noise videos.

In June of this year, 360 Brain's large model 4.0 version was released, with cross-modal processing and generation capabilities for text, images, voice, and video. The "Text-to-Video" multimodal function is the first of its kind in China, allowing the generation of videos from any text script without the limitations of professional skills and materials.

As a mainstream media content today, the collision of video and AI has brought about a completely new way of creation. Industry insiders predict that by 2030, 90% of digital content will be generated by AI. It is estimated that by 2032, the global market size of AI video generation software will reach 21.72 billion US dollars.

It is evident that AI video generation technology is still rapidly iterating and evolving, and the new opportunities it will bring are still unknown. The only certainty is that the competition among the players who have already entered the field is intensifying.

In conclusion

Original Universe New Voice believes that in the new wave of AI, text-to-text and text-to-image have been developing in parallel. ChatGPT represents the breakthrough in text generation, Midjourney has made text-to-image available to everyone, and with the emergence of Pika, the market has opened up infinite possibilities for text-to-video.

Currently, generative AI technology and applications are rapidly developing globally, and emerging text and image generation models are changing the traditional AI application landscape. AIGC, as the "touchstone" for the large-scale implementation of AI, not only helps creators achieve faster and richer content creation but also lowers the threshold for creation. It is foreseeable that the innovation in the field of large AI models will enable more people to materialize their creativity and bring the future world of virtual and reality closer to us.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Tearing Apart the "Beauty Founder" and "Rich Second Generation" Internet Celebrity Labels: Exploring the Essence of Pika Being Pursued by Capital

Established for six months, only four employees, yet causing a sensation in Silicon Valley

Peeling away the "celebrity" facade, what is the value of AI-generated videos?

Giants racing in the AI-generated video track

Selected Articles by 巴比特

Table of Contents

Related Articles