First-hand | Kwai's self-developed large-scale model "Ketu" for text and image has been launched for internal testing recently.

CN
巴比特
Follow
1 year ago

Source: GenAI New World

Author: Li Hezi

Image

Image Source: Generated by Wujie AI

Since the second half of this year, Kuaishou has been frequently focusing on large model business.

GenAI New World has learned first-hand that Kuaishou's latest progress in the AIGC field - the self-developed text-to-image large model "Kolors" has been fully launched for internal testing within the company.

Following the launch of the large language model "KwaiYii" last month, Kuaishou has launched the text-to-image large model "Kolors" within less than a month, filling the gap in the large model business after "Wenshengwen" with "Wenshengtu".

According to the development members of the "Kolors" project team, the text-to-image large model "Kolors" has three prominent features: powerful text understanding, rich detail depiction, and diverse style transformation.

From the layout of the homepage of the internal platform test version of the "Kolors" large model, it can be seen that it already has a mature product prototype.

Kolors Homepage

Although the "Kolors" large model has not yet started external testing, we can still get a glimpse of its specific performance from the "AI Play Review" function launched on Kuaishou's app two days ago.

"AI Play Review" was opened for internal testing on the main Kuaishou site on September 15, with technical support provided by the "Kolors" text-to-image large model.

Just when we thought this was another text-to-image tool purely for short video creators and would appear in the video editing backstage, Kuaishou unexpectedly placed the "AI Play Review" function in the comment section.

In other words, in the future, when you comment on a certain short video, you may no longer need to painstakingly find suitable images/emojis, as Kuaishou can directly generate one for you.

On the day "AI Play Review" was launched, users who obtained testing qualifications had already started "wishing" messages in the official comment section of Kuaishou.

GenAI New World also obtained testing qualifications at the first time, so let's take a look at our first-hand experience.

First, open the comment section of any video on the Kuaishou app and find the "AI" button on the right side of the comment input box to easily enter the "AI Play Review" function interface.

The premise for generating images is to first enter a paragraph of at least six words. After entering, the AI icon on the right side will automatically light up, indicating that you can start generating.

Let's try some more conventional descriptions to see if it can understand, such as entering "Sunshine shining on the beach, a child playing by the sea."

After a few seconds, a set of AI-generated images will appear below, each labeled with different styles, including Shinkai Makoto, pixel art, realistic anime, Chinese style, cyberpunk, and Pixar, among others, reportedly totaling more than a dozen. If you are not satisfied with the generated sets of images, you can also click the "Change to see" button in the upper right corner to regenerate a set.

From the generated images, the understanding seems to be quite accurate.

Generated Image

Entering a few lines of Chinese ancient poetry, it surprisingly understood:

Chinese Poetry

From the details of the images, whether it's the fine lines of the leaves and petals, or the distinct flower stamens, it's rare to handle them well (even the withered edges of the leaves are also reflected).

Image Details

It is said that the Kuaishou AI research team has modified the underlying formula of the denoising algorithm and the noise addition formula, and has also selected a batch of high-detail, high-aesthetic quality data for model learning in the later stage. This allows the generated image's details and textures to be richer, which is presumably related to this.

Let's input some Chinese context-specific phrases:

Chinese Phrases

Not bad, at least it didn't draw fish-flavored shredded pork as a fish and a piece of meat, and you can even smell the spiciness of Mapo Tofu through the screen (there are even sprinkled green onions on top).

It even understands what "Gan Fan Ren" is (Gan Fan is originally a Southwest Mandarin dialect):

Gan Fan Ren

The understanding of Chinese expressions by the "Kolors" large model can be said to have been taken into consideration from the beginning.

According to Kuaishou's development personnel, Kuaishou AI has tens of billions of image and text training data, which comes from open source communities and self-developed AI technology synthesis, and covers common 30 million Chinese entity concepts. Based on this, a powerful Chinese CLIP model has been developed.

The self-developed Chinese LLM, combined with the fusion of CLIP's image and text features as the text understanding module of the text-to-image model "Kolors," allows the "Kolors" large model to better understand Chinese characteristic concepts, reducing common problems in the text-to-image field such as complex concepts and attribute confusion.

However, since the "AI Play Review" function is placed in the comment section, let's also see how it performs in the commenting scenario.

Riding on a hot topic, inputting the description of the mood "Unable to get a train ticket, very frustrated," "AI Play Review" still generated some interesting images that reflect this kind of emotion:

Frustrated Image

And when I input the classic phrase from the emoji "I don't understand, but I am deeply shocked," I found that the generated simple sketch style and Shinkai Makoto style were the most appropriate (the most popular generated style reserved?):

Shocked Image

Overall, "AI Play Review" still has its merits, of course, the premise is still to use appropriate prompts and relatively common descriptions to prompt the AI. And if we consider the common characteristics of netizens freely expressing their emotions and colloquial language in comments, this function still has a lot of room for exploration.

In fact, from the "AI Play Review" function, we can see some of Kuaishou's thoughts on the application of large models, which is - it emphasizes "implementation" very much.

Different from many companies that have been committed to creating universal large model products, Kuaishou, as a short video content community, pays more attention to the integration with the community in the development of large model product functions, caring about whether users can really use them.

Just like the "AI Dialogue" function launched in August based on the "KwaiYii" large model, part of it is based on the search scenario, which allows users to find platform content more conveniently and accurately (while providing related videos and encyclopedia links with the answers), the user-oriented service concept is also extended to "AI Play Review."

As for why the "Kolors" large model was first implemented in the comment section of Kuaishou's site, Kuaishou's official response is,

"The cumulative number of mutual interactions of Kuaishou app users exceeds 31.1 billion pairs, a year-on-year increase of nearly 50%, and the daily total interaction (including likes, comments, and reposts, etc.) reaches 8 billion times. The comment section of short video with strong user stickiness has become one of the best landing application scenarios for AIGC capabilities… (Spending time and effort to find matching images) to a large extent suppresses the willingness of users to post comments. AI Play Review can greatly enhance the enthusiasm and satisfaction of users participating in comments."

Of course, since the "Kolors" large model has been developed into a product, there may be even greater ambitions behind it for Kuaishou.

During the first quarter earnings call in May this year, Kuaishou CEO Su Hua revealed for the first time to the outside world the progress of Kuaishou's large model business: a large model development team has been formed, and relying on the past technical accumulation in AIGC algorithms and large-scale language model, the development and training of large models are being advanced as planned.

Two months later, on July 8, Kuaishou announced the start of internal testing for the "Smart Search Q&A" product, marking the beginning of the application of large models.

This was followed by relatively intensive product and feature releases: on August 8, Kuaishou began internal testing of the "AI Dialogue" function, on August 21, the large language model "KwaiYii" started internal testing, on September 15, the main site began internal testing of the "AI Play Review" function, until today, the "Kolors" text-to-image large model surfaced…

It is worth noting that on August 10, Kuaishou, for the first time, officially and in detail introduced the progress of its large model business at the Photosynthesis Creator Conference held for platform creators.

In response to the large model business, which had previously revealed very little to the outside world, Kuaishou announced multiple developments at this conference. This includes various capabilities for text, images, videos, and even 3D materials and music audio generation, covering various aspects of video creation such as creative inspiration, material mining, and editing production, and also focused on the rapid creation of the user's own twin digital person in the live broadcast scenario, the "Kuaishou Smart Broadcast" solution.

Even this annual conference itself, including the posters and the way the guests appeared, was rare in being full of AIGC elements.

Starting from scratch to self-develop large models, Kuaishou, unconsciously, has accumulated a lot and slowly built up what it calls the "full-modal large model AIGC solution."

So, with the launch of the "Kolors" large model this time, Kuaishou can be said to have come prepared.

Although major companies in China have begun to launch their own large model products one after another this year, for content companies, it seems that we have not yet seen a truly impressive product. It is quite exciting to think about how to explore more new ways for content companies based on training a reliable large model.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

注册返10%、领$600,前100名赠送PRO会员
链接:https://accounts.suitechsui.blue/zh-CN/register?ref=FRV6ZPAF&return_to=aHR0cHM6Ly93d3cuc3VpdGVjaHN1aS5hY2FkZW15L3poLUNOL2pvaW4_cmVmPUZSVjZaUEFG
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink