Global AICoin Music Concert, the first time hearing the voice of China

6 months ago

Summer is coming, and various AI tracks are heating up. If we were to select the most popular AI application in recent times, AI music creation would definitely be at the forefront.

Just a month ago, the overseas startup Suno released the new AI music generation model Suno V3, which quickly generated music in various styles based on user's natural language, sparking heated discussions globally. Not long ago, the Udio music generation model from former DeepMind team members was also released. This model can produce very realistic music and complete the generation of multi-segment, long music pieces.

The competition in the AI music generation model track has suddenly become intense. With projects like OpenAI's MuseNet, Google's MusicLM, and Meta's MusicGen, it seems that we are witnessing a global AI music festival this year. It's a lively and colorful scene.

But in this AI music festival, there is a phenomenon that is worth paying attention to: this time, there is no time difference between Chinese technology and the global cutting-edge AI trends. We are no longer followers or catch-up players, but have joined this music and AI feast at the first time, playing our own melody.

On April 2, the AI music generation model "天工SkyMusic" created by Kuaishou opened for free invitation testing, and was officially released on April 17.

天工SkyMusic is the only publicly available AI music generation model in China, and also the first SOTA model for music in China.

It has maintained a high degree of synchronization with the globally influential Suno V3 and Udio, while demonstrating superior technical capabilities in multiple areas. In a horizontal evaluation with Suno V3,天工SkyMusic significantly outperformed its competitor in areas such as vocal & BGM sound quality, naturalness of vocals, and intelligibility of pronunciation, surpassing Suno V3 with a comprehensive score of 6.65, becoming the latest SOTA model for AI music globally.

"Chinese AI is not absent" is a melody that we have been looking forward to for a long time. What has enabled Kuaishou to complete this performance? What industrial and social value does 天工SkyMusic and the underlying 天工3.0 have?

Let's open this global AI music festival and understand a symphony about "Chinese AI is not absent" in this hot summer.

天工SkyMusic: The Oriental AI Music Festival

The Chinese AI music generation model that is not absent not only needs to be at the forefront of the industry at the first time, but also needs to provide convincing answers in terms of capabilities.

After its release, 天工SkyMusic has received very positive feedback from various parties. The voices from the media, musicians, industry experts, and a large number of users have proven its confidence to join the global AI music generation model competition. The AI music festival is not only happening in Europe and America, but also resonating in the East.

First, let's take a closer look at the specific technical capabilities of 天工SkyMusic.

By adopting the DiT model architecture in the music audio field similar to Sora, 天工SkyMusic has performed well in several core capabilities of AI music generation models.

These include high-quality music generation capabilities, highly realistic vocal modeling capabilities, rich lyric paragraph control capabilities, extensive mastery of music styles, and flexible music expression.

For example, AI music styles in Europe and America are often characterized by rich instrumental parameters, but tend to perform averagely in vocal fitting. In contrast, Kuaishou's 天工SkyMusic has undergone specialized model training in vocal naturalness and intelligibility, making AI pronunciation clear and free of distortion, allowing the music generated by the large model to be "deceptively real."

In addition, the music created by 天工SkyMusic demonstrates a grasp of various music styles and different generation demands. For example, the adaptation of popular music to epic songs, and the adaptation of classical poetry to traditional Chinese culture and rhythm.

Through a wide range of generation cases, we can see that 天工SkyMusic has a wide range of usage scenarios and a diverse user base, such as:

  1. Music professionals can use 天工SkyMusic to find inspiration and assist in music creation. For example, creators can input some "themes," including family, love, etc., and use the lyrics and melodies generated by 天工SkyMusic to find inspiration and explore creative boundaries.

  2. Short video creators and content creators can expand their creative boundaries through music generation, reducing the threshold for music creation. For example, content creators can use 天工SkyMusic to adapt "internet hits" and obtain new video BGM. With 天工SkyMusic, the following hit song can bring you a completely different experience.

  3. Music enthusiasts and fans can enjoy a more diverse music experience and interact with their favorite music types and styles in a more diverse and in-depth manner. For example, we can change a voice and reinterpret our favorite music.

  4. In educational settings, 天工SkyMusic can be used to assist in music education, allowing learners to experience the inner logic and rich techniques of music creation. It can also support traditional culture and musical instrument education. For example, 天工SkyMusic can generate music based on classical poetry, allowing students to grasp the essence of classical culture more vividly and accurately.

Based on such application value, we must also recognize the significant value of 天工SkyMusic as the earliest and currently the only AI music generation model in China in filling the industry gap. Its appearance means that Chinese users have a model that better suits the habits of Chinese music creation and provides better support for Chinese. Moreover, it is completely free, with no usage limits or additional usage thresholds, which foreign AI music models cannot achieve.

From a technical perspective, 天工SkyMusic also possesses unique technical advantages compared to projects such as MuseNet, MusicLM, and MusicGen.

In addition to the vocal synthesis and singing capabilities mentioned earlier, 天工SkyMusic is more sophisticated and diverse in music styles. It can control emotional changes through lyrics and achieve various singing techniques such as vibrato, opera, and chanting, making the generated music more emotionally rich and contextually relevant. Based on this, 天工SkyMusic supports the creation of various music styles such as rap, folk, funk, traditional Chinese, and electronic music, allowing users to customize music styles according to their preferences.

Overall, 天工SkyMusic has pioneered the field of Chinese AI music generation models, becoming the first SOTA for music AI in China, and significantly improving the technical performance of AI music generation models in the vocal domain.

天工 3.0: The World's Largest Open-Source MOE Model

天工SkyMusic's ability to resonate with the global AI music festival is undoubtedly inseparable from two factors: keen strategic direction and the foundation of technological capabilities.

The technological foundation behind 天工SkyMusic is the recently released "天工3.0."

天工3.0 uses a 400-billion-level MoE hybrid expert model, which is currently one of the largest and most powerful MoE models in the world. Compared to the previous generation天工2.0 MoE model, it has made significant performance improvements in model semantic understanding, logical reasoning, as well as generalization, uncertainty knowledge, and learning capabilities, with its model technical knowledge capabilities increasing by over 20% and mathematical/reasoning/coding/creative writing capabilities increasing by over 30%.

Specifically, 天工3.0 has brought comprehensive upgrades in four directions:

First, there is an enhancement in logical reasoning ability. The powerful logical reasoning ability of 天工3.0 enables it to handle information more accurately and efficiently in practical applications. For example, in the research mode of 天工3.0 AI search, it can extend related questions around a user's simple command and determine in real-time whether the paragraph information needs to be searched online. When analyzing specific industry information, it can summarize relevant events, break down industry chain maps, and present them in a structured or mind map format, integrating AIGC capabilities with industry applications more closely.

Secondly, there is an enhancement in semantic understanding ability. 天工3.0 can better understand and process complex semantic information in user's natural language queries, including metaphors and polysemous words. For example, 天工3.0 can break down and refine user queries, and even ask follow-up questions to better handle uncertain knowledge and meet diverse user needs.

In addition, 天工3.0 has added specialized Agent training capability. 天工3.0 has undergone specialized training in the ability to independently plan, call, and combine external tools and information, enabling it to independently generate and call code, complete various complex user demands such as industry research, product evaluations, information analysis, image generation, and chart drawing.

Addressing the needs of B-end industry users, 天工3.0 has also upgraded its knowledge base capability, arbitrary tool calling capability, and complex role command tracking capability, allowing enterprise users to build exclusive knowledge bases and Agents by uploading knowledge documents, and achieve practical capabilities such as automatic tool calling and complex command following Agent construction.

Finally, 天工3.0's content generation capability has also been comprehensively upgraded. Compared to 天工2.0, 天工3.0 has significantly enhanced content creation capabilities, able to accomplish AI music generation, AI voice, AI conversation, AI anime character generation, and more. Through specialized Agent training, it can also generate images in real-time based on text requirements during conversations, perform real-time content analysis, and construct charts based on text requirements.

The most critical feature of 天工3.0 is independent thinking.

天工3.0 can break down and optimize complex tasks, independently think at each step, and determine whether to call different tools at each step. Based on its independent thinking ability, 天工3.0 has added advantages such as networking, text-to-image, and coding capabilities, while enhancing the performance of AI search capabilities.

It is worth noting that 天工3.0, with its powerful performance and strong technical innovation, has chosen an open-source strategy. In the context of the rapid development of global open-source large models and the flourishing AI applications based on the open-source ecosystem, the open-source nature of 天工3.0 undoubtedly injects vitality into the Chinese AI open-source cause, comprehensively improving the overall technical level and industrial foundation of Chinese open-source large models.

While choosing to be open-source, 天工3.0 has also established an Intelligent Agent Plaza to help developers customize and build intelligent agents. Its better adaptation to Chinese language needs makes 天工3.0 more compatible with Chinese development requirements.

With enhanced capabilities, richer functions, and closer integration with developers, 天工3.0 has become a cornerstone - the cornerstone of "Chinese AI is not absent."

This time, Chinese AI is not absent

In the past, the AI industry has always believed that there is a time difference between us and the top AI companies in Europe and America. After the hot new models and technologies, the Chinese AI industry began to follow and learn.

However, in the wave of AI music generation models, the situation is obviously different. Why can we arrive at this stage at the first time, without following or anxiety, and start our own performance?

In fact, Kuaishou has always maintained a high level of attention in the direction of AI music generation. As early as December 2022, Kuaishou released the "Kuaishou TianGong" AIGC full series open-source algorithms and models, including multimodal music content generation large models. Since then, Kuaishou has also developed several collaborative projects based on AI music generation models, gaining rich practical experience. Kuaishou's attention and emphasis on AI music generation is the strategic guidance for 天工SkyMusic.

The release of 天工3.0 and 天工SkyMusic demonstrates a value logic: In the era of rapid development of AI large models, in order to improve innovation efficiency and seize strategic opportunities, it is necessary to first nurture a foundation. This foundation can integrate various advanced AI technical capabilities.

Kuaishou Group Chairman and CEO Fang Han believes that SOTA refers to the global number one in technical indicators in the field. OpenAI is the SOTA for large text models and video generation models, while Kuaishou has currently achieved the SOTA for music AIGC.

The reason why this can be achieved is that 天工3.0 provides a foundational level of AI capabilities, thereby achieving the unity of development efficiency and development quality.

Kuaishou founder Zhou Yahui believes, "In the next thirty years, a major change in human society is that human perception will transform into expression, and the entire human society will increase self-expression by 1000 times; creation and self-expression will be the fastest-growing curve in the entire social and cultural field in the next 30 years, with more and more people expressing themselves, expressing their understanding of the world, their attitudes towards social matters, and this expression will become more artistic and interesting; this kind of expression was very difficult in the past because of high tool thresholds, but the next 30 years will be the 30 years of self-expression, and we need to use AI to lower the threshold of human creation enough to allow people to fully realize self-expression."

Based on Kuaishou's strategic layout of "All in AGI and AIGC," 天工3.0 has become an AI large model that integrates multiple cutting-edge technologies such as natural language processing, computer vision, multimodal, AI search, and AI intelligent agents. Based on the foundational capabilities provided by 天工3.0, Kuaishou can keenly step up development opportunities and seize industry trends such as AI music creation. Meanwhile, developers can obtain diverse and rich AI capabilities, bringing AI to every corner of various industries.

Nurturing various AI technologies through super models, embracing all industries, this is the unique and pleasant Oriental melody in the global AI music festival.

In its initial stage, 天工SkyMusic has already brought rich music creation enjoyment to users. With continuous optimization and upgrades in the future, we will have a professional and user-friendly music creation platform. Around 天工SkyMusic, an ecosystem of AI music creators and new music stars may emerge, influencing the development of the music industry.

Behind the wave of global AI music, we can see a profound transformation from "Chinese AI is not absent" to "how Chinese AI leads."


Share To




