Digital human, crazy for 180 days

CN
巴比特
Follow
2 years ago

"Compared to the past two months, some noisy and 'cutting leeks' type of roles are accelerating their exit, and the market heat is gradually returning to the state at the beginning of the year."

Original source: 数智前线

Image

Image source: Generated by 无界 AI

Under the trend of large models, the digital human race track has become lively. In various industries such as culture and tourism, e-commerce, and finance, a variety of virtual digital humans are replacing real people, playing the roles of spokespersons, hosts, customer service, and intelligent assistants.

The number of market participants is visibly increasing. Internet giants, startups, traditional AI companies, and some digital service providers who previously focused on intelligent customer service marketing have all entered this track. Lu Yanxia, research director of IDC China, told 数智前线 that the entrance attribute is the reason why many enterprises are competing to lay out this track. Under the trend of generative AI, digital humans are seen as one of the entry-level products for natural language and machine interaction, driving the increase in market heat.

In June 2022, IDC predicted in a report that the market size of China's AI digital human market will reach 10.24 billion yuan by 2026. With the increasing heat, the market size of digital humans may reach this level even faster.

It is worth mentioning that at present, we are still in the early stage of large model-driven application landing for digital human products. On the one hand, the industry believes that the change in the scale of the digital human market will only be reflected in the market after it has been scaled next year. At the current stage, factors such as technological maturity, cost, and efficiency are still constraints. On the other hand, different vendors are showing a trend of differentiated competition, and enterprises are building their own barriers based on their accumulated advantages.

A few days ago, the GPT-4V version was updated, and the Text To Speech (TTS) technology made progress. The performance of text-driven speech in terms of pauses, accents, and naturalness of interaction has been greatly improved. Some senior professionals believe that the landing of large model-driven digital humans is expected to accelerate and usher in an outbreak.

The booming digital human race track

The digital human race track has visibly heated up this year. Since February this year, the WeChat index for the term "digital human" has reached several times to tens of times the level of last October.

Le Cheng, CEO of AI video live SaaS startup Tecking Technology, told 数智前线 that compared to last year, the heat of the entire track has significantly increased this year, especially in the first two months, showing a state of flooding and even internal competition.

"Last year, there were only a few companies experimenting, floating in the air, mainly in the direction of the metaverse and 3D digital humans, with overall high costs and difficulties in commercialization. This year, it suddenly came down to the ground."

In the hot market, some irregularities have also emerged, with micro-business agents entering the gold rush. According to industry insiders, there are probably more than a thousand agents selling various digital humans on the market.

Major companies, startup teams, AI companies, and some digital service providers who previously focused on intelligent customer service marketing are all making frequent moves in this track.

The layout of major companies has long been in place. Platforms such as Tencent, Baidu, Alibaba, JD.com, and Volcano Engine have previously launched digital human product platforms or services based on the concept of the metaverse or multiple scenarios such as live streaming for product sales. For example, Tencent Cloud Xiaowei released a matrix of digital human products in November 2021, providing five styles of digital human products including 3D ultra-realistic, 2D real person, and 2D cartoon. In 2021, Baidu also released the Baidu Intelligent Cloud Xiling platform at the AI Developer Conference, which provides digital human production, content creation, and business configuration services. Baidu also created digital human IPs such as "Du Xiaoxiao".

With the arrival of large models, manufacturers have launched new digital human platforms, and the production efficiency and cost management capabilities have greatly improved compared to the previous stage. Chen Lei, general manager of Tencent Cloud's intelligent digital human products, introduced that in April, Tencent Cloud released a small-sample digital human production platform, which can produce a demo in 12 hours, and the cost has also been greatly reduced to the level of thousands of yuan. Kuaishou's AIGC digital human product Kuaishou Zhibo, released in August this year, also focuses on reducing the production threshold, providing 3-5 minute real-person video and audio materials at a greatly reduced cost.

Well-known AI companies have been actively showcasing their strength. In April this year, SenseTime demonstrated the 2D digital human video generation platform "SenseAvatar" at its technology exchange day, which, according to the official introduction, can generate a natural-sounding and fluent digital human avatar with accurate mouth movements and proficiency in multiple languages using just a 5-minute real-person video material. In July, at the World Artificial Intelligence Conference, SenseAvatar was upgraded to version 2.0, focusing on improving the fluency of digital human speech and mouth movements in multiple languages.

Some companies that have been investing in the digital human track for a long time are also actively launching new products. In mid-August, Moka Technology, a technology service provider that has been involved in the 3D virtual human track for 5 years, launched three consumer-level products: a video AIGC generation platform, an AIGC live platform, and a virtual human service AIGC platform, reducing the application landing threshold of 3D virtual humans from the aspects of high quality, low cost, and large-scale replication.

The trend has also attracted cross-industry players, with typical examples being the live MCN agency Qianxun Holdings under Viya's umbrella. On August 8th, Qianxun Holdings' subsidiaries Qianyu Intelligent and Lingke respectively released an AI digital human live broadcast solution and an all-in-one AI intelligent live broadcast comprehensive service platform.

Senior professionals believe that the entrance attribute is the reason why many enterprises are competing to lay out this track. "Generative AI, one of the future entrances is digital humans. Today, we are using a simple web version, and in the future, the experience of digital humans may be richer. For this reason, enterprises have begun to enter this market," Lu Yanxia told 数智前线.

At the consumer-level product launch event in mid-August, Moka Technology founder Chai Jinxiang regarded 3D virtual humans as a future infrastructure. "Like web pages and apps, as an upgrade of a content carrier, it will reshape all industries," Chai Jinxiang said. With this understanding, in addition to consumer-level products, Moka Technology has also developed a 3D virtual human OS for managing future infrastructure.

Intelligent outbound call company Yunfu Intelligent, which is trying to combine the image of digital humans with the conversational ability of intelligent customer service, values the interactivity of digital humans and their future potential. "Recently, there is an expression that I particularly agree with, digital humans are actually the UI of large models," Yunfu Intelligent CEO Wei Jiaxing told 数智前线. "If we look 5 to 10 years ahead, digital humans may be creating silicon-based life. Today, it is just an interactive digital human without a soul, but that doesn't mean it cannot penetrate the soul in the future."

Overall, the trend of large models is igniting the digital human race track. A report from Zhonghang Securities pointed out that with the emergence of the wind-driven AI large model, virtual digital humans will accelerate the release of diverse commercial value. The B-end market for digital human manufacturing and operation services is expanding, and enterprises deeply involved in digital human-related businesses are expected to usher in a golden development period.

Different scenarios, each showing its own strengths

In the hot market, enterprises are eyeing different pieces of the pie.

In different segmented markets such as culture and tourism, e-commerce, finance, and enterprise internal applications, the commercial prospects of digital humans are not consistent. Le Cheng believes that in the future, digital humans should be seen as an ability, and different segmented tracks have different professional requirements for digital human capabilities.

The application of digital humans in the cultural and tourism industry is not new. Digital humans have already played the roles of spokespersons or intelligent tour guides in many scenic spots and cultural institutions. Typical cases include virtual digital humans "Tianyu" based on the "Dunhuang Feitian", "Wenyao Yao" of the China Cultural Relics Exchange Center, "Gayao" of the Dunhuang Research Institute, and "Aiwenwen" of the National Museum. A cultural and tourism company mentioned that with the presence of digital humans, some historical figures can interact with visitors in a personalized manner, enriching the exhibition hall's effects.

Currently, many companies such as Baidu and Tencent are making efforts in this market. A few months ago, a Baidu digital service provider told 数智前线 that they did a project in Hebei, targeting the cultural and tourism market, with a budget of tens of millions of yuan. However, industry insiders also mentioned that the digital human in scenic areas is not priced at tens of millions on its own; it is usually one capability in a complete digitalization solution for the scenic area, and the overall project can reach the tens of millions level. Overall, compared to applications in enterprise service scenarios priced at the thousand yuan level, the cultural and tourism scene can be considered a top-tier market.

IDC introduced that the financial industry is currently a relatively mature field for digital human applications. Taking the banking industry as an example, the earliest domestic "hiring" of digital employees was done by Shanghai Pudong Development Bank, which, in 2019, collaborated with Baidu Intelligent Cloud to create the 3D digital human "Xiao Pu". It is reported that "Xiao Pu" is currently serving in over 20 positions, including wealth planners, document reviewers, lobby managers, and telephone customer service. In early September, Wu Lianfeng, Vice President and Chief Analyst of IDC China, mentioned at the Bund Summit Banking Digitalization Forum that by 2025, over 80% of banks will deploy digital humans, handling 90% of customer service and financial consulting services.

An IT manager responsible for the wealth management sector of a city commercial bank told 数智前线 that they also plan to purchase and deploy a set of digital humans, and are currently in the stage of closely inspecting the digital human solutions of other banks and products from different vendors. "Grassroots employees have a lot of targets and cannot free up their hands to do more important work," the person said. Digital humans can free them from heavy customer service work and allow them to do more important customer maintenance and operational work. Currently, many vendors including Volcano Engine, SenseTime, Tencent Cloud, Baidu Intelligent Cloud, and JD Cloud have landing application cases of digital humans in the financial industry.

In the e-commerce live streaming scene, many top brands have started to try out digital human live streaming solutions. Le Cheng mentioned that big brands are actively trying out digital human tools in connection with the AI strategy of the company's top management. Currently, they have served many key account (KA) brands such as P&G and L'Oreal, and the data shows that digital human hosts have achieved 70% of the sales volume of real human hosts.

It is understood that there are two service models for digital human services in the e-commerce live streaming scene: one is to provide KA brands with a packaged service of digital human live streaming software and operation, usually priced between two to three thousand yuan per month. The other is to purchase a set of software and do the streaming themselves, with the market price currently between two to four thousand yuan.

Many companies are interested in the live streaming market, and there is also a phenomenon of uneven product solutions, including some "cutting leeks" behavior. A senior professional in the e-commerce industry mentioned that currently, brands with good data after using digital humans generally have the characteristic of having strong product capabilities, and traditional unmanned live streaming methods also perform well. After using digital humans, the effect is further improved by several times.

"The digital human vendors who boast about how amazing their digital humans are at selling products are just cutting leeks. The more they boast, the sharper their sickles," the person said. Currently, digital humans can only automate the sales of products that real humans can sell well at low cost and at scale.

IDC pointed out that the products and solutions of various players currently have differences in application direction, and enterprises are building digital human scenarios based on their own advantages. While major companies have certain advantages, smaller companies can choose their own paths, leading to differentiated competition.

Wei Jiaxing told 数智前线 that when they entered the digital human track, they chose some difficult and tiring scenarios. For example, using digital humans for customer service on the official website is both vertical and tiring, and the unit price is not high. Generally, the annual cost of regular official website customer service is around 2000 yuan, and adding a digital human capability may not exceed 5000 yuan. This is a market that the giants do not pay attention to, and ordinary startups are starting to work on it, but their capabilities cannot keep up. This kind of differentiated competition is an opportunity for companies like theirs.

The eve of large-scale landing

Although the volume is not small and actions are frequent, the industry generally recognizes the challenges that still exist at present.

Lu Yanxia observed that large-scale model applications have not yet landed on a large scale, and there will be data changes in the market next year. At the current stage, the development cycle, development cost, image customization, and true AIGC (AI-generated content) are all challenges.

As an example of technological maturity, many digital human products currently have relatively stiff performance in terms of speech, expressions, and interactive behavior. Some senior professionals even believe that immature solutions may wash potential customers out of the market.

However, under the wave of AIGC, the speed of technological iteration is also very fast. Le Cheng told 数智前线 that they have seen signs of breakthroughs in the combination of large models and digital humans in the text-to-speech technology (TTS). "Previously, there were problems with the naturalness of text turning into digital human speech, and the connection was not easy. The large model is one line, and the digital human is another line. They need a breakthrough in TTS technology to achieve a good integration."

At the end of September, OpenAI released a new version update GPT-4V, with TTS technology supported by a brand-new TTS model. It can generate human-like audio from text and a few seconds of sample speech, combined with the Whisper model's speech-to-text, ensuring the quality and fluency of user interaction with ChatGPT through speech.

Industry observers have noted that in some new versions that have already been tested, the performance of text-to-speech is quite amazing, and AI is very close to humans in terms of pauses, intonation, and cadence. "I judge that when TTS technology is mature end-to-end, it will bring about a significant change in the industry landscape," Le Cheng said. It is like having a glue that can combine the two lines of large model-driven digital humans, and then enterprises can optimize the performance of digital humans.

The presentation of the value of digital human products and their ability for large-scale replication is also a focus of the industry.

Chai Jinxiang, founder of Moka Technology, mentioned that a major pain point in the early development of the virtual human industry was the problem of large-scale replication. From the virtual humans in the long content era of animation, movies, and games to the virtual idols in the short content era, such as Hatsune Miku and Liu Yexi, including Ling, the virtual idol created by Moka in the early days, were all handcrafted, with long cycles and high costs.

An observer mentioned that the top virtual human idol "Liu Yexi" previously required a creative team of over a hundred people, and the investment cost for producing a work could exceed a million.

Chai Jinxiang mentioned in an interview with 数智前线 that Moka's AIGC technology has overcome the problem of virtual humans in the content industry not being able to be replicated on a large scale. In addition, if consumer-level products want to be continuously used by enterprises, they must solve the pain points of the enterprise, and the return on investment (ROI) must be worth it. "We need to think from the end, whether our product has the ability to bring value to the enterprise and whether it has the ability to achieve a positive ROI." In recent years, their product strategy has also focused on large-scale replication, professional capabilities in segmented industries, and high-quality images and interactive capabilities.

The industry places great emphasis on reducing the usage threshold of products, and many vendors have mentioned achieving one-click generation of digital humans using extremely small sample materials. In the e-commerce scene, many companies provide a digital human operation service model to lower the threshold for brand customers to use digital humans. In this model, technology and service are integrated, and enterprises can entrust the entire digital human-related work to the agency, without having to edit videos themselves or operate the digital human backend, and only need to pay the software and service fees on a monthly basis.

This business model has already led to an overlap between the role of digital human service providers and the roles of traditional MCN agencies and operation service providers in the e-commerce scene. Just as the company under Viya's umbrella provides a digital human live streaming platform and tools, the service scope of digital human vendors is also expanding. Observers believe that in the future, with the large-scale application and landing of technologies like digital humans in multiple industries, the boundaries and integration of different types of service provider roles are major trends.

Some practitioners believe that in the future, digital humans will replace the original white-collar roles in many enterprise service scenarios, with unlimited market space. However, some people believe that in the case of live streaming for product sales, social platforms will not let all hosts be replaced by digital humans in terms of traffic mechanisms, so there will be an upper limit on the market size.

After the noise of the past half year, practitioners have also observed that the market is gradually returning to a rational state. "Compared to the past two months, some noisy and 'cutting leeks' type of roles are accelerating their exit, and the market heat is gradually returning to the state at the beginning of the year," Le Cheng told 数智前线, and in the long run, the companies that will remain are those that focus more on technical accumulation.

The industry consensus is that the track has a long cycle, and the current industry development is still in the early stage. Lu Yanxia previously pointed out, "On the one hand, industry users can introduce AI digital humans from relatively mature application scenarios; on the other hand, they also need to maintain patience for application scenarios and not set overly high expectations."

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink