AI big models are going to revolutionize AI.

CN
巴比特
Follow
1 year ago

Source: Economic Observer

Author: Shen Yiran

Image source: Generated by Wujie AI

In April of this year, several researchers from a leading artificial intelligence company coincidentally focused on a new technology: SAM (Segment Anything Model). The researchers quickly reported this technology to the department head of the company, which originally specialized in machine vision technology, and the technology the researchers noticed was related to this. "With the emergence of SAM, more and more AI practitioners have realized that large models are a challenge for them," one of the researchers said.

A month later, the company began to allocate resources to develop visual large models.

In the following three months, leading machine vision AI companies began to pay attention to the potential of this technology. As of now, artificial intelligence companies such as SenseTime Technology, CloudWalk Technology, as well as traditional security companies, have all started to invest in this new technological competition.

SAM is a general-purpose image segmentation model launched by Meta in April of this year. Similar to chatting with ChatGPT, humans can use some language commands to allow SAM to autonomously identify and analyze the content in the images. SAM is considered to be the ChatGPT in the field of vision.

Enthusiasts around the world use it to draw and cut pictures, but Chinese researchers have recognized the power of SAM: if used in autonomous driving and security monitoring to detect people, vehicles, and roads, it is a large model that fundamentally changes the traditional machine vision approach.

Segmenting and recognizing images are core tasks of machine vision. In the past, for each task of creating a segmented image, it was necessary to train an algorithm, annotate a batch of data, and superimpose small models to make the machine "see" various objects in the image. However, SAM has shown some new features: there is no need to create small models for each specific task, and the machine can autonomously segment any object in any image, even in unknown or blurry scenes, and the operation is extremely simple.

This means that SAM has more universal features and may use this universal feature to significantly reduce the cost of machine vision recognition, thereby changing the business model and competitive landscape based on existing technology.

Since 2016, China, with a huge market, has seen the emergence of hundreds of artificial intelligence companies. In the face of market competition and capital assistance, several AI unicorns have gradually emerged, such as SenseTime Technology, CloudWalk Technology, Megvii Technology, and Yitu Technology. These companies have brought AI into the fields of security, government affairs, and industry, and have built moats based on algorithmic advancements and scale advantages.

But now, with the turnover of technology, this competition may be rekindled.

Feng Junlan, Chief Scientist of China Mobile Group and Vice Chairman of the China Artificial Intelligence Industry Development Alliance, told reporters that large models will bring a new paradigm of artificial intelligence, and the so-called moat in the AI field basically no longer exists under the impact of large models. The emergence of SAM proves the feasibility of visual large models, overturning the research framework, interaction, and production service mode of machine vision.

IEEE Senior Member, Professor at Tianjin University, and AR/VR technology expert Luo Xun told reporters that the previous AI capabilities of leading companies may be weakened to a certain extent due to the rise of universal large models. However, whether these companies themselves will weaken depends on their transformation.

Technical Route

As an important branch of AI, the goal of machine vision is to enable computers to mimic the human visual system and understand and process images and videos.

After 2000, the pioneers of artificial intelligence, Geoffrey Hinton, Yann LeCun, and Yoshua Bengio, broke through the deep learning technology, allowing machines to vaguely simulate the human brain, automatically learn and extract features from a large number of images.

2012 was an important time point. The ImageNet project created by Stanford University professor Fei-Fei Li pushed deep learning into the mainstream: researchers could teach computers to recognize various objects through manual annotation of a large number of images, greatly improving the accuracy of machine vision, reducing costs, and making commercialization possible.

In April 2023, a new change arrived. Meta launched a image segmentation model called SAM. As a large model, SAM not only equips machines with eyes to perceive the outside world, but also gives machines a real brain. It learns to observe, perceive, think, logically reason, and draw conclusions from images, and the operation is extremely simple, similar to giving commands to machines using human language, just like ChatGPT.

In short, it has made it easier to achieve the goal of machine vision, without the need for a large number of image annotations, stacked algorithms, and much less computational power consumed in the process. Jim Fan, an artificial intelligence scientist at NVIDIA, said that the SAM large model is the GPT-3 moment of machine vision. It has already understood the general concept of objects, and can perform image segmentation even in ambiguous situations, such as unknown objects, unfamiliar scenes (such as underwater images), and with ambiguous images.

After Meta released SAM, it also open-sourced the model and its training dataset, and introduced the application scenarios of SAM in AR, VR, content creation, and other fields.

Enterprises and researchers in China quickly judged the potential commercial value of SAM. If used in autonomous driving and security monitoring to detect people, vehicles, and roads, it can fundamentally change the traditional approach of machine vision.

Feng Junlan said that large models will change the supply mode of AI, greatly reducing the complexity of the supply side, and the marginal cost is close to zero. Business parties can express their needs in simpler natural language, no longer needing engineers to communicate with machines using professional instructions such as code, and can flexibly deploy to different models according to their own needs, improving efficiency.

Zhu Bing, Chief Product Officer of Uniview Technology, told reporters, "In the past, doing AI work was like moving boxes, which is actually a relatively low-tech physical activity. When AI empowers single-point scenarios, it is very fragmented and customized, with low pre-sales efficiency, low after-sales efficiency, and low sales efficiency. The upstream and downstream of the industry are quite painful." Zhu Bing cited examples, saying that the investment and cost of developing, collecting materials, calibrating, and customizing algorithms for different scenarios and regions are very high, and the R&D process often faces problems such as material scarcity, long cycles, and difficult optimization of indicators. For customers, the cost of custom development is also a considerable expense.

Now, replacing the original small model approach with large models, without the need for stacked algorithms or a large amount of annotated data, and consuming very little computational power in the process, machines can be given commands in simpler human language, without the need for professional computer programming languages. Zhu Bing said that large models have greatly reduced the R&D and deployment costs of AI, built a series of new approaches, and reconstructed the industry order, especially the computer vision industry. The technological barriers previously constructed by large companies have been leveled, and everyone is back on the same starting line.

Influx

Around the previous generation of machine vision technology, a batch of artificial intelligence companies emerged in China, and the technologies provided by these companies began to be widely used in the recognition of surveillance cameras in public security, subways, and commercial buildings.

The "Four Little Dragons of AI" refers to four Chinese artificial intelligence companies established between 2011 and 2014, namely SenseTime Technology, CloudWalk Technology, Megvii Technology, and Yitu Technology. Their common feature is the core technology of machine vision. The breakthrough in deep learning provided the technical foundation for the rise of these artificial intelligence companies, and China's industrial advantages provided a market for the development of these companies.

After the appearance of SAM, they began to target this technology one after another.

According to several industry insiders, in addition to Yitu Technology, the "Four Little Dragons of AI," including SenseTime Technology, CloudWalk Technology, and Megvii Technology, are all developing visual large models. In the traditional security "Hai Dayu," Hikvision and Uniview Technology are also laying out related technology research and development.

In April, just a few days after Meta launched SAM, SenseTime released the "RiRiXin" large model. Tian Feng, Dean of the SenseTime Intelligent Industry Research Institute, told reporters that the "RiRiXin" series includes multiple large models such as natural language generation, image generation, and visual perception. "Ruying," "Qiongyu," and "Gewu" all belong to visual-related large models.

In May, CloudWalk Technology released the "Congrong" large model, which includes multimodal large models including vision. CloudWalk Technology stated at a recent investor conference that visual large models are very important and will also launch visual-dominated models in the future. This is because the company has strong reserves in computer vision and needs to solve customer-specific business needs with multimodal technology.

Megvii Technology and Yitu Technology have not yet launched large models. Megvii Technology told reporters, "We are developing large models, but have not yet launched and delivered them to customers." In terms of direction, Megvii has selected four research directions: general image large models, video understanding large models, computational photography large models, and automatic driving perception large models, and has made certain breakthroughs.

Su Lianjie, Chief Analyst of Artificial Intelligence at research firm Omdia, told reporters that under the impact of visual large models, the "Four Little Dragons of AI" are rapidly transforming into large models and deploying multimodal large models dominated by vision, which is a relatively reasonable path.

Hikvision said in June this year, "We paid attention to the SAM model when it was first released and conducted a systematic evaluation." Zhu Bing told reporters that the company's self-developed AIoT industry large model "Wutong" is a large model architecture based on a universal large model + industry scenarios + training optimization, which was first released on May 9 and has been tested by the first batch of partners in June.

Hikvision and Uniview Technology are traditional security companies that started with equipment production. After the "Four Little Dragons of AI" entered the security industry and faced fierce competition, they have been actively embracing machine vision technology, but have also lost a certain market share due to insufficient software capabilities.

Currently, AI companies have reached a consensus on the "epoch-making" significance of large models.

Tian Feng, Dean of the SenseTime Intelligent Industry Research Institute, and Yao Zhiqiang, Co-founder of CloudWalk Technology, both told reporters that AI 1.0 was the era of small models, where enterprises mainly provided proprietary small models and used multi-point technology to solve specific scene needs. AI 2.0 is the era of large models, where enterprises need to build a unified large-scale technical platform, that is, to create a basic model with a general perception and cognitive ability of the world, and generate a series of industry-specific small models based on this to solve professional scenarios and the needs of more massive scenes.

Yao Zhiqiang believes that if an AI company still stays in the previous stage, it may be able to solve many scene problems, but the cost is difficult to reduce, making it impossible to demonstrate economies of scale. Tian Feng believes that the two eras coexist for a long time and are not in a confrontational relationship of who will eliminate whom. Both of them complete the task in a coordinated manner through model coordination. For example, using a mixed expert model structure (MoE), the AI 2.0 era is a combination of multiple models into services, and the 1.0 model can also be embedded.

In the new competition, the original technological accumulation and hardware investment will still play a role.

Tian Feng told reporters, "The 'AI Large Device' Intelligent Computing Center has powerful AI computing power, which can provide training power for 20 large models with a parameter size of hundreds of billions. It is the key equipment for developing and training large models. SenseTime Technology not only uses it for itself, but also opens it to large model startups and research partners."

A person in charge at CloudWalk told reporters that the company's CWOS operating system has inherent advantages in integrating super language models like ChatGPT. At the same time, the system can feed back data and information to the large model based on actual production conditions, optimize the training and adjustment of the model, and improve the accuracy and efficiency of the model.

Large Models Breaking Through the Market

"Even without the impact of large models, the 'Four Little Dragons of AI' are in a period of transformation and need to think about their own value and future," Su Lianjie said.

A group of artificial intelligence companies have been favored by capital and the market, including SenseTime Technology and CloudWalk Technology, which have been listed on the capital market. From 2018 to 2022, SenseTime Technology's annual R&D investment exceeded 12 billion yuan, and it raised over 5 billion yuan in funds during its IPO in 2021. CloudWalk's cumulative R&D investment exceeded 2.2 billion yuan from 2018 to 2022, and it raised 1.7 billion yuan in funds during its IPO in 2022.

The good interaction between technology and capital also gave China a leading advantage in the field of visual recognition for a time. Around 2018, China's artificial intelligence in terms of the number of published papers and the amount of AI financing was second only to or even surpassed the United States in many aspects, especially in the field of visual recognition, where Chinese AI companies repeatedly broke records and achieved excellent results in international competitions.

However, soon after, with the push of the market, the potential of existing technologies gradually reached its peak. In 2019, Zhang Bo, an academician of the Chinese Academy of Sciences, warned in an interview with Economic Observer that the potential of industrial applications on the existing technological route may have already hit a ceiling.

More importantly, from a business perspective, the original AI technological route has always been difficult to break through the cost bottleneck, so that more traditional industry customers cannot afford it. Zhu Bing said, "For many years, we have not seen a thriving new order. Many companies are fiercely competing in the fields of human and license plate recognition. The fundamental reason is that more algorithms cannot form economies of scale."

A researcher at a leading AI company told reporters that according to the traditional approach, when an AI company serves a car factory and sells an algorithm to recognize obstacles, the average cost of recognizing a single obstacle with one algorithm is tens of thousands of yuan, and it takes about 2 months. The customer also needs to provide tens of thousands of images for annotation. However, just one algorithm is not enough, and the actual road scenes are very complex. An algorithm suitable for small cars may not be suitable for large trucks, and a change in perspective may make it unrecognizable. It is also difficult to recognize when the detection target is partially obscured.

To increase the intelligence of the equipment, AI companies need to stack multiple algorithms, which simply means stacking many small models. According to financial reports, SenseTime Technology has accumulated 67,000 commercial small models. Reporters learned from CloudWalk Technology that the company also has thousands of commercial small models.

But the training time and cost have doubled.

Feng Junlan told reporters that many AI companies find it difficult to make money, and one important reason is the high cost of AI services, causing companies to "lose five yuan for every one yuan earned." This mode of "the more orders, the more losses" makes it difficult for the supply side to continue, and the demand side can only be a few key industries or industries with strong payment capabilities.

According to financial reports, CloudWalk Technology accumulated losses of 3.1 billion yuan from 2018 to 2022, and SenseTime Technology accumulated losses of over 40 billion yuan.

To further reduce AI costs and improve the market, there has been differentiation in the strategies of the "Four Little Dragons of AI." SenseTime chose AI large devices, CloudWalk chose the operating system, Megvii chose chips, and Yitu chose the Internet of Things.

From this perspective, large models may bring not only challenges to existing companies, but also a completely new business model and application scenarios.

The above-mentioned researcher said that the company has tried to find AI business in more markets. For example, the company once discussed AI monitoring with a supermarket to detect whether the salesperson was present. The company sent out 5 algorithm engineers, and the salary alone cost 300,000 yuan, but the total monthly salary of the dozen or so salespersons at the customer's store was less than 50,000 yuan. They also discussed AI quality inspection with a factory to detect whether the packaging boxes on the assembly line were damaged. The other party evaluated that hiring workers would be more economical, and so on.

These demands are collectively referred to as the long-tail demands of AI: a large number of small and medium-sized customers with weak payment capabilities, who do not have a strong demand for AI, but have some special requirements in certain scenarios that may or may not be used, and are unwilling to pay costs of millions. In the view of this researcher, in the future, a certain type of large model or a set of multimodal large models can be applied to these visual detection scenarios, using the transfer and general capabilities of large models, requiring only a small amount of data annotation and algorithm investment, with lower development cycles and lower computational requirements. This will greatly reduce costs, and customers are more likely to pay.

Zhu Bing has calculated that in the past, the satisfaction of fragmented demands based on small model AI algorithms was less than 10%, and in the future, based on large models, it will likely increase to over 50%. The overall efficiency improvement of long-tail algorithms can reach an order of magnitude of 10 times, and the customization time can be reduced to within one person-week.

Yao Zhiqiang told reporters that once the technology is platformized and standardized, all AI companies can quickly adapt to massive scenarios and achieve massive applications through a unified core technology platform.

Feng Junlan said that the cost of technology is far less than the value it brings to the business. When this formula is satisfied, technology can achieve economies of scale and migrate to more and longer-tail markets. This also meets the fundamental logic of AI companies to achieve profitability, and also means that they have the opportunity to open up more blue ocean markets.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

注册返10%、领$600,前100名赠送PRO会员
链接:https://accounts.suitechsui.blue/zh-CN/register?ref=FRV6ZPAF&return_to=aHR0cHM6Ly93d3cuc3VpdGVjaHN1aS5hY2FkZW15L3poLUNOL2pvaW4_cmVmPUZSVjZaUEFG
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink