AI big model, let the phone truly start to "intelligent"

Image Source: Generated by Wujie AI

After more than a decade of development, most smartphone systems have become more sophisticated, and their functions have become increasingly similar.

However, this is just superficial. In reality, various manufacturers are accumulating strength and brewing innovations. This year, with the rapid application of large models, the horn of evolution has finally begun to sound.

This revolution is definitely closely related to large models, but it is not only about large models. Currently, the mainstream in the field of large models is centered around server deployment. What does it really mean to combine smartphones with large models? What are the limitations and advantages?

To understand this issue, we need to go back to an earlier time.

Last year, OPPO unveiled its self-developed intelligent cross-device system "Pantanal" at ODC2022. Initially, it seemed somewhat abstract, but over the past year, Pantanal has already been preliminarily implemented. Through the two major experiences of intelligent cross-device and ubiquitous services, it achieves cross-device collaboration at the lower level and supports the intelligent circulation of services between different applications at the upper level.

At the just-concluded ODC2023, OPPO officially launched AndesGPT, integrating AIGC capabilities with Pantanal and incorporating them into ColorOS 14. On the surface, the large models and intelligent cross-device systems that OPPO has long been deploying are two different technological foundations, but today, they have produced an important chemical reaction.

The future blueprint of mobile application and interaction revolution is gradually becoming complete.

01 System "Decoupling," Integrating Services and Data

Last summer, OPPO first introduced its self-developed intelligent cross-device system "Pantanal."

At that time, most people's attention was focused on the concept of "cross-device," thinking that what OPPO was preparing to do was mainly the flow of data and function relay between different hardware. For example, "copy on the phone, paste on the computer," "receive a call on the phone, answer on the tablet," and other functions…

However, in reality, the most important thing that OPPO has achieved through Pantanal over the past year is to act as a bridge, deeply connecting the system, applications, and services through ubiquitous services and intelligent cross-device.

From the birth of the smartphone, the unit used to organize functions has been the App. In the early days of smartphone systems, represented by iOS, there were only two core levels: one was the desktop, with icons of various Apps arranged on it; the other was the App, which contained all the functions when opened.

At that time, Apple first shouted the slogan "There's an app for everything."

With the development of the mobile internet ecosystem, the ecosystem of Apps has finally become saturated, and many Apps have become bloated, with a lot of overlapping functions.

For users, bloated Apps and redundant functions have become more and more burdensome. It has become very troublesome to switch between Apps repeatedly for a small function. Typical scenarios include replying to messages while tracking the delivery progress of takeout, checking emails and browsing the web while navigating with maps…

The App is the most critical form of organizing functions on a smartphone, but it should not be the only form, and it should not be a completely encapsulated system. To optimize the experience of the App, it is necessary to "decouple the system."

This problem has always existed, so Android introduced "widgets" very early on, attempting to solve it at the user interface level. However, early widgets were completely voluntary for App developers and did not receive sufficient promotion.

After the release of "Pantanal," OPPO participated in a more open manner, decoupling the entire system into atomized services, providing services centered around people, and offering intelligent displays on mobile desktops, smartwatches, and even through headphones.

Over the past year, Pantanal has promoted the implementation of "ubiquitous services" for application scenarios such as travel, takeout, navigation, and express delivery. The first batch of supported Apps includes Alipay and Meituan, and ColorOS 14 now supports Xiaohongshu, Ctrip, and Qunar.

Through ubiquitous services, users can access services more flexibly and conveniently, reducing unnecessary operational steps.

This is not something that only OPPO is doing. Apple's iOS "instant dynamics" and "widgets" are also doing similar things. Decoupling the "services" from the App within the system, increasing the flexibility of configuration, and more efficiently utilizing the content and priority of phone information to reduce the cognitive and operational burden on users, has become an industry trend.

At the same time, it is not enough to just allow services to exist outside of the "App." The file data within the App will also experience fragmented experiences due to different system ecosystems, for example, an iPhone can open a keynote received on WeChat, but an Android phone cannot. This kind of fragmentation between the Android ecosystem and the Apple system is happening every day.

This situation is gradually being addressed through "intelligent cross-device." Pantanal breaks through data barriers with an OPPO account, achieves data collection, and cross-device calls, allowing services and data to not be limited to the phone but to flow between multiple devices and systems. This allows services to appear on different devices and systems for users: smartwatches, tablets, computers, headphones, televisions, cars… always within reach.

For example, at last month's OPPO Find N3 smartphone launch event, Apple system office software format files could be quickly opened on an Android phone, and this function does not require the installation of third-party applications. With PhoneLink on ColorOS 14, users can directly operate and use mobile Apps on the Windows system, access the phone's photo gallery, and transfer files.

Decoupling the basic organizational unit of phone functions from "App" within the system to "services," allowing it to flexibly flow between multiple devices, will bring about profound changes.

Because with the enrichment of ubiquitous service scenarios and the increase in devices that intelligent cross-device can flow to, another problem will quickly arise: how to achieve accurate and intelligent recommendations, allowing users to conveniently access them?

This requires the innovation and upgrade of the phone's interface, interaction, and machine learning models for intelligent recommendations.

The entry point for large models has emerged.

02 Large Models, Not Just "Large"

In the past year, the core keyword for the development of large models has been "large."

The scale of parameters is the core reason why large models demonstrate astonishing intelligence, as the saying goes, "great power brings great responsibility." The larger the scale and the more layers of parameters, the more finely the large model can understand the data, ultimately fitting more realistic results.

If wisdom is likened to sound, human wisdom is a continuous analog signal, while AI is like a digital recording playback device, and the sampling rate determines the sound quality. The higher the sampling rate, the closer it is to natural sound, and to a certain extent, the human ear can no longer distinguish the difference. Similarly, with a large enough scale of parameters, AI can "deceive" humans.

Therefore, the manufacturers who have announced the deployment of large models on smartphones in the past have been questioned to varying degrees by public opinion. Many people believe that the models that can be deployed on phones do not have enough parameters. Deploying multiple devices and models will instead bring about confusion.

If the goal of applying large models is simply to create an "artificial intelligence that can pass the Turing test, knows everything from astronomy to geography," then indeed, parameters determine everything.

However, in reality, whether a model is practical is not judged based on parameters. Ultimately, all models are meant to simulate a part of reality, deduce and fit reasonable and correct results, and meet the needs, rather than arbitrarily saying "bigger is better."

On this issue, smartphone manufacturers have a deeper understanding, with a typical example being the voice assistant.

The core model of the voice assistant is to fit the sound waveform emitted by the user into natural language. For early smartphones, the complexity and computational power of this model were already significant, and most smartphone chips couldn't handle it. So, early voice assistants would send the sound waveform of user voice commands to the cloud, where the model on the server would recognize the commands and send them back to the phone for execution.

But with the development of the NPU (neural engine) on smartphones and the simplification of the voice recognition model, smartphone manufacturers discovered that they could also deploy this model locally and run it directly on the NPU. This brought practical benefits: faster response times, the ability to use it without a network environment, and enhanced privacy and security.

Google was the first to shrink the recognition model of Google Assistant to a size of 500MB and deploy it locally on smartphones.

Currently, the main applications of large models are primarily focused on generative AI and have not yet entered more down-to-earth complex scenarios. When large model applications truly touch on the nitty-gritty of user needs, more issues will arise, with the two most critical being "data security" and "response speed."

From this perspective, it is not difficult to understand why OPPO, when launching AndesGPT, so confidently adopted the "edge-cloud collaboration" technology architecture.

From a billion parameters to tens of billions of parameters, OPPO plans to deploy a series of large models with different parameter scales, balancing response speed and security, while also seeking to improve the upper limit of large model capabilities.

Through edge-cloud collaboration, AndesGPT will intelligently call different models based on the gradient understanding of user commands and task requirements. For example, if a user's command is simply to search for a contact in the phone, it will quickly respond through the edge-side model. If the user asks for more complex knowledge, it will use the cloud-based large model to generate a more complex and accurate answer.

The capabilities of AndesGPT cover a wide range, from "intelligent summarization" and "intelligent elimination" to semantic-based multimodal information search, deepening the understanding and memory of user personalized habits…

According to internal sources at OPPO, these capabilities will also be supported on the edge side. They have already successfully run a 13-billion-parameter model on the edge, breaking through the upper limit of edge-side large model capabilities. These model capabilities, combined with the billion-level user scale of smartphones, will unfold a broad future for large model applications.

03 The Second Revolution of Smartphones

From Pantanal to AndesGPT, OPPO has demonstrated its determination to drive change.

At ODC2023, OPPO also announced plans to introduce "conversational interaction" into various system applications to simplify the user's smartphone experience.

"Conversational interaction" easily brings to mind the voice-first trend around 2017. At that time, many people believed that smart speakers and voice assistants would become the key to the next generation of human-computer interaction. However, the trend quickly swept through and then quickly declined.

One major reason for this is that past intelligent voice assistants, whether deployed on smart speakers or smartphones, had limited semantic understanding capabilities and limited data and service access. In the end, the functions they could achieve were still only a drop in the bucket compared to smartphones.

But this time, the change revolves around the system, applications, and services, relying on deep integration with large models.

From the development blueprint of ColorOS, it can be seen that, on the one hand, through Pantanal's decoupling of the system, it brings ubiquitous services and intelligent cross-device, making it easy for users to access functions outside of the App. At the same time, through AndesGPT, it promotes the development of "conversational interaction" based on the natural semantic understanding of user needs.

OPPO's Vice President of Software Engineering, Li Jie, stated in an interview that OPPO hopes to provide users with a product similar to a "super assistant" through AndesGPT.

This is where the advantage of using large models on smartphones lies.

On the one hand, smartphones can use local data to understand users, much of which is the most closely related to users' private lives. After obtaining authorization, this data can directly become the "context" of the prompt. On the other hand, smartphones can use various local interfaces and modules to access more App functions.

Here's a simple example. When a user asks a chatbot, "What should I eat tonight?" if it's not detailed in the prompt, chatbots deployed in the cloud won't know the user's regional and ethnic origins, taste and nutritional preferences, making it more difficult to access relevant services. In the end, it will likely only provide some generalized food recommendations. This situation, no matter how much the model's parameters are increased, is difficult to improve.

But by using large models deployed on smartphones, it may not require very large parameters to give recommendations based on the user's location, time, historical order records, and even exercise and health data. The recommendations given can be recipes, a link to a review App, or even directly calling the service module of a food delivery App to generate an order, which the user can confirm with a single click.

This is a basic imagination, and there are many similar scenarios. It can be said that the closer to users' lives, the simpler and more specific the needs, the better and more convenient the effect of assisting users after smartphones are combined with large models.

With the help of large models, using a smartphone will no longer be a process that requires "learning." Users only need to express their needs in natural language. Currently, based on the integration of Pantanal with large models, OPPO has taken the first step with the launch of the "Smart Assistant" on ColorOS 14, which understands user needs through "conversation" and helps users solve complex settings used frequently in daily life, upgrading traditional interaction to a more intelligent and convenient conversational method.

Its application logic is to first combine various complex smartphone settings into "atomized capabilities" oriented toward user needs through Pantanal, and then, after understanding user needs through AndesGPT, match the corresponding capabilities and complete the settings. The smartphone's settings function alone covers nearly 400 items.

So, this revolution is not only a transition from "interface interaction" to "conversation interaction," but also a transition from "users learning to use computers" to "computers actively understanding user needs."

OPPO has already taken the first step in this matter.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

AI big model, let the phone truly start to "intelligent"

01 System "Decoupling," Integrating Services and Data

02 Large Models, Not Just "Large"

Selected Articles by 巴比特

Table of Contents

Related Articles