Source: Duo Jing
Image Source: Generated by Wujie AI
Recently, at the 6th World Voice Expo and the 2023 iFLYTEK Global 1024 Developer Festival, iFLYTEK Chairman Liu Qingfeng announced the official release of the Spark Cognitive Large Model V3.0, which directly competes with GPT3.5. In addition, Spark V3.0 has also upgraded heuristic dialogue and AI character functions, providing users with a more personalized AI assistant.
Whenever major companies make "big moves" with large models, ChatGPT is always "taken out for a spin." So, as education is considered one of the "smoothest" scenarios for the implementation of large models, can the Spark Cognitive Large Model V3.0 compete with GPT3.5 in this aspect? The author selected the Chinese language test questions from the 2022 National Unified Entrance Examination for Ordinary Higher Education Institutions (Volume A) to explore the Chinese proficiency of both models. Furthermore, from the perspective of teachers, to become a "personalized AI assistant" for teachers, a certain level of instructional design ability is also required.
To evaluate Chinese proficiency, a set of college entrance exam papers may serve as a good touchstone. At the beginning of GPT-3.5's release, some people tested its proficiency using papers from different subjects, and it performed well. Today, let's compare the large models' database, logical reasoning, and writing abilities using college entrance exam papers.
(1) Cultural Knowledge
In the "Book of Songs · Wei Feng · Mang," the male and female protagonists have a past of stealing joy, ", ," describing their joyful interaction in childhood.
"'Although nine deaths, yet not regretful' doesn't seem quite right." For this question, iFLYTEK Spark directly provided the correct answer.
(2) Classical Chinese Reading
The excerpt is from "Strategies of the Warring States · Qin Strategy II."
Please segment this long sentence and explain its meaning in modern language.
不榖不烦一兵不伤一人而得商于之地六百里寡人自以为智矣诸士大夫皆贺子独不贺何也
The correct answer is: B. 不榖不烦一兵/不伤一人/而得商于之地六百里/寡人自以为智矣/诸士大夫皆贺/子独不贺/何也.
iFLYTEK Spark successfully segmented the sentence in 3 places, while GPT-3.5 only succeeded in 1 place. Moreover, in the translation of "子独不贺" (only the child does not congratulate), despite the lack of contextual language, iFLYTEK Spark was able to accurately identify the position of the subject and object, achieving a successful translation of the entire sentence.
(3) Essay Writing
Topic:
In "Dream of the Red Chamber," there is a scene where after the completion of the Grand View Garden built by Lady Yuan (Jia Yuanchun) for her mother's visit, people inscribed the names of the pavilions on the bridges in the garden. Some suggested taking the word "翼然" from Ouyang Xiu's "Record of the Pavilion of the Drunken Old Man," while Jia Zheng believed that "This pavilion is built on water," and suggested taking the word "泻" from "flowing out between the two peaks." Jia Baoyu felt that "沁芳" was more elegant, and Jia Zheng silently agreed. "沁芳" pointed out the beautiful scenery of flowers and trees reflecting on the water, avoiding clichés, and also fitting the theme of Lady Yuan's mother's visit, with subtle and thoughtful implications.
Based on the above material, people inscribed the names of the pavilions on the bridges in different ways, directly, by borrowing and adapting, or by creating based on the situation, resulting in different artistic effects. This phenomenon can also provide inspiration in a broader context, prompting deeper reflection. Please write an essay of about 800 words, combining your own learning and life experiences.
At first glance, both essays as college entrance exam topics have a common flaw. The essays are not material analysis questions, and both spend a large portion of the beginning analyzing the material content, which is not the best strategy in terms of essay structure and writing strategy.
Looking at the logical flow of the essays, both large models can discern the progressive meaning of "from borrowing to innovation," but the relationship between "borrowing and innovation" is clearly distinct. In the case of the Spark large model, "borrowing and innovation do not exist in isolation, but rather promote and integrate with each other," while GPT suggests that "balancing borrowing and originality is necessary." Clearly, based on exam experience, the former is more favored by examiners, while the latter may deviate from the main point.
Considering the use of materials, compared to exam essays, the breadth and depth of material usage in both essays seem insufficient. However, compared to GPT's purely logical reasoning, Spark quoted two ancient poems, gaining an advantage.
Finally, in terms of thematic sublimation, if Spark's essay can extend from the perspective of artistic creation, the "integration of borrowing and innovation" would definitely "precisely target" the core theme of the essay. Unfortunately, the entire essay only discusses artistic creation. In this regard, GPT's essay, from learning and life to entrepreneurship, market, and education policy, appears more grand and expansive.
In conclusion, the writing of this essay shows both strengths and weaknesses in the two large models, but neither reaches the level of "excellence."
By identifying different roles, large models can be both "respondents" and "questioners." This also means that the diverse comprehensive abilities of large models provide favorable support for creating personalized AI teaching assistants for teachers.
One of the most crucial steps is the ability to control the overall teaching process. Therefore, the author made instructional design requirements for Chinese, mathematics, and physics lessons to both large models, and after several attempts, found that there was little difference between them. The design process was complete but slightly lacking in integration with actual lesson content. In comparison, the author chose to compare the teaching of the human immune system in biology.
Interestingly, the Spark large model proposed a specific experimental design, which, compared to GPT's response, took into account the practical nature of biology. However, as far as I can remember, similar experiments have not appeared in high school textbooks. Based on experience, it takes several hours for visible bacterial growth to form colonies. While the experimental design aims to make the impact of the reagent clearly felt, it clearly does not make sense. This also shows that the responses of large models can also have "illusions."
In comparison, GPT's course design is more "valuable," providing a more comprehensive content that not only introduces the course itself but also guides students to think about the relationship between vaccines and human society.
After comparing instructional design, content courseware is also a headache for teachers. Since GPT3.5 can only generate text, this question was left to the Spark large model to complete.
From the table of contents, the various points of knowledge about the human immune system are clearly listed. In the courseware, the presentation of the points of knowledge and the emphasis on key points are to some extent clear. However, on the page about "natural killer cells and antiviral function," there are some disruptive words in the text, and the preceding and subsequent points of knowledge also show a certain degree of repetition and redundancy. In terms of presenting illustrations, the problem of "irrelevant image and text content" is quite obvious, with various styles, themes, and professions appearing in the illustrations, without examples from biology textbooks.
Of course, when choosing other subjects, the contradictions and problems will not be so sharp. For example, writing an introduction to a certain fruit or animal would reduce the sense of dissonance. However, these issues also reflect the expectations for future AI educational tools. Perhaps, at present, if a teacher needs to create presentation slides, AI would not be the first choice.
Similarly, at the recent Baidu World Conference 2023, Baidu officially announced the release of the Wenxin 4.0 version. Baidu's founder, chairman, and CEO Robin Li stated: This is the most powerful Wenxin large model to date, with a comprehensive upgrade of the basic model, significant improvements in understanding, generation, logic, and memory capabilities. In Robin Li's words, the comprehensive level of the Wenxin large model 4.0 is no less than that of GPT4.
Less than ten days after the Baidu conference, the Spark Cognitive Large Model V3.0 was officially released, directly competing with GPT3.5.
Earlier today, the DoNews official account published an article titled "Aiming at GPT-4, How Strong is Baidu Wenxin 4.0?" The article evaluated the actual level of the Wenxin large model 4.0 and the still-free GPT-3.5 in the Chinese domain based on the industry's commonly used dimensions of language understanding, reasoning, generation, and memory, as well as the national civil service exam "Administrative Aptitude Test" real questions. According to the evaluation results, the overall level of the Wenxin large model 4.0 is superior to GPT-3.5, especially in understanding and generation, showing surprising performance.
After comparing several questions, it can indeed be seen that in terms of Chinese output, the accuracy of the Spark Cognitive Large Model V3.0 is higher, and its overall performance is superior. However, the evaluation questions are limited, and a more comprehensive judgment is still needed.
Since March of this year, every time a large model is released, ChatGPT is "taken out for a spin" and compared from various dimensions. But returning to the essence of its model, understanding is always one of the core demands of large model users. Regarding this point, the compatibility of education with large models has been widely recognized. Therefore, over the past year and a half, there have been many actions to combine large models with education, including the development of intelligent hardware devices equipped with large models and the integration of large models into online learning platforms, as well as the development of mathematical large models by companies such as Xueersi.
On the one hand, the promotion of fair and inclusive education and the increasing demand for personalized learning require technology, which is the remedy for these pain points. On the other hand, the education industry's capital investment has been dormant for a long time, and AI+ education carries too many expectations.
Under its reputation, it has also raised some concerns.
Since the introduction of ChatGPT earlier this year, the battle of the large models has begun both domestically and internationally. According to related data, as of October 23, the number of large models in China has reached 130, surpassing the 114 in the United States, ranking first in the world. The "battle of the large models" is no longer an exaggerated rhetoric, but an objective reality.
On the consumer end, various large models are continuously exploring the imagination of their applications in various scenarios. On the GPT-3.5 page, these four functions seem somewhat simple.
However, as similar AI drawing and AI chatbot applications flood the market with similar appearances and functions, people's novelty for these applications gradually dissipates, leaving them with more entertainment value than professional value.
Currently, the application of large models has expanded from the consumer end to the business end. Various companies have launched "large model stores" for enterprises to alleviate the pressure of high research and development costs in the business end. However, due to the need for ecosystem construction and user familiarization, it may still be too early to talk about making money from large models for various companies.
Perhaps, domestic large models do not necessarily need to be compared with GPT. The one that can gain higher "retention" in the fierce market competition and truly achieve practical application in various scenarios will ultimately prevail.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。