Charts
DataOn-chain
VIP
Market Cap
API
Rankings
CoinOSNew
CoinClaw🦞
Language
  • 简体中文
  • 繁体中文
  • English
Leader in global market data applications, committed to providing valuable information more efficiently.

Features

  • Real-time Data
  • Special Features
  • AI Grid

Services

  • News
  • Open Data(API)
  • Institutional Services

Downloads

  • Desktop
  • Android
  • iOS

Contact Us

  • Chat Room
  • Business Email
  • Official Email
  • Official Verification

Join Community

  • Telegram
  • Twitter
  • Discord

© Copyright 2013-2026. All rights reserved.

简体繁體English
|Legacy

With only 2.7 billion parameters, the performance is 25 times higher! Microsoft releases Phi-2

CN
巴比特
Follow
2 years ago
AI summarizes in 5 seconds.

Source: AIGC Open Community

Image

Image Source: Generated by Wujie AI

On December 13th, Microsoft officially released Phi-2, a large language model with 2.7 billion parameters on its official website.

Phi-2 is developed based on Microsoft's Phi-1.5 and is capable of generating text/code, summarizing text, mathematical reasoning, and other functions.

Although Phi-2 has a small number of parameters, its performance surpasses that of Llama-2 with 13 billion parameters, Mistral with 7 billion parameters, and Google's latest release, Gemini Nano 2.

It is worth mentioning that Phi-2 is just a base model without human feedback reinforcement learning (RLHF) and instruction fine-tuning. However, in multiple task evaluations, its performance can match or even exceed models with 25 times more parameters.

Currently, Microsoft has open-sourced Phi-1.5 and Phi-1 to help developers deeply research and apply models with small parameters.

Phi-1.5 open source address: https://huggingface.co/microsoft/phi-1_5

Phi-1 open source address: https://huggingface.co/microsoft/phi-1

Phi-1.5 paper address: https://arxiv.org/abs/2309.05463

Image

Currently, there is a strange phenomenon in the large model field, where the model parameters are getting larger and larger. Models with several hundred billion parameters are considered entry-level, and there are numerous models with over a trillion parameters. Some models have even reached tens of trillions.

Models with high parameters are not necessarily bad, but it depends on the application scenario. For basic model service providers like Microsoft, OpenAI, Baidu, and iFlytek, higher parameters mean broader coverage. For example, ChatGPT has evolved to multimodal, capable of generating not only text but also images and understanding audio.

Image

Phi-2 Evaluation Data

However, models with high parameters also have many drawbacks: overfitting; if the training data is poor, the model's performance may not improve and could even decline. High computational cost; each user query consumes a significant amount of resources. Long pre-training time; each model iteration requires a substantial amount of training time.

Difficult tuning; models with high parameters have a large and difficult-to-control number of neurons, making it challenging to perform partial function tuning and control. The recent lazy GPT-4 is a prime example.

Therefore, Microsoft's development of the Phi series models is primarily for research purposes, to explore how small parameter models can match or even surpass large parameter models while maintaining functionality, creating a win-win situation for enterprises and users.

Brief Introduction to Phi-2

Phi-2, like Phi-1.5, adopts a 24-layer Transformer architecture, with a dimension of 64 for each head, and utilizes techniques such as rotational embedding to enhance model performance.

Phi-2 is just a base model without human feedback reinforcement learning and instruction fine-tuning. However, in text generation, mathematical reasoning, and code programming, it is not inferior to models with high parameters and may even perform better than them.

Image

In terms of training data and process, Phi-2 was pre-trained using 1.4T of high-quality "textbook-level" data, not random or black box data obtained from web crawling. Microsoft stated that this is one of the key reasons why models with small parameters outperform those with large parameters.

Phi-2 was trained for a total of 14 days on 96 A100 GPUs.

Phi-2 Experimental Data

Microsoft tested Phi-2 on mainstream platforms such as MMLU, BBH, PIQA, WinoGrande, ARC easy, Challenge, SIQA, and GSM8k.

Image

The data shows that Phi-2 outperforms Mistral -7B and Llama-2-13B on various aggregate benchmarks.

It is worth mentioning that in multi-step reasoning test tasks, such as encoding and mathematics, Phi-2's performance exceeds that of the 70-billion-parameter Llama-2.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

|
|
APP
Windows
Mac
Share To

X

Telegram

Facebook

Reddit

CopyLink

|
|
APP
Windows
Mac
Share To

X

Telegram

Facebook

Reddit

CopyLink

Selected Articles by 巴比特

2 years ago
Baidu AI, needs to make money through Killer App.
2 years ago
Global AICoin Music Concert, the first time hearing the voice of China
2 years ago
These five women are changing the AI industry
View More

Table of Contents

|
|
APP
Windows
Mac
Share To

X

Telegram

Facebook

Reddit

CopyLink

Related Articles

avatar
avatarForesight News
23 minutes ago
Bitget IPO Prime Phase Two: Why OpenAI?
avatar
avatar链捕手
28 minutes ago
Sequoia Interview with Hassabis: Information is the Essence of the Universe, AI Will Open Up New Branches of Science
avatar
avatar深潮TechFlow
31 minutes ago
Jensen Huang calls out to graduates: AI will not replace you, but those who use AI well will.
avatar
avatar深潮TechFlow
49 minutes ago
Chips, energy, storage - three lines of AI infrastructure, which will rise first, which will rise the most, and which can still be pursued?
avatar
avatarPANews
1 hour ago
Morgan Stanley 2026 Semiconductor Report: Buy Packaging, Buy Testing, Buy Chinese Chips, Avoid Traditional Tracks
APP
Windows
Mac

X

Telegram

Facebook

Reddit

CopyLink