Official Code Llama open source: free for commercial use, mysterious version approaching GPT-4

Editors: Du Wei, Chen Ping

Today, Meta's open-source Llama model family welcomes a new member - the basic model Code Llama, which specializes in code generation.

As a code-specific version of Llama 2, Code Llama is further fine-tuned based on specific code datasets.

Meta stated that the open-source license for Code Llama is the same as Llama 2, free for research and commercial purposes.

The related paper "Code Llama: Open Foundation Models for Code" has been published, with a total of 47 pages and 25 authors.

The paper can be found at: Code Llama: Open Foundation Models for Code

GitHub link: Code Llama on GitHub

The Code Llama series has three versions with parameter sizes of 7B, 13B, and 34B. It supports multiple programming languages, including Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash.

Code Llama stably supports context generation with up to 100,000 tokens. The figure below shows the fine-tuning process for Code Llama.

In terms of its effectiveness, the different versions of Code Llama all surpass GPT-3.5 in one-shot generation pass rates (pass@1) on the HumanEval and MBPP datasets.

In addition, the "Unnatural" 34B version of Code Llama has a pass@1 rate close to GPT-4 (62.2% vs. 67.0%) on the HumanEval dataset. Although Meta has not released this version, it has achieved significant performance improvements through training with a small amount of high-quality coding data.

This special version has attracted the attention of many, including Andrej Karpathy, former AI director at Tesla and now back at OpenAI.

Although the article mentions that it is "the 34B version of Code Llama-Python fine-tuned on 15,000 unnatural instructions," Karpathy is still very curious about this "mysterious name, vague description, and secretive model that crushes others."

How Code Llama Works

Code Llama has strong coding capabilities, able to generate code based on code and natural language prompts (e.g., user input prompt "help me write a function to output the Fibonacci sequence"). It can also assist users in code completion and debugging.

All three parameter versions of Code Llama models are trained on 500B of code tokens and code-related data. The 7B and 13B basic and instruction models have also undergone FIM (fill-in-the-middle) training, allowing them to insert code into existing code, meaning they can support out-of-the-box code completion tasks.

The table below shows the training dataset for Code Llama.

With these three models, different service and latency requirements can be met. For example, the 7B model can run on a single GPU; the 34B model can provide the best results and better coding assistance, but in terms of speed, the smaller 7B and 13B models are faster and more suitable for low-latency tasks, such as real-time code completion.

Code Llama not only provides stable generation of up to 100,000 context tokens, but the training token sequences for all models are also as high as 16,000.

In addition to being a prerequisite for generating longer programs, having a longer input sequence also brings new functionality to Code Llama. For example, users can provide more context from their code libraries to the model to make the generated code more relevant.

It is worth mentioning that Meta has further fine-tuned two additional variants of Code Llama: Code Llama-Python and Code Llama-Instruct.

Code Llama-Python is a variant of Code Llama, further fine-tuned on 100B tokens of Python code. The table below shows the training dataset for Code Llama-Python.

Code Llama-Instruct is a variant of Code Llama that is instructionally fine-tuned and aligned, able to better understand input prompts. Meta recommends using the Code Llama-Instruct variant when using Code Llama for code generation, as it has been fine-tuned and can generate useful and safe answers in natural language.

Meta stated that they do not recommend using Code Llama or Code Llama-Python for general natural language tasks, as these two models are not designed to follow natural language instructions. Code Llama is specifically designed for code-specific tasks and is not suitable as a base model for other tasks.

When using Code Llama models, users must comply with the licensing and usage policies.

Performance of Code Llama

Meta used two coding benchmarks, HumanEval and MBPP (Mostly Basic Python Programming), for testing. HumanEval tests the model's ability to complete code based on docstrings, while MBPP tests the model's ability to write code based on descriptions.

The results show that Code Llama's performance is superior to open-source, code-specific LLMs, and even surpasses their own Llama 2. For example, the Code Llama 34B scored 53.7% on HumanEval and 56.2% on MBPP, making it the best compared to other state-of-the-art open-source solutions and comparable to ChatGPT.

However, Code Llama also has risks. Meta stated that building responsible AI models is crucial, and they took many security measures before releasing Code Llama. As part of the red team testing work, Meta quantitatively assessed the risk of Code Llama generating malicious code. They created some prompts in an attempt to make the model generate malicious code and compared Code Llama's responses to those of ChatGPT (GPT3.5 Turbo). The results showed that Code Llama's responses are safer.

It seems that the not particularly strong coding capability of Llama 2 has been filled by Code Llama. Meta hopes that the emergence of Code Llama will inspire other researchers to create new innovative tools for research and commercial products based on Llama 2.

Reference link: Code Llama: Large Language Model Coding

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Official Code Llama open source: free for commercial use, mysterious version approaching GPT-4

Selected Articles by 巴比特

Table of Contents

Related Articles