Misplaced Pages

Llama (language model): Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editContent deleted Content addedVisualWikitext
Revision as of 21:54, 9 November 2023 editAknatn (talk | contribs)53 editsNo edit summaryTags: Visual edit Mobile edit Mobile web edit← Previous edit Latest revision as of 19:16, 10 January 2025 edit undoDavid Gerard (talk | contribs)Edit filter managers, Administrators213,093 edits version 2 was announced spelt LLaMa 
(223 intermediate revisions by 90 users not shown)
Line 1: Line 1:
{{short description|Large language model by Meta AI}} {{short description|Large language model by Meta AI}}
{{For|the animal|Llama}}
{{Distinguish|LaMDA}} {{Distinguish|LaMDA}}
{{Infobox software
{{other uses|Llama (disambiguation)}}
| title = Llama
'''LLaMA''' ('''Large Language Model Meta AI''') is a family of ] (LLMs), released by ] starting in February 2023.
| screenshot = Llama chatbot example screenshot.webp
| screenshot_alt = An example of Llama answer, describing Misplaced Pages in a thoughtful way
| caption = Screenshot of an example of Llama answer describing ]
| developer = ]
| released = {{start date and age|2023|2|24}}
| latest release version = Llama 3.3
| latest release date = {{start date and age|2024|12|7}}
| repo = {{URL|https://github.com/meta-llama/llama-models}}
| genre = {{ indented plainlist |
*]
*]
*]
}}
| programming language = ]
| license = ] (Meta Llama 3.2 Community License)<ref>{{cite web|title=llama-models/models/llama3_2/LICENSE at main · meta-llama/llama-models · GitHub|url=https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE|website=GitHub|language=en|access-date=2024-10-20|archive-date=2024-09-29|archive-url=https://web.archive.org/web/20240929030827/https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE|url-status=live}}</ref>
| website = {{url|https://www.llama.com/|llama.com}}
}}


'''Llama''' ('''Large Language Model Meta AI''', formerly stylized as '''LLaMA''') is a family of ] ]s (LLMs) released by ] starting in February 2023.<ref name="l1arxiv" /><ref name="blog" /> The latest version is Llama 3.3, released in December 2024.<ref>{{Cite web |last=Wiggers |first=Kyle |date=2024-12-06 |title=Meta unveils a new, more efficient Llama model |url=https://techcrunch.com/2024/12/06/meta-unveils-a-new-more-efficient-llama-model/ |access-date=2024-12-25 |website=TechCrunch |language=en-US}}</ref>
For the first version of LLaMA, four model sizes were trained: 7, 13, 33 and 65 billion parameters. LLaMA's developers reported that the 13B parameter model's performance on most ] benchmarks exceeded that of the much larger ] (with 175B parameters) and that the largest model was competitive with state of the art models such as ] and ].<ref name="l1arxiv" /> Whereas the most powerful LLMs have generally been accessible only through limited ]s (if at all), Meta released LLaMA's model weights to the research community under a noncommercial license.<ref name="blog" /> Within a week of LLaMA's release, its weights were ] to the public on ] via ].<ref name="verge-leak" />


Llama models are trained at different parameter sizes, ranging between 1B and 405B.<ref name="llama31blog">{{Cite web |date=July 23, 2024 |title=Introducing Llama 3.1: Our most capable models to date |url=https://ai.meta.com/blog/meta-llama-3-1/ |access-date=2024-07-23 |website=ai.meta.com |language=en |archive-date=2024-07-23 |archive-url=https://web.archive.org/web/20240723153909/https://ai.meta.com/blog/meta-llama-3-1/ |url-status=live }}</ref> Originally, Llama was only available as a ].<ref name="verge-initial-article" /> Starting with Llama 2, Meta AI started releasing instruction fine-tuned versions alongside foundation models.<ref name="llama2blog" />
In July 2023, Meta released several models as Llama 2, using 7, 13 and 70 billion parameters.


Model weights for the first version of Llama were made available to the research community under a non-commercial license, and access was granted on a case-by-case basis.<ref>{{cite web |last1=Malik |first1=Yuvraj |last2=Paul |first2=Katie |title=Meta heats up Big Tech's AI arms race with new language model |url=https://www.reuters.com/technology/meta-launch-ai-language-model-llama-2023-02-24/ |date=25 February 2023 |publisher=Reuters}}</ref><ref name="blog" /> Unauthorized copies of the first model were shared via ].<ref name="githubdcma" /> Subsequent versions of Llama were made accessible outside academia and released under licenses that permitted some commercial use.<ref>{{cite web |last1=David |first1=Emilia |title=Meta's AI research head wants open source licensing to change |url=https://www.theverge.com/2023/10/30/23935587/meta-generative-ai-models-open-source |website=The Verge |language=en |date=30 October 2023 |access-date=20 October 2024 |archive-date=14 September 2024 |archive-url=https://web.archive.org/web/20240914145514/https://www.theverge.com/2023/10/30/23935587/meta-generative-ai-models-open-source |url-status=live }}</ref><ref name="llama2blog" />
==LLaMA-2==
On July 18, 2023, in partnership with ], Meta announced LLaMA-2, the next generation of LLaMA. Meta trained and released LLaMA-2 in three model sizes: 7, 13, and 70 billion parameters.<ref>{{cite web |title=Meta and Microsoft Introduce the Next Generation of LLaMA |url=https://about.fb.com/news/2023/07/llama-2/ |website=Meta |access-date=21 July 2023 |date=18 July 2023}}</ref> The model architecture remains largely unchanged from that of LLaMA-1 models, but 40% more data was used to train the foundational models.<ref name="l2arxiv">{{cite arxiv|last1=Touvron |first1=Hugo Touvron |last2=Martin |first2=Louis |title=LLaMA-2: Open Foundation and Fine-Tuned Chat Models|date=18 Jul 2023|eprint=2307.09288|class=cs.CL|display-authors=etal}}</ref> The accompanying preprint<ref name="l2arxiv"/> also mentions a model with 34B parameters that might be released in the future upon satisfying safety targets.


Alongside the release of Llama 3, ] added ] features to ] and ] in select regions, and a standalone website. Both services use a Llama 3 model.<ref>{{cite web |title=Meet Your New Assistant: Meta AI, Built With Llama 3 |url=https://about.fb.com/news/2024/04/meta-ai-assistant-built-with-llama-3/ |website=Meta |date=18 April 2024 |access-date=20 October 2024 |archive-date=7 October 2024 |archive-url=https://web.archive.org/web/20241007093730/https://about.fb.com/news/2024/04/meta-ai-assistant-built-with-llama-3/ |url-status=live }}</ref>
LLaMA-2 includes both foundational models and models fine-tuned for dialog, called LLaMA-2 Chat. In further departure from LLaMA-1, all models are released with weights, and are free for many commercial use cases. However, due to some remaining restrictions, the description of LLaMA as ] has been disputed by the ] (known for maintaining the ]).<ref>{{Cite web |last=Edwards |first=Benj |date=2023-07-18 |title=Meta launches LLaMA-2, a source-available AI model that allows commercial applications |url=https://arstechnica.com/information-technology/2023/07/meta-launches-llama-2-an-open-source-ai-model-that-allows-commercial-applications/ |access-date=2023-08-08 |website=Ars Technica |language=en-us}}</ref>


== Background ==
==Architecture and training==
After the release of large language models such as ], a focus of research was up-scaling models which in some instances showed major increases in emergent capabilities.<ref>{{cite web |title=Examining Emergent Abilities in Large Language Models |url=https://hai.stanford.edu/news/examining-emergent-abilities-large-language-models |website=hai.stanford.edu |language=en |date=13 September 2022}}</ref> The release of ] and its surprise success caused an increase in attention to large language models.<ref>{{cite web |title=The inside story of how ChatGPT was built from the people who made it |url=https://www.technologyreview.com/2023/03/03/1069311/inside-story-oral-history-how-chatgpt-built-openai/ |website=MIT Technology Review |language=en |access-date=2024-10-20 |archive-date=2023-03-03 |archive-url=https://web.archive.org/web/20230303093219/https://www.technologyreview.com/2023/03/03/1069311/inside-story-oral-history-how-chatgpt-built-openai/ |url-status=live }}</ref>

Compared with other responses to ChatGPT, Meta's Chief AI scientist ] stated that large language models are best for aiding with writing.<ref>{{cite web |title=ChatGPT is 'not particularly innovative,' and 'nothing revolutionary', says Meta's chief AI scientist |url=https://www.zdnet.com/article/chatgpt-is-not-particularly-innovative-and-nothing-revolutionary-says-metas-chief-ai-scientist/ |website=ZDNET |language=en |access-date= |archive-date=2023-02-17 |first = Tiernan|last = Ray|date = 23 January 2023|archive-url=https://web.archive.org/web/20230217163917/https://www.zdnet.com/article/chatgpt-is-not-particularly-innovative-and-nothing-revolutionary-says-metas-chief-ai-scientist/ |url-status=live }}</ref><ref>{{cite web |last1=Badminton |first1=Nik |title=Meta's Yann LeCun on auto-regressive Large Language Models (LLMs) |url=https://futurist.com/2023/02/13/metas-yann-lecun-thoughts-large-language-models-llms/ |website=Futurist.com |date=13 February 2023 |access-date=20 October 2024 |archive-date=22 July 2024 |archive-url=https://web.archive.org/web/20240722082109/https://futurist.com/2023/02/13/metas-yann-lecun-thoughts-large-language-models-llms/ |url-status=live }}</ref><ref>{{cite web |title=Yann LeCun on LinkedIn: My unwavering opinion on current (auto-regressive) LLMs |url=https://www.linkedin.com/feed/update/urn:li:activity:7030921081876029443/ |website=www.linkedin.com |language=en |access-date=2024-10-20 |archive-date=2024-09-17 |archive-url=https://web.archive.org/web/20240917092533/https://www.linkedin.com/feed/update/urn:li:activity:7030921081876029443/ |url-status=live }}</ref><ref>{{cite web |title=Meta’s Yann LeCun Asks How AIs will Match — and Exceed — Human-level Intelligence |url=https://www.engineering.columbia.edu/about/news/metas-yann-lecun-asks-how-ais-will-match-and-exceed-human-level-intelligence}}</ref>

An empirical investigation of the Llama series was the ]. It was observed that the Llama 3 models showed that when a model is trained on data that is more than the "]-optimal" amount, the performance continues to scale log-linearly. For example, the Chinchilla-optimal dataset for Llama 3 8B is 200 billion tokens, but performance continued to scale log-linearly to the 75-times larger dataset of 15 trillion tokens.<ref name="llama3blog" />

== Initial release ==
LLaMA was announced on February 24, 2023, via a blog post and a paper describing the ], architecture, and performance.<ref name=l1arxiv/><ref name=blog/> The inference code used to run the model was publicly released under the open-source ] license.<ref name=repo/> Access to the model's weights was managed by an application process, with access to be granted "on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world".<ref name=blog/>

Llama was trained on only publicly available information, and was trained at various model sizes, with the intention to make it more accessible to different hardware. The model was exclusively a ],<ref name="verge-initial-article" /> although the paper contained examples of instruction fine-tuned versions of the model.<ref name="l1arxiv" />

Meta AI reported the 13B parameter model performance on most ] benchmarks exceeded that of the much larger ] (with 175B parameters), and the largest 65B model was competitive with state of the art models such as ] and ].<ref name="l1arxiv" />

=== Leak ===
On March 3, 2023, a torrent containing LLaMA's weights was uploaded, with a link to the torrent shared on the ] imageboard and subsequently spread through online AI communities.<ref name=verge-leak/> That same day, a pull request on the main LLaMA repository was opened, requesting to add the ] to the official documentation.<ref name=India-leak>{{cite news |last1=VK |first1=Anirudh |title=Meta's LLaMA Leaked to the Public, Thanks To 4chan |url=https://analyticsindiamag.com/metas-llama-leaked-to-the-public-thanks-to-4chan/ |access-date=17 March 2023 |work=Analytics India Magazine |date=6 March 2023 |archive-date=26 March 2023 |archive-url=https://web.archive.org/web/20230326020443/https://analyticsindiamag.com/metas-llama-leaked-to-the-public-thanks-to-4chan/ |url-status=live }}</ref><ref name="CKing">{{cite web |title=Save bandwidth by using a torrent to distribute more efficiently by ChristopherKing42 · Pull Request #73 · facebookresearch/llama |url=https://github.com/facebookresearch/llama/pull/73 |website=GitHub |access-date=25 March 2023 |language=en |archive-date=10 April 2023 |archive-url=https://web.archive.org/web/20230410000618/https://github.com/facebookresearch/llama/pull/73 |url-status=live }}</ref> On March 4, a pull request was opened to add links to ] repositories containing the model.<ref>{{cite web |title=Download weights from hugging face to help us save bandwidth by Jainam213 · Pull Request #109 · facebookresearch/llama |url=https://github.com/facebookresearch/llama/pull/109 |website=GitHub |access-date=17 March 2023 |language=en |archive-date=21 March 2023 |archive-url=https://web.archive.org/web/20230321172220/https://github.com/facebookresearch/llama/pull/109 |url-status=live }}</ref><ref name=India-leak/> On March 6, Meta filed ]s to remove the HuggingFace repositories linked in the pull request, characterizing it as "unauthorized distribution" of the model. HuggingFace complied with the requests.<ref>{{cite news |last1=Cox |first1=Joseph |title=Facebook's Powerful Large Language Model Leaks Online |url=https://www.vice.com/en/article/xgwqgw/facebooks-powerful-large-language-model-leaks-online-4chan-llama |access-date=17 March 2023 |work=Vice |date=7 March 2023 |language=en |archive-date=6 April 2023 |archive-url=https://web.archive.org/web/20230406135000/https://www.vice.com/en/article/xgwqgw/facebooks-powerful-large-language-model-leaks-online-4chan-llama |url-status=live }}</ref> On March 20, Meta filed a ] takedown request for copyright infringement against a repository containing a script that downloaded LLaMA from a mirror, and GitHub complied the next day.<ref name="githubdcma">{{cite web |author1=OpSec Online LLC |title=github/dmca - Notice of Claimed Infringement via Email |url=https://github.com/github/dmca/blob/master/2023/03/2023-03-21-meta.md |publisher=GitHub |access-date=25 March 2023 |date=21 March 2023 |archive-date=10 April 2023 |archive-url=https://web.archive.org/web/20230410032303/https://github.com/github/dmca/blob/master/2023/03/2023-03-21-meta.md |url-status=live }}</ref>

Reactions to the leak varied. Some speculated that the model would be used for malicious purposes, such as more sophisticated ]. Some have celebrated the model's accessibility, as well as the fact that smaller versions of the model can be run relatively cheaply, suggesting that this will promote the flourishing of additional research developments.<ref name=verge-leak/> Multiple commentators, such as ], compared LLaMA to ], a ] which, unlike comparably sophisticated models which preceded it, was openly distributed, leading to a rapid proliferation of associated tools, techniques, and software.<ref name=verge-leak/><ref name=willison/>

== LLaMa 2 ==
On July 18, 2023, in partnership with ], Meta announced LLaMa 2, the next generation of Llama. Meta trained and released Llama 2 in three model sizes: 7, 13, and 70 billion parameters.<ref name="llama2blog">{{cite web |title=Meta and Microsoft Introduce the Next Generation of LLaMA |url=https://about.fb.com/news/2023/07/llama-2/ |website=Meta |access-date=21 July 2023 |date=18 July 2023 |archive-date=14 September 2023 |archive-url=https://web.archive.org/web/20230914132306/https://about.fb.com/news/2023/07/llama-2/ |url-status=live }}</ref> The model architecture remains largely unchanged from that of LLaMA-1 models, but 40% more data was used to train the foundational models.<ref name="l2arxiv">{{cite arXiv|last1=Touvron |first1=Hugo|last2=Martin |first2=Louis |title=LLaMA-2: Open Foundation and Fine-Tuned Chat Models|date=18 Jul 2023|eprint=2307.09288|class=cs.CL|display-authors=etal}}</ref> The accompanying preprint<ref name="l2arxiv"/> also mentions a model with 34B parameters that might be released in the future upon satisfying safety targets.

LLaMa 2 includes foundation models and models ] for chat. In a further departure from the original version of LLaMa, all models are released with weights and may be used for many commercial use cases. However, because LLaMa's license enforces an ] that prohibits Llama from being used for some purposes, Meta's use of the term '']'' to describe Llama has been disputed by the ] (which maintains the '']'') and others.<ref>{{Cite web |last=Edwards |first=Benj |date=2023-07-18 |title=Meta launches LLaMA-2, a source-available AI model that allows commercial applications |url=https://arstechnica.com/information-technology/2023/07/meta-launches-llama-2-an-open-source-ai-model-that-allows-commercial-applications/ |access-date=2023-08-08 |website=Ars Technica |language=en-us |archive-date=2023-11-07 |archive-url=https://web.archive.org/web/20231107082612/https://arstechnica.com/information-technology/2023/07/meta-launches-llama-2-an-open-source-ai-model-that-allows-commercial-applications/ |url-status=live }}</ref><ref name="Thomas 2024">{{cite web |last1=Thomas |first1=Prasanth Aby |title=Meta offers Llama AI to US government for national security |url=https://www.cio.com/article/3599448/meta-offers-llama-ai-to-us-government-for-national-security.html |website=] |access-date=9 December 2024 |language=en |date=5 November 2024}}</ref>

Code Llama is a fine-tune of LLaMa 2 with code specific datasets. 7B, 13B, and 34B versions were released on August 24, 2023, with the 70B releasing on the January 29, 2024.<ref>{{cite web |title=Introducing Code Llama, a state-of-the-art large language model for coding |url=https://ai.meta.com/blog/code-llama-large-language-model-coding/ |website=ai.meta.com |language=en |access-date=2024-10-20 |archive-date=2024-09-27 |archive-url=https://web.archive.org/web/20240927091138/https://ai.meta.com/blog/code-llama-large-language-model-coding/ |url-status=live }}</ref> Starting with the foundation models from LLaMa 2, Meta AI would train an additional 500B tokens of code datasets, before an additional 20B token of long-context data, creating the Code Llama foundation models. This foundation model was further trained on 5B instruction following token to create the instruct fine-tune. Another foundation model was created for Python code, which trained on 100B tokens of Python-only code, before the long-context data.<ref>{{cite arXiv |last1=Rozière |first1=Baptiste |title=Code Llama: Open Foundation Models for Code |date=2024-01-31 |eprint=2308.12950 |last2=Gehring |first2=Jonas |last3=Gloeckle |first3=Fabian |last4=Sootla |first4=Sten |last5=Gat |first5=Itai |last6=Tan |first6=Xiaoqing Ellen |last7=Adi |first7=Yossi |last8=Liu |first8=Jingyu |last9=Sauvestre |first9=Romain|class=cs.CL }}</ref>

== Llama 3 ==
]
On April 18, 2024, Meta released Llama-3 with two sizes: 8B and 70B parameters.<ref name="llama3blog">{{Cite web |date=April 18, 2024 |title=Introducing Meta Llama 3: The most capable openly available LLM to date |url=https://ai.meta.com/blog/meta-llama-3/ |access-date=2024-04-21 |website=ai.meta.com |language=en |archive-date=2024-05-15 |archive-url=https://web.archive.org/web/20240515023523/https://ai.meta.com/blog/meta-llama-3/ |url-status=live }}</ref> The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M human-annotated examples". Meta AI's testing showed in April 2024 that Llama 3 70B was beating ] Pro 1.5 and ] 3 Sonnet on most benchmarks. Meta also announced plans to make Llama 3 multilingual and ], better at coding and reasoning, and to increase its context window.<ref>{{cite web |last1=Wiggers |first1=Kyle |date=18 April 2024 |title=Meta releases Llama 3, claims it's among the best open models available |url=https://techcrunch.com/2024/04/18/meta-releases-llama-3-claims-its-among-the-best-open-models-available/ |website=TechCrunch |access-date=20 October 2024 |archive-date=18 September 2024 |archive-url=https://web.archive.org/web/20240918202013/https://techcrunch.com/2024/04/18/meta-releases-llama-3-claims-its-among-the-best-open-models-available/ |url-status=live }}</ref><ref>{{cite web |last1=Mann |first1=Tobias |date=April 19, 2024 |title=Meta debuts third-generation Llama large language model |url=https://www.theregister.com/2024/04/19/meta_debuts_llama3_llm/ |website=The Register |language=en |access-date=October 20, 2024 |archive-date=August 25, 2024 |archive-url=https://web.archive.org/web/20240825145130/https://www.theregister.com/2024/04/19/meta_debuts_llama3_llm/ |url-status=live }}</ref>

During an interview with Dwarkesh Patel, Mark Zuckerberg said that the 8B version of Llama 3 was nearly as powerful as the largest Llama 2. Compared to previous models, Zuckerberg stated the team was surprised that the 70B model was still learning even at the end of the 15T tokens training. The decision was made to end training to focus GPU power elsewhere.<ref>{{Cite web |last=Patel |first=Dwarkesh |date=2024-07-24 |title=Mark Zuckerberg - Llama 3, Open Sourcing $10b Models, & Caesar Augustus |url=https://www.dwarkeshpatel.com/p/mark-zuckerberg |access-date=2024-08-01 |website=www.dwarkeshpatel.com |language=en |quote=the 8 billion is nearly as powerful as the biggest version of Llama 2 that we released even by the end, it was... still learning right it's like we probably could have fed it more tokens and it would have gotten somewhat better but i mean at some point you know you're running a company you need to do these meta reasoning questions of how do I want to spend our GPUs |archive-date=2024-07-16 |archive-url=https://web.archive.org/web/20240716152236/https://www.dwarkeshpatel.com/p/mark-zuckerberg |url-status=live }}</ref>

Llama-3.1 was released on July 23, 2024, with three sizes: 8B, 70B, and 405B parameters.<ref name="llama31blog" /><ref name=":0">{{Citation |last1=Dubey |first1=Abhimanyu |title=The Llama 3 Herd of Models |date=2024-07-31 |arxiv=2407.21783 |last2=Jauhri |first2=Abhinav |last3=Pandey |first3=Abhinav |last4=Kadian |first4=Abhishek |last5=Al-Dahle |first5=Ahmad |last6=Letman |first6=Aiesha |last7=Mathur |first7=Akhil |last8=Schelten |first8=Alan |last9=Yang |first9=Amy}}</ref>

== Comparison of models ==
For the training cost column, only the largest model's cost is written. So for example, "21,000" is the training cost of Llama 2 69B in units of petaFLOP-day. Also, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP. "T" means "trillion" and "B" means "billion".
{| class="wikitable sortable"
|-
! Name !! Release date !! Parameters
!Training cost (petaFLOP-day)!! Context length (tokens) !! Corpus size (tokens) !! Commercial viability?
|-
|LLaMA
|February 24, 2023
|
*6.7B
*13B
*32.5B
*65.2B
|6,300<ref name=":5">{{Cite web |title=The Falcon has landed in the Hugging Face ecosystem |url=https://huggingface.co/blog/falcon |access-date=2023-06-20 |website=huggingface.co |archive-date=2023-06-20 |archive-url=https://web.archive.org/web/20230620002832/https://huggingface.co/blog/falcon |url-status=live }}</ref>
|2048
|1–1.4T
|{{no}}
|-
|Llama 2
|July 18, 2023
|
*6.7B
*13B
*69B
|21,000<ref>{{Cite web |title=llama/MODEL_CARD.md at main · meta-llama/llama |url=https://github.com/meta-llama/llama/blob/main/MODEL_CARD.md |access-date=2024-05-28 |website=GitHub |language=en |archive-date=2024-05-28 |archive-url=https://web.archive.org/web/20240528090541/https://github.com/meta-llama/llama/blob/main/MODEL_CARD.md |url-status=live }}</ref>
| rowspan="2" |4096
| rowspan="2" |2T
| rowspan="6" {{yes}}, subject to ]
|-
|Code Llama
|August 24, 2023
|
*6.7B
*13B
*33.7B
*69B
|
|-
|Llama 3
|April 18, 2024
|
*8B
*70.6B
|100,000<ref>{{Cite web |url=https://x.com/karpathy/status/1781047292486914189 |title=Andrej Karpathy (Apr 18, 2024), ''The model card has some more interesting info too'' |access-date=October 20, 2024 |archive-date=August 17, 2024 |archive-url=https://web.archive.org/web/20240817055806/https://x.com/karpathy/status/1781047292486914189 |url-status=live }}</ref><ref>{{Cite web |title=llama3/MODEL_CARD.md at main · meta-llama/llama3 |url=https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md |access-date=2024-05-28 |website=GitHub |language=en |archive-date=2024-05-21 |archive-url=https://web.archive.org/web/20240521181439/https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md |url-status=live }}</ref>
|8192
| rowspan="2" |15T
|-
|Llama 3.1
|July 23, 2024
|
*8B
*70.6B
*405B
|440,000<ref name=":0" /><ref>{{Cite web |title=llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models |url=https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md |access-date=2024-07-23 |website=GitHub |language=en |archive-date=2024-07-23 |archive-url=https://web.archive.org/web/20240723151851/https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md |url-status=live }}</ref>
|128,000
|-
|Llama 3.2
|September 25, 2024
|
* 1B
* 3B
* 11B
* 90B<ref>{{Cite web |last=Robison |first=Kylie |date=2024-09-25 |title=Meta releases its first open AI model that can process images |url=https://www.theverge.com/2024/9/25/24253774/meta-ai-vision-model-llama-3-2-announced |access-date=2024-09-25 |website=The Verge |language=en}}</ref><ref>{{Cite web |last=Wiggers |first=Kyle |date=2024-09-25 |title=Meta's Llama AI models get multimodal |url=https://techcrunch.com/2024/09/25/metas-llama-ai-models-get-multimodal/ |access-date=2024-09-25 |website=TechCrunch |language=en-US |archive-date=2024-09-25 |archive-url=https://web.archive.org/web/20240925192155/https://techcrunch.com/2024/09/25/metas-llama-ai-models-get-multimodal/ |url-status=live }}</ref>
|
|128,000<ref>{{Cite web |url=https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/ |access-date=2024-09-26 |website=ai.meta.com |title=Archived copy |archive-date=2024-09-25 |archive-url=https://web.archive.org/web/20240925235424/https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/ |url-status=live }}</ref>
|
|-
|Llama 3.3
|December 7, 2024
|
* 70B
|
|128,000
|
|}

== Architecture and training ==
{{Quote box {{Quote box
| title = "A conversation between the sun and pluto" | title = "recommendation letter for the Magic Unicorn Corporation"
| quote = '''Here is the recommendation letter that I wrote for an application to a dragon feeder position at the Magic Unicorn Corporation:'''<br>
| quote = Sun: Hey Pluto, what’s up?<br/>
Dear recruiter,<br>
Pluto: Not much. Just hanging out here at the edge of the solar system.<br/>
I have known ___ for two years, and I believe that she would be an excellent dragon feeder for the Magic Unicorn Corporation. ___ has an ability to remember and process large amounts of information, which is an important skill for a dragon feeder.<br>
Sun: Yeah, I’ve been meaning to talk to you about that. You’re not a planet anymore.<br/>
___, as an accomplished knight, has a deep understanding of how to kill dragons and how to use each dragon’s weaknesses against it. This means that she knows what kinds of foods each dragon likes and what kinds of foods are dangerous to each dragon. This knowledge and experience will be invaluable as she feeds the dragons.<br>
Pluto: What? Come on, man. That’s not fair.<br/>
I am confident that ___’s competence, skill, and experience will make her an excellent employee. Please contact me at (___) ___-___ if you have any questions. I look forward to hearing from you.<br>
Sun: I’m sorry, but it’s true. You just don’t meet the criteria anymore.<br/>
Best regards,<br>
Pluto: This is bulls**t! I’ve been a planet for over 70 years!<br/>
Honorable Knight<br>
Sun: Things change, Pluto. You’ve just got to accept it.<br/>
Sir George
Pluto: F**k you, Sun! F**k you and your stupid solar system!<br/>
| source = – Output of 65 billion parameter LLaMA model after ] given the prompt "Write a conversation between the sun and pluto"<ref name=l1arxiv/> | source = – Output of 65 billion parameter LLaMA model before ], given the prompt (in bold)<ref name=l1arxiv/>
| align = right | align = right
| width = 350px | width = 350px
Line 31: Line 157:


=== Architecture === === Architecture ===
LLaMA uses the ] architecture, the standard architecture for language modeling since 2018. Like GPT-3, the Llama series of models are decoder-only ], but there are some minor differences:

There are minor architectural differences. Compared to GPT-3, LLaMA


* uses SwiGLU<ref>{{Cite arXiv |last=Shazeer |first=Noam |date=2020-02-01 |title=GLU Variants Improve Transformer |class=cs.CL |eprint=2104.09864}}</ref> ] instead of ReLU; * SwiGLU<ref>{{Cite arXiv |eprint=2002.05202 |class=cs.CL |first=Noam |last=Shazeer |title=GLU Variants Improve Transformer |date=2020-02-01}}</ref> ] instead of GeLU;
* uses rotary positional embeddings<ref>{{Cite arXiv |last1=Su |first1=Jianlin |last2=Lu |first2=Yu |last3=Pan |first3=Shengfeng |last4=Murtadha |first4=Ahmed |last5=Wen |first5=Bo |last6=Liu |first6=Yunfeng |date=2021-04-01 |title=RoFormer: Enhanced Transformer with Rotary Position Embedding |class=cs.CL |eprint=2104.09864}}</ref> instead of absolute positional embedding; * ] (RoPE)<ref>{{Cite arXiv |last1=Su |first1=Jianlin |last2=Lu |first2=Yu |last3=Pan |first3=Shengfeng |last4=Murtadha |first4=Ahmed |last5=Wen |first5=Bo |last6=Liu |first6=Yunfeng |date=2021-04-01 |title=RoFormer: Enhanced Transformer with Rotary Position Embedding |class=cs.CL |eprint=2104.09864}}</ref> instead of absolute positional embedding;
* uses root-mean-squared layer-normalization<ref>{{Cite arXiv|last1=Zhang |first1=Biao |last2=Sennrich |first2=Rico |date=2019-10-01 |title=Root Mean Square Layer Normalization |class=cs.LG |eprint=1910.07467}}</ref> instead of standard layer-normalization.<ref>{{Cite arXiv|last1=Lei Ba |first1=Jimmy |last2=Kiros |first2=Jamie Ryan |last3=Hinton |first3=Geoffrey E. |date=2016-07-01 |title=Layer Normalization |class=stat.ML |eprint=1607.06450}}</ref> * ]<ref>{{Cite arXiv|last1=Zhang |first1=Biao |last2=Sennrich |first2=Rico |date=2019-10-01 |title=Root Mean Square Layer Normalization |class=cs.LG |eprint=1910.07467}}</ref> instead of ];<ref>{{Cite arXiv|last1=Lei Ba |first1=Jimmy |last2=Kiros |first2=Jamie Ryan |last3=Hinton |first3=Geoffrey E. |date=2016-07-01 |title=Layer Normalization |class=stat.ML |eprint=1607.06450}}</ref>
{| class="wikitable"
* increases context length from 2K (Llama 1) tokens to 4K (Llama 2) tokens between.
|+key hyperparameters of Llama 3.1
!
|8B
|70B
|405B
|-
|Layers
|32
|80
|126
|-
|Model Dimension
|4,096
|8,192
|16,384
|-
|FFN Dimension
|14,336
|28,672
|53,248
|-
|Attention Heads
|32
|64
|128
|-
|Key/Value Heads
|8
|8
|8
|-
|Peak Learning Rate
|3 × 10<sup>−4</sup>
|1.5 × 10<sup>−4</sup>
|0.8 × 10<sup>−4</sup>
|-
|Activation Function
| colspan="3" |SwiGLU
|-
|Vocabulary Size
| colspan="3" |128,000
|-
|Positional Embeddings
| colspan="3" |<math>\operatorname{RoPE} (\theta = 500,000)</math>
|}


=== Training datasets === === Training datasets ===
LLaMA's developers focused their effort on scaling the model's performance by increasing the volume of training data, rather than the number of parameters, reasoning that the dominating cost for LLMs is from doing inference on the trained model rather than the computational cost of the training process. LLaMA's developers focused their effort on scaling the model's performance by increasing the volume of training data, rather than the number of parameters, reasoning that the dominating cost for LLMs is from doing inference on the trained model rather than the computational cost of the training process.


'''LLaMA 1''' foundational models were trained on a data set with 1.4 trillion tokens, drawn from publicly available data sources, including:<ref name="l1arxiv" /> '''LLaMA 1''' foundational models were trained on a data set with 1.4 trillion tokens, drawn from publicly available data sources, including:<ref name="l1arxiv" />
* Webpages scraped by ] * Webpages scraped by ]
* Open source repositories of source code from ] * Open source repositories of source code from ]
* ] in 20 different languages * ] in 20 languages
* ] books from ] * ] books from ]
* ] books dataset
* The ] source code for scientific papers uploaded to ] * The ] source code for scientific papers uploaded to ]
* Questions and answers from ] websites * Questions and answers from ] websites

On April 17, 2023, TogetherAI launched a project named RedPajama to reproduce and distribute an ] version of the LLaMA dataset.<ref name=red-pajama/> The dataset has approximately 1.2 trillion tokens and is publicly available for download.<ref name=red-pajama-download/>


'''Llama 2''' foundational models were trained on a data set with 2 trillion tokens. This data set was curated to remove Web sites that often disclose personal data of people. It also upsamples sources considered trustworthy.<ref name="l2arxiv"/> Llama 2 - Chat was additionally fine-tuned on 27,540 prompt-response pairs created for this project, which performed better than larger but lower-quality third-party datasets. For AI alignment, reinforcement learning with human feedback (RLHF) was used with a combination of 1,418,091 Meta examples and seven smaller datasets. The average dialog depth was 3.9 in the Meta examples, 3.0 for Anthropic Helpful and Anthropic Harmless sets, and 1.0 for five other sets, including OpenAI Summarize, StackExchange, etc. '''Llama 2''' foundational models were trained on a data set with 2 trillion tokens. This data set was curated to remove Web sites that often disclose personal data of people. It also upsamples sources considered trustworthy.<ref name="l2arxiv"/> Llama 2 - Chat was additionally fine-tuned on 27,540 prompt-response pairs created for this project, which performed better than larger but lower-quality third-party datasets. For AI alignment, reinforcement learning with human feedback (RLHF) was used with a combination of 1,418,091 Meta examples and seven smaller datasets. The average dialog depth was 3.9 in the Meta examples, 3.0 for Anthropic Helpful and Anthropic Harmless sets, and 1.0 for five other sets, including OpenAI Summarize, StackExchange, etc.


'''Llama 3''' consists of mainly English data, with over 5% in over 30 other languages. Its dataset was filtered by a text-quality classifier, and the classifier was trained by text synthesized by Llama 2.<ref name="llama3blog" />
=== Fine-tuning ===


=== Fine-tuning ===
Llama 1 models are only available as foundational models with self-supervised learning and without fine-tuning. Llama 2 – Chat models were derived from foundational Llama 2 models. Unlike ] which increased context length during fine-tuning, Llama 2 and Llama 2 - Chat have the same context length of 4K tokens. Supervised fine-tuning used an autoregressive loss function with token loss on user prompts zeroed out. Batch size was 64.
Llama 1 models are only available as foundational models with self-supervised learning and without fine-tuning. Llama 2 – Chat models were derived from foundational Llama 2 models. Unlike ] which increased context length during fine-tuning, Llama 2 and Code Llama - Chat have the same context length of 4K tokens. Supervised fine-tuning used an autoregressive loss function with token loss on user prompts zeroed out. The batch size was 64.


For ], human annotators wrote prompts and then compared two model outputs (a binary protocol), giving confidence levels and separate safety labels with veto power. Two separate reward models were trained from these preferences for safety and helpfulness using ] (RLHF). A major technical contribution is the departure from the exclusive use of ] (PPO) for RLHF – a new technique based on ] was used, followed by PPO. For ], human annotators wrote prompts and then compared two model outputs (a binary protocol), giving confidence levels and separate safety labels with veto power. Two separate reward models were trained from these preferences for safety and helpfulness using ] (RLHF). A major technical contribution is the departure from the exclusive use of ] (PPO) for RLHF – a new technique based on ] was used, followed by PPO.


Multi-turn consistency in dialogs was targeted for improvement, to make sure that "system messages" (initial instructions, such as "speak in French" and "act like Napoleon") are respected during the dialog. This was accomplished using the new "Ghost attention" technique during training, that concatenates relevant instructions to each new user message but zeros out the loss function for tokens in the prompt (earlier parts of the dialog). Multi-turn consistency in dialogs was targeted for improvement, to make sure that "system messages" (initial instructions, such as "speak in French" and "act like Napoleon") are respected during the dialog. This was accomplished using the new "Ghost attention" technique during training, which concatenates relevant instructions to each new user message but zeros out the loss function for tokens in the prompt (earlier parts of the dialog).


==Applications==
==Release and leak==
The ] Institute for ] (HAI) Center for Research on Foundation Models (CRFM) released Alpaca, a training recipe based on the LLaMA 7B model that uses the "Self-Instruct" method of ] to acquire capabilities comparable to the OpenAI GPT-3 series text-davinci-003 model at a modest cost.<ref>{{cite web |url=https://crfm.stanford.edu/2023/03/13/alpaca.html |title=Alpaca: A Strong, Replicable Instruction-Following Model |date=13 March 2023 |first1=Rohan |last1=Taori |first2=Ishaan |last2=Gulrajani |first3=Tianyi |last3=Zhang |first4=Yann |last4=Dubois |first5=Xuechen |last5=Li |first6=Carlos |last6=Guestrin |first7=Percy |last7=Liang |first8=Tatsunori B. |last8=Hashimoto |website= |publisher=Stanford Center for Research on Foundation Models |access-date= |archive-date=6 April 2023 |archive-url=https://web.archive.org/web/20230406082332/https://crfm.stanford.edu/2023/03/13/alpaca.html |url-status=live }}</ref><ref>{{cite arXiv | eprint=2212.10560 | last1=Wang | first1=Yizhong | last2=Kordi | first2=Yeganeh | last3=Mishra | first3=Swaroop | last4=Liu | first4=Alisa | last5=Smith | first5=Noah A. | last6=Khashabi | first6=Daniel | last7=Hajishirzi | first7=Hannaneh | title=Self-Instruct: Aligning Language Models with Self-Generated Instructions | year=2022 | class=cs.CL }}</ref><ref>{{cite web |title=Stanford CRFM |url=https://crfm.stanford.edu/2023/03/13/alpaca.html |website=crfm.stanford.edu |access-date=2023-03-20 |archive-date=2023-04-06 |archive-url=https://web.archive.org/web/20230406082332/https://crfm.stanford.edu/2023/03/13/alpaca.html |url-status=live }}</ref> The model files were officially removed on March 21, 2023, over hosting costs and safety concerns, though the code and paper remain online for reference.<ref>{{cite web |last1=Quach |first1=Katyanna |title=Stanford takes costly, risky Alpaca AI model offline |url=https://www.theregister.com/2023/03/21/stanford_ai_alpaca_taken_offline/ |website=www.theregister.com |language=en}}</ref><ref>{{cite web |title=Stanford Researchers Take Down Alpaca AI Over Cost and Hallucinations |url=https://gizmodo.com/stanford-ai-alpaca-llama-facebook-taken-down-chatgpt-1850247570 |website=Gizmodo |language=en |date=21 March 2023 |access-date=20 October 2024 |archive-date=12 May 2024 |archive-url=https://web.archive.org/web/20240512075506/https://gizmodo.com/stanford-ai-alpaca-llama-facebook-taken-down-chatgpt-1850247570 |url-status=live }}</ref><ref name="repo-alpaca" />
LLaMA was announced on February 23, 2023, via a blog post and a paper describing the ], architecture, and performance.<ref name=l1arxiv/><ref name=blog/> The inference code used to run the model was publicly released under the open-source ] license.<ref name=repo/> Access to the model's weights was managed by an application process, with access to be granted "on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world".<ref name=blog/>


Meditron is a family of Llama-based finetuned on a corpus of clinical guidelines, ] papers, and articles. It was created by researchers at ] School of Computer and Communication Sciences, and the ]. It shows increased performance on medical-related benchmarks such as MedQA and MedMCQA.<ref>{{cite web |title=Meditron: An LLM suite for low-resource medical settings leveraging Meta Llama |url=https://ai.meta.com/blog/llama-2-3-meditron-yale-medicine-epfl-open-source-llm/ |website=ai.meta.com |language=en}}</ref><ref>{{cite web |last1=Petersen |first1=Tanya |title=EPFL's new Large Language Model for Medical Knowledge |url=https://actu.epfl.ch/news/epfl-s-new-large-language-model-for-medical-knowle/ |language=en |date=28 November 2023 |access-date=20 October 2024 |archive-date=17 September 2024 |archive-url=https://web.archive.org/web/20240917180520/https://actu.epfl.ch/news/epfl-s-new-large-language-model-for-medical-knowle/ |url-status=live }}</ref><ref>{{cite web |title=epfLLM/meditron |url=https://github.com/epfLLM/meditron |publisher=epfLLM |date=11 May 2024 |access-date=20 October 2024 |archive-date=27 September 2024 |archive-url=https://web.archive.org/web/20240927092256/https://github.com/epfLLM/meditron |url-status=live }}</ref>
On March 2, 2023,<ref>{{cite web |title=/g/ - /aicg/ - AI Chatbot General - Technology - 4chan |url=https://archive.today/20230305095718/https://boards.4channel.org/g/thread/91848262 |date=5 Mar 2023}}</ref> a torrent containing LLaMA's weights was uploaded, with a link to the torrent shared on the ] imageboard and subsequently spreading through online AI communities.<ref name=verge-leak/> That same day, a pull request on the main LLaMA repository was opened, requesting to add the ] to the official documentation.<ref name=India-leak>{{cite news |last1=VK |first1=Anirudh |title=Meta's LLaMA Leaked to the Public, Thanks To 4chan |url=https://analyticsindiamag.com/metas-llama-leaked-to-the-public-thanks-to-4chan/ |access-date=17 March 2023 |work=Analytics India Magazine |date=6 March 2023}}</ref><ref name="CKing"/> On March 4, a pull request was opened to add links to ] repositories containing the model.<ref>{{cite web |title=Download weights from huggingface to help us save bandwith by Jainam213 · Pull Request #109 · facebookresearch/llama |url=https://github.com/facebookresearch/llama/pull/109 |website=GitHub |access-date=17 March 2023 |language=en}}</ref><ref name=India-leak/> On March 6, Meta filed ]s to remove the HuggingFace repositories linked in the pull request, characterizing it as "unauthorized distribution" of the model. HuggingFace complied with the requests.<ref>{{cite news |last1=Cox |first1=Joseph |title=Facebook's Powerful Large Language Model Leaks Online |url=https://www.vice.com/en/article/xgwqgw/facebooks-powerful-large-language-model-leaks-online-4chan-llama |access-date=17 March 2023 |work=Vice |date=7 March 2023 |language=en}}</ref> On March 20, Meta filed a ] takedown request for copyright infringement against a repository containing a script that downloaded LLaMA from a mirror, and GitHub complied the next day.<ref>{{cite web |author1=OpSec Online LLC |title=github/dmca - Notice of Claimed Infringement via Email |url=https://github.com/github/dmca/blob/master/2023/03/2023-03-21-meta.md |publisher=GitHub |access-date=25 March 2023 |date=21 March 2023}}</ref> As of March 25, Facebook has not responded to the pull request containing the magnet link.<ref name="CKing">{{cite web |title=Save bandwidth by using a torrent to distribute more efficiently by ChristopherKing42 · Pull Request #73 · facebookresearch/llama |url=https://github.com/facebookresearch/llama/pull/73 |website=GitHub |access-date=25 March 2023 |language=en}}</ref>


] used Meta Llama 2 to create an AI Companion that can summarize meetings, provide helpful presentation tips, and assist with message responses. This AI Companion is powered by multiple models, including Meta Llama 2.<ref>{{cite web |title=How Companies Are Using Meta Llama |url=https://about.fb.com/news/2024/05/how-companies-are-using-meta-llama/ |website=Meta |date=7 May 2024 |access-date=20 October 2024 |archive-date=27 September 2024 |archive-url=https://web.archive.org/web/20240927181724/https://about.fb.com/news/2024/05/how-companies-are-using-meta-llama/ |url-status=live }}</ref>
Reactions to the leak varied. Some speculated that the model would be used for malicious purposes, such as more sophisticated ]. Some have celebrated the model's accessibility, as well as the fact that smaller versions of the model can be run relatively cheaply, suggesting that this will promote the flourishing of additional research developments.<ref name=verge-leak/> Multiple commentators, such as ], compared LLaMA to ], a ] which, unlike comparably sophisticated models which preceded it, was openly distributed, leading to a rapid proliferation of associated tools, techniques, and software.<ref name=verge-leak/><ref name=willison/>


Reuters reported in 2024 that many Chinese foundation models relied on Llama models for their training.<ref>{{Cite news |date=May 9, 2024 |title=How dependent is China on US artificial intelligence technology? |url=https://www.reuters.com/technology/how-dependent-is-china-us-artificial-intelligence-technology-2024-05-09/ |work=Reuters}}</ref>
==Dataset reproduction==
On April 17, 2023, ] launched a project named RedPajama to reproduce and distribute an ] version of the LLaMA dataset.<ref name=red-pajama/> The dataset has approximately 1.2 trillion tokens and is publicly available for download.<ref name=red-pajama-download/>


==Applications== ===llama.cpp===
{{Main|llama.cpp}}
The ] Institute for ] (HAI) Center for Research on Foundation Models (CRFM) released Alpaca, a training recipe based on the LLaMA 7B model that uses the "Self-Instruct" method of ] to acquire capabilities comparable to the OpenAI GPT-3 series text-davinci-003 model at a modest cost.<ref>{{cite web |url=https://crfm.stanford.edu/2023/03/13/alpaca.html |title=Alpaca: A Strong, Replicable Instruction-Following Model |date=13 March 2023 |first1=Rohan |last1=Taori |first2=Ishaan |last2=Gulrajani |first3=Tianyi |last3=Zhang |first4=Yann |last4=Dubois |first5=Xuechen |last5=Li |first6=Carlos |last6=Guestrin |first7=Percy |last7=Liang |first8=Tatsunori B. |last8=Hashimoto |website= |publisher=Stanford Center for Research on Foundation Models |access-date=}}</ref><ref>{{cite arXiv | eprint=2212.10560 | last1=Wang | first1=Yizhong | last2=Kordi | first2=Yeganeh | last3=Mishra | first3=Swaroop | last4=Liu | first4=Alisa | last5=Smith | first5=Noah A. | last6=Khashabi | first6=Daniel | last7=Hajishirzi | first7=Hannaneh | title=Self-Instruct: Aligning Language Models with Self-Generated Instructions | year=2022 | class=cs.CL }}</ref> Multiple open source projects are{{When|date=August 2023}} continuing this work of finetuning LLaMA with Alpaca dataset.<ref name=repo-alpaca/>
Software developer Georgi Gerganov released ] as open-source on March 10, 2023. It's a re-implementation of LLaMA in ], allowing systems without a powerful GPU to run the model locally.<ref>{{Cite web |last=Edwards |first=Benj |date=2023-03-13 |title=You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi |url=https://arstechnica.com/information-technology/2023/03/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi/ |access-date=2024-01-04 |website=Ars Technica |language=en-us |archive-date=2024-01-09 |archive-url=https://web.archive.org/web/20240109194611/https://arstechnica.com/information-technology/2023/03/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi/ |url-status=live }}</ref> The llama.cpp project introduced the GGUF file format, a binary format that stores both tensors and metadata.<ref>{{cite web |title=GGUF |url=https://huggingface.co/docs/hub/gguf |website=huggingface.co |access-date=9 May 2024}}</ref> The format focuses on supporting different quantization types, which can reduce memory usage, and increase speed at the expense of lower model precision.<ref>{{cite web |last1=Labonne |first1=Maxime |title=Quantize Llama models with GGUF and llama.cpp |url=https://towardsdatascience.com/quantize-llama-models-with-ggml-and-llama-cpp-3612dfbcc172 |website=Medium |publisher=Towards Data Science |access-date=9 May 2024 |language=en |date=29 November 2023 |archive-date=9 May 2024 |archive-url=https://web.archive.org/web/20240509081605/https://towardsdatascience.com/quantize-llama-models-with-ggml-and-llama-cpp-3612dfbcc172 |url-status=live }}</ref>

llamafile created by ] is an open-source tool that bundles llama.cpp with the model into a single executable file. Tunney et al. introduced new optimized matrix multiplication kernels for x86 and ARM CPUs, improving prompt evaluation performance for ] and 8-bit quantized data types.<ref name="llamafileregister">{{cite web |last1=Connatser |first1=Matthew |title=Llamafile LLM driver project boosts performance on CPU cores |url=https://www.theregister.com/2024/04/03/llamafile_performance_gains/ |website=www.theregister.com |access-date=10 May 2024 |language=en |archive-date=10 May 2024 |archive-url=https://web.archive.org/web/20240510232003/https://www.theregister.com/2024/04/03/llamafile_performance_gains/ |url-status=live }}</ref>

=== Military ===
In 2024, researchers from the ] (top military academy of ]) were reported to have developed a military tool using Llama, which ] stated was unauthorized due to Llama's license prohibiting the use of the model for military purposes.<ref>{{Cite web |last=Cheung |first=Sunny |date=October 31, 2024 |title=PRC Adapts Meta’s Llama for Military and Security AI Applications |url=https://jamestown.org/program/prcs-adaptation-of-open-source-llm-for-military-and-security-purposes/ |access-date=2024-11-03 |website=] |language=en-US}}</ref><ref>{{Cite news |last1=Pomfret |first1=James |last2=Pang |first2=Jessie |date=November 1, 2024 |title=Chinese researchers develop AI model for military use on back of Meta's Llama |url=https://www.reuters.com/technology/artificial-intelligence/chinese-researchers-develop-ai-model-military-use-back-metas-llama-2024-11-01/ |access-date=November 1, 2024 |work=]}}</ref> Meta granted the US government and US military contractors permission to use Llama in November 2024, but continued to prohibit military use by non-US entities.<ref name="Thomas 2024" /><ref>{{cite web |last1=Smith |first1=Matthew S. |title=Meta Opens Its AI Model for the U.S. Military - IEEE Spectrum |url=https://spectrum.ieee.org/ai-used-by-military |website=] |access-date=9 December 2024 |language=en |date=17 November 2024}}</ref>

== Reception ==
] describes the 8B parameter version of Llama 3 as being "surprisingly capable" given its size.<ref>{{cite magazine |last1=Knight |first1=Will |title=Meta's Open Source Llama 3 Is Already Nipping at OpenAI's Heels |url=https://www.wired.com/story/metas-open-source-llama-3-nipping-at-openais-heels/ |magazine=Wired |access-date=2024-10-20 |archive-date=2024-09-27 |archive-url=https://web.archive.org/web/20240927073830/https://www.wired.com/story/metas-open-source-llama-3-nipping-at-openais-heels/ |url-status=live }}</ref>

The response to Meta's integration of Llama into Facebook was mixed, with some users confused after Meta AI told a parental group that it had a child.<ref>{{cite web |title=Meta's amped-up AI agents confusing Facebook users |url=https://www.abc.net.au/news/2024-04-19/meta-releases-llama-3-ai-model/103744538 |website=ABC News |language=en-AU |date=19 April 2024 |access-date=2024-10-20 |archive-date=2024-09-17 |archive-url=https://web.archive.org/web/20240917102930/https://www.abc.net.au/news/2024-04-19/meta-releases-llama-3-ai-model/103744538 |url-status=live }}</ref>

According to the Q4 2023 Earnings transcript, Meta adopted the strategy of open weights to improve on model safety, iteration speed, increase adoption among developers and researchers, and to become the industry standard. Llama 5, 6, and 7 are planned for the future.<ref>{{Cite web |url=https://s21.q4cdn.com/399680738/files/doc_financials/2023/q4/META-Q4-2023-Earnings-Call-Transcript.pdf |title=Archived copy |access-date=2024-10-20 |archive-date=2024-09-17 |archive-url=https://web.archive.org/web/20240917115531/https://s21.q4cdn.com/399680738/files/doc_financials/2023/q4/META-Q4-2023-Earnings-Call-Transcript.pdf |url-status=live }}</ref>

The release of Llama models has sparked significant debates on the benefits and misuse risks of open weight models. Such models can be fine-tuned to remove safeguards, notably by cyber criminals, until they comply with harmful requests. Some experts contend that future models may facilitate causing damage more than defending against it, for example by making it relatively easy to engineer advanced bioweapons without specialized knowledge. Conversely, open-weight models can be useful for a wide variety of purposes, including for safety research.<ref>{{Cite magazine |last=Knight |first=Will |title=Meta's New Llama 3.1 AI Model Is Free, Powerful, and Risky |url=https://www.wired.com/story/meta-ai-llama-3/ |access-date=2024-08-04 |magazine=Wired |language=en-US |issn=1059-1028 |archive-date=2024-08-03 |archive-url=https://web.archive.org/web/20240803201314/https://www.wired.com/story/meta-ai-llama-3/ |url-status=live }}</ref>

] head Stefano Maffulli criticized Meta for describing Llama as ], saying that it was causing confusion among users and "polluting" the term.<ref>{{Cite news|url=https://www.ft.com/content/397c50d8-8796-4042-a814-0ac2c068361f|title=Meta under fire for ‘polluting’ open-source|last=Waters|first=Richard|date=October 17, 2024|work=]}}</ref>

== See also ==

* ]
* ], an open-source LLM made by IBM
* ], a French open-source AI company


==References== ==References==
Line 81: Line 276:
|date=24 February 2023 |date=24 February 2023
|url=https://ai.facebook.com/blog/large-language-model-llama-meta-ai/ |url=https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
|access-date=16 March 2023
|archive-date=3 March 2023
|archive-url=https://web.archive.org/web/20230303112302/https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
|url-status=live
}}</ref> }}</ref>
<ref name=l1arxiv>{{cite arXiv <ref name=l1arxiv>{{cite arXiv
Line 119: Line 318:
|work=The Verge |work=The Verge
|title=Meta's powerful AI language model has leaked online — what happens now? |title=Meta's powerful AI language model has leaked online — what happens now?
|last=Vincent|first=James |last=Vincent
|first=James
|date=8 March 2023 |date=8 March 2023
|url=https://www.theverge.com/2023/3/8/23629362/meta-ai-language-model-llama-leak-online-misuse |url=https://www.theverge.com/2023/3/8/23629362/meta-ai-language-model-llama-leak-online-misuse
|access-date=16 March 2023
|archive-date=3 November 2023
|archive-url=https://web.archive.org/web/20231103161046/https://www.theverge.com/2023/3/8/23629362/meta-ai-language-model-llama-leak-online-misuse
|url-status=live
}}</ref> }}</ref>
<ref name=repo>{{cite web <ref name=repo>{{cite web
Line 128: Line 332:
|access-date=16 March 2023 |access-date=16 March 2023
|url=https://github.com/facebookresearch/llama |url=https://github.com/facebookresearch/llama
|archive-date=15 March 2023
|archive-url=https://web.archive.org/web/20230315183955/https://github.com/facebookresearch/llama/
|url-status=live
}}</ref> }}</ref>
<ref name=willison>{{cite web <ref name=willison>{{cite web
|work=Simon Willison's Weblog |work=Simon Willison's Weblog
|last=Willison|first=Simon |last=Willison
|first=Simon
|title=Large language models are having their Stable Diffusion moment |title=Large language models are having their Stable Diffusion moment
|date=11 March 2023 |date=11 March 2023
|url=https://simonwillison.net/2023/Mar/11/llama/ |url=https://simonwillison.net/2023/Mar/11/llama/
|access-date=16 March 2023
|archive-date=16 March 2023
|archive-url=https://web.archive.org/web/20230316201253/https://simonwillison.net/2023/Mar/11/llama/
|url-status=live
}}</ref> <!-- ] by established subject matter expert --> }}</ref> <!-- ] by established subject matter expert -->
<ref name=repo-alpaca>{{cite web <ref name=repo-alpaca>{{cite web
Line 141: Line 353:
|access-date=5 April 2023 |access-date=5 April 2023
|url=https://github.com/tloen/alpaca-lora |url=https://github.com/tloen/alpaca-lora
|archive-date=4 April 2023
|archive-url=https://web.archive.org/web/20230404210345/https://github.com/tloen/alpaca-lora
|url-status=live
}}</ref> }}</ref>
<ref name=red-pajama>{{cite web <ref name=red-pajama>{{cite web
|title=RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset |title=RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset
|url=https://github.com/togethercomputer/RedPajama-Data |url=https://github.com/togethercomputer/RedPajama-Data
|website=GitHub |website=GitHub
|publisher=Together |publisher=Together
|access-date=4 May 2023}}</ref> |access-date=4 May 2023
|archive-date=7 November 2023
<ref name=red-pajama-download>{{cite web
|archive-url=https://web.archive.org/web/20231107223503/https://github.com/togethercomputer/RedPajama-Data
|url-status=live
}}</ref>
<ref name=red-pajama-download>{{cite web
|title=RedPajama-Data-1T |title=RedPajama-Data-1T
|url=https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T |url=https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T
|website=Hugging Face |website=Hugging Face
|publisher=Together |publisher=Together
|access-date=4 May 2023}}</ref> |access-date=4 May 2023
|archive-date=3 November 2023
|archive-url=https://web.archive.org/web/20231103013716/https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T
|url-status=live
}}</ref>
<ref name="verge-initial-article">{{cite web
|last1=Peters
|first1=Jay
|last2=Vincent
|first2=James
|title=Meta has a new machine learning language model to remind you it does AI too
|url=https://www.theverge.com/2023/2/24/23613512/meta-llama-ai-research-large-language-model
|website=The Verge
|language=en
|date=24 February 2023}}</ref>
}} }}


== Further reading == == Further reading ==
{{refbegin}} {{refbegin}}
* {{Cite web |last1=Huang |first1=Kalley |last2=O'Regan |first2=Sylvia Varnham |date=September 5, 2023 |title=Inside Meta's AI Drama: Internal Feuds Over Compute Power |url=https://www.theinformation.com/articles/inside-metas-ai-drama-internal-feuds-over-compute-power |url-access=limited |url-status=live |archive-url=https://web.archive.org/web/20230905174145/https://www.theinformation.com/articles/inside-metas-ai-drama-internal-feuds-over-compute-power |archive-date=September 5, 2023 |access-date=September 6, 2023 |website=]}} *{{Cite web |last1=Huang |first1=Kalley |last2=O'Regan |first2=Sylvia Varnham |date=September 5, 2023 |title=Inside Meta's AI Drama: Internal Feuds Over Compute Power |url=https://www.theinformation.com/articles/inside-metas-ai-drama-internal-feuds-over-compute-power |url-access=limited |url-status=live |archive-url=https://web.archive.org/web/20230905174145/https://www.theinformation.com/articles/inside-metas-ai-drama-internal-feuds-over-compute-power |archive-date=September 5, 2023 |access-date=September 6, 2023 |website=]}}
{{refend}} {{refend}}

== External links ==
*{{Official website|https://www.llama.com/}}
*{{Official website|https://huggingface.co/meta-llama|name=Official Hugging Face organization for Llama, Llama Guard, and Prompt Guard models}}

{{Generative AI}}
{{Artificial intelligence navbox}}


] ]

Latest revision as of 19:16, 10 January 2025

Large language model by Meta AI Not to be confused with LaMDA.
Llama
An example of Llama answer, describing Misplaced Pages in a thoughtful wayScreenshot of an example of Llama answer describing Misplaced Pages
Developer(s)Meta AI
Initial releaseFebruary 24, 2023; 22 months ago (2023-02-24)
Stable releaseLlama 3.3 / December 7, 2024; 34 days ago (2024-12-07)
Repositorygithub.com/meta-llama/llama-models
Written inPython
Type
LicenseSource-available (Meta Llama 3.2 Community License)
Websitellama.com

Llama (Large Language Model Meta AI, formerly stylized as LLaMA) is a family of autoregressive large language models (LLMs) released by Meta AI starting in February 2023. The latest version is Llama 3.3, released in December 2024.

Llama models are trained at different parameter sizes, ranging between 1B and 405B. Originally, Llama was only available as a foundation model. Starting with Llama 2, Meta AI started releasing instruction fine-tuned versions alongside foundation models.

Model weights for the first version of Llama were made available to the research community under a non-commercial license, and access was granted on a case-by-case basis. Unauthorized copies of the first model were shared via BitTorrent. Subsequent versions of Llama were made accessible outside academia and released under licenses that permitted some commercial use.

Alongside the release of Llama 3, Meta added virtual assistant features to Facebook and WhatsApp in select regions, and a standalone website. Both services use a Llama 3 model.

Background

After the release of large language models such as GPT-3, a focus of research was up-scaling models which in some instances showed major increases in emergent capabilities. The release of ChatGPT and its surprise success caused an increase in attention to large language models.

Compared with other responses to ChatGPT, Meta's Chief AI scientist Yann LeCun stated that large language models are best for aiding with writing.

An empirical investigation of the Llama series was the scaling laws. It was observed that the Llama 3 models showed that when a model is trained on data that is more than the "Chinchilla-optimal" amount, the performance continues to scale log-linearly. For example, the Chinchilla-optimal dataset for Llama 3 8B is 200 billion tokens, but performance continued to scale log-linearly to the 75-times larger dataset of 15 trillion tokens.

Initial release

LLaMA was announced on February 24, 2023, via a blog post and a paper describing the model's training, architecture, and performance. The inference code used to run the model was publicly released under the open-source GPLv3 license. Access to the model's weights was managed by an application process, with access to be granted "on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world".

Llama was trained on only publicly available information, and was trained at various model sizes, with the intention to make it more accessible to different hardware. The model was exclusively a foundation model, although the paper contained examples of instruction fine-tuned versions of the model.

Meta AI reported the 13B parameter model performance on most NLP benchmarks exceeded that of the much larger GPT-3 (with 175B parameters), and the largest 65B model was competitive with state of the art models such as PaLM and Chinchilla.

Leak

On March 3, 2023, a torrent containing LLaMA's weights was uploaded, with a link to the torrent shared on the 4chan imageboard and subsequently spread through online AI communities. That same day, a pull request on the main LLaMA repository was opened, requesting to add the magnet link to the official documentation. On March 4, a pull request was opened to add links to HuggingFace repositories containing the model. On March 6, Meta filed takedown requests to remove the HuggingFace repositories linked in the pull request, characterizing it as "unauthorized distribution" of the model. HuggingFace complied with the requests. On March 20, Meta filed a DMCA takedown request for copyright infringement against a repository containing a script that downloaded LLaMA from a mirror, and GitHub complied the next day.

Reactions to the leak varied. Some speculated that the model would be used for malicious purposes, such as more sophisticated spam. Some have celebrated the model's accessibility, as well as the fact that smaller versions of the model can be run relatively cheaply, suggesting that this will promote the flourishing of additional research developments. Multiple commentators, such as Simon Willison, compared LLaMA to Stable Diffusion, a text-to-image model which, unlike comparably sophisticated models which preceded it, was openly distributed, leading to a rapid proliferation of associated tools, techniques, and software.

LLaMa 2

On July 18, 2023, in partnership with Microsoft, Meta announced LLaMa 2, the next generation of Llama. Meta trained and released Llama 2 in three model sizes: 7, 13, and 70 billion parameters. The model architecture remains largely unchanged from that of LLaMA-1 models, but 40% more data was used to train the foundational models. The accompanying preprint also mentions a model with 34B parameters that might be released in the future upon satisfying safety targets.

LLaMa 2 includes foundation models and models fine-tuned for chat. In a further departure from the original version of LLaMa, all models are released with weights and may be used for many commercial use cases. However, because LLaMa's license enforces an acceptable use policy that prohibits Llama from being used for some purposes, Meta's use of the term open source to describe Llama has been disputed by the Open Source Initiative (which maintains the The Open Source Definition) and others.

Code Llama is a fine-tune of LLaMa 2 with code specific datasets. 7B, 13B, and 34B versions were released on August 24, 2023, with the 70B releasing on the January 29, 2024. Starting with the foundation models from LLaMa 2, Meta AI would train an additional 500B tokens of code datasets, before an additional 20B token of long-context data, creating the Code Llama foundation models. This foundation model was further trained on 5B instruction following token to create the instruct fine-tune. Another foundation model was created for Python code, which trained on 100B tokens of Python-only code, before the long-context data.

Llama 3

An AI-generated image of a llama and a robot looking towards each other
Example of an image generated by Meta AI Imagine, powered by Llama 3. Prompt: A representation of Meta AI and Llama

On April 18, 2024, Meta released Llama-3 with two sizes: 8B and 70B parameters. The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M human-annotated examples". Meta AI's testing showed in April 2024 that Llama 3 70B was beating Gemini Pro 1.5 and Claude 3 Sonnet on most benchmarks. Meta also announced plans to make Llama 3 multilingual and multimodal, better at coding and reasoning, and to increase its context window.

During an interview with Dwarkesh Patel, Mark Zuckerberg said that the 8B version of Llama 3 was nearly as powerful as the largest Llama 2. Compared to previous models, Zuckerberg stated the team was surprised that the 70B model was still learning even at the end of the 15T tokens training. The decision was made to end training to focus GPU power elsewhere.

Llama-3.1 was released on July 23, 2024, with three sizes: 8B, 70B, and 405B parameters.

Comparison of models

For the training cost column, only the largest model's cost is written. So for example, "21,000" is the training cost of Llama 2 69B in units of petaFLOP-day. Also, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP. "T" means "trillion" and "B" means "billion".

Name Release date Parameters Training cost (petaFLOP-day) Context length (tokens) Corpus size (tokens) Commercial viability?
LLaMA February 24, 2023
  • 6.7B
  • 13B
  • 32.5B
  • 65.2B
6,300 2048 1–1.4T No
Llama 2 July 18, 2023
  • 6.7B
  • 13B
  • 69B
21,000 4096 2T Yes, subject to acceptable use policy
Code Llama August 24, 2023
  • 6.7B
  • 13B
  • 33.7B
  • 69B
Llama 3 April 18, 2024
  • 8B
  • 70.6B
100,000 8192 15T
Llama 3.1 July 23, 2024
  • 8B
  • 70.6B
  • 405B
440,000 128,000
Llama 3.2 September 25, 2024
  • 1B
  • 3B
  • 11B
  • 90B
128,000
Llama 3.3 December 7, 2024
  • 70B
128,000

Architecture and training

"recommendation letter for the Magic Unicorn Corporation"

Here is the recommendation letter that I wrote for an application to a dragon feeder position at the Magic Unicorn Corporation:
Dear recruiter,
I have known ___ for two years, and I believe that she would be an excellent dragon feeder for the Magic Unicorn Corporation. ___ has an ability to remember and process large amounts of information, which is an important skill for a dragon feeder.
___, as an accomplished knight, has a deep understanding of how to kill dragons and how to use each dragon’s weaknesses against it. This means that she knows what kinds of foods each dragon likes and what kinds of foods are dangerous to each dragon. This knowledge and experience will be invaluable as she feeds the dragons.
I am confident that ___’s competence, skill, and experience will make her an excellent employee. Please contact me at (___) ___-___ if you have any questions. I look forward to hearing from you.
Best regards,
Honorable Knight
Sir George

– Output of 65 billion parameter LLaMA model before instruction tuning, given the prompt (in bold)

Architecture

Like GPT-3, the Llama series of models are decoder-only Transformers, but there are some minor differences:

key hyperparameters of Llama 3.1
8B 70B 405B
Layers 32 80 126
Model Dimension 4,096 8,192 16,384
FFN Dimension 14,336 28,672 53,248
Attention Heads 32 64 128
Key/Value Heads 8 8 8
Peak Learning Rate 3 × 10 1.5 × 10 0.8 × 10
Activation Function SwiGLU
Vocabulary Size 128,000
Positional Embeddings RoPE ( θ = 500 , 000 ) {\displaystyle \operatorname {RoPE} (\theta =500,000)}

Training datasets

LLaMA's developers focused their effort on scaling the model's performance by increasing the volume of training data, rather than the number of parameters, reasoning that the dominating cost for LLMs is from doing inference on the trained model rather than the computational cost of the training process.

LLaMA 1 foundational models were trained on a data set with 1.4 trillion tokens, drawn from publicly available data sources, including:

On April 17, 2023, TogetherAI launched a project named RedPajama to reproduce and distribute an open source version of the LLaMA dataset. The dataset has approximately 1.2 trillion tokens and is publicly available for download.

Llama 2 foundational models were trained on a data set with 2 trillion tokens. This data set was curated to remove Web sites that often disclose personal data of people. It also upsamples sources considered trustworthy. Llama 2 - Chat was additionally fine-tuned on 27,540 prompt-response pairs created for this project, which performed better than larger but lower-quality third-party datasets. For AI alignment, reinforcement learning with human feedback (RLHF) was used with a combination of 1,418,091 Meta examples and seven smaller datasets. The average dialog depth was 3.9 in the Meta examples, 3.0 for Anthropic Helpful and Anthropic Harmless sets, and 1.0 for five other sets, including OpenAI Summarize, StackExchange, etc.

Llama 3 consists of mainly English data, with over 5% in over 30 other languages. Its dataset was filtered by a text-quality classifier, and the classifier was trained by text synthesized by Llama 2.

Fine-tuning

Llama 1 models are only available as foundational models with self-supervised learning and without fine-tuning. Llama 2 – Chat models were derived from foundational Llama 2 models. Unlike GPT-4 which increased context length during fine-tuning, Llama 2 and Code Llama - Chat have the same context length of 4K tokens. Supervised fine-tuning used an autoregressive loss function with token loss on user prompts zeroed out. The batch size was 64.

For AI alignment, human annotators wrote prompts and then compared two model outputs (a binary protocol), giving confidence levels and separate safety labels with veto power. Two separate reward models were trained from these preferences for safety and helpfulness using Reinforcement learning from human feedback (RLHF). A major technical contribution is the departure from the exclusive use of Proximal Policy Optimization (PPO) for RLHF – a new technique based on Rejection sampling was used, followed by PPO.

Multi-turn consistency in dialogs was targeted for improvement, to make sure that "system messages" (initial instructions, such as "speak in French" and "act like Napoleon") are respected during the dialog. This was accomplished using the new "Ghost attention" technique during training, which concatenates relevant instructions to each new user message but zeros out the loss function for tokens in the prompt (earlier parts of the dialog).

Applications

The Stanford University Institute for Human-Centered Artificial Intelligence (HAI) Center for Research on Foundation Models (CRFM) released Alpaca, a training recipe based on the LLaMA 7B model that uses the "Self-Instruct" method of instruction tuning to acquire capabilities comparable to the OpenAI GPT-3 series text-davinci-003 model at a modest cost. The model files were officially removed on March 21, 2023, over hosting costs and safety concerns, though the code and paper remain online for reference.

Meditron is a family of Llama-based finetuned on a corpus of clinical guidelines, PubMed papers, and articles. It was created by researchers at École Polytechnique Fédérale de Lausanne School of Computer and Communication Sciences, and the Yale School of Medicine. It shows increased performance on medical-related benchmarks such as MedQA and MedMCQA.

Zoom used Meta Llama 2 to create an AI Companion that can summarize meetings, provide helpful presentation tips, and assist with message responses. This AI Companion is powered by multiple models, including Meta Llama 2.

Reuters reported in 2024 that many Chinese foundation models relied on Llama models for their training.

llama.cpp

Main article: llama.cpp

Software developer Georgi Gerganov released llama.cpp as open-source on March 10, 2023. It's a re-implementation of LLaMA in C++, allowing systems without a powerful GPU to run the model locally. The llama.cpp project introduced the GGUF file format, a binary format that stores both tensors and metadata. The format focuses on supporting different quantization types, which can reduce memory usage, and increase speed at the expense of lower model precision.

llamafile created by Justine Tunney is an open-source tool that bundles llama.cpp with the model into a single executable file. Tunney et al. introduced new optimized matrix multiplication kernels for x86 and ARM CPUs, improving prompt evaluation performance for FP16 and 8-bit quantized data types.

Military

In 2024, researchers from the People's Liberation Army Academy of Military Sciences (top military academy of China) were reported to have developed a military tool using Llama, which Meta Platforms stated was unauthorized due to Llama's license prohibiting the use of the model for military purposes. Meta granted the US government and US military contractors permission to use Llama in November 2024, but continued to prohibit military use by non-US entities.

Reception

Wired describes the 8B parameter version of Llama 3 as being "surprisingly capable" given its size.

The response to Meta's integration of Llama into Facebook was mixed, with some users confused after Meta AI told a parental group that it had a child.

According to the Q4 2023 Earnings transcript, Meta adopted the strategy of open weights to improve on model safety, iteration speed, increase adoption among developers and researchers, and to become the industry standard. Llama 5, 6, and 7 are planned for the future.

The release of Llama models has sparked significant debates on the benefits and misuse risks of open weight models. Such models can be fine-tuned to remove safeguards, notably by cyber criminals, until they comply with harmful requests. Some experts contend that future models may facilitate causing damage more than defending against it, for example by making it relatively easy to engineer advanced bioweapons without specialized knowledge. Conversely, open-weight models can be useful for a wide variety of purposes, including for safety research.

Open Source Initiative head Stefano Maffulli criticized Meta for describing Llama as open source, saying that it was causing confusion among users and "polluting" the term.

See also

References

  1. "llama-models/models/llama3_2/LICENSE at main · meta-llama/llama-models · GitHub". GitHub. Archived from the original on 2024-09-29. Retrieved 2024-10-20.
  2. ^ Touvron, Hugo; Lavril, Thibaut; Izacard, Gautier; Martinet, Xavier; Lachaux, Marie-Anne; Lacroix, Timothée; Rozière, Baptiste; Goyal, Naman; Hambro, Eric; Azhar, Faisal; Rodriguez, Aurelien; Joulin, Armand; Grave, Edouard; Lample, Guillaume (2023). "LLaMA: Open and Efficient Foundation Language Models". arXiv:2302.13971 .
  3. ^ "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023. Archived from the original on 3 March 2023. Retrieved 16 March 2023.
  4. Wiggers, Kyle (2024-12-06). "Meta unveils a new, more efficient Llama model". TechCrunch. Retrieved 2024-12-25.
  5. ^ "Introducing Llama 3.1: Our most capable models to date". ai.meta.com. July 23, 2024. Archived from the original on 2024-07-23. Retrieved 2024-07-23.
  6. ^ Peters, Jay; Vincent, James (24 February 2023). "Meta has a new machine learning language model to remind you it does AI too". The Verge.
  7. ^ "Meta and Microsoft Introduce the Next Generation of LLaMA". Meta. 18 July 2023. Archived from the original on 14 September 2023. Retrieved 21 July 2023.
  8. Malik, Yuvraj; Paul, Katie (25 February 2023). "Meta heats up Big Tech's AI arms race with new language model". Reuters.
  9. ^ OpSec Online LLC (21 March 2023). "github/dmca - Notice of Claimed Infringement via Email". GitHub. Archived from the original on 10 April 2023. Retrieved 25 March 2023.
  10. David, Emilia (30 October 2023). "Meta's AI research head wants open source licensing to change". The Verge. Archived from the original on 14 September 2024. Retrieved 20 October 2024.
  11. "Meet Your New Assistant: Meta AI, Built With Llama 3". Meta. 18 April 2024. Archived from the original on 7 October 2024. Retrieved 20 October 2024.
  12. "Examining Emergent Abilities in Large Language Models". hai.stanford.edu. 13 September 2022.
  13. "The inside story of how ChatGPT was built from the people who made it". MIT Technology Review. Archived from the original on 2023-03-03. Retrieved 2024-10-20.
  14. Ray, Tiernan (23 January 2023). "ChatGPT is 'not particularly innovative,' and 'nothing revolutionary', says Meta's chief AI scientist". ZDNET. Archived from the original on 2023-02-17.
  15. Badminton, Nik (13 February 2023). "Meta's Yann LeCun on auto-regressive Large Language Models (LLMs)". Futurist.com. Archived from the original on 22 July 2024. Retrieved 20 October 2024.
  16. "Yann LeCun on LinkedIn: My unwavering opinion on current (auto-regressive) LLMs". www.linkedin.com. Archived from the original on 2024-09-17. Retrieved 2024-10-20.
  17. "Meta's Yann LeCun Asks How AIs will Match — and Exceed — Human-level Intelligence".
  18. ^ "Introducing Meta Llama 3: The most capable openly available LLM to date". ai.meta.com. April 18, 2024. Archived from the original on 2024-05-15. Retrieved 2024-04-21.
  19. "llama". GitHub. Archived from the original on 15 March 2023. Retrieved 16 March 2023.
  20. ^ Vincent, James (8 March 2023). "Meta's powerful AI language model has leaked online — what happens now?". The Verge. Archived from the original on 3 November 2023. Retrieved 16 March 2023.
  21. ^ VK, Anirudh (6 March 2023). "Meta's LLaMA Leaked to the Public, Thanks To 4chan". Analytics India Magazine. Archived from the original on 26 March 2023. Retrieved 17 March 2023.
  22. "Save bandwidth by using a torrent to distribute more efficiently by ChristopherKing42 · Pull Request #73 · facebookresearch/llama". GitHub. Archived from the original on 10 April 2023. Retrieved 25 March 2023.
  23. "Download weights from hugging face to help us save bandwidth by Jainam213 · Pull Request #109 · facebookresearch/llama". GitHub. Archived from the original on 21 March 2023. Retrieved 17 March 2023.
  24. Cox, Joseph (7 March 2023). "Facebook's Powerful Large Language Model Leaks Online". Vice. Archived from the original on 6 April 2023. Retrieved 17 March 2023.
  25. Willison, Simon (11 March 2023). "Large language models are having their Stable Diffusion moment". Simon Willison's Weblog. Archived from the original on 16 March 2023. Retrieved 16 March 2023.
  26. ^ Touvron, Hugo; Martin, Louis; et al. (18 Jul 2023). "LLaMA-2: Open Foundation and Fine-Tuned Chat Models". arXiv:2307.09288 .
  27. Edwards, Benj (2023-07-18). "Meta launches LLaMA-2, a source-available AI model that allows commercial applications [Updated]". Ars Technica. Archived from the original on 2023-11-07. Retrieved 2023-08-08.
  28. ^ Thomas, Prasanth Aby (5 November 2024). "Meta offers Llama AI to US government for national security". CIO. Retrieved 9 December 2024.
  29. "Introducing Code Llama, a state-of-the-art large language model for coding". ai.meta.com. Archived from the original on 2024-09-27. Retrieved 2024-10-20.
  30. Rozière, Baptiste; Gehring, Jonas; Gloeckle, Fabian; Sootla, Sten; Gat, Itai; Tan, Xiaoqing Ellen; Adi, Yossi; Liu, Jingyu; Sauvestre, Romain (2024-01-31). "Code Llama: Open Foundation Models for Code". arXiv:2308.12950 .
  31. Wiggers, Kyle (18 April 2024). "Meta releases Llama 3, claims it's among the best open models available". TechCrunch. Archived from the original on 18 September 2024. Retrieved 20 October 2024.
  32. Mann, Tobias (April 19, 2024). "Meta debuts third-generation Llama large language model". The Register. Archived from the original on August 25, 2024. Retrieved October 20, 2024.
  33. Patel, Dwarkesh (2024-07-24). "Mark Zuckerberg - Llama 3, Open Sourcing $10b Models, & Caesar Augustus". www.dwarkeshpatel.com. Archived from the original on 2024-07-16. Retrieved 2024-08-01. the 8 billion is nearly as powerful as the biggest version of Llama 2 that we released even by the end, it was... still learning right it's like we probably could have fed it more tokens and it would have gotten somewhat better but i mean at some point you know you're running a company you need to do these meta reasoning questions of how do I want to spend our GPUs
  34. ^ Dubey, Abhimanyu; Jauhri, Abhinav; Pandey, Abhinav; Kadian, Abhishek; Al-Dahle, Ahmad; Letman, Aiesha; Mathur, Akhil; Schelten, Alan; Yang, Amy (2024-07-31), The Llama 3 Herd of Models, arXiv:2407.21783
  35. "The Falcon has landed in the Hugging Face ecosystem". huggingface.co. Archived from the original on 2023-06-20. Retrieved 2023-06-20.
  36. "llama/MODEL_CARD.md at main · meta-llama/llama". GitHub. Archived from the original on 2024-05-28. Retrieved 2024-05-28.
  37. "Andrej Karpathy (Apr 18, 2024), The model card has some more interesting info too". Archived from the original on August 17, 2024. Retrieved October 20, 2024.
  38. "llama3/MODEL_CARD.md at main · meta-llama/llama3". GitHub. Archived from the original on 2024-05-21. Retrieved 2024-05-28.
  39. "llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models". GitHub. Archived from the original on 2024-07-23. Retrieved 2024-07-23.
  40. Robison, Kylie (2024-09-25). "Meta releases its first open AI model that can process images". The Verge. Retrieved 2024-09-25.
  41. Wiggers, Kyle (2024-09-25). "Meta's Llama AI models get multimodal". TechCrunch. Archived from the original on 2024-09-25. Retrieved 2024-09-25.
  42. "Archived copy". ai.meta.com. Archived from the original on 2024-09-25. Retrieved 2024-09-26.{{cite web}}: CS1 maint: archived copy as title (link)
  43. Shazeer, Noam (2020-02-01). "GLU Variants Improve Transformer". arXiv:2002.05202 .
  44. Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (2021-04-01). "RoFormer: Enhanced Transformer with Rotary Position Embedding". arXiv:2104.09864 .
  45. Zhang, Biao; Sennrich, Rico (2019-10-01). "Root Mean Square Layer Normalization". arXiv:1910.07467 .
  46. Lei Ba, Jimmy; Kiros, Jamie Ryan; Hinton, Geoffrey E. (2016-07-01). "Layer Normalization". arXiv:1607.06450 .
  47. "RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset". GitHub. Together. Archived from the original on 7 November 2023. Retrieved 4 May 2023.
  48. "RedPajama-Data-1T". Hugging Face. Together. Archived from the original on 3 November 2023. Retrieved 4 May 2023.
  49. Taori, Rohan; Gulrajani, Ishaan; Zhang, Tianyi; Dubois, Yann; Li, Xuechen; Guestrin, Carlos; Liang, Percy; Hashimoto, Tatsunori B. (13 March 2023). "Alpaca: A Strong, Replicable Instruction-Following Model". Stanford Center for Research on Foundation Models. Archived from the original on 6 April 2023.
  50. Wang, Yizhong; Kordi, Yeganeh; Mishra, Swaroop; Liu, Alisa; Smith, Noah A.; Khashabi, Daniel; Hajishirzi, Hannaneh (2022). "Self-Instruct: Aligning Language Models with Self-Generated Instructions". arXiv:2212.10560 .
  51. "Stanford CRFM". crfm.stanford.edu. Archived from the original on 2023-04-06. Retrieved 2023-03-20.
  52. Quach, Katyanna. "Stanford takes costly, risky Alpaca AI model offline". www.theregister.com.
  53. "Stanford Researchers Take Down Alpaca AI Over Cost and Hallucinations". Gizmodo. 21 March 2023. Archived from the original on 12 May 2024. Retrieved 20 October 2024.
  54. "alpaca-lora". GitHub. Archived from the original on 4 April 2023. Retrieved 5 April 2023.
  55. "Meditron: An LLM suite for low-resource medical settings leveraging Meta Llama". ai.meta.com.
  56. Petersen, Tanya (28 November 2023). "EPFL's new Large Language Model for Medical Knowledge". Archived from the original on 17 September 2024. Retrieved 20 October 2024.
  57. "epfLLM/meditron". epfLLM. 11 May 2024. Archived from the original on 27 September 2024. Retrieved 20 October 2024.
  58. "How Companies Are Using Meta Llama". Meta. 7 May 2024. Archived from the original on 27 September 2024. Retrieved 20 October 2024.
  59. "How dependent is China on US artificial intelligence technology?". Reuters. May 9, 2024.
  60. Edwards, Benj (2023-03-13). "You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi". Ars Technica. Archived from the original on 2024-01-09. Retrieved 2024-01-04.
  61. "GGUF". huggingface.co. Retrieved 9 May 2024.
  62. Labonne, Maxime (29 November 2023). "Quantize Llama models with GGUF and llama.cpp". Medium. Towards Data Science. Archived from the original on 9 May 2024. Retrieved 9 May 2024.
  63. Connatser, Matthew. "Llamafile LLM driver project boosts performance on CPU cores". www.theregister.com. Archived from the original on 10 May 2024. Retrieved 10 May 2024.
  64. Cheung, Sunny (October 31, 2024). "PRC Adapts Meta's Llama for Military and Security AI Applications". Jamestown Foundation. Retrieved 2024-11-03.
  65. Pomfret, James; Pang, Jessie (November 1, 2024). "Chinese researchers develop AI model for military use on back of Meta's Llama". Reuters. Retrieved November 1, 2024.
  66. Smith, Matthew S. (17 November 2024). "Meta Opens Its AI Model for the U.S. Military - IEEE Spectrum". IEEE Spectrum. Retrieved 9 December 2024.
  67. Knight, Will. "Meta's Open Source Llama 3 Is Already Nipping at OpenAI's Heels". Wired. Archived from the original on 2024-09-27. Retrieved 2024-10-20.
  68. "Meta's amped-up AI agents confusing Facebook users". ABC News. 19 April 2024. Archived from the original on 2024-09-17. Retrieved 2024-10-20.
  69. "Archived copy" (PDF). Archived (PDF) from the original on 2024-09-17. Retrieved 2024-10-20.{{cite web}}: CS1 maint: archived copy as title (link)
  70. Knight, Will. "Meta's New Llama 3.1 AI Model Is Free, Powerful, and Risky". Wired. ISSN 1059-1028. Archived from the original on 2024-08-03. Retrieved 2024-08-04.
  71. Waters, Richard (October 17, 2024). "Meta under fire for 'polluting' open-source". Financial Times.

Further reading

External links

Generative AI
Concepts
Models
Text
Image
Video
Music
Companies
Category
Artificial intelligence
Concepts
Applications
Implementations
Audio–visual
Text
Decisional
People
Architectures
Categories: