The LLM Index

A list of large language models (LLMs), including open-source and commercial offerings, comparisons of each, and libraries for working with LLMs. Find the best large language models for your use case.

Last updated: 2025-01-27

Large language models (LLMs) are powerful machine learning systems that for many use cases can now understand and compose text at a near human level. They are currently the leading subcategory of Foundation Models, large models pre-trained using unsupervised methods on enormous datasets that can be tuned to perform a range of tasks. Due to their capabilities, individuals as well as businesses are now regularly using LLMs through popular platforms such as ChatGPT, Gemini, and Claude. This index is a list of LLMs and their properties and functionality. For a snapshot "evolutionary tree", we recommend Figure 1 in this paper.

Note that LLMs are being developed and released at a frantic clip. While we'll try and keep this LLM list up-to-date, we may have missed some recent releases. Please contact zxie[at]sapling.ai with any significant updates.

Leaderboards

Many reading this will be most interested in which LLM will perform best for their use case. While this can depend on the evaluation method and things are changing rapidly, we recommend the following resources to help make that assessment:

Chatbot Arena (LMSYS Org) and "Leaderboard" tab on LMSYS
Open LLM Leaderboard (Hugging Face)

Commercial LLMs

Most software businesses are familiar with cloud service providers (CSPs) that provide scalable computing resources. With the growth of ChatGPT, new LLM cloud services have been launched by familiar incumbents as well as well-capitalized startups.

LM	Initial Release	Developer	Reference
Gemini (FKA Bard)	2023-03-21	Google	Link
ChatGPT	2022-11-30	OpenAI	Link
Claude	2023-03-14	Anthropic	Link
Command R	2021-11-15	Cohere	Link

Open Source LLMs

Assuming you have the ability to run models with billions of parameters, using an open source model is one way to ensure control of your systems and data. The open source LLM ecosystem is moving quickly, most notably after the release of Meta's Llama models (including Llama 2/3) followed by the release of Mistral's models. In parallel to the release of powerful models trained on large corpora of data and instruct-finetuned by research groups, a community of developers has also made it possible to run larger and larger models in real-time on commodity hardware—even, for example, on a consumer laptop.

LM	Initial Release	Developer	License	Reference
DeepSeek	2023-11-29	DeepSeek	MIT	Link
FLAN-T5	2022-12-06	Google	Apache 2.0	Link
Gemma	2024-02-21	Google	Custom	Link
Gemma 2	2024-06-27	Google	Custom	Link
Grok	2023-11-05	xAI	Apache 2.0	Link
Llama 2	2023-07-18	Meta	Custom (Commercial OK)	Link
Llama 3	2024-04-18	Meta	Custom (Commercial OK)	Link
Llama 3.3	2024-12-06	Meta	Custom (Commercial OK)	Link
Mistral	2023-09-27	Mistral AI	Apache 2.0	Link
Phi	2023-06-20	Microsoft	MIT	Link
Qwen	2023-09-13	Alibaba Cloud	Custom	Link

Comparisons

Commercial LLM Comparison

Side-by-side comparisons of different commercial LLM offerings.

	Gemini (FKA Bard)	ChatGPT	Claude	Command R
Gemini (FKA Bard)		Link	Link	Link
ChatGPT	Link		Link	Link
Claude	Link	Link		Link
Command R	Link	Link	Link

Open Source LLM Comparison

Side-by-side comparisons of open source LLM options.

Scroll right to see the full table.

	DeepSeek	FLAN-T5	Gemma	Gemma 2	Grok	Llama 2	Llama 3	Llama 3.3	Mistral	Phi	Qwen
DeepSeek		Link	Link	Link	Link	Link	Link	Link	Link	Link	Link
FLAN-T5	Link		Link	Link	Link	Link	Link	Link	Link	Link	Link
Gemma	Link	Link		Link	Link	Link	Link	Link	Link	Link	Link
Gemma 2	Link	Link	Link		Link	Link	Link	Link	Link	Link	Link
Grok	Link	Link	Link	Link		Link	Link	Link	Link	Link	Link
Llama 2	Link	Link	Link	Link	Link		Link	Link	Link	Link	Link
Llama 3	Link	Link	Link	Link	Link	Link		Link	Link	Link	Link
Llama 3.3	Link	Link	Link	Link	Link	Link	Link		Link	Link	Link
Mistral	Link	Link	Link	Link	Link	Link	Link	Link		Link	Link
Phi	Link	Link	Link	Link	Link	Link	Link	Link	Link		Link
Qwen	Link	Link	Link	Link	Link	Link	Link	Link	Link	Link

By Industry

The most widely known LLMs are general-purpose, i.e. they can perform a variety of tasks across different topics and commercial industries. However, sometimes users and businesses may want an LLM trained on data from a specific industry, reducing the amount of prompting required for it to behave in an industry-relevant way and constraining its behavior. Also known as domain-specific LLMs, these language models may be easier to deploy to production for many businesses or serve as a better foundation for fine-tuning.

Coming Soon

LLMs for biomedical, healthcare, finance, academia, and eCommerce.

By Language

LLMs are often trained on massive web crawls of text from various languages. Hence, often they are multilingual by default. However, there have also been LLMs trained specifically for languages besides English.

Coming Soon

Multimodal LLMs

Multimodal LLMs are LLMs that can process and generate not just text, but also other types of media, such as images, audio, and video. Most LLM platforms have multimodal support, most commonly to process documents such as PDFs as well as to generate images from text.

Coming Soon

Libraries

In addition to APIs, a number of developer libraries and SDKs have been released for working with LLMs. You can find Sapling's curated list of LLM libraries here:

Libraries

Frequently Asked Questions

As these systems are evolving rapidly, we do not feel comfortable passing judgement on which LLM is best. However, a combination of cloud vs. ability to self-host, pricing, and qualitative evaluation should be enough to prune the index down to a small number of possible options.

If you'd like to look over tables of numbers, in addition to the LMSYS Chatbot Arena, Stanford mantains the HELM benchmark (as of April 2024, however, this benchmark is out-of-date).

An ad hoc (but usually effective) approach is to check the sentiment on X (Twitter) and the LocalLLaMA Reddit group on different LLMs.

Contact us with a brief description of your use case if you'd like for us to make a snap assessment. Depending on your requirements, a smaller, custom language model may even be the best option.

Please see the question above on how to evaluate different LLMs. Some factors you'll likely wish to consider include (1) compute costs, (2) data security requirements, (3) whether a custom language model would work best, (4) latency requirements, and (5) internal expertise available to set up the deployment.

LLMs are now available for different languages (Chinese, English, etc.) as well as different industries (healthcare/biomedical, legal, software coding, financial services, and cybersecurity). We plan to release comparisons for different languages and industries soon; in the meantime, feel free to contact us regarding your specific need.

Training an LLM is expensive (even a 7B parameter model can take hundreds of GPUs to train in reasonable amounts of time -- weeks to months). Although libraries and scaffolding for training LLMs are being rapidly released, the process can still be finicky, especially if you do not have experience training NLP models. If you need guidance on getting started, it's more than likely you should instead be finetuning one of the existing commercial LLMs using their finetuning guides and/or finding a LLM that roughly matches your use case.

The LLM Index

A list of large language models (LLMs), including open-source and commercial offerings, comparisons of each, and libraries for working with LLMs. Find the best large language models for your use case.

Last updated: 2025-01-27

Leaderboards

Commercial LLMs

Open Source LLMs

Comparisons

Commercial LLM Comparison

Open Source LLM Comparison

By Industry

By Language

Multimodal LLMs

Libraries

Frequently Asked Questions

How do I evaluate different LLMs and determine which one is best for my use case?

Which LLM should I use?

Is there an LLM available for my specific use case?

How can I train my own LLM?