The LLM Index
Large language models (LLMs) are powerful machine learning systems that for many use cases can now understand and compose text at a human level. Due to their capabilities, individuals as well as businesses are now regularly using LLMs in their daily tasks. This index is a list of of popular LLMs and their properties and functionality.
Note that LLMs are being developed and released at a frantic clip. While we'll try and keep this list up-to-date, we may miss some recent releases. Please contact zxie[at]sapling.ai
with any significant updates.
Commercial
Most software businesses are familar with cloud service providers (CSPs) that provide scalable computing resources. With the growth of ChatGPT, new LLM cloud services have been launched from familiar incumbents as well as well-capitalized startups.
LM | Initial Release | Developer | Instruct / RLHF | Reference |
---|---|---|---|---|
Bard | 2023-03-21 | Link | ||
ChatGPT | 2022-11-30 | Link | ||
Claude | 2023-03-14 | Link | ||
Cohere | 2021-11-15 | Link | ||
Jurassic | 2021-08-11 | Link | ||
Pi | 2023-05-02 | Link |
Open Source
Assuming you have the ability to run models with billions of parameters, using an open source model is one way to ensure control of your systems and data. The open source LLM ecosystem is moving quickly, most notably after the release of Meta's LLaMA models. In parallel to the release of powerful models trained on large corpuses of data and instruct-finetuned by research groups, a community of developers has also made it possible to run larger and larger models in real-time on commodity hardware—even on, for example, on a consumer laptop.
LM | Initial Release | Developer | License | Instruct / RLHF | Reference |
---|---|---|---|---|---|
Alpaca | 2023-03-13 | Noncommercial | Link | ||
BLOOM | 2022-07-06 | Open RAIL-M v1 | Link | ||
Cerebras-GPT | 2023-03-28 | Apache 2.0 | Link | ||
Dolly | 2023-03-24 | MIT | Link | ||
FLAN-T5 | 2022-12-06 | Apache 2.0 | Link | ||
FLAN-UL2 | 2023-03-03 | Apache 2.0 | Link | ||
GPT-J | 2021-06-09 | Apache 2.0 | Link | ||
GPT4All | 2023-03-26 | Varies | Link | ||
GPTNeo | 2021-03-21 | Apache 2.0 | Link | ||
Koala | 2023-04-03 | Noncommercial | Link | ||
LLaMA | 2023-02-24 | Noncommercial | Link | ||
MPT-7B | 2023-05-05 | Apache 2.0 | Link | ||
OpenAssistant | 2023-04-15 | Varies | Link | ||
OpenLLaMA | 2023-04-28 | Apache 2.0 | Link | ||
OPT | 2022-05-03 | NA | Link | ||
Pythia | 2023-02-13 | Apache 2.0 | Link | ||
RedPajama-INCITE | 2023-05-05 | Apache 2.0 | Link | ||
StableLM | 2023-04-19 | CC BY-SA 4.0 | Link | ||
StableVicuna | 2023-04-28 | Noncommercial | Link | ||
Vicuna | 2023-03-30 | Noncommercial | Link |
Comparisons
Commercial
Side-by-side comparisons of different commercial LLM offerings.
Bard | ChatGPT | Claude | Cohere | Jurassic | Pi | |
---|---|---|---|---|---|---|
Bard | Link | Link | Link | Link | Link | |
ChatGPT | Link | Link | Link | Link | Link | |
Claude | Link | Link | Link | Link | Link | |
Cohere | Link | Link | Link | Link | Link | |
Jurassic | Link | Link | Link | Link | Link | |
Pi | Link | Link | Link | Link | Link |
Open Source
Side-by-side comparisons of open source LLM options.
By Industry
The most widely known LLMs are general-purpose, i.e. they can perform a variety of tasks across different topics and commercial industries. However, sometimes users and businesses may want an LLM trained on data from a specific industry, reducing the amount of prompting required for it to behave in an industry-relevant way and constraining its behavior.
Coming Soon
By Language
LLMs are often trained on massive web crawls of text from various languages. Hence, often they are multilingual by default. However, there have also been LLMs trained specifically for languages besides English.
Coming Soon
Frequently Asked Questions
As these systems are evolving rapidly, we do not feel comfortable passing judgement on which LLM is best. However, a combination of cloud vs. ability to self-host, pricing, and qualitative evaluation should be enough to prune the index down to a small number of possible options.
If you'd like to look over tables of numbers, Stanford mantains the HELM benchmark.
Contact us with a brief description of your use case if you'd like for us to make a snap assessment. Depending on your requirements, a smaller, custom language model may even be the best option.
Please see the question above on how to evaluate different LLMs. Some factors you'll likely wish to consider include (1) compute costs, (2) data security requirements, (3) whether a custom language model would work best, (4) latency requirements, and (5) internal expertise available to set up the deployment.
LLMs are now available for different languages (Chinese, English, etc.) as well as different industries (healthcare/biomedical, legal, software coding, financial services, and cybersecurity). We plan to release comparisons for different languages and industries soon; in the meantime, feel free to contact us regarding your specific need.
Training an LLM is expensive. Although libraries and scaffolding for training LLMs are being rapidly released, the process can still be finicky, especially if you do not have experience training NLP models. If you need guidance on getting started, it's more than likely you should instead be finetuning one of the existing commercial LLMs using their finetuning guides and/or finding a LLM that roughly matches your use case.