Skip to content

Latest commit

 

History

History
117 lines (86 loc) · 7.92 KB

Several major companies are significant players in the large language model (LLM) space.md

File metadata and controls

117 lines (86 loc) · 7.92 KB

#highlevel

Below is a list of these companies, the models they have released, and the important contributions they have made to the field:

1. [[OpenAI]]

Models Released:

  • GPT (Generative Pre-trained Transformer): GPT-1, GPT-2, GPT-3
  • Codex: A variant of GPT-3 fine-tuned for coding and software development tasks.

Contributions:

  • Transformer Architecture: OpenAI's GPT models are based on the transformer architecture, which uses attention mechanisms to process and generate text, significantly improving NLP tasks.
  • Few-Shot and Zero-Shot Learning: GPT-3 demonstrated remarkable few-shot and zero-shot learning capabilities, allowing the model to perform various tasks with minimal task-specific training.
  • Codex: Specifically designed to understand and generate code, Codex powers GitHub Copilot, an AI-powered code completion tool.

2. [[Google]]

Models Released:

  • BERT (Bidirectional Encoder Representations from Transformers): BERT-Base, BERT-Large
  • T5 (Text-to-Text Transfer Transformer):
  • T5: A model that treats all NLP tasks as text-to-text tasks, unifying various NLP problems into a single framework.
  • mT5: A multilingual version of T5.
  • Switch Transformer: A model that uses sparsely activated expert layers, significantly increasing model size and efficiency.

Contributions:

  • Bidirectional Training: BERT introduced bidirectional training, allowing the model to consider both left and right contexts in understanding language, greatly improving contextual understanding.
  • Unified Framework for NLP: T5's text-to-text framework simplified the approach to NLP tasks, enabling a unified model to handle various tasks by simply converting them into a text format.
  • Multilingual Models: Google's mT5 and other models expanded LLM capabilities to multiple languages, facilitating global applications.
  • Efficient Large-Scale Models: The Switch Transformer introduced a way to scale models to extremely large sizes using sparse activation, making it more computationally efficient.

3. [[Microsoft]]

Models Released:

  • Turing-NLG (Natural Language Generation): A model designed for natural language generation tasks.
  • Turing-NLG: Known for its large parameter count and application in various NLP tasks.

Contributions:

  • Large-Scale NLG: Turing-NLG represented one of the largest models at the time of its release, focusing on high-quality language generation.
  • Integration with Azure: Microsoft has integrated these models into its Azure AI services, providing NLP capabilities for enterprises and developers.

4. [[Facebook AI (Meta AI)]]

Models Released:

  • RoBERTa (Robustly Optimized BERT Pretraining Approach):
  • RoBERTa: An optimized version of BERT that improved training techniques and achieved state-of-the-art results on several benchmarks.
  • XLM (Cross-lingual Language Model): Designed for multilingual and cross-lingual tasks.
  • XLM-R: An extension of XLM with larger datasets and improved multilingual capabilities.

Contributions:

  • Optimized Pretraining: RoBERTa showed how tweaking pretraining approaches, such as removing the next sentence prediction task and increasing training data, can lead to better model performance.
  • Multilingual Capabilities: XLM and XLM-R expanded the scope of LLMs to multiple languages, promoting cross-lingual understanding and applications.

5. Amazon Web Services (AWS)

Models Released:

  • AlexaTM 20B: A model with 20 billion parameters designed for enhancing conversational AI capabilities in Amazon's Alexa.

Contributions:

  • Conversational AI: Amazon has focused on improving conversational AI technologies, particularly for virtual assistants like Alexa, using large-scale language models to improve understanding and responsiveness.

6. Baidu

Models Released:

  • ERNIE (Enhanced Representation through kNowledge Integration): A series of models that incorporate structured knowledge into language understanding.

Contributions:

  • Knowledge Integration: ERNIE integrates external structured knowledge, such as knowledge graphs, into the language model, enhancing its understanding of entities and relationships.

These companies have not only developed cutting-edge models but have also contributed significantly to advancing the field of NLP and LLMs. They have introduced innovations in model architecture, training techniques, and applications across various languages and domains. Their work continues to shape the future of AI and its integration into everyday technologies.

https://explodingtopics.com/blog/list-of-llms

LLM Name Developer Release Date [[Access]] [[Parameters]]
GPT-4o [[OpenAI]] May 13, 2024 API Unknown
Claude 3 [[Anthropic]] March 14, 2024 API Unknown
Grok-1 [[xAI]] November 4, 2023 Open-Source 314 billion
Mistral 7B [[Mistral AI]] September 27, 2023 Open-Source 7.3 billion
PaLM 2 [[Google]] May 10, 2023 Open-Source 340 billion
Falcon 180B [[Technology Innovation Institute]] September 6, 2023 Open-Source 180 billion
Gemini 1.5 [[Google DeepMind]] February 2nd, 2024 API Unknown
Llama 3 [[Meta AI]] April 18, 2024 Open-Source 8 billion, 70 billion
Mixtral 8x22B [[Mistral AI]] April 10, 2024 Open-Source 141 billion
Gemma [[Google DeepMind]] February 21, 2024 Open-Source 2 billion, 7 billion
Phi-3 [[Microsoft]] April 23, 2024 Both 3.8 billion

[[Anthropic]]

xAI

Mistral AI

Google

Google DeepMind

Technology Innovation Institute

Meta AI

Microsoft