Come on OpenAI, release the GPT-4 weights 😇
As Large Language Models (LLM) have exploded in popularity since the unveiling of ChatGPT by OpenAI in October 2022, research and commercial interest over LLMs has led to a tsunami of research contributions. This is an attempt to curate the list of research articles, with a description of the main findings of each paper to the best of my knowledge. This list is non-exhaustive.
- 🔥 Zhao, Wayne Xin et al. “A Survey of Large Language Models.” ArXiv abs/2303.18223 (April 2023)
- Great place to start. Setups a rigourous taxonomy for what constitutes an LLM: an NLP text generation transformer can be reasonably classified as an LLM when its parameter count hovers around 10B or more (for reference BERT only has 300 million parameters, LLaMA biggest model has 65 billion and GPT-3 has 175 billion).
- Bowman, Sam. “Eight Things to Know about Large Language Models.” ArXiv abs/2304.00612 (April 2023)
- In one of his 8 points, Bowman scrutinizes the "emergent abilities" claim by Wei et al. [2022], that on previously "unsolvable" tasks once a certain large threshold of parameter count is reached, LLM unexpectedly performance jumps massively. The author finds that of the 202 tasks evaluated by Wei et al., only 33% of them are truly "emergent", i.e. see discontinuous jumps in performance. Most other tasks see continuous improvements, no improvements and a few see performance degradations.
- Yang, Jingfeng et al. “Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond.” ArXiv abs/2304.13712 (April 2023)
- Main takeaway of this paper is that while LLMs outperform classical state-of-the-art transformers (BERT, RoBERTa, BART) on tasks that require out-of-sample generalization, on domain-specific tasks classical supervised methods still outperform LLMs (e.g. English-Kazakh translation).
- 🔥 Vaswani, Ashish et al. “Attention is All you Need.” NIPS (Dec. 2017)
- The original paper by researchers at Google that introduced Transformers for Natural Language Processing. And so the quest to AGI began...
- Bulatov, Aydar et al. “Scaling Transformer to 1M tokens and beyond with RMT.” ArXiv abs/2304.11062 (April 2023)
- Peng, Bo et al. “RWKV: Reinventing RNNs for the Transformer Era.” (May 2023)
- Yu, L. et al. “MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers.” ArXiv abs/2305.07185 (May 2023)
- Mohtashami, Amirkeivan and Martin Jaggi. “Landmark Attention: Random-Access Infinite Context Length for Transformers.” (May 2023)
- Devlin, Jacob et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” ArXiv abs/1810.04805 (Oct. 2019)
- BERT
- Radford, Alec et al. “Language Models are Unsupervised Multitask Learners.” (Aug. 2019)
- GPT-2
- Raffel, Colin et al. “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.” ArXiv abs/1910.10683 (Oct. 2019)
- T5
- 🔥 Brown, Tom B. et al., “Language Models are Few-Shot Learners.” ArXiv abs/2005.14165 (May 2020)
- GPT-3 (175B parameters) by OpenAI which powers chatbot ChatGPT (free-tier version). Proprietary LLM.
- Thoppilan, Romal et al. “LaMDA: Language Models for Dialog Applications.” ArXiv abs/2201.08239 (Jan. 2022)
- LaMDA (137B parameters) by Google which powered chatbot Bard (to be replaced soon by PaLM 2). Proprietary LLM.
- Chowdhery, Aakanksha et al. “PaLM: Scaling Language Modeling with Pathways.” ArXiv abs/2204.02311 (April 2022)
- PaLM 1 (540B parameters) by Google. Proprietary LLM.
- 🔥 Touvron, Hugo et al. “LLaMA: Open and Efficient Foundation Language Models.” ArXiv abs/2302.13971 (Fev. 2023)
- LLaMA (7B to 65B parameter variants) by Meta. Released to the public under a non-commercial licence. Now ubiquitous among the research and AI enthusiast community, seeing countless finetuned variants (e.g. Alpaca, Vicuna, Open Assistant, WizardLM, Manticore, etc.).
- 🔥 OpenAI. “GPT-4 Technical Report.” ArXiv abs/2303.08774 (March 2023)
- GPT-4 (unknown parameter count) by OpenAI which powers chatbot ChatGPT and can be used with external plugins (premium version). Proprietary LLM.
- Biderman, Stella Rose et al. “Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling.” ArXiv abs/2304.01373 (April 2023)
- Pythia (3B, 7B, 12B parameters) by EleutherAI (ex-OpenAI veterans). Open source with free commercial licence.
- Anil, Rohan et al. “PaLM 2 Technical Report.” (May 2023)
- PaLM 2 (340B parameters) by Google which is expected to replace LaMDA as the LLM powering chatbot Bard. Proprietary LLM.
- Note: the 340B figure was first reported by CNBC on May 16th 2023 (Source)
- Li, Raymond et al. “StarCoder: may the source be with you!” ArXiv abs/2305.06161 (May 2023)
- Starcoder (15B parameters) specifically trained on a subset of +80 coding languages from the more general coding dataset "The Stack" (which has over +300 programming languages). Available under an OpenRAIL licence.
- Penedo, Guilherme et al. “The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only.” ArXiv abs/2306.01116 (June 2023)
- Falcon (3B, 40B parameters) from the Technological Innovation Institute from the United Arab Emirates. Made a splash on the HuggingFace LLM rankings and has an open source licence compared to Meta's LLaMA.
*Reinforcement Learning with Human Feedback
**Supervised Fine Tuning
***Proximal Policy Optimization
- 🔥 Hu, J. Edward et al. “LoRA: Low-Rank Adaptation of Large Language Models.” ArXiv abs/2106.09685 (June 2021)
- 🔥 Ouyang, Long et al. “Training language models to follow instructions with human feedback.” ArXiv abs/2203.02155 (March 2022)
- Wang, Yizhong et al. “Self-Instruct: Aligning Language Model with Self Generated Instructions.” ArXiv abs/2212.10560 (Dec. 2022)
- Peng, Baolin et al. “Instruction Tuning with GPT-4.” ArXiv abs/2304.03277 (April 2023)
- 🔥 Kopf, Andreas et al. “OpenAssistant Conversations - Democratizing Large Language Model Alignment.” ArXiv abs/2304.07327 (April 2023)
- Xu, Can et al. “WizardLM: Empowering Large Language Models to Follow Complex Instructions.” ArXiv abs/2304.12244 (April 2023)
- Zhou, Chunting et al. “LIMA: Less Is More for Alignment.” (May 2023)
- Gudibande, Arnav et al. “The False Promise of Imitating Proprietary LLMs.” ArXiv abs/2305.15717 (May 2023)
- Mukherjee, Subhabrata et al. “Orca: Progressive Learning from Complex Explanation Traces of GPT-4.” (June 2023)
- Luo, Ziyang et al. “WizardCoder: Empowering Code Large Language Models with Evol-Instruct.” ArXiv abs/2306.08568 (June 2023)
- 🔥 Frantar, Elias et al. “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers.” ArXiv abs/2210.17323. Published at the ICLR 2023 (Oct. 2022)
- Yao, Z. et al. “A Comprehensive Study on Post-Training Quantization for Large Language Models”. ArXiv, abs/2303.08302. (March 2023)
- 🔥 Dettmers, Tim et al. “QLoRA: Efficient Finetuning of Quantized LLMs.” (May 2023)
- Dettmers, Tim et al. “SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression.” (June 2023)
- Bai, Yuntao et al. “Constitutional AI: Harmlessness from AI Feedback.” ArXiv abs/2212.08073 (Dec. 2022)
- Sun, Zhiqing et al. “Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision.” ArXiv abs/2305.03047 (May 2023)
- Bender, Emily M. et al. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜.” Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (March 2021)
- Wan, Alexander et al. “Poisoning Language Models During Instruction Tuning.” ArXiv abs/2305.00944 (May 2023)
- Santurkar, Shibani et al. “Whose Opinions Do Language Models Reflect?” ArXiv abs/2303.17548 (March 2023)
- Deshpande, A. et al. “Toxicity in ChatGPT: Analyzing Persona-assigned Language Models.” ArXiv abs/2304.05335 (April 2023)
- Shapira, Natalie et al. “Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models.” (May 2023)
- 🔥 Wei, Jason et al. “Emergent Abilities of Large Language Models.” Trans. Mach. Learn. Res. 2022 (June 2022)
- Schaeffer, Rylan et al. “Are Emergent Abilities of Large Language Models a Mirage?” ArXiv abs/2304.15004 (April 2023)
- Singhal, K. et al. “Large Language Models Encode Clinical Knowledge.” ArXiv abs/2212.13138 (Dec. 2022)
- Wu, Chaoyi et al. “PMC-LLaMA: Further Finetuning LLaMA on Medical Papers.” ArXiv abs/2304.14454 (April 2023)
- Singhal, K. et al. “Towards Expert-Level Medical Question Answering with Large Language Models.” ArXiv abs/2305.09617 (May 2023)
- Wu, Shijie et al. “BloombergGPT: A Large Language Model for Finance.” ArXiv abs/2303.17564 (April 2023)
- Xie, Qianqian et al. “The Wall Street Neophyte: A Zero-Shot Analysis of ChatGPT Over MultiModal Stock Movement Prediction Challenges.” ArXiv abs/2304.05351 (May 2023)
- Yang, Hongyang et al. “FinGPT: Open-Source Financial Large Language Models.” ArXiv abs/2306.06031 (June 2023)
- 🔥 Bubeck, Sébastien et al. “Sparks of Artificial General Intelligence: Early experiments with GPT-4.” ArXiv abs/2303.12712 (March 2023)
- Park, Joon Sung et al. “Generative Agents: Interactive Simulacra of Human Behavior.” ArXiv abs/2304.03442 (April 2023)
- Wang, Guanzhi et al. “Voyager: An Open-Ended Embodied Agent with Large Language Models.” (May 2023)