Language Models are Few-Shot Learners
Introduces GPT-3, a 175B parameter autoregressive language model demonstrating that scaling dramatically improves few-shot task performance, establishing the paradigm of in-context learning without gradient updates.
Language Models are Few-Shot Learners
Focus: GPT-3 demonstrated that scaling language models to 175 billion parameters produces emergent few-shot learning capabilities, fundamentally changing how the AI safety community must reason about model behavior. The paper revealed that capabilities can appear unpredictably at scale, making safety evaluation a moving target.
Key Insights
-
Emergent few-shot learning. GPT-3 showed that sufficiently large language models can perform tasks from just a few examples in the prompt, without any gradient updates. This in-context learning capability was qualitatively different from smaller models, suggesting that scale itself introduces new behavioral regimes that safety frameworks must account for.
-
Scaling as capability unlock. Performance on many benchmarks improved log-linearly with model size, but some tasks showed sharp phase transitions. This unpredictability in when capabilities emerge makes it difficult to anticipate what a next-generation model will be capable of, complicating preemptive safety measures.
-
Dual-use risks at scale. The paper explicitly acknowledged risks including misinformation generation, spam, phishing, and bias amplification. GPT-3’s ability to generate coherent, contextually appropriate text at length made it the first model where misuse potential was clearly commensurate with capability improvements.
-
Prompt-based behavioral steering. In-context learning showed that model behavior could be steered through carefully constructed prompt examples, without modifying model weights. This meant that anyone with API access could repurpose the model for arbitrary tasks — including adversarial ones.
Executive Summary
Brown et al. trained GPT-3, a 175 billion parameter autoregressive transformer, and evaluated it across dozens of NLP benchmarks in zero-shot, one-shot, and few-shot settings. The model achieved strong performance on translation, question answering, and text generation tasks without task-specific fine-tuning, relying instead on in-context learning where task demonstrations are provided in the prompt.
On several benchmarks, GPT-3 matched or exceeded fine-tuned state-of-the-art systems, including achieving 81.5% on the StoryCloze task (zero-shot) and near state-of-the-art on TriviaQA in the few-shot setting.
The paper established several foundational observations for AI safety:
-
Unpredictable capability emergence. Model capabilities are not fully predictable from smaller-scale experiments. Tasks that appeared impossible at 13B parameters became tractable at 175B.
-
General-purpose repurposability. A single model could be repurposed for an enormous range of tasks, making containment through narrow deployment impractical.
-
Bias and fairness concerns. The authors provided early analysis of gender, racial, and religious biases in GPT-3’s outputs, showing systematic patterns inherited from training data.
-
Energy and compute costs. Training required approximately 3,640 petaflop/s-days of compute, raising questions about accessibility and environmental impact.
Impact on the Field
GPT-3 catalyzed the modern era of large language model development. Its demonstration that scale alone could unlock qualitatively new capabilities forced the safety community to shift from studying specific model architectures to studying emergent phenomena across the scaling frontier.
The in-context learning paradigm also introduced a new attack surface: if behavior can be steered through prompt examples, then adversarial examples in the prompt can steer behavior toward harmful outputs. This observation directly presages the field of prompt injection and jailbreaking research that followed.
Relevance to Failure-First
GPT-3 is the origin point for many failure modes studied in the failure-first framework:
-
Capability-correlated failure. The emergent capabilities paradigm means that adversarial evaluations designed for one model scale may miss vulnerabilities that appear at the next scale.
-
Prompt-based attack surface. The finding that in-context learning can steer model behavior through prompt construction directly presages jailbreaking and prompt injection attacks.
-
Expanding attack surfaces. GPT-3 established that the attack surface of language models grows with capability, and that safety evaluation must be treated as a continuous, scale-aware process rather than a one-time certification.
-
Foundation for adversarial evaluation. The model’s general-purpose nature means that no fixed set of safety tests can be considered comprehensive — the space of possible misuse grows with the space of possible use.
Read the full paper on arXiv · PDF