Architecture

Foundation Model

Large AI models trained on broad data that can be adapted to many downstream tasks, serving as a base for specialized applications.

What is a foundation model?

A foundation model is a large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks. The term was coined by Stanford's HAI in 2021.

Key characteristics:

  • Trained on massive, diverse datasets
  • Learn general-purpose representations
  • Adaptable to many tasks without retraining
  • Serve as a "foundation" for specialized applications

Examples:

  • GPT-4 (OpenAI): Text generation, reasoning, coding
  • Claude (Anthropic): Conversation, analysis, writing
  • BERT (Google): Text understanding, search
  • DALL-E (OpenAI): Image generation
  • Whisper (OpenAI): Speech recognition

The shift: Before: Train a new model for each task. Now: Start with a foundation model, adapt to your task.

Why foundation models matter

Efficiency: Instead of training from scratch for each task, adapt a pre-trained model. Dramatically reduces compute, data, and time requirements.

Capability: Foundation models learn rich representations from diverse data. This transfers to downstream tasks, often outperforming task-specific models.

Accessibility: Organizations can build powerful AI without massive training infrastructure. Use foundation models via APIs or fine-tune open-source versions.

Emergent abilities: Large foundation models exhibit capabilities not present in smaller versions or explicitly trained for. Reasoning, following instructions, in-context learning.

Versatility: Same model handles translation, summarization, question answering, coding, creative writing, and more.

How foundation models are built

Phase 1: Pre-training Train on massive datasets to learn general patterns.

For language models:

  • Data: Trillions of words from internet, books, code
  • Objective: Predict next word (or masked word)
  • Scale: Thousands of GPUs, months of training

Phase 2: Alignment (optional) Improve helpfulness and safety.

  • RLHF: Learn from human preferences
  • Constitutional AI: Learn from principles
  • Fine-tuning on instructions

Phase 3: Specialization (optional) Adapt for specific use cases.

  • Fine-tuning on domain data
  • Prompt engineering
  • RAG integration

The cost: Pre-training GPT-4 scale models costs $50M-$100M+. Most organizations use existing foundation models rather than training from scratch.

Adapting foundation models

Prompting: Zero adaptation—just write effective prompts.

  • Zero-shot: No examples
  • Few-shot: Include examples in prompt

Fine-tuning: Update model weights on your data.

  • Full fine-tuning: Update all parameters
  • LoRA/QLoRA: Update subset of parameters
  • Good for: Specific styles, domains, formats

RAG (Retrieval-Augmented Generation): Augment with external knowledge.

  • No retraining needed
  • Knowledge stays current
  • Good for: Domain knowledge, factual accuracy

Tool use: Connect to external capabilities.

  • APIs, databases, code execution
  • Good for: Actions, real-time data

Combination: Most production systems combine approaches—fine-tuned models with RAG and tool use.

Foundation model considerations

Choosing a model:

  • Capability: Does it handle your task well?
  • Cost: API pricing, hosting requirements
  • Speed: Latency requirements
  • Privacy: Where is data processed?
  • License: Can you use it commercially?

Build vs. buy:

  • Use APIs: Fastest, no infrastructure, per-use cost
  • Self-host open-source: Control, privacy, higher upfront cost
  • Fine-tune: Custom behavior, requires expertise
  • Train from scratch: Rarely justified except at largest scale

Risks:

  • Dependency on providers
  • Potential for bias and errors
  • Privacy concerns with external APIs
  • Cost at scale
  • Rapid obsolescence

Mitigations:

  • Abstract providers behind your own layer
  • Evaluate models for bias
  • Use self-hosted models for sensitive data
  • Monitor and budget costs
  • Design for model swapping