Fundamentals

Large Language Model (LLM)

A neural network trained on massive text datasets that can understand and generate human-like language.

What is a large language model?

A large language model (LLM) is an AI system trained on enormous amounts of text data to understand and generate human language. "Large" refers to both the training data (often trillions of words) and the model size (billions of parameters).

LLMs power most modern AI assistants, chatbots, and writing tools. They can:

  • Answer questions and explain concepts
  • Write and edit text in various styles
  • Translate between languages
  • Summarize long documents
  • Generate code
  • Reason through problems

Popular LLMs include OpenAI's GPT-4, Anthropic's Claude, Google's Gemini, and Meta's Llama.

How do LLMs work?

LLMs are built on the transformer architecture, which processes text by paying attention to relationships between words regardless of their distance in a sentence.

Training process:

  1. Pre-training: The model learns from massive text datasets (books, websites, code) by predicting the next word in sequences. This teaches grammar, facts, and reasoning patterns.

  2. Fine-tuning: The model is refined on specific tasks or higher-quality data to improve performance.

  3. RLHF (Reinforcement Learning from Human Feedback): Human raters evaluate responses, and the model learns to generate outputs humans prefer.

At inference time:

When you send a prompt, the LLM:

  1. Converts your text into tokens (numerical representations)
  2. Processes tokens through many layers of attention and computation
  3. Generates output tokens one at a time, each conditioned on all previous tokens
  4. Converts tokens back to readable text

Key LLM concepts

Parameters The learned values in the neural network. More parameters generally mean more capability. GPT-4 has over 1 trillion parameters; smaller models like Llama 3 8B have 8 billion.

Context window The maximum amount of text the model can process at once, measured in tokens. Ranges from 4K tokens (older models) to 200K+ tokens (Claude, Gemini).

Tokens The units LLMs process—roughly 4 characters or 0.75 words in English. "Hello world" is 2 tokens.

Temperature A setting that controls randomness. Low temperature (0.0) gives deterministic outputs; high temperature (1.0+) gives more creative, varied responses.

Prompt The input text you send to the model. Prompt engineering is the practice of optimizing prompts for better outputs.

How businesses use LLMs

Customer support Automating responses to common questions, summarizing tickets, drafting replies for agents.

Content creation Generating marketing copy, blog posts, product descriptions, and social media content.

Code assistance Writing, reviewing, and debugging code. Explaining codebases to new team members.

Data analysis Querying databases in natural language, generating reports, explaining insights.

Document processing Extracting information from contracts, summarizing documents, translating content.

Knowledge management Building searchable knowledge bases, answering employee questions, onboarding assistance.

Most production use cases combine LLMs with:

  • RAG for accurate, up-to-date information
  • Fine-tuning for domain-specific behavior
  • Function calling for taking actions

LLM limitations to understand

Hallucinations LLMs can generate plausible-sounding but incorrect information. They don't "know" what's true—they predict what sounds right.

Knowledge cutoff LLMs only know information from their training data. They can't access real-time information without tools.

Context limits While improving, context windows still limit how much information the model can consider at once.

Consistency LLMs may give different answers to the same question. They don't have persistent memory between conversations.

Reasoning limits While capable of impressive reasoning, LLMs can fail on problems requiring true logical deduction or mathematical precision.

Bias Training data biases are reflected in model outputs. Careful prompting and testing are needed to mitigate this.

Understanding these limitations is essential for building reliable AI applications.