Pages

Monday, December 29, 2025

AI & LLM Concepts

1. Large Language Model (LLM)

  • An AI system that predicts the next word in a sentence.

  • Learns language patterns from massive amounts of text.

  • A neural network trained to predict the next word or "token" in a sequence.

Example:

  • You type: “All that glitters is”

  • The model predicts: “not gold”
    Like auto-complete on steroids.


2. Tokenization

  • Breaking text into small pieces (tokens) that the model can understand.

  • Tokens can be words, parts of words, or symbols.

  • The process of breaking down input text into smaller, discrete units (tokens) so the model can process natural language effectively

Example:

  • Sentence: “Running fast”

  • Tokens: run + ing + fast
    Helps AI understand verb tense and meaning.


3. Vectorization

  • Converting tokens into numbers that represent meaning.

  • Similar words have similar numbers.

  • Mapping words into a multi-dimensional space (vectors) where words with similar meanings are clustered together

Example:

  • “King” and “Queen” are close together

  • “King” and “Banana” are far apart
    Like placing words on a meaning map.


4. Attention

  • Helps AI understand context by focusing on important nearby words.

  • This is the key breakthrough that made LLMs powerful.

  • A mechanism that allows the model to look at nearby words to derive context, helping it distinguish between different meanings of the same word

Example:

  • “Apple is tasty” → fruit

  • “Apple revenue increased” → company
    Humans do this naturally; attention lets AI do the same.


5. Self-Supervised Learning

  • AI learns without humans labeling data.

  • It hides parts of text and tries to guess the missing parts.

  • A scalable training method where the model learns from the inherent structure of data (like filling in blanks) without needing human labels

Example:

  • Sentence: “The sky is ___”

  • AI learns the answer is likely “blue”
    Like solving fill-in-the-blanks automatically.


6. Transformer

  • The engine behind modern AI models.

  • Uses multiple layers of attention to understand deep meaning.

  • The specific architectural algorithm that uses attention layers and neural networks to predict the next token

Example:

  • Understanding sarcasm

  • Knowing emotions like fear, hunger, or intent
    Like reading between the lines in a conversation.


7. Fine-Tuning

  • Teaching a general AI to behave in a specific way.

  • Makes models domain-specific.

  • Taking a base model and training it further on specific data (like medical or financial records) to make it an expert in a particular field

Example:

  • Medical AI learns medical language

  • Finance AI learns financial terms
    Same brain, different training focus.


8. Few-Shot Prompting

  • Giving examples inside the prompt to guide the AI.

Example:

Q: Where is my order?
A: Please share your order ID.

Q: Where is my parcel?

AI learns the expected response style instantly.


9. Retrieval-Augmented Generation (RAG)

  • AI fetches relevant documents before answering.

  • Reduces hallucinations.

  • Enhancing an LLM by giving it access to real-time, relevant company documents from a database to provide more accurate answers

Example:

  • Customer asks about refund

  • AI pulls refund policy from company docs
    Like checking a rulebook before answering.


10. Vector Database

  • Stores documents as vectors (meaning-based).

  • Finds relevant info using semantic similarity, not keywords.

Example:

  • Query: “I’m upset with payment”

  • AI retrieves docs about refunds or complaints
    Even if “upset” isn’t written explicitly.


11. Model Context Protocol (MCP)

  • Lets AI connect to external systems and tools.

  • AI can act, not just talk.

  • A way for models to securely connect with external servers or tools (like booking a flight) to execute tasks

Example:

  • AI checks airline databases

  • Books a flight automatically
    Like a smart assistant with hands.


12. Context Engineering

  • Managing everything AI knows before responding:

    • Past chats

    • User preferences

    • Documents

    • External data

  • The practice of managing user preferences and summarizing long chat histories to keep the model's "memory" efficient

Example:

  • Summarizing old conversations

  • Remembering your preferences
    Like a human assistant with memory.


13. Agents

  • Long-running AI systems that plan and act autonomously.

  • Can use tools, APIs, and other agents.

  • Long-running processes that can query LLMs and external systems independently to complete a user's goal

Example:

  • Travel agent AI books flights, hotels, and sends emails
    Like a personal secretary that never sleeps.


14. Reinforcement Learning (RLHF)

  • AI improves by learning which answers humans prefer.

  • Good answers are rewarded; bad ones are penalized.

  • Training models by having humans rank responses, rewarding "good" paths and penalizing "bad" ones to improve the user experience

Example:

  • ChatGPT asks: Which answer is better?

  • Your choice trains the model
    Like training a dog using rewards.


15. Chain of Thought

  • AI reasons step by step instead of guessing.

  • Improves accuracy for complex problems.

  • Training a model to break down complex problems step-by-step, which significantly improves the quality of its reasoning

Example:

  • Solving math step-by-step

  • Explaining reasoning clearly
    Like showing your work in exams.


16. Reasoning Models

  • Advanced models that focus on logical problem-solving.

  • Can plan, infer, and analyze deeply.

Example:

  • Debugging code

  • Solving puzzles

  • Making strategic decisions


17. Multimodal Models

  • AI that works with text, images, audio, and video.

  • AI that can process and generate more than just text, including images and video

Example:

  • Counting objects in an image

  • Generating images or videos from text
    Like human senses combined.


18. Small Language Models (SLMs)

  • Smaller, cheaper, faster models for specific tasks.

  • Used by companies for privacy and control.

  • Models with fewer parameters (3M to 300M) than LLMs, making them cheaper and faster for specific company tasks

Example:

  • Customer support chatbot

  • Sales assistant
    Not smart at everything, but great at one thing.


19. Distillation

  • Teaching a small model using a large model as a teacher.

  • Keeps performance but reduces cost.

  • The process of creating a smaller "student" model that mimics a larger "teacher" model to reduce costs while maintaining performance

Example:

  • Senior employee trains a junior
    Junior does the job faster and cheaper.


20. Quantization

  • Reducing number precision to save memory and cost.

  • Used during deployment, not training.

  • Reducing the memory size of a model's internal weights (e.g., from 32-bit to 8-bit) to make it cheaper to run in production

Example:

  • Compressing a video without noticeable quality loss
    AI runs faster and cheaper.



Core Concepts

Large Language Model (LLM): If you give the model the phrase "all that glitters," it predicts the next sequence is "is not gold".

Tokenization: Instead of just breaking words by spaces, a model recognizes suffixes like "ing" in "eating" or "dancing" to understand that an action is being performed.

Vectorization: Words with similar meanings are placed close together in a coordinate space. For instance, "upset" would be mathematically closer to "low rating" than to "happy".

Attention: This helps the model know that "Apple" in the sentence "tasty apple" refers to a fruit, while "Apple" in "Apple's revenue" refers to the company.

Transformer: The "engine" of the AI "car". It uses layers of attention to find complex relationships like sarcasm or the implication that a "crab is fearful" when being hunted by a "crane".



Training & Learning

Self-Supervised Learning: Much like a human guessing a hidden number in a sequence (5, 4, 3, 2, __ ) or predicting where someone is looking in a video even if a part of the frame is blank.

Fine-tuning: Taking a general base model and training it on medical jargon so it can assist doctors with patient diagnoses.

Reinforcement Learning (RLHF): Similar to Pavlov’s dog, where behaviors are reinforced with rewards. In AI, if a user chooses "Response 1" over "Response 2," the path taken to create "Response 1" gets a "plus one" score.

Chain of Thought: Instead of giving a direct answer, the model is trained to reason step-by-step. For example, if a math problem is harder, a model like DeepSeek will take more steps to "think" through it.


Engineering & Implementation

Few-shot Prompting: When a user asks "Where is my parcel?", the system provides the AI with several examples of previous parcel queries and correct responses before the AI answers.

Retrieval Augmented Generation (RAG): A server fetches a company’s specific policy documents or "terms and conditions" in real-time and hands them to the LLM to ensure the answer is accurate to that company.

Model Context Protocol (MCP): An LLM acting as a client to connect to external servers, such as an Air India or Indigo database, to check real-time flight details and actually book a ticket for you.

Context Engineering: Using a "sliding window" to remember the last 100 chats perfectly while summarizing the previous 1,000 chats into just five sentences to save space.

Agents: A long-running travel agent process that monitors flight prices and automatically books a trip when it sees a "window of opportunity" based on your preferences.


Optimization & Specialization

Multi-modal Models: A model that doesn't just read "cat" but can also see an image of a cat and count how many apples are in a photo.

Small Language Models (SLM): A specific bot used by NASA that is an expert at weather analysis but cannot handle general sales queries.

Distillation: A "teacher" (LLM) produces an output, and a "student" (SLM) tries to mimic it. If the student fails, its weights are updated until it can match the teacher's quality with fewer resources.

Quantization: Condensing a 32-bit number into an 8-bit number to save 75% of memory, making the model much faster and cheaper to run in production.



Below is a visual, diagram-based explanation of how an LLM is trained.


Overall Training Pipeline (Big Picture)

Raw Text
   ↓
Tokenization
   ↓
Vectors (Numbers)
   ↓
Transformer (Attention + Neural Networks)
   ↓
Next-Token Prediction
   ↓
Loss Calculation
   ↓
Backpropagation (Learning)
   ↓
Trained LLM

Think of this as a factory line that turns text into intelligence.


Tokenization Diagram (Breaking Text)

Input sentence:

"AI is learning fast"

Tokenization:

AI | is | learn | ing | fast

Diagram:

Sentence
   ↓
[ AI ] [ is ] [ learn ] [ ing ] [ fast ]

Tokens are the smallest pieces of meaning the model works with.


Vectorization Diagram (Meaning → Numbers)

Each token becomes a vector (numbers):

AI      → [0.21, -1.34, 0.88, ...]
learn   → [0.25, -1.30, 0.91, ...]
fast    → [0.90,  0.10, 0.77, ...]

Conceptual Space:

Meaning Space (Vectors)

   cat ●────● dog
        \
         \
          ● AI
             \
              ● learning

Similar meanings are close together.


Transformer + Attention Diagram (Core Brain)

Without Attention (old models)

Word → Word → Word → Word

With Attention (Transformers)

           ┌──────────┐
Token 1 ───► Attention│
Token 2 ───► Attention│───► Contextual Meaning
Token 3 ───► Attention│
           └──────────┘

Attention Example:

"I ate an apple"

apple ──► fruit
"Apple released a phone"

apple ──► company

Attention decides what matters.


Next-Token Prediction Diagram

Training objective:

Input:  "The sky is"
Target: "blue"

Prediction flow:

[The] [sky] [is]
       ↓
Transformer
       ↓
Predicted next token → "blue"

LLMs learn by guessing the next word.


Loss Calculation Diagram (Error Measurement)

Predicted: "green"
Actual:    "blue"
Difference = Loss

Diagram:

Prediction ────┐
               ├──► Loss Function ───► Error Value
Actual     ────┘

Bigger mistake = bigger loss.


Backpropagation Diagram (Learning Happens Here)

High Loss
   ↓
Adjust Weights
   ↓
Lower Loss Next Time

Flow:

Output Error
   ↑
Neural Network Weights Updated
   ↑
Transformer Layers

Like correcting mistakes after an exam.


Self-Supervised Learning Diagram

No humans label data.

Original text:
"All that glitters is gold"

Training samples:
Input:  "All that glitters is"
Target: "gold"

Diagram:

Existing Text
     ↓
Mask / Shift
     ↓
Predict Missing Token

Text teaches itself.


Full Training Loop Diagram

Text Batch
   ↓
Tokenize
   ↓
Vectors
   ↓
Transformer
   ↓
Prediction
   ↓
Loss
   ↓
Backpropagation
   ↓
Repeat (Trillions of Times)

This loop runs on GPUs/TPUs for weeks or months.


Fine-Tuning & RLHF Diagram (Behavior Training)

Fine-Tuning

Question → Model → Answer
                   ↓
             Is this correct?
                   ↓
            Update Weights

RLHF

Answer A     Answer B
   ↓            ↓
Human picks better one
         ↓
     Reward / Penalty

This makes the model helpful and safe.


Final Mental Model (One Diagram)

Text
 ↓
Tokens
 ↓
Vectors
 ↓
Attention (Context)
 ↓
Transformer Layers
 ↓
Next Word Prediction
 ↓
Human Feedback
 ↓
Aligned LLM


No comments:

Post a Comment