8. How AI Works
Note: The video covers material not in the guide below — please watch in full.
Action Step
Complete this before moving on.
Ask Claude to explain what LLMs are, then ask it to simplify into one paragraph. Save that learning to a file in your repo using the copy path workflow. Then give Claude the exact same prompt twice and compare the two responses — notice what changed and what stayed the same. If you love an output, save it right away — you won't get the same thing again.
Training Guide
You've been using this thing for a few trainings now. You've configured it, you've watched it create documents, you've seen it rewrite a brief in your brand voice. It works.
But what IS it? What's actually happening when you type a message and get a response back?
You don't need to know this to use it. But knowing it makes you better at it — and it makes the next two trainings on tokens and sub-agents make a lot more sense.
(Let's start with what you're actually talking to)
What an LLM Is
When people say "AI" right now, they mostly mean one specific thing: a Large Language Model, or LLM. That's what Claude is. That's what ChatGPT is. That's what Gemini is. They're all LLMs.
Here's whaan LLM actuly does: it predicts the next word.
That's it. You give it text, and it predicts what should come next — word by word, one at a time, until it's built a full response. It's doing this thousands of times per second.
You already use something like this every day. When you text someone on your phone and it suggests the next word — that's the same concept. Your phone is predicting what you're about to type based on patterns in how people write.
An LLM is that, but trained on basically the entire internet. Books, articles, code, conversations, research papers — hundreds of billions of words. So instead of predicting "sounds good" after "that," it can predict entire paragraphs of coherent, contextual, useful text.
It's not thinking the way you think. It's pattern-matching at a scale that's hard to comprehend. And that pattern-matching is so good that it feels like intelligence — but it's a fundamentally different process than what's happening in your head.
(So if it's just predicting patterns, where do the models come from?)
The Model Landscape
Every major tech company is building their own LLMs. You already know the big three:
- Claude (made by Anthropic) — what we use. Best for knowledge work, writing, and conversation.
- GPT (made by OpenAI) — powers ChatGPT. Strong at coding and general tasks.
- Gemini (made by Google) — their latest models. Strong on reasoning benchmarks.
You picked Opus in the settings training. That's Anthropic's most powerful Claude model. Each company has tiers — powerful models for heavy work, lighter models for quick tasks. Same idea as car engines: there's a V8 and a 4-cylinder, and you pick based on the job.
But here's something most people don't know: not all models are behind a paywall.
There are open source models — LLMs that are free, publicly available, and anyone can run them. Meta has Llama. Mistral (one of our clients, actually) has their own models. There are dozens of others.
Open source models are generally less powerful than the big proprietary ones right now, but the gap is closing fast. They matter because they mean AI isn't locked inside three companies. The technology is spreading, and the ecosystem is much bigger than Claude vs ChatGPT vs Gemini.
You don't need to use open source models at LeanScale right now. But when someone mentions Llama or Mistral in a conversation, now you know what they're talking about.
(OK — so you know what an LLM is and where the models come from. Here's something important about how they behave)
Why the Same Prompt Gives Different Answers
Try this sometime: give Claude the exact same prompt twice. You'll get two different responses.
This isn't a bug. It's by design.
There's randomness built into how LLMs generate text. When the model is predicting the next word, it doesn't always pick the single highest-probability word — it samples from the top options. This is what makes the output feel natural and varied instead of robotic and repetitive.
What this means practically:
- Don't like the output? Ask again. You might get something better.
- Got something you love? Save it. You might not get the exact same thing twice.
- Run the same task three times and you'll get three slightly different versions. Pick the best one, or combine parts from each.
There's a flip side to this: hallucinations. Because the model is predicting based on patterns — not looking things up — it can confidently state something that's completely wrong. It doesn't "know" things the way you know things. It generates text that looks right based on patterns it's seen.
For factual claims — dates, numbers, names, stats — always verify. Especially when it matters. The AI is an incredible first draft machine, but it's not a source of truth.
(Now here's the part that connects everything you've been doing to what comes next)
Why Context Is Everything
The AI doesn't know you. It doesn't know LeanScale. It doesn't know your client. It doesn't know what you worked on yesterday. Every conversation starts from zero.
The only thing it knows is what you give it in the conversation.
This is why file system access — what you learned in the last training — matters so much. When you pointed the AI at the NovaPay transcript, the research template, and the brand tone guide, each one of those files was context. Each one made the output better. Without the transcript, it couldn't extract the client details. Without the template, it wouldn't know the format. Without the brand guide, it wouldn't write like LeanScale.
Context is the single biggest lever you have. The same AI with bad context gives you generic garbage. The same AI with great context gives you something that feels like it was written by someone who's been at LeanScale for a year.
But here's the thing: context has a limit. There's a budget. Every file you attach, every message you send, every response the AI generates — it all takes up space in something called the context window. That budget is measured in tokens — and when you run out, the AI starts forgetting.
Managing that budget — knowing how much room you have, when to save your work, when to start fresh — is one of the most important skills you'll learn. (That's what the next training is all about)
And there's actually a way around the limit. The AI can spin up copies of itself — each copy gets its own fresh budget, works independently, and reports back. (Those are called sub-agents, and they get their own training too)
Comment in Slack
Post your answer in your onboarding channel.
What was your biggest takeaway(s) from this training?