9. AI Tokens - Explained

Note: The video covers material not in the guide below — please watch in full.

Action Step

Complete this before moving on.

Pause the video and try this yourself. Point Claude at a large folder (like LeanScale Context) and tell it to read every file in there. Watch the token counter at the bottom climb past 50%. When the context warning shows up, click it to manually compact. Read the summary it creates — notice what survived and what got lost. That's the telephone game in action.

Comment in Slack

Post your answer in your onboarding channel.

What was your biggest takeaway(s) from this training?

Training Guide

You just learned that context is everything — the more relevant information you give the AI, the better it performs. You also learned that context has a limit.

This training is about that limit — what it is, how it works, and what happens when you hit it.

(Let's start with how the AI actually sees your messages)

What Tokens Are

When you type a message, the AI doesn't read words the way you do. It breaks everything down into chunks called tokens. One token is roughly 3-4 characters. So the word "LeanScale" is about 3 tokens. A full sentence is maybe 15-20 tokens. A page of text is around 500.

Every single thing in your conversation costs tokens:

Your prompts
The AI's responses
Every file you attach
The full conversation history — every message you've sent and every response you've received, all the way back to the start

That last one is the one people miss. It's not just the current message that costs tokens — it's EVERYTHING in the conversation. The AI re-reads the entire history every time it responds. So as your conversation gets longer, each response costs more.

(OK — so how many tokens do you actually get?)

Model Makers and Context Windows

Every AI model comes with a context window — the maximum number of tokens it can hold in one conversation. Think of it as working memory. It's how much the AI can keep in its head at once.

Different models have different-sized windows:

Claude (what we use) — 200,000 tokens. That's roughly 500 pages of text.
GPT — varies by model, but the latest ones are around 128,000 tokens
Gemini — Google claims up to 1 million tokens on their largest model

These numbers change as companies release new versions — the trend is always bigger. But bigger isn't always better. Some models with massive context windows get worse at recalling specific details buried in the middle. More room doesn't automatically mean more accurate — it just means more room. How well the AI actually uses that room matters just as much as the size of it.

200,000 tokens sounds massive — and it is. You can do serious work before hitting the wall. But if you're attaching long documents, having detailed back-and-forth, and running through multiple tasks, it adds up faster than you'd think. A 50-page transcript alone can eat 15,000 tokens. Do that a few times, plus the AI's responses stacking up — and suddenly you're halfway through your budget.

(So what happens when you actually hit the wall?)

The Telephone Game

Remember the telephone game as a kid? You whisper a message down a line of people, and by the end it's completely different from how it started.

That's what happens to your AI conversation when it runs out of tokens.

When you hit the limit, the AI does something called compacting. It takes the entire conversation — every message, every instruction, every nuance — and summarizes it. That summary becomes the new "memory." Everything else is gone.

And here's the problem: if you keep working and hit the limit again, it summarizes the summary. Now you're two rounds of telephone deep. The big picture survives, but the details? The specific instructions? The exact tone you spent 20 minutes getting right? Those start to disappear.

This is why people get frustrated with AI on long projects. It's not that the AI got dumber. It's that the context got compressed, and the nuance got lost.

(Let's actually see this happen)

Context Window Demo

Let's make this real. Look at the bottom of your Claude Code panel — you'll see a token counter. It shows how much of your 200,000 token budget you've used in this conversation.

Now think about what we've done in these trainings. Every message you've sent, every response you've read, every file you've attached — all of it is sitting in that counter right now.

Watch what happens when you compact. Type /compact in Claude Code. The AI will summarize the entire conversation into a short summary. Watch the token counter drop. That's the tradeoff — you got space back, but everything that was in the conversation is now condensed into a paragraph or two.

Read the summary it created. Notice what survived and what didn't. The big themes are there. The specific wording of a prompt you gave three messages ago? Probably not. That's the telephone game in action.

(Now here's something that trips people up — there are actually two different token limits)

Session Context Window vs Account Usage

There are two different limits, and they measure completely different things.

Your context window is 200,000 tokens per conversation. That's the AI's working memory — how much it can hold in its head at once. When this fills up, it compacts. This is what we just covered.

Your account usage limit is how much you can use per month on your subscription plan. That's across ALL your conversations — every session, every day. When this fills up, you get rate-limited until it resets or you upgrade.

Different limits, different consequences. Context window filling up = the AI forgets details in that conversation. Account limit filling up = you can't send messages until it resets or you upgrade.

Don't mix them up. When someone says "I ran out of tokens," ask which kind — because the fix is completely different. Context window? Save your work and start a new conversation. Account limit? Wait for the reset, or upgrade your plan.

You can check your account usage in your Claude settings — it's worth glancing at occasionally so you're not surprised mid-project.

(One more thing worth understanding — how all of this connects to money)

Token Pricing and Billing

Tokens aren't just a technical concept — they're also how AI companies charge for usage.

There are two ways to pay for AI:

Subscription (what you have). You pay a flat monthly fee — Claude Pro, ChatGPT Plus, etc. That gives you a certain amount of usage per month. If you use a lot, you might hit a rate limit and have to wait before sending more messages. You can upgrade to a higher tier for more usage.

API pricing. Instead of a subscription, you pay per token — a tiny fraction of a cent per token in, a slightly higher fraction per token out. No monthly fee, just metered usage. This is how companies build AI into their products. You won't use the API directly, but it's worth knowing it exists — when someone says "API costs," this is what they mean.

For us, the subscription is the better deal. Flat cost, predictable, and you don't have to think about individual token counts. The API model is for teams building software that sends thousands of requests a day.

(So now you understand what tokens are, how the context window works, what happens when it fills up, and how billing works. The next training is about what to DO about all of this — how to manage your context so you never lose your work)

Training Guide

What Tokens Are​

Model Makers and Context Windows​

The Telephone Game​

Context Window Demo​

Session Context Window vs Account Usage​

Token Pricing and Billing​