Tuesday, March 17, 2026

Tokens: concrete examples

What Are Tokens?

What do tokens look like in practice?

Tokens are the chunks of text a model processes. The exact splitting depends on the tokenizer, but here are typical examples using OpenAI’s cl100k_base tokenizer (used by GPT‑4). Each highlighted segment represents one token.

Common words and short phrases

The cat sat on the mat.

This sentence uses eight tokens. Notice the space before “cat” is attached to the word token—most tokenizers keep spaces with the following word. The period at the end is its own token.

Complex or compound words

unbelieveable becomes three tokens because “unbelievable” is rare enough that the tokenizer breaks it into common subwords. Similarly, tokenization splits into two.

Numbers and punctuation

2025 is often a single token if it appears frequently in training data, but 3.14159 might split into 3.14159 or similar, depending on how the tokenizer was trained. Punctuation like ? ! and , usually get their own tokens.

Whitespace and special characters

Multiple spaces or newlines are often collapsed or turned into special tokens. A line break might be \n or encoded as part of a token. Emojis are typically single tokens: 🚀 🔥.

A longer sentence

Tokenization is how neural networks see text. This example contains nine tokens. The initial space before “Tokenization” is included because the text started without a preceding word—tokenizers are trained to expect a certain pattern of spaces.

Why this matters

Counting tokens explains why a long word like “electroencephalographic” might use five or six tokens while a short word like “a” uses one. It also affects billing: an API call that sends 1000 tokens and receives 500 tokens back costs roughly 1500 tokens. Models also have token limits—if a conversation exceeds 128,000 tokens, the oldest messages are dropped.


Tokens are the bridge between human language and mathematical vectors. Every word you read from an AI was once a sequence of token IDs being processed in parallel.

No comments:

Post a Comment

Jyotisha: Iran – Vimshottari Dasha (1978–2030) Jyotisha & the Dasha of Iran (1978 – 2030 · Vimsh...