LLM Settings Explained: Temperature, Top-P, and Max Tokens
Control AI output using temperature, top-p, and max tokens. Practical LLM settings guide for beginners who want predictable, consistent results.
Imagine a friendly DJ who is building a playlist for your party. You tell the DJ a theme, and the DJ then chooses the next song again and again. If the DJ plays only safe hits, the music feels steady but maybe a little dull. If the DJ experiments too much, the party might drift. Great DJs use a few simple choices to keep the room happy.
Large language models work in a similar way. They pick the next token again and again, and a few simple settings guide those picks.
What are LLM settings? Temperature, top-p, and max tokens are parameters that control how your AI generates responses. Temperature controls randomness (low = predictable, high = creative). Top-p controls word-selection diversity. Max tokens sets the response length limit. Together, these settings shape the feel and style of every answer you get.
In this lesson, you will learn the main settings you can adjust for large language models from providers like OpenAI, Anthropic, or Google. You will also learn how they change the feel of the answer and which recipes to use for common tasks.
š Julley, Iām Dheeraj and Iām an AI systems builder.
I build production-grade AI systems at work by day and ship my own products by night (9+). This newsletter is the bridge between those two worlds. Every system, every build, documented step by step.
Join 1,100+ builders getting the exact AI setups, prompts, and production configs that actually work in your business.
Prompt Engineering Course - Complete Series
LLM Settings Explained: Temperature, Top-P, and Max Tokens ā You are here
Get Reliable JSON from LLMs: Structured Output Prompting Guide
Prompt vs. Settings: What Controls What?
Your prompt shapes content, while the LLM model settings shape style and variation. Think of it like this:
Prompt says what to cook
Settings decide heat and spice
We focus on the five most useful controls:
Model choice and context window
Output length and stop markers
Temperature
Top-p and Top-k
Frequency and presence penalties
By the end of this lesson, you will know when to touch each one and when to leave it alone.
These settings are available when you use AI models through APIs or special playgrounds, not in the standard ChatGPT or Claude chat interfaces. To follow along and play with these settings, you can access the OpenAI Playground or Google AI Studio. Both offer free trials, though you may need to create an account.
Model Choice and Context Window in LLM
Different generative AI models have different strengths. Some are fast, some handle bigger prompts, and some follow rules better. For beginner work, you can treat model choice like choosing a car class: a small city car for quick trips, a family car for comfort and storage, or a van for big cargo.
The context window is the space the model uses to read your prompt and its own answer. As we covered in the first lesson on how LLMs work, if your prompt fills most of that space, there is little room left for the reply. This one habit - keeping prompts short and focused - makes outputs cleaner right away.
Practical tips:
Use one or two short examples instead of many
Remove repeated info from long prompts
If you must paste long text, add a short instruction that tells the model which parts matter
Commercial break: Claude Code Builder cohort
The founding batch of my Claude Code cohort starts June 20 on Maven. Six live Saturdays. You bring your business problem, we build the system.
Only 12 Seats. When theyāre gone, the founding price ($797) closes and Cohort 2 opens at $1,597.
Use code GENAI20 for 20% off. Expires June 19. Check the Syllabus ā
Output Length and Stop Markers
You can set a maximum length for the answer. Use it to keep outputs tight. If an answer cuts off, you can ask it to continue, but the goal is to guide the model so it finishes in one go.
Stop markers are special strings that tell the model when to stop. They are helpful when you want only the content between two markers. For example, if you ask for JSON, you can add a stop marker after the closing brace. This reduces extra text. You can use </finish>, STOP, or END as stop sequences too.
Practical tips:
Always set a length hint in your prompt
For strict formats, give an example format and a stop marker
What Is Temperature Setting in LLM (And What Does It Control)?
Temperature controls how adventurous the model is when picking the next token. The scale runs from 0.0 to 2.0, though values above 1.0 often produce unpredictable or nonsensical results and are rarely useful. Low temperature means careful and steady, while high temperature means creative and varied.
Temperature = 0 to 0.3: The model gives stable and focused answers. Great for math steps, extraction, and coding. Your DJ is playing only the proven classics.
Temperature = 0.4 to 0.7: Balanced and useful for many tasks. Good for summaries, planning, and helpful writing. Your DJ mixes familiar favorites with a few pleasant surprises.
Temperature = 0.8 to 1.0: Creative and surprising. Good for brainstorming, creative thinking, and story ideas. Your DJ is experimenting with unexpected combinations.
Temperature above 1.0: Highly experimental and often chaotic. The output can become unpredictable or lose coherence. Use with extreme caution or avoid entirely.
Temperature 0 vs 1: Real Examples
Here is the same prompt run at different temperatures so you can see the difference in practice:
Prompt: "Write a one line description of a small coffee shop in a quiet lane."
At low temperature (0.1): "A small coffee shop tucked in a quiet lane that serves warm drinks and smiles."
At medium temperature (0.5): "A cozy coffee shop down a quiet lane where the first sip feels like a calm morning."
At high temperature (0.9): "A tiny coffee nook hidden in a hush of brick where steam curls and time slows."
Each answer fits the prompt, but the tone and creativity changes with temperature. This is why I use temperature 0.7 for newsletter writing and drop to 0.2 for factual summaries - the difference in output quality for each use case is noticeable.
Common mistake: People turn up temperature to fix weak prompts. Start with a strong prompt first, then adjust temperature if the tone still needs a lift.
The Seed Parameter: Getting Repeatable Results
Some large language models let you set a seed parameter so the model gives the same result on repeated runs with the same inputs and settings. This is helpful for tests or demos.
A seed is like a starting number that tells the model which random path to take. If you use the same seed with the same prompt and settings, you will get the exact same answer every time. Think of it like a recipe code that always produces the same cake.
When to use temperature or seed in GenAI models:
If you care about exact repeatability, set a seed (if supported) to any number you choose
If you care about steady quality but not exact match, run with temperature near zero and skip the seed
What Is Top-P and When Does It Matter?
These two controls limit the pool of tokens the model can pick from at each step. Think of them as controlling the size of the crate your DJ pulls songs from. A smaller crate gives a safer vibe, while a larger crate invites variety.
Note: Most APIs use either Top-p or Top-k, not both at once. OpenAI and Anthropic use Top-p by default, while some other providers may use Top-k.
Top-p (nucleus sampling): The model looks at the most likely tokens and takes the smallest group whose total probability adds up to p. If p is 0.9, the model picks from a group that covers ninety percent of likely options.
Top-k: The model picks from only the top k tokens by rank. If k is 40, it considers only the forty most likely tokens.
When to adjust:
For stable work, leave Top-p near 0.9 and leave Top-k alone
For very careful work, lower Top-p a little (try 0.8)
For playful brainstorming, raise Top-p a little (try 0.95)
You do not need to change both at once. In most cases, you will set temperature and Top-p. Many people never touch Top-k.
Most people spend 30 to 60 minutes per new task type guessing the right temperature and top-p until something works. Across coding, extraction, writing, and brainstorming, that adds up fast.
Inside PluggedIn, there are pre-tuned settings recipes mapped to 7 task categories so you skip the guesswork entirely.
Frequency and Presence Penalties
These two settings help you reduce repetition in the output.
Frequency penalty: Punishes words that already appeared many times. If the word āamazingā shows up 3 times, the model gets more discouraged each time it tries to use it again. A positive value discourages repetition.
Presence penalty: Punishes words that appeared even once. After āamazingā appears once, the model is equally discouraged from using it again. A positive value encourages the model to discuss new topics and increases diversity.
You do not need to touch these often. Keep them at zero unless you see loops or echoes, then raise a small amount (try 0.3 to 0.5).
Example uses:
You ask for product name ideas and the model repeats the same word in many names. Add a small frequency penalty.
You want city names across different regions. Add a small presence penalty to encourage variety.
When to Use Each Setting
This is what competitors rarely tell you - not just what the settings do, but when to actually change them:
Low temperature (0.0-0.3): Data extraction, code generation, factual Q&A, structured outputs. Use when there is one right answer.
Medium temperature (0.4-0.7): Blog posts, emails, general writing, explanations. Use for most day-to-day tasks.
High temperature (0.8-1.0): Brainstorming, creative writing, generating variations. Use when you want options and diversity.
Rule of thumb: If there is one right answer, go low. If you want creativity and variety, go high.
For top-p: Leave it at 0.9 for most tasks. Only adjust if temperature alone is not getting you where you want.
For penalties: Start at zero. Add frequency penalty if you notice word repetition. Add presence penalty if the model keeps circling the same topics.
Starter Recipes for Common Tasks
Here are friendly defaults you can copy or reference. They are starting points, not magic numbers - adjust as you go.
Data extraction or code:
Temperature near zero (0.0 to 0.1)
Top-p around 0.9
No penalties
Clear schema and field rules in the promptSummaries and reports:
Temperature around 0.3
Top-p around 0.9
No penalties
Word limit and section names in the promptPlanning and analysis:
Temperature around 0.5
Top-p around 0.9
No penalties
Ask for numbered steps or a tableBrainstorming and ideas:
Temperature around 0.9
Top-p around 0.95
Small presence penalty (0.3 to 0.5) if repeats show up
Ask for many short options and variety rulesBlog posts and newsletters:
Temperature around 0.7
Top-p around 0.9
No penalties
Tone and structure rules in the promptTroubleshooting: Five Common Problems
Problem 1: Answer drifts off topic. Fix: Lower temperature a little, add a clear rule that restates the goal, and add a refusal rule for low confidence.
Problem 2: Answer is too short or cuts off. Fix: Add a word or token target in the prompt, increase max output length, and remove extra context to leave room.
Problem 3: Answer is boring or stiff. Fix: Raise temperature a little, raise Top-p a little, and add a style note or an example.
Problem 4: Answer repeats words or lines. Fix: Add a small frequency penalty and ask the model to avoid repeating phrases.
Problem 5: JSON breaks your parser. Fix: Lower temperature, add a strict schema and an example, tell the model to produce only JSON with no narrative text, and add a stop marker if supported. See the complete guide to getting reliable JSON from LLMs for a full breakdown.
Key Takeaways
Prompts shape content of LLM response, while settings shape style and variation
Keep prompts short so the answer has room inside the context window
Temperature changes how adventurous the model is (stay below 1.0 for most tasks)
Top-p and Top-k control the set of tokens the model can choose (most APIs use one or the other)
Frequency penalty punishes repeated words more each time they appear
Presence penalty punishes any word that appeared even once
Use simple starter recipes for extraction, summary, planning, ideas, writing, and code
Adjust one setting at a time and test with the same prompt
These settings work in API environments and playgrounds, not standard chat interfaces
Frequently Asked Questions
What does temperature mean in AI?
Temperature controls how random or predictable the AIās output is. At temperature 0, the AI always picks the most likely next word (deterministic). At temperature 1, it considers less likely words too (creative). Most tasks work best between 0.3 and 0.7. For factual extraction and structured data, use 0-0.3.
What is top-p in language models?
Top-p (nucleus sampling) limits the AI to only consider the most probable words that together add up to a certain probability threshold. Top-p of 0.9 means the AI picks from words covering 90% of the probability space, ignoring the least likely 10%. This prevents wildly unlikely word choices while still allowing variety.
What is the best temperature setting for ChatGPT?
It depends on the task. For factual responses and data extraction, use 0-0.3. For general writing and emails, use 0.5-0.7. For brainstorming and creative tasks, use 0.8-1.0. There is no single best setting - the right temperature depends on what you need the output to do.
What is the difference between temperature and top-p?
Temperature adjusts how bold the model is overall when picking the next word. Top-p limits the pool of words it can pick from. Temperature affects the full distribution, while top-p trims the edges. In practice, adjust temperature first. Only tune top-p if temperature alone is not giving you the right balance of consistency versus variety.
How do I make AI give more consistent answers?
Three things help: (1) Lower the temperature toward 0.1-0.3, (2) Write clearer, more specific prompts with explicit rules, (3) Use the same system prompt every time. Consistency comes from reducing ambiguity at both the prompt level and the settings level.
What does max tokens mean in ChatGPT?
Max tokens sets the maximum length of the AIās response. One token is roughly three-quarters of a word. If you set max tokens to 500, the response will cut off at around 375 words even if the answer is not complete. Always set max tokens high enough for your expected output, or set a target length in the prompt itself.
Get PluggedIn
You know what temperature and top-p do. What you need now are the pre-built recipes that tell you exactly which values to use for which task.
Without those recipes, you'll keep losing 30 to 60 minutes every time you start a new task type from scratch.
Get PluggedIn to go from manually guessing settings for each new use case to having pre-tuned temperature and top-p recipes for 7 task categories ready to copy
Whatās inside the Prompt Engineering Mastery Bundle:
Complete 9-lesson ebook (PDF)
7 niche-specific prompt packs (55+ prompts):
Customer support automation
Content creation on a budget
Client proposals & SOWs
Research & analysis
Email & communication
Sales & lead nurture
Operations & SOPs
Whatās Next
You now know how to guide the feel of the answer. In the next lesson, we will build core prompting patterns that always help. You will learn zero-shot, few-shot, and one-shot prompts and clean ways to set roles and styles so your results are reliable every time.










