> For the complete documentation index, see [llms.txt](https://docs.blockbrain.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.blockbrain.ai/for-users/all-about-llms/blockbrain-llm-selection-guide.md).

# Blockbrain LLM Selection Guide

### How Blockbrain Measures Usage&#x20;

Every action on Blockbrain - messages, file uploads, or agent runs - consumes Compute Blocks (CBs), a transparent, usage-based metric reflecting the actual computational cost of each operation. CBs are primarily driven by LLM token usage, and their consumption directly mirrors each model's input/output token pricing.&#x20;

> **A more expensive model = proportionally higher CB usage.**&#x20;

### Default Recommendation&#x20;

| Model                           | Quality | Speed | Cost Eff. | Pricing (Input/Output per 1M tokens) |
| ------------------------------- | ------- | ----- | --------- | ------------------------------------ |
| Gemini 2.5 Flash                | 4.3     | 4.8   | 4.6       | $0.50 / $3.00                        |
| GPT 5.4 Mini (budget alt.)      | 4.3     | 4.5   | 4.2       | $0.40 / $1.60                        |
| Claude Haiku 4.5 (quality alt.) | 4.2     | 3.6   | 4.0       | $1.00 / $5.00                        |

> &#x20;**Gemini 2.5 Flash** is the best all-around choice — excellent quality, fast, cost-efficient, with a 1M token context window.&#x20;

### Quick Decision Matrix&#x20;

| Priority           | Primary Pick                    | Budget Option                   | Premium Option          |
| ------------------ | ------------------------------- | ------------------------------- | ----------------------- |
| Everyday use       | Gemini 2.5 Flash                | GPT 5.4 Mini                    | Gemini 2.5 Pro ($2/$12) |
| Max cost savings   | GPT 4o Mini ($0.15/$0.60)       | GPT 5.4 Mini                    | Gemini 2.5 Flash        |
| Highest quality    | Claude Opus 4.8 Max ($5/$25)    | Gemini 2.5 Pro                  | GPT 5.5 Pro ($5/$30)    |
| Fastest response   | Claude Sonnet 4.6 Fast ($3/$15) | Gemini 2.5 Flash                | GPT 5.4 Low Thinking    |
| Creative & writing | Claude Sonnet 4.6 ($3/$15)      | Claude Haiku 4.5                | Claude Opus 4.8         |
| Code development   | GPT 5.3 Codex ($1.75/$14)       | Mistral Codestral ($0.30/$0.90) | Claude Opus 4.8         |
| Complex reasoning  | o3 ($2/$8)                      | o4 Mini ($1.10/$4.40)           | GPT 5.5 Pro             |

### Key Considerations&#x20;

* **Output tokens cost more than input tokens** - prioritize low output-cost models for reports, docs, and code generation.&#x20;
* **1M context windows** (Gemini 2.5 Flash/Pro, Claude Sonnet 4.6) hold \~3,000 pages of text - critical for document analysis and long conversations.&#x20;
* **Premium models add up fast**: 1,000 queries cost \~$0.68 with GPT 5.4 Mini vs. \~$10.00 with Claude Opus 4.8.&#x20;

### Real-World Cost Examples for LLM Queries and Recommendations <a href="#heading-title-text" id="heading-title-text"></a>

#### Category: Direct LLM Calls <a href="#category-direct-llm-calls" id="category-direct-llm-calls"></a>

#### 1 - Claude Opus 4.8 <a href="#id-1-claude-opus-4.8" id="id-1-claude-opus-4.8"></a>

* **Query example**: "Turn these rough workshop notes into a polished strategic recommendation memo for the steering committee, in situation-complication-resolution structure" + attached raw workshop notes (\~1.5hr strategy session, messy live notes)
* **Input tokens**: \~1,640
* **Output tokens**: \~1,000
* **Input token cost** (per 1M): $5.00
* **Output token cost** (per 1M): $25.00
* **Compute Blocks**: \~3,320
* **Cost** (at 30 € / 1M Compute Blocks): \~€0.10
* **Recommendation**: **Gemini 3.1 Flash-Lite** - it came in at \~101 CBs (\~€0.003), roughly 33x cheaper than Opus. But it's not a clean win: Gemini's memo dropped structure the Opus version included. Claude Sonnet 4.6 (Example 1.1) is a safer middle-ground swap at \~1.67x fewer CBs with comparable depth; Gemini is worth it only if the shorter, less detailed format is actually sufficient for this use case.

#### 1.1 - Claude Sonnet 4.6, same query as 1 <a href="#id-1.1-claude-sonnet-4.6-same-query-as-1" id="id-1.1-claude-sonnet-4.6-same-query-as-1"></a>

* **Query example**: Same as Example 1
* **Input tokens**: \~1,640
* **Output tokens**: \~1,000
* **Input token cost** (per 1M): $3.00
* **Output token cost** (per 1M): $15.00
* **Compute Blocks**: \~1,992
* **Cost** (at 30 € / 1M Compute Blocks): \~€0.06
* **Recommendation**: **GPT 5 Mini** came as \~9x cheaper than Opus and \~5.7x cheaper than Sonnet, while producing a *more* thorough memo than either Claude model - the opposite trade-off from Gemini 3.1 Flash Lite, which was cheaper still but less detailed. Of the three alternatives tested against Opus on this exact query, GPT-5 Mini currently looks like the best value: most of the cost savings without giving up detail.

#### 2 - Claude Opus 4.8, multi-turn / multi-document analysis <a href="#id-2-claude-opus-4.8-multi-turn-multi-document-analysis" id="id-2-claude-opus-4.8-multi-turn-multi-document-analysis"></a>

* **Query example**: Two-turn conversation. Turn 1: analyze a freight contract's liability clause against a court ruling and a firm precedent memo, draft a structured legal memo (issue / applicable law / analysis / risk / recommendation). Turn 2, same thread: "argue the counterposition - why might this be enforceable despite the risk?"
* **Input tokens**: \~5,199 combined (turn 1: 3 source documents + instruction, \~1,785; turn 2: the entire turn 1 input+output re-sent as context, \~3,374, + new instruction, \~40)
* **Output tokens**: \~2,807 combined (turn 1 memo \~1,589 + turn 2 counterposition \~1,218)
* **Input token cost** (per 1M): $5.00
* **Output token cost** (per 1M): $25.00
* **Compute Blocks**: \~9,617
* **Cost** (at 30 € / 1M Compute Blocks): \~€0.29
* **Recommendation**: For this level of legal reasoning quality, **GPT-5** is a plausible alternative, given its current strength on structured reasoning tasks - but a lighter model like Gemini Flash is probably **not** appropriate here given the accuracy bar for legal analysis. The bigger lever regardless of model is conversation length: turn 2's short prompt still re-sends all of turn 1 as context, so cost compounds with every follow-up turn.

#### 2.1 - Claude Sonnet 4.6, same two-turn conversation as 2 <a href="#id-2.1-claude-sonnet-4.6-same-two-turn-conversation-as-2" id="id-2.1-claude-sonnet-4.6-same-two-turn-conversation-as-2"></a>

* **Query example**: Same as 2
* **Input tokens**: \~5,503 combined (turn 1: same 1,785 as Opus's version; turn 2: turn 1 input+output re-sent, \~3,678, + new instruction, \~40)
* **Output tokens**: \~3,402 combined (turn 1 memo 1,893 + turn 2 counterposition 1,509 - both real, and both longer than Opus's equivalents of \~1,589 and \~1,218)
* **Input token cost** (per 1M): $3.00
* **Output token cost** (per 1M): $15.00
* **Compute Blocks**: \~6,754
* Cost (at 30 € / 1M Compute Blocks): \~€0.203
* **Recommendation**: **Switch to GPT-5.** Despite being significantly cheaper than Claude Opus (2), Sonnet 4.6 paradoxically produces *longer* outputs (3,402 vs. \~2,807 tokens combined), driving up costs without a clear quality benefit. At \~€0.203 per two-turn conversation, it remains more expensive than GPT-5 for comparable or superior output. See section 2 for full cost comparison.

#### 3 - Claude Opus 4.8, quantitative data reconciliation <a href="#id-3-claude-opus-4.8-quantitative-data-reconciliation" id="id-3-claude-opus-4.8-quantitative-data-reconciliation"></a>

* **Query example**: Given three KPI/data documents (a quarterly metrics table, a segment breakdown, and a leadership commentary memo), identify the 2-3 biggest strategic risks, back each with specific figures and quarter-over-quarter trends, reconcile the ARR figures across documents and flag any discrepancy, and compare leadership's narrative against what the data actually shows.
* **Input tokens**: \~1,284 (3 source documents + instruction)
* **Output tokens**: \~1,233
* **Input token cost** (per 1M): $5.00
* **Output token cost** (per 1M): $25.00
* **Compute Blocks**: \~3,725
* **Cost** (at 30 € / 1M Compute Blocks): \~€0.11
* **Recommendation**: **GPT-5 Nano** essentially matched the substance of this analysis - same 3 risks identified, same $0.6M ARR discrepancy calculated, same severity ranking - at \~61x fewer CBs than Opus (\~61 CBs vs. \~3,725).

#### 3.1 - Claude Sonnet 4.6, same query as 3  <a href="#id-3.1-claude-sonnet-4.6-same-query-as-3" id="id-3.1-claude-sonnet-4.6-same-query-as-3"></a>

* **Query** **example**: Same as 3
* **Input** **tokens**: \~1,284
* **Output** **tokens**: 1,208
* **Input** **token** **cost** (per 1M): $3.00
* **Output** **token** **cost** (per 1M): $15.00
* **Compute** **Blocks**: \~2,197
* **Cost** (at 30 € / 1M Compute Blocks): \~€0.066
* **Recommendation**: **Gemini Flash-Lite or GPT-5 mini-tier.** At \~€0.066 per query, Sonnet 4.6 is already cost-efficient - but arithmetic reconciliation is a structured, rule-bound task that doesn't require frontier reasoning. Lighter models may match accuracy at a fraction of the cost.

***

#### Category: Web Search <a href="#category-web-search" id="category-web-search"></a>

Web search is powered by a search provider (Tavily / Linkup / Perplexity) plus a synthesis LLM. The numbers below are real for these specific example runs; a different query could cost more or less depending on how many sources it ends up reading and which provider/model handles synthesis.

#### 1 - EU AI Act research query <a href="#id-1-eu-ai-act-research-query" id="id-1-eu-ai-act-research-query"></a>

* **Query example**: "What are the latest changes to the EU AI Act that could affect enterprise AI vendors like us?" - ran 3 search queries, read 6 sources (via Linkup, EU-hosted)
* **Input tokens**: \~6,000–8,500 (assumes \~900–1,300 tokens/source × 6 sources)
* **Output tokens**: \~911
* **Input token cost** (per 1M): $3.00 (model: Sonnet 4.6)
* **Output token cost** (per 1M): $15.00
* **Compute Blocks**: \~3,200–3,900
* **Cost** (at 30 € / 1M Compute Blocks): \~€0.10–€0.12
* **Recommendation**: **Prompting angle:** this query used broad, open-ended phrasing ("what are the latest changes") - a more tightly scoped version, e.g. "What EU AI Act changes were published since June 2026 specifically affecting enterprise vendors?", would likely reduce how many search queries the agent runs and how many sources it decides it needs to read, directly lowering CB cost. Open-ended research prompts tend to trigger more tool calls than a narrowly-scoped question with the same intent.

#### 2 - EUR/USD exchange rate <a href="#id-2-eur-usd-exchange-rate" id="id-2-eur-usd-exchange-rate"></a>

* **Query example**: "What's today's EUR/USD exchange rate?" - ran 1 search query, read 1 source ([Investing.com - Stock Market Quotes & Financial News](http://investing.com/) )
* **Input tokens**: \~300–650
* **Output tokens**: \~64
* **Input token cost** (per 1M): $3.00
* **Output token cost** (per 1M): $15.00
* **Compute Blocks**: \~190–290
* **Cost** (at 30 € / 1M Compute Blocks): \~€0.006–€0.009
* **Recommendation**: **Prompting angle:** this prompt is already about as tightly scoped as possible (single fact, no ambiguity about scope or timeframe) — a good reference example of what an efficiently-scoped Websearch prompt looks like, with little room to reduce tool calls further.

#### 3 - AI vendor liability / MCP tool-calling research <a href="#id-3-ai-vendor-liability-mcp-tool-calling-research" id="id-3-ai-vendor-liability-mcp-tool-calling-research"></a>

* **Query example**: "Has there been any regulatory guidance published on AI vendor liability for MCP-style tool-calling architectures?" - ran 3 search queries, read 8 sources (the most of the three examples)
* **Input tokens**: \~7,200–10,400
* **Output tokens**: \~819
* **Input token cost** (per 1M): $3.00
* **Output** **token** **cost** (per 1M): $15.00
* **Compute** **Blocks**: \~3,390–4,350
* **Cost** (at 30 € / 1M Compute Blocks): \~€0.10–€0.13
* **Recommendation**: Landed close to Example 1's cost despite reading more sources (8 vs. 6), because its output was shorter. **Prompting angle:** the open-ended phrasing ("has there been any guidance") likely drove the agent to cast a wide net across more sources than a narrower question would need — asking a more specific sub-question, e.g. "Does the EU AI Act specifically address MCP-style tool-calling architectures?", would likely cut the source count and therefore the CB cost, at the risk of missing broader context a wider search might surface.

***

### Category: Outlook Agent <a href="#category-outlook-agent" id="category-outlook-agent"></a>

Cost depends on which tool(s) the agent ends up calling and how much it needs to reason, which isn't fixed in advance.

#### 1 - Calendar conflict check <a href="#id-1-calendar-conflict-check" id="id-1-calendar-conflict-check"></a>

* **Query** **example**: "Check my calendar for any conflicts next Tuesday afternoon" - agent reasoned about the date, called the Outlook calendar tool, and returned a conflict analysis
* **Input** **tokens**: \~1,700–3,500
* **Output** **tokens**: \~400
* **Input** **token** **cost** (per 1M): $3.00
* **Output** **token** **cost** (per 1M): $15.00
* **Compute** **Blocks**: \~1,100–1,650
* **Cost** (at 30 € / 1M Compute Blocks): \~€0.03–€0.05
* **Recommendation**: **Prompting angle:** this prompt is already narrowly scoped (one date, one time window) - good example of an efficiently-scoped agent prompt, with little room to reduce tool calls further here.

#### 2 - Mailbox search <a href="#id-2-mailbox-search" id="id-2-mailbox-search"></a>

* **Query** **example**: "Show me all emails mentioning SharePoint connector from the last two weeks" - agent calculated the date range, called the mailbox search tool, returned 3 matching emails with a summary
* **Input** **tokens**: \~1,700–3,500
* **Output** **tokens**: \~411
* **Input** **token** **cost** (per 1M): $3.00
* **Output** **token** **cost** (per 1M): $15.00
* **Compute** **Blocks**: \~1,150–1,700
* **Cost** (at 30 € / 1M Compute Blocks): \~€0.03–€0.05
* **Recommendation**: Nearly identical cost as first use-case in this category - confirms simple, single-tool-call Outlook actions cost similarly regardless of which specific tool is invoked. **Prompting angle:** the specific keyword and explicit time window kept this cheap; a vaguer version like "show me anything about SharePoint" with no date range could force the agent into a much broader (and more expensive) mailbox scan.

#### 3 - Complex multi-step workflow request <a href="#id-3-complex-multi-step-workflow-request" id="id-3-complex-multi-step-workflow-request"></a>

* **Query** **example**: "When an email arrives from the Product group, classify it by workstream, extract action items into a structured list, log them to my tracker, draft a reply if it's routine, and notify me in Teams."
* **Input** **tokens**: \~1,550–3,050
* **Output** **tokens**: \~950
* **Input** **token** **cost** (per 1M): $3.00
* **Output** **token** **cost** (per 1M): $15.00
* **Compute** **Blocks**: \~1,890–2,340
* **Cost** (at 30 € / 1M Compute Blocks): \~€0.06–€0.07
* **Recommendation**: **Prompting angle:** this prompt bundled six distinct asks into one request. Splitting this into requests the agent can actually fulfill - e.g. "classify and log action items from emails I forward to you," dropping the real-time trigger and Teams pieces - would avoid the capability-limited dead end entirely and produce a much shorter, cheaper, and more useful response.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.blockbrain.ai/for-users/all-about-llms/blockbrain-llm-selection-guide.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.