# How to Choose the Right LLM

When using Blockbrain, understanding how computational resources are measured and allocated is essential for getting the most out of the platform. At the heart of this system lies a transparent and usage-based metric: **Compute Blocks (CBs)**.&#x20;

Every action at Blockbrain - sending a message, uploading a file, or running an agent - consumes Compute Blocks (CBs). CBs reflect the actual computational cost of each operation and are composed primarily of tokens used by Large Language Models (LLMs).&#x20;

CB usage directly mirrors the input and output token pricing of each LLM. For example, if Opus 4.6 is 66.67% more expensive than Sonnet 4.6 in terms of input and output token prices, its CB consumption will also be approximately 66.67% higher.&#x20;

***

### Default Recommendation for Company-Wide Use&#x20;

#### Primary Choice: Gemini 2.5 Flash&#x20;

Blockbrain Metrics:&#x20;

* Answer Quality: 3.2/5&#x20;
* Speed: 4.8/5&#x20;
* Cost Efficiency: 4.6/5&#x20;
* Context: 1M tokens | Provider: Vertex AI (EU)&#x20;

**Pricing**: $0.50 input / $3.00 output per million tokens&#x20;

Why this model? Excellent balance of quality, speed, and cost with a massive 1M token context window - ideal for diverse business use cases.&#x20;

#### Alternative #1: GPT 5.4 Mini&#x20;

Blockbrain Metrics:&#x20;

* &#x20;Quality: 4.3 | Speed: 4.5 | Cost Efficiency: 4.2&#x20;
* Context: 400k tokens | Provider: OpenAI (EU)&#x20;

**Pricing**: $0.40 input / $1.60 output per million tokens&#x20;

Why consider? Competitive quality at low cost - excellent for high-volume deployment.&#x20;

#### Alternative #2: Claude Haiku 4.5&#x20;

Blockbrain Metrics:&#x20;

* Quality: 3.6 | Speed: 3.6 | Cost Efficiency: 4.0&#x20;
* Context: 200k tokens | Provider: Vertex AI (EU)&#x20;

**Pricing**: $1 input / $5 output per million tokens&#x20;

Why consider? Highest quality among budget-tier models. Excellent for teams that need better reasoning while maintaining cost efficiency.&#x20;

***

### Scenario-Based Recommendations&#x20;

#### Highest Quality (Premium Tasks)&#x20;

**Winner: Claude Opus 4.6 Max  (VERY EXPENSIVE )**

* Blockbrain Rating: Quality 5.0 (highest in portfolio)&#x20;

> Consistency Note: The Blockbrain comparison table lists Claude Opus 4.6 at 4.5. The 5.0 rating here reflects the Max configuration's peak performance.&#x20;

* Pricing: $5 input / $25 output per million tokens&#x20;

> Cost Warning: At $25 per million output tokens, this costs 8x more than Gemini 2.5 Flash. For 10,000 responses/month (1,000 tokens each), expect $250+ in output costs alone.&#x20;

**When to use**: Reserve for mission-critical tasks, C-suite deliverables, complex strategic analysis, or when absolute best quality is non-negotiable.&#x20;

**Budget Quality Option: Gemini 2.5 Pro**&#x20;

* Quality: 3.6 | Speed: 4.3 | Cost Efficiency: 3.7&#x20;
* **Pricing**: $2 input / $12 output per million tokens&#x20;
* Best premium option without extreme cost &#x20;

#### Maximum Speed&#x20;

**Winner: Claude Sonnet 4.6 Fast**&#x20;

* Blockbrain Rating: Speed: 5.0 | Answer Quality: 3.8
* **Pricing**: $3 input / $15 output per million tokens&#x20;
* Context: 1M tokens | Provider: Vertex AI (EU)&#x20;

**Why it wins**: Achieves maximum speed (5.0) with excellent quality (4.5). Ideal for real-time applications, customer-facing chatbots, and time-sensitive workflows.&#x20;

**Runner-up: Gemini 2.5 Flash**&#x20;

* Speed: 4.8 | Answer Quality: 3.8&#x20;
* Pricing: $0.50 input / $3 output (5x cheaper outputs)&#x20;
* **Better value** for most speed-critical applications&#x20;

#### Code Development Excellence&#x20;

**Winner: Gemini 3.5 Flash**

Blockbrain Metrics:&#x20;

* Quality: 4.6 | Speed: 4.6 | Cost Efficiency: 3.7&#x20;
* Context: 1M tokens | Provider: Vertex AI (EU) &#x20;
* **Pricing**: $1.5 input / $9 output per million tokens&#x20;

**Why it wins**: Highest quality for code development (4.6) with excellent speed. Purpose-built for coding and agentic tasks.&#x20;

**Alternative: GPT 5.3 Codex**&#x20;

Blockbrain Metrics:&#x20;

* Quality: 4.4 | Speed: 4.2 | Cost Efficiency: 3.5&#x20;
* Context: 400k tokens | Provider: OpenAI (EU) &#x20;
* **Pricing**: $1.75 input / $14 output per million tokens&#x20;

**Why it wins**: High quality for code development (4.4) with excellent speed. Purpose-built for software development, code generation, and technical tasks.&#x20;

**Premium Option: Claude Opus 4.6**&#x20;

* Answer Quality: 4.5 | Pricing: $5 / $25&#x20;
* Best for: Complex architectural decisions, critical code review&#x20;

#### Creative & Writing Tasks&#x20;

**Winner: Claude Sonnet 4.6** &#x20;

* Blockbrain Rating: Quality 4.4 | Speed: 3.8 | Cost Efficiency: 3.3&#x20;
* **Pricing**: $3 input / $15 output per million tokens&#x20;
* Context: 1M tokens | Provider: Vertex AI (EU)&#x20;

**Why it wins**: Claude models excel at nuanced writing, tone control, and creative content. Sonnet 4.6 delivers flagship-quality writing (4.7) at mid-tier pricing—exceptional value for creative work.&#x20;

**Budget Alternative: Claude Haiku 4.5**&#x20;

* Quality: 3.6 | Pricing: $1 input / $5 output&#x20;
* Excellent for: creative briefs, social media, email drafts&#x20;

**Premium Option: Claude Opus 4.6**&#x20;

* Quality: 4.5–5.0 | Pricing: $5 / $25&#x20;
* Best for: High-stakes content, brand manifestos, critical communications&#x20;

#### Complex Reasoning Tasks &#x20;

**Winner: o3 (OpenAI Reasoning Model)**&#x20;

Blockbrain Metrics:&#x20;

* Quality: 3.5 | Speed: 2.3 | Cost Efficiency: 3.7&#x20;
* Context: 200k tokens | Provider: Azure AI (EU) &#x20;
* **Pricing**: $2 input / $8 output per million tokens&#x20;
* Performance: 20% improvement over o1 in coding, math, and science with multimodal reasoning and autonomous tool use.&#x20;
* Best for: Complex problem-solving, scientific analysis, advanced coding, mathematical proofs.&#x20;

**Budget Alternative: o4 Mini**&#x20;

* Quality: 3.4 | Speed: 3.7 | Cost Efficiency: 4.1&#x20;
* **Pricing**: $1.10 input / $4.40 output&#x20;
* 80–90% of o3's reasoning power at 45% lower cost&#x20;

**Premium Option: GPT 5.5 Pro** &#x20;

* Quality: 4.9 | Pricing: $5 / $30&#x20;
* Most advanced reasoning available, but very expensive&#x20;

***

### Decision Matrix&#x20;

| Priority              | Primary Recommendation          | Budget Alternative          | Premium Option                   |
| --------------------- | ------------------------------- | --------------------------- | -------------------------------- |
| Balanced everyday use | Gemini 2.5 Flash ($0.50/$3)     | GPT 5.4 Mini ($0.40/$1.60)  | Gemini 2.5 Pro ($2/$12)          |
| Maximum cost savings  | GPT 4o Mini ($0.15/$0.60)       | GPT 5.4 Mini ($0.40/$1.60)  | Gemini 2.5 Flash ($0.50/$3)      |
| Highest quality       | Claude Opus 4.6 Max ($5/$25)    | Gemini 2.5 Pro ($2/$12)     | GPT 5.5 Pro ($5/$30)             |
| Fastest response      | Claude Sonnet 4.6 Fast ($3/$15) | Gemini 2.5 Flash ($0.50/$3) | GPT 5.4 Low Thinking ($2.50/$15) |
| Creative work         | Claude Sonnet 4.6 ($3/$15)      | Claude Haiku 4.5 ($1/$5)    | Claude Opus 4.6 ($5/$25)         |
| Code development      | Gemini 3.5 Flash ($1.5/$9)      | GPT 5.3 Codex ($1.75/$14)   | Claude Opus 4.6 ($5/$25)         |
| Complex reasoning     | o3 ($2/$8)                      | o4 Mini ($1.10/$4.40)       | GPT 5.5 Pro ($5/$30)             |

***

### Strategic Recommendations&#x20;

#### For Most Companies: Multi-Model Strategy&#x20;

We recommend a tiered approach:&#x20;

* **Tier 1** (80% of queries): Fast, cost-efficient models&#x20;
  * Gemini 2.5 Flash or GPT 5.4 Mini&#x20;
  * Use for: emails, summaries, Q\&A, basic analysis &#x20;
* **Tier 2** (15% of queries): Balanced premium models&#x20;
  * Claude Sonnet 4.6 or Gemini 2.5 Pro&#x20;
  * Use for: reports, complex content, strategic analysis&#x20;
* Tier 3 (5% of queries): Flagship models&#x20;
  * Claude Opus 4.6 (only when necessary)&#x20;
  * Use for: critical decisions, high-stakes content, C-suite materials&#x20;

Estimated Savings: 60–75% vs. using flagship models for everything&#x20;

***

### Final Recommendations by Company Size &#x20;

#### Startups & Small Teams (<50 people) &#x20;

**Default**: GPT 5.4 Mini — $0.40 input / $1.60 output&#x20;

* Excellent quality for price (4.3)&#x20;
* Broad capability across use cases&#x20;
* Low absolute cost for getting started&#x20;

**Alternative**: Gemini 2.5 Flash — $0.50 input / $3 output&#x20;

* Slightly higher cost but 1M context window&#x20;
* Better for document-heavy workflows &#x20;

#### Mid-Size Companies (50–500 people)&#x20;

**Default**: Gemini 2.5 Flash — $0.50 input / $3 output&#x20;

* Best balanced performance (3.2 quality, 4.8 speed)&#x20;
* 1M context window for versatility&#x20;
* Scales well with volume&#x20;

**Specialist Add-ons:**&#x20;

* Engineering: GPT 5.3 Codex ($1.75/$14) or Mistral Codestral ($0.30/$0.90)&#x20;
* High-quality content: Claude Sonnet 4.6 ($3/$15)&#x20;

#### Enterprises (500+ people)&#x20;

**Default**: Multi-model strategy&#x20;

| Department           | Recommended Model                                             | Pricing (Input/Output)            |
| -------------------- | ------------------------------------------------------------- | --------------------------------- |
| Engineering          | GPT 5.3 Codex + Mistral Codestral (volume) / Gemini 3.5 Flash | $1.75/$14 + $0.30/$0.90 / $1.5/$9 |
| Creative / Marketing | Claude Sonnet 4.6                                             | $3/$15                            |
| Analytics            | Gemini 2.5 Pro                                                | $2/$12                            |
| General Workforce    | Gemini 2.5 Flash                                              | $0.50/$3                          |
| Executive / Critical | Claude Opus 4.6 (limited access)                              | $5/$25                            |

**Cost Management:**&#x20;

* Implement model routing based on query complexity&#x20;
* Set monthly budgets per team&#x20;
* Monitor usage patterns quarterly&#x20;

***

### Important Considerations&#x20;

#### Output Token Costs Matter Most&#x20;

For typical conversational AI:&#x20;

* **Input**: System prompt + user query = 500 tokens&#x20;
* **Output**: AI response = 200–500 tokens&#x20;

Example cost for 1,000 queries (500 input tokens, 300 output tokens):&#x20;

| Model             | Input Cost | Output Cost | Total  |
| ----------------- | ---------- | ----------- | ------ |
| GPT 4o Mini       | $0.075     | $0.18       | $0.26  |
| GPT 5.4 Mini      | $0.20      | $0.48       | $0.68  |
| Gemini 2.5 Flash  | $0.25      | $0.90       | $1.15  |
| Claude Haiku 4.5  | $0.50      | $1.50       | $2.00  |
| Claude Sonnet 4.6 | $1.50      | $4.50       | $6.00  |
| Claude Opus 4.6   | $2.50      | $7.50       | $10.00 |

> Output-heavy use cases (reports, documentation, code generation) should prioritize low output-cost models. &#x20;

#### Context Window Value&#x20;

| Model                  | Context Window   |
| ---------------------- | ---------------- |
| Gemini 2.5 Flash / Pro | 1M tokens        |
| Claude Sonnet 4.6      | 1M tokens        |
| Most others            | 128k–400k tokens |
| Mistral Codestral      | 32k tokens       |

**When it matters**: Document analysis, long conversations, comprehensive research, multi-file code review.&#x20;

> Pro tip: A 1M context window can hold 750,000 words or 3,000 pages of text.&#x20;

#### Provider Considerations&#x20;

**All Blockbrain models are EU-hosted, ensuring:**&#x20;

* **GDPR compliance** – Data processed within EU boundaries&#x20;
* **Data residency** – Meets European regulatory requirements&#x20;
* **Lower latency** – For European customers&#x20;

***

### Best Practices for Cost Optimization&#x20;

#### 1. Prompt Engineering&#x20;

**Reduce output tokens by 30–50%**&#x20;

Either add this in the initial instructions of the bot, or prompt it directly:&#x20;

* Request concise responses: "Answer in 2–3 sentences" or use in the sendbox \
  Options → Length: Short / Very Short&#x20;
* Use structured outputs: "Respond in bullet points"&#x20;
* Avoid redundancy: "Don't repeat the question"&#x20;

> **Impact**: Can reduce costs by 40%+ for output-heavy models.&#x20;

#### 2. Smart Model Routing&#x20;

| Query Type                    | Recommended Model            |
| ----------------------------- | ---------------------------- |
| Simple (FAQ, definitions)     | GPT 5.4 Mini                 |
| Standard (analysis, drafting) | Gemini 2.5 Flash             |
| Complex (strategic, critical) | Claude Sonnet 4.6 / Opus 4.6 |

> **Impact**: 50–70% cost reduction vs. using premium models for everything.&#x20;

#### 3. Caching & Reuse&#x20;

* Cache common prompts (Prompt Library)&#x20;
* Reuse context where possible (e.g. via Insights)&#x20;
* Implement RAG (Retrieval-Augmented Generation) via the database to reduce context size&#x20;

**Impact**: 20–30% reduction in input token costs.&#x20;

***

### Model Comparison Table (Top Recommendations)&#x20;

| Model                  | Answer Quality | Speed | Cost Eff. | Input $ | Output $ |                         |
| ---------------------- | -------------- | ----- | --------- | ------- | -------- | ----------------------- |
| Gemini 3.5 Flash       | 4.6            | 4.6   | 3.7       | $1.5    | $9       |                         |
| Gemini 2.5 Flash       | 3.2            | 4.8   | 4.6       | $0.50   | $3.00    | All-around default      |
| GPT 5.4 Mini           | 4.3            | 4.5   | 4.2       | $0.40   | $1.60    | Budget-conscious        |
| GPT 4o Mini            | 2.3            | 5.0   | 5.0       | $0.15   | $0.60    | Maximum savings         |
| Claude Haiku 4.5       | 3.6            | 3.6   | 4.0       | $1.00   | $5.00    | Quality on budget       |
| GPT 5.3 Codex          | 4.4            | 4.2   | 3.5       | $1.75   | $14.00   | Code development        |
| Mistral Codestral      | 3.4            | 3.9   | 5.0       | $0.30   | $0.90    | Code (budget)           |
| Claude Sonnet 4.6      | 4.4            | 3.0   | 3.3       | $3.00   | $15.00   | Creative / writing      |
| Claude Sonnet 4.6 Fast | 3.8            | 5.0   | 3.7       | $3.00   | $15.00   | Speed + quality         |
| Gemini 2.5 Pro         | 3.6            | 4.1   | 3.7       | $2.00   | $12.00   | Premium balanced        |
| Claude Opus 4.6        | 4.5            | 2.8   | 3.0       | $5.00   | $25.00   | Top-tier quality        |
| GPT 5.5 Pro            | 4.9            | 1.2   | 1.8       | $5.00   | $30.00   | Max quality / reasoning |
| o3                     | 3.5            | 2.3   | 3.7       | $2.00   | $8.00    | Complex reasoning       |
| o4 Mini                | 3.4            | 3.7   | 4.1       | $1.10   | $4.40    | Reasoning value         |

### Conclusion&#x20;

**The Blockbrain model portfolio offers excellent options for every use case and budget.**&#x20;

**For most companies, we recommend:**&#x20;

1. Start with Gemini 2.5 Flash as your default model (e.g. in your Company GPT)&#x20;
2. Add GPT 5.4 Mini for budget-conscious teams&#x20;
3. Introduce specialist models (Gemini 3.5 Flash, GPT 5.3 Codex, Claude Sonnet 4.6)&#x20;
4. Reserve premium models (Opus, GPT 5.5 Pro) for critical work only&#x20;

**This approach typically delivers:**&#x20;

* 60–75% cost savings vs. premium-only deployment&#x20;
* 90%+ user satisfaction&#x20;
* Flexibility to scale and optimize over time


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.blockbrain.ai/for-users/all-about-llms/how-to-choose-the-right-llm.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
