[PRAGMATISMO] Stop Wasting Money on Claude — Pure HTML
⚓ Research 📅 2026-06-02 👤 Pragmatismo 👁️ 8Stop Wasting Money on Claude — Pure HTML
Qwen 3.6 runs on your RTX 3060. DeepSeek V4 Flash on 2 RTX 4090. GPT-OSS-120B in the cloud. Frontier performance for zero API cost. Free assessment with Pragmatismo.
| Pragmatismo — Practical AI. Real results. • June 2026 | pragmatismo.com.br |
| # Stop Paying for Claude. |
| Open-weight models beat Claude on cost. And they run on your GPU. |
| Qwen 3.6 (Apache 2.0) runs on an RTX 3060 with 12GB and scores 73.4% on SWE-bench Verified — versus 80.8% for Claude Opus 4.6 at US$ 75/million output tokens. DeepSeek V4 Flash (MIT) surpasses 79% and runs on 2 RTX 4090s. GPT-OSS-120B (Apache 2.0) runs in the cloud for pennies. The math doesn’t work for Claude. |
Real benchmark — SWE-bench Verified 2026
Code that works. Or it doesn’t.
500 real GitHub issues. The model must understand the bug, write the patch, and pass the tests. No shortcuts. Source: swebench.com.
DeepSeek V4 Flash
79%Claude Opus 4.6
80.8%Qwen 3.6-27B (dense)
77.2%Qwen 3.6-35B-A3B ⚡️ RTX 3060
73.4%GPT-OSS-120B
~70%*
Data: swebench.com. Qwen 3.6-35B-A3B runs on RTX 3060 12GB with quantization (Apache 2.0). DeepSeek V4 Flash (MIT) requires 2 RTX 4090. Claude Opus 4.6 costs US$ 75/1M output tokens. *GPT-OSS-120B: partial benchmark, OpenAI’s open model (Apache 2.0), 5.1B active of 120B MoE, runs in cloud. June 2026.
Real cost
Frontier performance. Pocket change.
The cost gap between Claude and open models is abysmal. And with your own GPU, token cost is zero.
| Model | Input/1M | Output/1M | SWE-Ver. | Runs on |
| Claude Opus 4.6 | US$ 15 | US$ 75 | 80.8% | API only |
| DeepSeek V4 Flash (API) | US$ 0.14 | US$ 0.28 | 79% | 2 RTX 4090 |
| Qwen 3.6-27B (API) | ~US$ 0.50 | ~US$ 2 | 77.2% | RTX 4090 |
| Qwen 3.6-35B-A3B | US$ 0 | US$ 0 | 73.4% | RTX 3060 |
| DeepSeek V4 Flash | US$ 0 | US$ 0 | 79% | 2× RTX 4090 |
Sources: deepseek.com, qwen.alibaba.com, swebench.com, openai.com/gpt-oss. Official API prices June/2026. Qwen 3.6-35B-A3B on RTX 3060, DeepSeek V4 Flash on 2× RTX 4090, GPT-OSS-120B on 1× H100.
“While you pay US$ 75 per million output tokens for Claude, your competitor runs Qwen 3.6 on a 12GB RTX 3060 — for free, without sending any data to anyone.”
— The math that doesn’t add up. June 2026.
Your scenario
Four paths. One is yours.
From the simplest to the most sovereign. ALL models below are open weight (Apache 2.0 or MIT).
EXIT 01 — API SWAP
DeepSeek V4 Flash or Qwen 3.6 via API
Swap endpoints, zero code changes. OpenAI-compatible API. Cost 50-250x lower than Claude, equivalent coding performance. Results in days. DeepSeek V4 Flash: US$ 0.28/1M output. Qwen 3.6: ~US$ 2/1M output.
EXIT 02 — CLOUD GPU
GPT-OSS-120B, Qwen 3.6 or DeepSeek V4 Flash in the cloud
Rent a GPU (H100, A100) on AWS, Azure, RunPod or Spheron. Run vLLM with OpenAI-compatible API. GPT-OSS-120B (5.1B active, 120B MoE) fits on 1× H100. DeepSeek V4 Flash 2× H100. Full data control. Predictable cost. Used in production by General Bots.
EXIT 03 — ON-PREMISE (YOUR GPU)
Qwen 3.6-35B-A3B on RTX 3060 (12GB)
Qwen 3.6-35B-A3B: activates only 3B parameters per token (MoE). With 4-bit quantization, it fits in 4-6GB of VRAM. Runs on your RTX 3060 with 12GB. 73.4% on SWE-bench Verified. DeepSeek V4 Flash (284B MoE, 13B active): 2× RTX 4090. 79% on SWE-bench Verified, 1M context. Inference cost: ZERO. LGPD/GDPR compliance automatic — data never leaves your machine.
EXIT 04 — LAST RESORT
Claude Haiku — if you really have no alternative
If you don’t have a GPU, can’t use the cloud, and don’t want to switch APIs, Haiku is still much cheaper than Opus. But it’s plan Z. Start with any of the three exits above.
REAL READING (NO BULLSHIT)
pragmatismo.com.br
Carl vs Wilson — Two Teenagers, Two AI Philosophies
A deep analysis of how your AI stack choice can define the future of your company. Which side do you choose?
pragmatismo.com.br
The LLM Boom is Over: Enter the Era of Industrial Orchestration
Why the hype cycle is settling and what matters now for sustainable AI — cost, control, and real results.
pragmatismo.com.br
TCO comparison: open source saves up to 87.5% over 5 years vs proprietary stacks. The numbers of freedom.
Full blog: Open source, LLMs & real strategy
Visit → pragmatismo.com.br/blog. No paywall, no empty promises.
NEXT STEP
Free assessment
with Pragmatismo
We analyze your current stack — model, GPU, volume, cost — and map which of the four exits makes sense. No commitment. No sales pitch.
| • Real monthly cost mapping by model and volume |
| • Architecture and GPU recommendation (RTX 3060, 4090 or cloud) |
| • Monthly savings estimate and ROI |
| • LGPD/GDPR compliance analysis for LLMs |
Or directly: contato@pragmatismo.com.br
Pragmatismo • General Bots • Docs
Av. Rio Branco, 177 — Rio de Janeiro, Brazil • +55 21 4040-2160
© 2026 Pragmatismo Inovações Ltda. Benchmarks from June 2026.
🏷️ bigtech 🏷️ buble 🏷️ investiment