Everyone assumes you need the most expensive model to write good code. They are wrong.
I have spent the last year building a CPU inference engine from scratch—no PyTorch, no TensorFlow, just C and x86 intrinsics. Along the way I learned something: the model that understands you is not the same model that should write your code.
The core insight: Frontier models parse chaos. MiniMax executes specs. Use both.
Why Frontier for Planning?
I type fast. I think faster. The result is word salad—typos, broken English, grammar errors, half-finished thoughts. Frontier models (Claude, GPT) are remarkably good at parsing this chaos and producing coherent specs.
My C-Kernel-Engine has a lot of moving parts. When I add compatibility for a new model, even frontier gets confused. But I can direct them. If they do not get what I want, I escalate to pseudo-code. They parse both.
Level 1: Word Salad
This is how I actually type:
"i want my kenrls to have full llama.cpp aproity. we can creaet pathc code to dump tensors adn tehn compare our nuemriocal otuptu with laama.cpp and eprformance."
Frontier parses this. It understands I want numerical parity verification against llama.cpp with tensor dumps.
Level 2: Pseudo-Code (When AI Doesn't Get It)
Sometimes the word salad is not enough. Then I write pseudo-code:
"first for oru bump covnerter save the side card + quant summary ir1 msut read the quant summar and sidecar and ir template for that mdoel adn parse throught the ops. then see if we ahve a kernel fro evry ops. if not hard fail. for sidecar template ops: // order of operation for sidecar quant summary: // what are the ops if kernel exist: // we have a kernel we can use continue else: // hard fault. downstream will fail."
Notice the typos are still there. Frontier does not care. It extracts the logic: bump converter outputs sidecar + quant summary → IR1 validates kernel coverage → hard fail on missing ops.
This is their strength. Not writing 500 lines of C. Parsing my brain dump at whatever fidelity I give them.
The Problem with Frontier for Implementation
| What I Ask | What Frontier Gives Me |
|---|---|
| "write an int8 matmul with avx intrinsics" | Lecture about memory safety + refactored architecture |
| "add error handling to this function" | Rewrites entire module with "best practices" |
| "just give me the struct definition" | "Here's a comprehensive solution that addresses..." |
MiniMax? It just executes. No moralizing. No refactoring my life choices. I give it a spec, it gives me code.
The Workflow
I treat AI like a compiler pipeline. High-level language goes in, machine code comes out.
Why 90/10?
When I was at Ericsson building carrier-grade packet processing, we separated control plane from data plane. You do not run configuration logic through the same path as 10Gbps traffic.
Same principle here:
| Frontier (Control Plane) | MiniMax (Data Plane) | |
|---|---|---|
| Role | Architect / Janitor | Builder / Worker |
| Input | Word salad, typos, chaos | Strict specs, pseudo-code |
| Obedience | Low (argues, lectures) | High (blind execution) |
| Cost | $$$ | Free / negligible |
| Context Handling | Skims, summarizes | Actually reads the code |
Real Example: Adding New Model Compatibility
When I add support for a new model to my C-Kernel-Engine, there are a lot of moving parts. I have done this for GPT-2, Qwen2, Qwen3—now working on Gemma:
| Phase | Model | Task |
|---|---|---|
| 1. Understand | Claude/GPT | Dump my word salad about the new model. Claude parses chaos → coherent implementation plan. |
| 2. Kernel Parity | MiniMax | Implement the actual kernels. Create patch code to dump tensors at each layer. |
| 3. Verification Harness | MiniMax | Write tests that compare our numerical output against llama.cpp and PyTorch. Tensor-by-tensor parity. |
| 4. IR Stitching | MiniMax | Wire up the layers—sidecar parsing, quant summary validation, kernel coverage checks. |
| 5. Debug | Claude/GPT | When parity breaks (it always does), Claude helps identify where. MiniMax fixes. |
Notice the pattern: Claude touches it twice (beginning and end). MiniMax does everything in the middle. That is the 90/10.
Why MiniMax specifically? High context window, follows instructions without arguing, fast inference. When I dump my entire header file history into context, it actually processes it. Frontier models skim and summarize.
cc-switch: Terminal Workflow
I do not use browser interfaces. AwesomeWM for a decade. Terminal is home.
cc-switch lets me toggle backends from command line:
| Agility | Route "design this" to Claude, "implement this" to MiniMax, same terminal |
| Learning | Compare outputs across model families—helps me design better |
| Privacy | Sensitive kernel code stays on trusted infrastructure |
Sustainable Independence
This is not about saving money. It is philosophy.
I ditched Vue and Node for pure JS and PHP. I build inference in C, not PyTorch. Dependencies are debt. Every abstraction is a liability. Same with AI providers—relying on one is vendor lock-in. Relying on a family of models is strategy.
Where this is going: Eventually frontier is just for sanity checks. MiniMax (or whatever open model is best) does 99%. That is a great place to be.
Final
I am a solo dev who achieved PyTorch numerical parity for GPT-2, Qwen2, and Qwen3 in pure C. Now working on Gemma. Started 60x slower than llama.cpp—now 1.5x slower. Closing the gap in both numerical parity and raw performance.
I write code at 4am because that is when my mind is clearest. My typing is chaos but my systems work.
The industry wants you to believe you need the biggest model for everything. You do not.
Plan with frontier. Build with MiniMax. Own your workflow.