Blog - ShivasNotes

Everyone assumes you need the most expensive model to write good code. They are wrong.

I have spent the last year building a CPU inference engine from scratch—no PyTorch, no TensorFlow, just C and x86 intrinsics. Along the way I learned something: the model that understands you is not the same model that should write your code.

The core insight: Frontier models parse chaos. MiniMax executes specs. Use both.

Why Frontier for Planning?

I type fast. I think faster. The result is word salad—typos, broken English, grammar errors, half-finished thoughts. Frontier models (Claude, GPT) are remarkably good at parsing this chaos and producing coherent specs.

My C-Kernel-Engine has a lot of moving parts. When I add compatibility for a new model, even frontier gets confused. But I can direct them. If they do not get what I want, I escalate to pseudo-code. They parse both.

Level 1: Word Salad

This is how I actually type:

"i want my kenrls to have full llama.cpp aproity. we can creaet pathc code to dump tensors adn tehn compare our nuemriocal otuptu with laama.cpp and eprformance."

Frontier parses this. It understands I want numerical parity verification against llama.cpp with tensor dumps.

Level 2: Pseudo-Code (When AI Doesn't Get It)

Sometimes the word salad is not enough. Then I write pseudo-code:

"first for oru bump covnerter save the side card + quant summary ir1 msut read the quant summar and sidecar and ir template for that mdoel adn parse throught the ops. then see if we ahve a kernel fro evry ops. if not hard fail. for sidecar template ops: // order of operation for sidecar quant summary: // what are the ops if kernel exist: // we have a kernel we can use continue else: // hard fault. downstream will fail."

Notice the typos are still there. Frontier does not care. It extracts the logic: bump converter outputs sidecar + quant summary → IR1 validates kernel coverage → hard fail on missing ops.

This is their strength. Not writing 500 lines of C. Parsing my brain dump at whatever fidelity I give them.

The Problem with Frontier for Implementation

What I Ask	What Frontier Gives Me
"write an int8 matmul with avx intrinsics"	Lecture about memory safety + refactored architecture
"add error handling to this function"	Rewrites entire module with "best practices"
"just give me the struct definition"	"Here's a comprehensive solution that addresses..."

MiniMax? It just executes. No moralizing. No refactoring my life choices. I give it a spec, it gives me code.

The Workflow

I treat AI like a compiler pipeline. High-level language goes in, machine code comes out.

Why 90/10?

When I was at Ericsson building carrier-grade packet processing, we separated control plane from data plane. You do not run configuration logic through the same path as 10Gbps traffic.

Same principle here:

	Frontier (Control Plane)	MiniMax (Data Plane)
Role	Architect / Janitor	Builder / Worker
Input	Word salad, typos, chaos	Strict specs, pseudo-code
Obedience	Low (argues, lectures)	High (blind execution)
Cost	$$$	Free / negligible
Context Handling	Skims, summarizes	Actually reads the code

Real Example: Adding New Model Compatibility

When I add support for a new model to my C-Kernel-Engine, there are a lot of moving parts. I have done this for GPT-2, Qwen2, Qwen3—now working on Gemma:

Phase	Model	Task
1. Understand	Claude/GPT	Dump my word salad about the new model. Claude parses chaos → coherent implementation plan.
2. Kernel Parity	MiniMax	Implement the actual kernels. Create patch code to dump tensors at each layer.
3. Verification Harness	MiniMax	Write tests that compare our numerical output against llama.cpp and PyTorch. Tensor-by-tensor parity.
4. IR Stitching	MiniMax	Wire up the layers—sidecar parsing, quant summary validation, kernel coverage checks.
5. Debug	Claude/GPT	When parity breaks (it always does), Claude helps identify where. MiniMax fixes.

Notice the pattern: Claude touches it twice (beginning and end). MiniMax does everything in the middle. That is the 90/10.

Why MiniMax specifically? High context window, follows instructions without arguing, fast inference. When I dump my entire header file history into context, it actually processes it. Frontier models skim and summarize.

cc-switch: Terminal Workflow

I do not use browser interfaces. AwesomeWM for a decade. Terminal is home.

cc-switch lets me toggle backends from command line:

Agility	Route "design this" to Claude, "implement this" to MiniMax, same terminal
Learning	Compare outputs across model families—helps me design better
Privacy	Sensitive kernel code stays on trusted infrastructure

Sustainable Independence

This is not about saving money. It is philosophy.

I ditched Vue and Node for pure JS and PHP. I build inference in C, not PyTorch. Dependencies are debt. Every abstraction is a liability. Same with AI providers—relying on one is vendor lock-in. Relying on a family of models is strategy.

Where this is going: Eventually frontier is just for sanity checks. MiniMax (or whatever open model is best) does 99%. That is a great place to be.

Final

I am a solo dev who achieved PyTorch numerical parity for GPT-2, Qwen2, and Qwen3 in pure C. Now working on Gemma. Started 60x slower than llama.cpp—now 1.5x slower. Closing the gap in both numerical parity and raw performance.

I write code at 4am because that is when my mind is clearest. My typing is chaos but my systems work.

The industry wants you to believe you need the biggest model for everything. You do not.

Plan with frontier. Build with MiniMax. Own your workflow.

About

My two-tier A.I workflow: frontier thinks, MiniMax (open weights) does the heavy lifting