qwen3-vl

4 posts connected to this tag.

Aug 1, 2026

What CKE Built In July 2026: Vision, Audio, X-Ray And Faster CPU Inference

A source-linked July recap of 194 PRs and 32 articles: multimodal models, long-context numerical parity, compiler and test hardening, CPU evidence, contributors, and the path toward distribu...

Read post →

Jul 23, 2026

What Numerical Parity Actually Requires: BF16, Quantization, ISA Drift And CKE X-Ray

Numerical parity is not one BF16 tolerance. Trace how CKE uses llama.cpp, PyTorch, X-Ray, replay provenance and phase-owned kernel contracts to locate the first divergence.

Read post →

Jul 19, 2026

Prefill vs Decode in CKE: Why One Model Needs GEMM, GEMV, and Two Execution Plans

Follow an LLM request from tokenization through prefill, KV-cache construction and decode, then see why CPU projections become GEMM or GEMV and how CKE compares them with llama.cpp.

Read post →

Jun 17, 2026

How Qwen3-VL Vision Works: Header Body Footer

Lab note This ShivasNotes lab note studies how a real vision-language model turns images into tokens that a language decoder can use. It connects earlier notes on activations, normalization,...

Read post →

Get my rants delivered to your inbox

I will send new posts as and when I write. No fixed cadence, just engineering notes, rants, and things I am thinking through.

Need an intelligent system to work on real hardware?

Embedded systems · Robotics · Constrained AI · CPU and HPC · Accelerators · Distributed systems

Work With Us / Antshiv Robotics

See engineering services

ShivasNotes

Independent notes on software, systems, AI workflows, robotics, and building Antsand in public.

What CKE Built In July 2026: Vision, Audio, X-Ray And Faster CPU Inference

What Numerical Parity Actually Requires: BF16, Quantization, ISA Drift And CKE X-Ray

Prefill vs Decode in CKE: Why One Model Needs GEMM, GEMV, and Two Execution Plans

How Qwen3-VL Vision Works: Header Body Footer

Subscribe

Subscribe to emails from Anthony

Need an intelligent system to work on real hardware?

Embedded systems · Robotics · Constrained AI · CPU and HPC · Accelerators · Distributed systems

ShivasNotes

Read

Support