● LIVEReading: NewsUpdated: 10 min agoSubscribers: 23,400● LIVEReading: NewsUpdated: 10 min agoSubscribers: 23,400

// COOKEDBAKEDWORKED.COM

ModelsSun, Jun 28, 2026· 5 min read

DeepSeek's DSpark speeds up LLM inference; llama.cpp and Cline both ship updates

DeepSeek dropped DSpark, a speculative decoding paper with a matching model on Hugging Face. Meanwhile llama.cpp hit b9828, Cline shipped v4.0.1, and a tiny Mac utility keeps your laptop awake only while agents run.

DeepSeek published DSpark, a speculative decoding technique that claims to speed up LLM inference without changing output quality. The paper landed on Hacker News with 749 points, and a matching model — DeepSeek-V4-Pro-DSpark — appeared on Hugging Face the same day. If the numbers hold up, this matters for anyone running local models: faster tokens, same hardware.

Research worth reading

DSpark is DeepSeek's take on speculative decoding — a technique where a small draft model proposes tokens and a larger model verifies them in parallel, cutting wall-clock latency. The PDF is public on GitHub. The Hugging Face model card for DeepSeek-V4-Pro-DSpark is live at rank #19, meaning it's already getting traction. CBW has not run benchmarks on it yet.

Open-source releases

llama.cpp tagged build b9828 this week. It's a rolling release, not a named milestone, but llama.cpp is the runtime most local-model builders depend on — any update here is worth pulling. Cross-confirmed by Reddit r/LocalLLaMA.

Cline shipped v4.0.1. Cline is the VS Code agent extension that lets you give an AI assistant file-system access and terminal commands. The 4.x line has been a significant rewrite; 4.0.1 is a patch on top of that. If you use Cline for coding tasks, update now.

CrewAI hit 1.15.1. If you're running multi-agent pipelines with CrewAI, this is a routine maintenance release — worth updating but no headline features announced.

Tools

Adrafinil is a small Mac utility that keeps your laptop awake only while an AI agent is actively running, then lets it sleep again. It got 105 points on Hacker News. If you run overnight agent jobs and hate waking up to a sleeping machine that stopped halfway through, this is the fix. It's a lid-closed solution — no need to change energy settings manually.

On Hugging Face Spaces, the LTX 2 LoRA trainer (ltx-community/ltx2-lora-trainer) is climbing the charts at #28, cross-confirmed with the fal/LTX-2.3-3DREAL-LoRA model at #34. LTX is a video generation model; the trainer lets you fine-tune it on your own footage without writing code.

Industry signal

Two Reddit threads worth a skim: r/LocalLLaMA is discussing 96 GB VRAM RTX 5090s appearing in Shenzhen's Huaqiangbei market — modified cards that could change what's possible for local inference if they're real and stable. Separately, a thread noting that Google still ships small models for coding tasks is a useful counterpoint to the 'bigger is always better' narrative.

What builders can do this week

1. Pull the DeepSeek-V4-Pro-DSpark model from Hugging Face and run a simple benchmark against your current local model on a fixed prompt set — time to first token and tokens per second. Post your numbers; there's almost no public data yet.

2. Update Cline to v4.0.1 and set up a simple file-editing agent task — for example, have it refactor a folder of Markdown docs into a consistent format. The 4.x rewrite improved context handling, so tasks that failed before may work now.

3. Install Adrafinil if you run Mac-based overnight agent jobs. Configure it to wake your machine only during your CrewAI or llama.cpp runs, then let it sleep. Takes under five minutes to set up.

// what we actually tested

What we can and can't confirm

Confirmed: DeepSeek published the DSpark paper publicly on GitHub and the DeepSeek-V4-Pro-DSpark model is live on Hugging Face.

Not independently verified by CBW: We have not run DSpark benchmarks. The claimed inference speedups come from DeepSeek's own paper — third-party replication is pending.

Not independently verified by CBW: The 96 GB RTX 5090 cards from Huaqiangbei are reported on Reddit only. Modified VRAM cards from grey markets have a history of instability or outright fraud — treat with skepticism until independent teardowns appear.

Worth noting: Cline v4.0.1 and CrewAI 1.15.1 are patch releases. No changelog details were available in the signals — check the GitHub release pages for specifics before upgrading production workflows.

Worth noting: The LTX-2.3-3DREAL-LoRA model and ltx2-lora-trainer Space are trending on Hugging Face but CBW has not tested video output quality.

Source: DeepSeek DSpark paper (GitHub PDF) — https://github.com/deepseek-ai/DeepSpec/blob/main/DSpark_paper.pdf

Source: DeepSeek-V4-Pro-DSpark on Hugging Face — https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark

Source: llama.cpp GitHub — https://github.com/ggml-org/llama.cpp

Source: Cline GitHub — https://github.com/cline/cline

Source: Adrafinil on GitHub (Show HN) — https://github.com/kageroumado/adrafinil

Source: Reddit r/LocalLLaMA — 96 GB 5090s from Huaqiangbei — https://www.reddit.com/r/LocalLLaMA/comments/1ugyqsi/96_gig_5090s_from_shenzhens_huaqiangbei/