DeepSeek-Math-V2 (GitHub Repo) DeepSeek's new math reasoning model achieves gold-level performance on IMO 2025, matching recent results from Google and OpenAI. The approach trains an LLM-based proof verifier that serves as a reward model. This incentivizes the generator to verify step-by-step reasoning rather than just final answers, addressing the fundamental limitation that correct answers don't guarantee correct reasoning. | OpenAI cuts off Mixpanel after analytics leak exposes API users (3 minute read) OpenAI API users may be affected by a recent breach at data analytics provider Mixpanel. Only API users are affected - normal ChatGPT users don't need to take any action. The data includes names, approximate locations, operating system and browser details, and user IDs. OpenAI dropped Mixpanel as a result of the attack. It is also carrying out a wider security review across its vendor ecosystem. | Google changes Gemini 3 Pro free access limits due to 'high demand' (2 minute read) Google has updated the access elements for Gemini 3 Pro. Free users are now only guaranteed basic access, where daily limits may change frequently when using Thinking with 3 Pro. The limits have likely decreased, given the industry's general demand trends. NotebookLM rolled back access to the new Nano Banana Pro-powered Infographics and Slide Decks for free users and implemented limits for Pro users. | | The Moat of the Search Index (2 minute read) ChatGPT surpasses Google's search by using an "agent" approach, extracting relevant information from multiple sources and synthesizing answers, which diminishes Google's traditional search index advantage. This method reduces the impact of any single search result's failure, although it struggles with long-tail, fresh, or SEO-heavy queries. The traditional search engine moat has largely vanished, with AI-driven models increasingly blending search capabilities. | How to Create an Effective Prompt for Nano Banana Pro (15 minute read) Designing a comic is a complex challenge that brings together storytelling, visual structure, stylistic consistency, and the ability to translate abstract concepts into illustrated sequences. This post details how to create a comic using Nano Banana Pro. It provides a meta-prompt that readers can use to generate prompts for Nano Banana. | Review of DeepSeek OCR (3 minute read) DeepSeek-OCR consists of two components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder. It can generate training data for LLMs/VLMs at a scale of 200k+ pages per day in production. This post presents some initial thoughts from reading the released paper. | | Better Agents (GitHub Repo) Better Agents is a CLI tool and a set of standards for agent building. It makes coding assistants experts in any agent framework. The tool generates an AGENTS.md that ensures industry best practices. The CLI guides users through selecting a programming language, agent framework, coding assistant, LLM provider, and API keys. | Compounding Engineering Plugin (GitHub Repo) The Compounding Engineering Plugin is a Claude Code plugin that transforms how developers plan, build, and review code using AI-powered tools that systematically improve their development workflow. Compound engineering is the idea that each unit of engineering work should make subsequent units of work easier, not harder. The plugin provides the tools to make compound engineering practical. | INTELLECT-3: A 100B+ MoE trained with large-scale RL (10 minute read) INTELLECT-3 is a 100B+ parameter Mixture-of-Experts that achieves state-of-the-art performance for its size across math, code, science, and reasoning benchmarks. It was trained with both SFT and RL on top of the GLM 4.5 Air base model. The researchers used a diverse and challenging mix of RL environments designed to enhance the reasoning and agentic capabilities of their model. Full details about the training are available. | Open Deep Research (GitHub Repo) An experimental, fully open-source research assistant built on LangGraph that automates deep topic research by planning, gathering, and writing structured markdown reports—either via a human-in-the-loop workflow or a multi-agent architecture—with configurable models, search tools, prompting, and evaluation integration. | | Off-the-Rails Cost (1 minute read) A 'wasted thread' is when a model starts spitting out tons of leaked thinking or repeating tokens. This usually means users have to abandon and revert the thread. 17.8% of all costs incurred by Gemini users in Amp were on 'wasted tokens'. This is more than 2x worse than Sonnet and almost 8x worse than Opus. | | | Love TLDR? Tell your friends and get rewards! | | Share your referral link below with friends to get free TLDR swag! | | | | Track your referrals here. | | Want to advertise in TLDR? 📰 If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us. Want to work at TLDR? 💼 Apply here or send a friend's resume to jobs@tldr.tech and get $1k if we hire them! If you have any comments or feedback, just respond to this email! Thanks for reading, Andrew Tan, Ali Aminian, & Jacob Turner | | | |
0 Comments