Anthropic is about to drop Sonnet 5 during Super Bowl week (2 minute read) Anthropic plans to release Claude Sonnet 5, with early testing showing strong math and coding capabilities, potentially outperforming Claude Opus 4.5. The model, observed with a 128K context window, aims to be a cost-effective solution for developers. Its release may coincide with Super Bowl LX week, aligning with AI labs' marketing strategies against ChatGPT and Google's Gemini. | OpenAI launched Codex for macOS (8 minute read) OpenAI released a new macOS app for Codex designed to coordinate multiple agents, run tasks in parallel, and manage long-running software projects. It's now available to ChatGPT Free and Go users for a limited time, with doubled rate limits for paid tiers. | xAI Joins SpaceX (2 minute read) SpaceX announced that xAI is joining its organization, integrating efforts between Elon Musk's AI lab and the space company. The collaboration aims to combine advanced AI research with aerospace engineering, potentially accelerating autonomous systems and robotics in space missions. This merger signals a strategic alignment of AI development with real‑world hardware and exploration initiatives. | | Teaching LLMs to Be Funny (20 minute read) Tinker recently made it possible to post-train Kimi K2, Moonshot's 1 trillion parameter model. This post takes a look at how to train the model on a qualitative reward. It shows readers how to train a model to decompose jokes into verifiable properties. The resulting model can make jokes and explain why jokes are funny. The models, code, and data needed to replicate the model in the post are available. | Clawdbot's Missing Layers (7 minute read) E-commerce took a while to take off because the infrastructure to make it safe took time to develop. This is the same with agents. While the technology has great potential, it is currently full of risks. Agents need a security stack, just like e-commerce, where each layer handles what the others can't. This post discusses the different security layers AI agents need. Each layer represents an opportunity for companies to build infrastructure that will make the entire ecosystem possible. | Fine-tuning open LLM judges to outperform GPT-5.2 (12 minute read) Open-source models like GPT-OSS 120B and Qwen3 235B are fine-tuned using Direct Preference Optimization (DPO) to potentially outperform GPT-5.2 on human preference tasks. RewardBench 2 is used for evaluation, highlighting areas like Math and Safety where these models excel. Cost-efficient, these open models also offer transparency, enabling better alignment with specific use cases while significantly reducing reliance on expensive, closed-source alternatives. | | Context Management and MCP (10 minute read) Context rot is unavoidable with today's models, and you cannot work around it. The best way to deal with this is to leverage subagents. The subagent approach provides a ton of flexibility in how problems can be approached. It's not a perfect solution, but it is a lot better than other current fixes in a lot of use cases, and it embraces the limitations of the models. | NVIDIA proposes Golden Goose: Unlimited RLVR Tasks (18 minute read) Golden Goose enables the synthesis of large-scale RL with Verifiable Rewards (RLVR) tasks from unverifiable web text. The resulting GooseReason dataset helps revive model performance in math, science, and cybersecurity, surpassing prior state-of-the-art in multiple domains. | | OpenAI quietly lays groundwork for ads in ChatGPT (2 minute read) ChatGPT responses now contain references to ads in the source code. While not visible to users yet in the UI, the addition of the code signals that ChatGPT ads are moving from concept to near-launch. It is likely that targeting and eligibility are already being tested. OpenAI will sell ads on an impression basis. Early indications suggest they won't be cheap. | Judgment isn't uniquely human (16 minute read) Experts frequently underestimate AI's capabilities, as shown by Yann LeCun's dismissal of AI learning real-world physics and a recent NYT Op-Ed's claim that judgment is human-exclusive. AI models like GPT-3.5 and tools by Anthropic already demonstrate sophisticated decision-making, challenging these assertions. Studies in "Science" and "Nature" also reveal that AI often outperforms humans in complex judgments, suggesting the need to reassess AI's role and potential in decision-making contexts. | | | Love TLDR? Tell your friends and get rewards! | | Share your referral link below with friends to get free TLDR swag! | | | | Track your referrals here. | | | |
0 Comments