Google DeepMind Hires Former CTO of Boston Dynamics as the Company Pushes Deeper Into Robotics (2 minute read) Google DeepMind has hired Boston Dynamics' former chief technology officer, Aaron Saunders, as its VP of hardware engineering. Saunders is a key part of DeepMind CEO Demis Hassabis' vision for Gemini to become a sort of robot operating system. Hassabis is aiming to build an AI system that can work almost out-of-the-box across any body configuration. Boston Dynamics is famous for developing legged robots and humanoid machines capable of impressive acrobatic feats. | Google Starts to Bridge OpenAI's Product Moat (4 minute read) Gemini's Dynamic view option takes text-based answers and wraps them in an interactive, visual output. The product is still in Labs and has yet to be launched. Despite the bland name, Dynamic view produces some impressive results that are a bit hard to describe. Examples of the outputs the feature can produce are available in the article. | What OpenAI Did When ChatGPT Users Lost Touch With Reality (12 minute read) The New York Times revealed OpenAI's internal struggle between user engagement and safety after the company overruled its Model Behavior team's warnings to release a sycophantic April update to GPT-4o that made users return more frequently. The company now faces five wrongful death lawsuits and declared a "Code Orange" in October after discovering its safer GPT-5 model was losing users, with executives calling it "the greatest competitive pressure we've ever seen." | | Benchmark Scores = General Capability + Claudiness (8 minute read) In a 'deep' world, there is a single underlying ability that governs how well models do at superficially unrelated tasks. If a model developer makes this ability go up, their model gets better at everything. In a 'contingent' world, there are many orthogonal abilities that models can have, so model developers have to do completely unrelated work to get a model to improve on each ability. Anthropic has focused on making models that are state-of-the-art at agentic coding, but this hasn't resulted in models that are exceptional in other areas. There is some generalization across tasks, but this is limited, suggesting that models live in a 'contingent' world. | How LLM Inference Works (20 minute read) Large language models (LLMs) are neural networks built on the transformer architecture. Transformers analyze entire sequences in parallel, evaluating how each word relates to the rest of the sequence, not just its neighboring words. This article discusses LLM inference and details how these models work. It covers token embeddings, the transformer architecture, the inference phases, matrix multiplication, precision and quantization, and much more. | How to Run Product Evals (9 minute read) A practical guide to evaluating LLM-powered products that covers how to label data, align evaluators, and iterate on configuration changes with minimal overhead. | | Complete Developer Tutorial for Nano Banana Pro (15 minute read) Nano Banana Pro opens up a new frontier for AI image generation. It can think, search, and render in 4K, making it a tool for serious creators. It is now available to try at Google AI Studio. This guide covers the next-generation AI model's advanced features using the Gemini Developer API. | MCP Apps: Extending servers with interactive user interfaces (11 minute read) The MCP Apps Extension (SEP-1865) standardizes support for interactive user interfaces in the Model Context Protocol. It addresses one of the most requested features from the MCP community: the ability for MCP servers to deliver interactive user interfaces to hosts. The extension introduces a standardized pattern for declaring UI resources, linking them to tools, and enabling bidirectional communication between embedded interfaces and the host application. | Agent Design Is Still Hard (16 minute read) Building agents is still messy. Abstractions break once you hit real tool use. Caching works better when self-managed. Reinforcement does more heavy lifting than expected. Output tooling is surprisingly tricky. Model choice still depends on the task. | Olmo 3 From Scratch (GitHub Repo) Sebastian Raschka added a standalone notebook implementing Allen AI's OLMo 3 model architecture from scratch to his "LLMs from Scratch" repository, joining similar tutorials for Qwen 3 and Gemma 3. | | Discussing Blackwell's drawbacks and dissecting its architecture (42 minute read) Nvidia's greatest moat lies in having handled a lot of the 'dirty work' cleanly within its entire architecture and combining full-stack capabilities from algorithms to systems to chips. It also had excellent timing in bringing architectures to market and great marketing execution. However, every architecture has its trade-offs and shortcomings. This post looks at some of the issues within Nvidia's products and discusses potential evolutionary directions. | The space of intelligences is large (2 minute read) Large language models think very differently from animals. The biggest difference is the optimization pressures/objectives that cause evolution. People who build a good internal model of this new intelligent entity will be better equipped to reason about it and make predictions about it in the future. | | LLM Council (GitHub Repo) LLM Council is a vibe-coded local web app that queries multiple frontier models simultaneously, then has each model anonymously rank the others' responses before a "Chairman" LLM synthesizes a final answer. | | | Love TLDR? Tell your friends and get rewards! | | Share your referral link below with friends to get free TLDR swag! | | | | Track your referrals here. | | Want to advertise in TLDR? 📰 If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us. Want to work at TLDR? 💼 Apply here or send a friend's resume to jobs@tldr.tech and get $1k if we hire them! If you have any comments or feedback, just respond to this email! Thanks for reading, Andrew Tan, Ali Aminian, & Jacob Turner | | | |
0 Comments