Building more with GPT-5.1-Codex-Max (7 minute read) GPT-5.1-Codex-Max is trained to operate across multiple context windows through "compaction", allowing it to work over millions of tokens and complete tasks spanning 24+ hours. The model achieves 77.9% on SWE-bench Verified while using 30% fewer thinking tokens than its predecessor. | | Google Has Your Data. Gemini Barely Uses It (13 minute read) Google is dramatically under-capitalizing on the strongest context position in the industry. Gemini hides its workspace connector in its settings, treating it as an optional enhancement, not the center of the product. This is almost certainly because Google is trying to play it safe. The company has an opportunity to turn its dormant context advantage into experiences that would be impossible anywhere else. | My GPT-5.1 Pro Review (10 minute read) GPT-5.1 Pro is a slow, heavy-weight reasoning model that feels intelligent and can handle tough problems. It follows instructions well without going off the rails, making it feel like a contract engineer working from a spec rather than an assistant. Its biggest weakness is its interface, as is lives in ChatGPT, not in the IDE. Gemini 3 is still better for most day-to-day work. However, GPT-5.1-Pro wins in deep thought, planning, and research. | How evals drive the next chapter in AI for businesses (9 minute read) OpenAI published a framework arguing that evaluation systems ("evals") are bridge between AI's probabilistic nature and business outcomes that breakdown into three phases: 1) experts defining success, 2) stress-test systems with real-world edge cases, and 3) continuous monitoring to build proprietary datasets that compound into competitive moats. | Gemini 3 Prompting: Best Practices for General Usage (6 minute read) Gemini 3 Pro responds best to direct, structured prompts with behavioral constraints placed at the top rather than scattered throughout. Unlike previous versions, it defaults to concise responses unless explicitly asked to be conversational. The model handles long contexts better when instructions appear after data rather than before and treats multimodal inputs as equal-class data requiring explicit cross-modal instructions. It benefits from explicit planning steps with self-critique loops using XML or Markdown formatting. | | Scientific ML in Pytorch (5 minute read) PINA, a new open-source library for Scientific Machine Learning, is now part of the PyTorch ecosystem. It offers a modular, scalable workflow for modeling scientific systems, including PDE solvers and physical simulations. | The secret behind Gemini 3 (1 minute read) The secret behind Gemini 3 is improved pre-training and post training. The pre-training team delivered a massive jump using scaling. Post-training is still a total greenfield. There is a lot of room for algorithm progress and improvement. | | "We're in an LLM bubble," Hugging Face CEO says—but not an AI one (3 minute read) Clem Delangue, CEO of Hugging Face, says that the LLM bubble may be bursting next year. However, that doesn't mean that 'AI' will collapse - LLMs are just a subset of AI technology. We are still at the very beginning of AI, and we'll see much more in the next few years. It is more likely that we will end up with a multitude of models that can solve many different problems rather than one type of model that will solve all problems for all companies. | alphaXiv raises $7M in funding to become the GitHub of AI research (3 minute read) alphaXiv has raised $7 million in seed funding in a round co-led by Menlo Ventures and Haystack. The startup aims to help engineers transform the latest academic discoveries into cutting-edge AI features by streamlining their paths from research to production. Its platform allows researchers to publish their latest papers, connecting them to engineers who use the knowledge to create new AI features. alphaXiv aims to become the de facto global workspace for AI researchers. | | | Love TLDR? Tell your friends and get rewards! | | Share your referral link below with friends to get free TLDR swag! | | | | Track your referrals here. | | Want to advertise in TLDR? 📰 If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us. Want to work at TLDR? 💼 Apply here or send a friend's resume to jobs@tldr.tech and get $1k if we hire them! If you have any comments or feedback, just respond to this email! Thanks for reading, Andrew Tan, Ali Aminian, & Jacob Turner | | | |
0 Comments