Nvidia unveils new GPU designed for long-context inference (1 minute read) Nvidia has announced a new GPU called the Rubin CPX designed for context windows larger than 1 million tokens. The GPU, meant to be used as part of a broader 'disaggregated inference' infrastructure approach, is optimized for the processing of large sequences of context. It performs better on long-context tasks like video generation and software development. The Rubin CPX will be available at the end of 2026. | | The whole point of OpenAI's Responses API is to help them hide reasoning traces (5 minute read) OpenAI's Responses API replaces the previous /chat/completions API for inference. The new API has a lot more features, but the main difference is that it is stateful. Users no longer have to pass the entire conversation history with each request, they just have to pass an ID representing the state of the conversation and the provider keeps models up to date. This allows OpenAI to keep its reasoning traces secret. | The Training Imperative (6 minute read) Every serious AI company will eventually train its own models. The barrier to doing so is collapsing. Distillation, fine-tuning, and post-containing get easier every month. Soon, the only way to stay relevant will be to own your own models. | Thoughts on Evals (14 minute read) Production monitoring reveals real issues that pre-deployment evals will inevitably miss, especially as AI products become more unpredictable and personalized. Evals are collections of already-known failure cases, but agents can and often do fail in ways that produce no error codes. | | Introducing the MCP Registry (4 minute read) The Model Context Protocol Registry provides a standardized way to distribute and discover MCP servers. A community-driven project, it allows organizations to create private enterprise registries while maintaining compatibility through shared API schemas and moderation guidelines. | | The Gross Margin Debate in AI (8 minute read) AI companies face varied gross margins across sectors. Chips maintain around 70% margins, while cloud services see margins pressured by AI investments, estimated between 50-55%. Application-level margins range widely, with AI "Supernovas" starting at 25%, potentially negative, while others hit 60%, highlighting pricing strategies and diversified revenue models to improve margins over time. | | Love TLDR? Tell your friends and get rewards! | Share your referral link below with friends to get free TLDR swag! | | Track your referrals here. | | | |
0 Comments