Introducing Codex Plugin for Claude Code (3 minute read) The Codex plugin for Claude Code gives users a simple way to pull Codex into their Claude Code workflow. It is useful for normal Codex reviews, a more adversarial review, and handing work off to Codex when a second pass from a different agent is required. The plugin delegates through the local Codex CLI and Codex app server, so it uses the system's existing local auth, configuration, environment, and MCP setup. | Qwen3.5-Omni: Scaling Up, Toward Native Omni-Modal AGI (94 minute read) Qwen3.5-Omni is a full omnimodal large language model that understands text, images, audio, and audio-visual content. It can process more than 10 hours of audio input and over 400 seconds of 720P audio-visual input at 1 FPS. The model is trained on a massive amount of text and visual data, and more than 100 million hours of audio-visual data. It supports speech recognition in 113 languages and dialects and speech generation in 36 languages and dialects. | Microsoft 365 Copilot gets Critique and Council modes (2 minute read) Microsoft 365 Copilot has introduced Critique and Council modes to enhance research capabilities. Critique uses a dual-model system to generate and refine research drafts, outperforming single-model solutions by 13.88% on the DRACO benchmark. Council allows parallel report generation using Anthropic and OpenAI models for impactful comparison and insight aggregation. | | A Mirror Test For LLMs (16 minute read) The proposed "Mirror Test" assesses LLM self-awareness by challenging models to identify their own outputs without explicit cues. Testing reveals that Anthropic's Opus 4.6 model shows notable self-recognition capabilities due to its distinct token outputs, outperforming OpenAI's GPT models, which fail to recognize self-generated tokens. Despite indications of attempted self-marking, no LLM demonstrated consistent self-awareness, as none effectively communicated using message passing. | AI Infrastructure Roadmap: Five frontiers for 2026 (17 minute read) The first generation of AI was a world where progress meant bigger weights, more data, and stellar benchmarks. The landscape has now changed. Big labs are now designing AI that interfaces with the real world. Infrastructure optimized for scale and efficiency won't get us to the next phase. What's needed now is infrastructure for grounding AI in operational contexts, real-world experiences, and continuous learning. | AI Applications and Vertical Integration (6 minute read) AI application companies are increasingly becoming "full-stack" by vertically integrating either downward into the model layer or upward into the service layer. Companies like Cursor and Intercom achieve differentiation and cost efficiency by developing proprietary models, while others, such as Crosby AI and WithCoverage, focus on delivering end-to-end services. As AI capabilities evolve, these strategies allow companies to enhance performance, reduce costs, and offer comprehensive solutions. | | Agent Labs: Workload-Harness Fit (14 minute read) Workloads vary by volume, value, verification property, time horizons, and other dimensions. This affects how agent labs focus their research efforts. The taxonomy of workloads governs which end markets justify training versus agent engineering. Labs also need to know what it actually costs to execute. | TimesFM (GitHub Repo) TimesFM is a pretrained time-series foundation model for time-series forecasting. The model is based on pretraining a patched-decoder style attention model on a large time-series corpus. It works well across different forecasting history lengths, prediction lengths, and temporal granularities. | Composer 2 Technical Report (22 minute read) Composer 2 introduced a two-stage training approach combining continued pretraining and reinforcement learning to improve long-horizon coding, achieving strong results on software engineering benchmarks. | | Plentiful, high-paying jobs in the age of AI (23 minute read) AI might not eliminate high-paying human jobs due to potential constraints like limited computing power and energy usage. These constraints could lead to the principle of comparative advantage, where humans remain employed in roles despite AI's superior capabilities, because the opportunity cost of allocating AI to all tasks would be too high. As AI advances, human roles could change, but new tasks and increased wealth might sustain or even increase compensation for human jobs. | Audit Claude Platform activity with the Compliance API (2 minute read) The Compliance API on the Claude Platform enables admins to audit logs, monitor user activities, and integrate data into existing compliance systems. It tracks admin and system activities, as well as resource activities like file creation or deletion. To access it, organizations should contact their account team and create an admin API key. | | 🚀 Transformers.js v4 (GitHub Repo) Transformers.js v4 features a new WebGPU Runtime that allows the same transformers.js code to be used across a wide variety of JavaScript environments. | | | Love TLDR? Tell your friends and get rewards! | | Share your referral link below with friends to get free TLDR swag! | | | | Track your referrals here. | | | |
0 Comments