Latest

6/recent/ticker-posts

Header Ads Widget

Claude Mythos leaks 🤖, last xAI cofounder exits 👋, lessons from OpenAI 💡

'Mythos' is the name for a new tier of Anthropic models that are larger and more intelligent than Opus. The models get dramatically higher scores ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌  ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 

TLDR

Together With blackduck

TLDR AI 2026-03-30

Black Duck Signal: Agentic AppSec built for AI-native development (Sponsor)

Black Duck Signal combines LLM-powered code analysis with 20+ years of human‑vetted security intelligence to autonomously identify, prioritize, and fix vulnerabilities in AI‑generated code:

>> Analyze new code instantly and fix issues before they're committed.

>> Run security scans with natural language prompts in coding assistants and IDEs.

>> Get fast, accurate results for any programming language - new or old.

See it in action and request a demo

🚀

Headlines & Launches

Claude Mythos (3 minute read)

'Mythos' is the name for a new tier of Anthropic models that are larger and more intelligent than Opus. The models get dramatically higher scores on tests of software coding, academic reasoning, and cybersecurity compared to Claude Opus 4.6. Mythos is a large, compute-intensive model that is very expensive to use and serve. Anthropic is working on making the model much more efficient before any general release.
Meta tests Avocado 9B, Avocado Mango Agent, and more (2 minute read)

Meta's Avocado model has been pushed back to at least May as it still falls short of leading systems from competitors. The company appears to be running parallel experiments with multiple Avocado variants. The model appears to be able to solve complex math problems that earlier Llama models could not, but these problems have already been solved by other labs months earlier. Meta's AI leadership has reportedly discussed temporarily licensing Google's Gemini technology. Some requests within Meta AI are already being routed through Gemini models.
Anthropic's Claude popularity with paying consumers is skyrocketing (4 minute read)

Anthropic is more popular with customers than ever. Claude is gaining paid subscribers in record numbers. Paid subscriptions have more than doubled this year. The majority of new subscribers were in the lowest tier. OpenAI is still gaining new paid subscribers at a rapid rate and remains the biggest consumer AI platform.
🧠

Deep Dives & Analysis

Function Calling Harness: From 6.75% to 100% (32 minute read)

AutoBe is an open-source AI agent that takes a single natural language conversation and generates a complete backend. qwen3-coder-next has a 6.75% function calling success rate when asked to generate API data types for a shopping mall backend. AutoBe boosts that success rate up to over 99.8%. It uses a harness where type schemas constrain outputs, compilers verify results, and structure feedback pinpoints compactly where and why something went wrong so the agent can correct itself. This post dissects the engineering behind AutoBe.
AI's capability improvements haven't come from it getting less affordable (12 minute read)

AI's capability improvements at the frontier have not led to increased inference costs relative to human labor. Despite rising per-task inference costs, current models achieve tasks at roughly 3% of human costs without any upward trend in median cost ratios. Models can continue advancing even under strict cost constraints, enabling profitable automation with AI cost ratios remaining well below human levels.
The Capability Overhang in AI (4 minute read)

Coding agents outperform other domains because codebases provide a self-contained environment of critical context, unlike fragmented knowledge work spread across video calls and legacy systems. Enterprise adoption remains stalled by the three hard problems of context fragmentation, complex access control, and a rapidly shifting architecture landscape.
🧑‍💻

Engineering & Research

Schedule tasks on the web (5 minute read)

Claude Code on the web users can now schedule tasks. The tasks will run on Anthropic-managed infrastructure, so they will keep working even if users turn off their devices. Scheduled tasks are available to all Claude Code on the web users. Example tasks include reviewing open pull requests each morning, analyzing CI failures overnight and surfacing summaries, syncing documentation after PRs merge, and running dependency audits every week.
lat.md (GitHub Repo)

lat.md is a spec that agents keep in sync with the code base that helps them understand big ideas and key business logic. It ensures that corner cases have proper high-level tests that matter and can speed up coding by saving agents from endless grepping. The spec uses plain Markdown, with Wiki links connecting concepts into a navigable graph.
What Pretext Reinforced About AI Loops (5 minute read)

Pretext is a fast, accurate, comprehensive text measurement algorithm that can lay out web pages without leaning on DOM measurement and reflow. It was created using AI agent workflows. The particular loop that was used in developing the tool (constrain -> measure -> isolate -> classify -> test -> reject -> keep only what survives broad pressure) made the engineering rigorous. This article analyzes the loop to see what makes it so successful.
🎁

Miscellaneous

xAI's Last Cofounder Leaves (3 minute read)

All remaining co-founders of xAI reportedly departed, marking the complete exit of the original founding team.
Things I learned at OpenAI (7 minute read)

OpenAI alumni emphasize the significance of creating effective evaluations and benchmarks, noting that the best benchmarks drive collective optimization efforts. Post-training data design and model alignment are critical for unlocking new AI capabilities, particularly in subjective attributes like empathy or creativity. Fast iteration, choosing the right problems, and leveraging internal tooling are key competitive advantages in AI research.

Quick Links

Clerk Core 3: auth your agents can set up themselves (Sponsor)

Keyless mode now works with TanStack Start, Astro, and React Router. No account or configuration required to get started. Redesigned hooks, faster token fetching, and a codemod CLI to handle the upgrade.
Live Translate Comes to Headphones on iOS (4 minute read)

Google rolled out real-time translation through headphones on iOS, expanding support to more countries and 70+ languages while preserving speaker tone and cadence.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!
Track your referrals here.

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? 💼

Apply here, create your own role or send a friend's resume to jobs@tldr.tech and get $1k if we hire them! TLDR is one of Inc.'s Best Bootstrapped businesses of 2025.

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Andrew Tan, Ali Aminian, & Jacob Turner


Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR AI isn't for you, please unsubscribe.

Post a Comment

0 Comments