Latest

6/recent/ticker-posts

Header Ads Widget

ChatGPT tone control ๐Ÿค–, gaming the Metr plot ๐Ÿ“ˆ, Anthropic Bloom ๐ŸŒธ

OpenAI has introduced new personalization options in ChatGPT, letting users adjust enthusiasm, warmth, and emoji use directly ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌  ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 

TLDR

Together With Metronome

TLDR AI 2025-12-22

Retrofitting legacy systems for AI (Sponsor)

Many financial systems of record (like ERP) were built at a time when AI tools were the stuff of science fiction. Now, they need to work together.

On the latest episode of Unpack Pricing, Metronome CEO Scott Woody talks with Rillet CEO Nicolas Kopp about why old financial systems often have a "garbage in, garbage out" problem for modern companies running AI.

They get into why gen AI is famously bad with numbers (and where it actually helps), why 20 years of finance experience might now be a hiring red flag, and how one customer went from six-month pricing changes to two-hour launches. 

Listen now →

๐Ÿš€

Headlines & Launches

ChatGPT Adds Tone Personalization (1 minute read)

OpenAI has introduced new personalization options in ChatGPT, letting users adjust enthusiasm, warmth, and emoji use directly. These controls, available in the Personalization menu, offer "More," "Less," or "Default" settings, expanding tone customization beyond the existing base style and tone feature.
Cursor Acquires Graphite (2 minute read)

Cursor has acquired Graphite, a company known for its performance-focused internal developer portal. This marks Cursor's third acquisition as it aims to build a comprehensive AI-powered dev platform.
Introducing Bloom: an open source tool for automated behavioral evaluations (7 minute read)

Anthropic's Bloom is an open-source tool for generating automated behavioral evaluations of AI models. Bloom assesses specific behaviors like self-preferential bias and sabotage by creating scenarios and quantifying behavior occurrence across models. It efficiently differentiates between aligned and misaligned models and correlates strongly with human judgment, enabling scalable and reliable behavior evaluations.
๐Ÿง 

Deep Dives & Analysis

The changing drivers of LLM adoption (15 minute read)

LLM use is rising. People are increasingly using different LLMs, different products, and in different places. ChatGPT remains dominant and keeps acquiring new users, but Gemini's growth has been faster over the last few months. OpenAI's revenue seems to be on track, but consumer revenue is likely decreasing as a share. A substantial share of workplace AI use involves workers adopting tools on their own rather than waiting for employer-provided access.
Evaluating Context Compression for AI Agents (10 minute read)

What happens when agents run out of memory determines whether they continue productively or have to start from scratch. This post explores an evaluation framework that measures how much context different compression strategies preserve. Structured summarization retains more useful information than alternative methods without sacrificing compression efficiency.
Understanding AI Benchmarks (25 minute read)

Benchmarks are the most widely misunderstood part of the AI ecosystem. The narrative keeps implying a universal increase in intelligence, but the numbers can be misleading. To navigate this noise, look at the aggregate, look at the relative, and verify with your own tasks. The only benchmark that matters at the end of the day is your own workload.
Experiment Diary (3 minute read)

This document contains a diary for an experiment aimed at teaching an LLM using GRPO to generate regex given a description. It details the performance, learnings, modifications, and key takeaways from each experiment. The initial training run was on December 17. It saw the model quickly learning how to generate valid regex tags, but the model was basically generating random regex strings.
Andrej Karpathy's 2025 LLM Year in Review (6 minute read)

Andrej Karpathy has outlined paradigm shifts of LLMs in 2025, including fast inference engines, model distillation trends, real-time agents, neural GPUs, and the rise of high-quality open models like DeepSeek-V2 and RWKV.
๐Ÿง‘‍๐Ÿ’ป

Engineering & Research

Could public AI tools be leaking your sensitive data? (Sponsor)

One in three employees uses AI tools without approval, risking data leaks and compliance violations. With this Enterprise AI Governance Kit from You.com, you will get ready-to-use templates to help protect your organization, including security checklists, usage policies, and governance frameworks.

Download the kit.

Qwen-Image-Layered (GitHub Repo)

Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. Each layer can be independently manipulated without affecting other content. They can be resized, repositioned, and recolored. The approach enables high-fidelity and consistent editing.
Introducing MiMo-V2-Flash (10 minute read)

MiMo-V2-Flash is a powerful, efficient, and ultra-fast foundational language model that excels in reasoning, coding, and agentic scenarios. It serves as an excellent general-purpose assistant for everyday tasks. The model is available globally on Hugging Face, AI Studio, and Xiaomi's API platform. Benchmark results are available in the article.
jax-js (GitHub Repo)

jax-js is a machine learning framework for the browser. It brings JAX-style, high-performance CPU and GPU kernels to JavaScript, so users can run numerical applications on the web. The library is written from scratch and has no external dependencies. It can run anywhere a browser can run.
Multiplexing MCP Servers For Agentic Specialization (8 minute read)

MCP servers give agents the tools they need to accomplish tasks. This post discusses how to multiplex MCP servers to simplify the connection to various tools within them. Multiplexing allows multiple MCP servers to be used over a gateway in a single interaction. It allows agents to access multiple MCP servers with different stacks, clouds, applications, and frameworks for specialized tasks.
tcgen05 for dummies (70 minute read)

tcgen05 is the set of PTX instructions that programs Tensor Cores on the latest NVIDIA Blackwell GPUs. This post contains a tutorial for Blackwell in plain CUDA C++ with PTX. It documents the author's process of learning tcgen05 and reaching 98% of CuBLAS speed. Readers can follow the tutorial using Modal or any other B200 cloud providers.
๐ŸŽ

Miscellaneous

How to game the METR plot (9 minute read)

METR topics are public, making it easy to game METR horizon length measurements for a frontier lab. The horizon length under METR's assumptions might be adding little information beyond benchmark accuracy. There is a meme going around based on a team achieving a one to four hour range on the METR plot. This post explains why the plot has been interpreted incorrectly.
The Shape of AI: Jaggedness, Bottlenecks, and Salients (11 minute read)

AI is incredibly good at some tasks while being really bad at others. This is the 'Jagged Frontier' of AI ability. The jaggedness is likely going to remain a big part of AIs going forward. However, the growing frontier will outpace jaggedness.

Quick Links

SoftBank races to fulfill $22.5 billion funding commitment to OpenAI by year-end (5 minute read)

Masayoshi Son has already sold SoftBank's entire $5.8 billion stake in Nvidia, offloaded $4.8 billion of its T-Mobile US stake, and slashed staff to come up with the money.
How can Flash beat Pro (1 minute read)

A lot of research progress on agentic reinforcement learning made its way into Gemini 3 Flash, but it was too late for Pro.
The dawn of a world simulator (8 minute read)

World simulators are models trained to predict how the world evolves over time, frame-by-frame, using large amounts of video and interaction data.
Moore Threads unveils next-gen gaming GPU with 15x performance and 50x ray tracing improvement (4 minute read)

Chinese GPU maker Moore Threads' Huagang architecture promises significant performance gains in gaming and AI.
AI-Driven Factories Set to Reshape US Industry in 2026 (1 minute read)

In 2026, US companies will apply a factory mindset to energy, mining, construction, and manufacturing by pairing AI and autonomy with skilled labor to standardize complex work.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!
Track your referrals here.

Want to advertise in TLDR? ๐Ÿ“ฐ

If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? ๐Ÿ’ผ

Apply here, create your own role or send a friend's resume to jobs@tldr.tech and get $1k if we hire them! TLDR is one of Inc.'s Best Bootstrapped businesses of 2025.

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Andrew Tan, Ali Aminian, & Jacob Turner


Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR AI isn't for you, please unsubscribe.

Post a Comment

0 Comments