TLDR AI 2025-04-15

Delve goes viral on X for pulling all-nighters to ship autopilot for SOC 2 (Sponsor)

Delve officially launched Computer Use Agents that allow founders and GRC teams to go auto-capture all screenshots for SOC 2.

And customers are using it to achieve incredible results:

Lovable got fully SOC 2 compliant in less than 20 hours
11x ditched their old compliance platform, saved 143 hours on SOC 2, and unlocked $1.2M ARR
Bland AI got SOC 2 and unlocked $500k ARR within 7 days

If you want to ditch your old platform, they'll even migrate you off for FREE.

Book a demo here for $2000 off compliance in April!

(PS: TLDR readers get free custom Arc'teryx jackets)

🚀

Headlines & Launches

OpenAI GPT-4.1 (12 minute read)

OpenAI has launched three new models in its API: GPT‑4.1, GPT‑4.1 mini, and GPT‑4.1 nano. These models outperform GPT‑4o and GPT‑4o mini across the board, with major gains in coding and instruction following. They also have larger context windows—supporting up to 1 million tokens of context—and are able to better use that context with improved long-context comprehension. They feature a refreshed knowledge cutoff of June 2024.

Hugging Face Acquires Pollen Robotics (4 minute read)

Hugging Face, the center of the open source AI community, has long stated its goal is to be a decentralized DeepMind. While this isn't exactly the case, adding in an open source robotics platform via Pollen moves it closer to that goal.

DolphinGemma (6 minute read)

DeepMind has announced DolphinGemma, a large language model developed by Google that helps scientists study how dolphins communicate — and hopefully find out what they're saying, too.

🧠

Research & Innovation

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model (21 minute read)

The ByteDance team has released a paper showing how to train a competitive 7B parameter video generation model on a "modest" compute budget of 655k H100 hours. It has strong performance on a number of temporally difficult tasks.

PixelFlow: Pixel-Space Generative Models with Flow (30 minute read)

Most generative models on continuous signals operate in latent space due to computational constraints. This work introduces a series of cascades that allow the generation to happen directly in pixel space. This eliminates the need for a pretrained VAE.

InteractVLM: 3D Interaction Reasoning from 2D Foundational Models (24 minute read)

New VLM that can reason about contacts between humans in 3D and objects. It does so by leveraging a strong base model and lifting its reasoning into 3D with clever multi-view rendering.

🧑‍💻

Engineering & Resources

3B parameter tokenizer (GitHub Repo)

Scaling up image tokenizers is challenging because they tend to collapse. This work introduces GigaTok, which is a massive tokenizer with superior reconstruction performance. Decoder scaling and regularization helped with stability and overall quality.

Improved MoE with C3PO (GitHub Repo)

C3PO introduces a new test-time optimization technique that improves accuracy in Mixture-of-Experts LLMs by re-mixing expert weights based on similar reference samples.

Visual Reasoning with Less Data (16 minute read)

Using MCTS to quantify sample difficulty, ThinkLite-VL improves reasoning in VLMs with just 11k training samples and no distillation.

🎁

Miscellaneous

BrowseComp Benchmark for Hard-to-Find Knowledge (9 minute read)

OpenAI's BrowseComp is a new benchmark of 1,266 problems designed to evaluate AI agents' browsing skills in gathering complex, hard-to-locate information online.

Business Leaders' Thoughts on AI Possibilities (6 minute read)

Executives from nine companies share how they're leveraging Google Cloud's AI tools to drive innovation across sectors, with over 600 real-world use cases highlighted.

6 highlights from Google Cloud Next 25 (2 minute read)

Vertex AI introduces updates to video, image, speech, and music generation models, enhancing creative workflows for businesses. Google AI is enabling specialized AI agents for companies, improving productivity and security. A new Agent2Agent Protocol allows different AI agents to securely communicate across platforms.

⚡

Quick Links

DeepSeek to Open Source its Inference Engine (2 minute read)

DeepSeek's inference engine is built on VLLM, although it is now heavily modified.

NVIDIA to Manufacture AI Supercomputers in the U.S. (12 minute read)

NVIDIA is localizing AI hardware production by building factories in Texas and Arizona, aiming to produce Blackwell chips and AI supercomputers entirely within the U.S.

Gemini Adds Question Generation to Google Classroom (1 minute read)

Educators can now use Gemini to generate questions or quizzes from selected text in Google Classroom, enhancing lesson interactivity and streamlining content creation.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!

https://refer.tldr.tech/0b6a6dc1/2

Track your referrals here.

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? 💼

Apply here or send a friend's resume to jobs@tldr.tech and get $1k if we hire them!

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Andrew Tan, Ali Aminian & Andrew Carr

Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR AI isn't for you, please unsubscribe.