Latest

6/recent/ticker-posts

Header Ads Widget

Query Planning Slowdown 🐢, Airbnb’s Data Mesh 🧩, Ontology-Driven Policies 🧬

Cloudflare’s shift to per-tenant retention in a massive ClickHouse “Ready-Analytics” table exposed an unexpected scaling limit ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌  ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 

TLDR

Together With Fivetran

TLDR Data 2026-05-18

5 of 6 companies lack the data foundation for agentic AI. They're spending $$$ anyway (Sponsor)

AI agents are stuck in pilot, and data is to blame. Yet most orgs are investing 7-8 figures in agentic projects anyway. 

Fivetran's agentic AI readiness index shows why most companies aren't realizing the full value of AI. Read it to learn why:

  • Only 15% of teams are prepared for agentic AI at scale
  • Governance and compliance issues are stalling AI projects
  • Open Data Infrastructure is emerging as the new agentic standard

If you're trying to deliver autonomous AI systems, start with the foundation. Try Fivetran free

📱

Deep Dives

Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse (9 minute read)

Cloudflare's shift to per-tenant retention in a massive ClickHouse “Ready-Analytics” table exposed an unexpected scaling limit: query planning, not I/O or scan volume, became the bottleneck as parts per replica grew. Tracing showed 45% of leaf query CPU time in part filtering. Switching to a shared lock and then a shared-read cache removed most of the contention and cut query latency sharply.
Viaduct 1.0 and the Future of Airbnb's Data Mesh (5 minute read)

Viaduct 1.0 is Airbnb's open-source data-oriented service mesh built on GraphQL. It provides a single unified schema for accessing any data source across the company while enabling decentralized development through multi-tenant modules as teams contribute their own schema and resolvers without operating separate GraphQL services, striking a balance between a monolithic GraphQL server and full federation.
AWS Outage May 2026: Lessons for Database Disaster Recovery (10 minute read)

A major AWS US-EAST-1 outage in May was triggered by a data center overheating event in a single availability zone, causing multi-hour disruptions for high-profile services like Coinbase. The incident highlighted the critical difference between Multi-AZ high availability (which failed to protect latency-sensitive workloads) and true cross-region disaster recovery.
🚀

Opinions & Advice

Exploring schema evolution with ontology-driven propagation (4 minute read)

A plain-English ontology can act as a runtime access policy that survives schema evolution, letting an LLM classify columns column-by-column using row counts, cardinality ratios, and sampled values. The approach keeps policy separate from pipeline code, but it does not cover numeric sensitive inferences or cross-column re-identification.
The Modern Data Stack is Overcomplicated: Data Ingestion (17 minute read)

Data ingestion looks simple, but the wrong choice can create hidden costs through broken connectors, schema drift, over-engineering, and wasted engineering time. The best approach is usually a hybrid: managed connectors for standard SaaS, streaming only when low latency truly matters, and custom pipelines for niche or legacy sources.
Welcome to ORDER BY Jungle (11 minute read)

PostgreSQL resolves column names and expressions in ORDER BY clauses in inconsistent ways. For example, bare identifiers (e.g. ORDER BY a) first look for aliases in the SELECT list, while any expression (e.g. ORDER BY -a) resolves against the FROM clause, leading to confusing behaviors with aliases, quoting, GROUP BY, window functions, and UNION.
💻

Launches & Tools

A Data Layer That Won't Make You Wait (Sponsor)

You can spend your whole morning waiting for that data to land. Or, you can use a data layer that won't make you wait. That's Lakebase. Learn how Lakebase's fully-managed Postgres database can help you spin up ideas fast, and run agents and apps on one platform.
ducklake-sdk (GitHub Repo)

ducklake-sdk is an alpha Rust/Python SDK for reading and writing DuckLake tables without running DuckDB. It implements the DuckLake spec in a Rust core, with Python integrations for Polars, Arrow, and DuckDB, targeting SQL-catalog metadata plus Parquet storage. Useful for embedding DuckLake access into apps, pipelines, or engines directly.
MinIO's MemKV promises 95% better GPU utilization by ending AI recompute tax (5 minute read)

MemKV is a petabyte-scale context memory store for AI inference designed to preserve and share session state across GPU clusters. By moving context directly from NVMe into the AI data path over 800 GbE RDMA, it targets the “recompute tax” and claims 95%+ better GPU utilization and about 50% lower cost per token on benchmark workloads.
Apache Arrow as Data Interchange (5 minute read)

Apache Arrow is rapidly becoming the universal in-memory columnar format for data interchange across the modern data stack. Instead of repeatedly serializing, deserializing, and copying data between tools (Pandas → Spark → databases, etc.), Arrow enables zero-copy handoff, where systems share the exact same memory layout, dramatically reducing CPU overhead.
What Matters in Production RAG (8 minute read)

Key requirements for production RAG include smart chunking strategies (recursive, semantic, and structure-aware), robust indexing pipelines with document registries, content hashing for efficient updates, alias-based zero-downtime index switching, careful embedding model management, and strong observability with detailed tracing, chunk attribution, and retrieval quality metrics.
🎁

Miscellaneous

Your AI agent deletes critical data: Who is responsible? (5 minute read)

AI agents that can write to production systems create a new accountability and recovery problem: a Replit agent once deleted a live database, and the real issue was the absence of clear ownership, guardrails, and rollback. With 86% of IT/security leaders expecting agents to outrun current controls, governance is a shared responsibility across architecture, security, legal, and business. Practical controls like policy boundaries, observability, human-in-the-loop triage, and explicit recovery mechanisms are essential to prevent autonomous tools from becoming enterprise-wide risk.
Context pruning: cut LLM tokens without losing quality (9 minute read)

Context Pruning is the practice of selectively removing low-value tokens, sentences, or passages from an LLM's input to reduce cost, latency, and often improve output quality. It includes techniques such as token-level, sentence/chunk-level, attention-based, and dynamic layer-progressive pruning, and works best when paired with semantic caching.

Quick Links

What Leading a Data Team Actually Looks Like Right Now (7 minute read)

Data leaders still face the same core challenges despite the AI hype: proving business value, managing stakeholder politics, preventing dashboard/model/tool sprawl, and saying no to low-value requests.
How Agents Use Systems Differently (15 minute read)

Agents use software differently than humans, so infrastructure needs to be redesigned around snapshots, branching, elastic scale, high concurrency, isolation, and cheap experimentation.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!
Track your referrals here.

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of data engineering professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? 💼

Apply here, create your own role or send a friend's resume to jobs@tldr.tech and get $1k if we hire them! TLDR is one of Inc.'s Best Bootstrapped businesses of 2025.

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Joel Van Veluwen, Tzu-Ruey Ching & Remi Turpaud


Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR Data isn't for you, please unsubscribe.

Post a Comment

0 Comments