Latest

6/recent/ticker-posts

Header Ads Widget

Pinterest Unveils Moka πŸš€, Self‑Serve BI, Finally πŸŽ‰, Dashboards as Code πŸ’»

Pinterest introduced Moka, its new Spark on EKS platform, to address the growing need for scalable and efficient processing of its massive data ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌  ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 

TLDR

TLDR Data 2025-07-14

πŸ“±

Deep Dives

Next Gen Data Processing at Massive Scale at Pinterest With Moka (16 minute read)

Pinterest introduced Moka, its new Spark on EKS platform, to address the growing need for scalable and efficient processing of the company's massive data. It designed MoKa to handle batch Spark workloads on non-sensitive data, choosing Kubernetes (EKS) for flexibility and resource efficiency.
No More Disks: The Architecture Behind Stateless Compute in ClickHouse Cloud (23 minute read)

ClickHouse Cloud now uses a fully stateless compute architecture, enabled by a new in-memory database engine that stores all metadata centrally in a Shared Catalog rather than on local disks. This design allows compute nodes to quickly access the latest state at startup and supports stateless compute not only for ClickHouse's native format, but also for open table formats like Iceberg and Delta Lake.
Netflix Tudum Architecture: from CQRS with Kafka to CQRS with RAW Hollow (7 minute read)

Netflix switched Tudum's data system from using Kafka event streams to using Raw Hollow objects, which send up-to-date data snapshots directly to client devices. This change made data updates faster and simpler, removing the need for devices to store or replay old event data.
πŸš€

Opinions & Advice

Has Self-Serve BI Finally Arrived Thanks to AI? (18 minute read)

The MCP integration enables users to interact with BI data through a conversational interface, providing real-time answers without the need to manually navigate dashboards. By querying source data directly and supplementing responses with expert validation and visualizations, GenBI reduces the risk of errors or hallucinated results.
MCP and the reshaping of data visualisation & business intelligence (6 minute read)

Model Context Protocol (MCP) is an open standard that allows AI systems like Claude to directly connect to diverse data sources, from PDFs to databases to BI tools, without custom integrations. This could reshape the role of BI professionals, as executives might bypass traditional workflows to self-serve insights via AI. While the human edge in storytelling, governance, and complex analysis still holds, data teams should proactively engage with MCP to stay relevant as automation advances.
There is No Golden Path Anymore: Engineering Practices are Being Rewritten (36 minute podcast)

Ben Matthews from Stack Overflow and LoΓ―c Houssier from Superhuman discuss strategies for engineering teams to navigate rapid technological change, emphasizing strong leadership and aligned autonomy to empower teams and increase organizational velocity. AI is transforming workflows at Superhuman, from improving onboarding and streamlining work to reviving stalled projects.
πŸ’»

Launches & Tools

Rill (GitHub Repo)

Rill lets you quickly build ultra-responsive dashboards from your data lake by co-locating an embedded, in-memory DuckDB engine with a SvelteKit front end, enabling sub-second SQL queries, live profiling, and "dashboards as code" with Git versioning. It auto-profiles datasets on each keystroke, offers opinionated default visuals, and imports Parquet/CSV from S3, GCS, HTTP, or local files. This SQL-first, self-hosted BI tool slashes latency for exploratory analysis and embeds governance via project files.
Kompute (GitHub Repo)

This repository offers a general-purpose GPU compute framework built on Vulkan, designed for high-performance data processing across a range of graphics cards, including AMD, Qualcomm, and NVIDIA. Key features include asynchronous processing, mobile support, and optimization for advanced GPU data tasks, making it highly relevant for data engineers working on GPU-accelerated applications.
Sail 0.3: Long Live Spark (5 minute read)

Sail 0.3 enhances Spark compatibility with a Rust-native execution engine. Supporting both Spark 4.0 and 3.5 while improving performance and reducing latency in cloud-native storage, it introduces a lightweight PySpark client, allows flexible installation options, and automatically adjusts runtime behavior based on the installed Spark version, ensuring seamless integration and efficiency for data engineers.
Announcing Lakebase Public Preview (7 minute read)

Lakebase is a fully managed Postgres database built for AI and analytics on Databricks. It combines transactional and analytical workloads in one platform, supports serverless scaling and instant branching, and integrates with Unity Catalog for governance. This enables faster development of intelligent data apps without managing infrastructure.
🎁

Miscellaneous

SaaS 2.0 (12 minute read)

Traditional SaaS like Salesforce packages generic "opinionated lists" and rigid workflows that often misalign with a team's unique processes. By contrast, a "specialist-and-a-spreadsheet" model uses AI agents prompted with explicit sales playbooks or matchmaking expertise to dynamically manage lists, ask clarifying questions, and apply nuanced rules on the fly. This flips the paradigm from buying monolithic software to consuming bespoke expertise at scale, promising highly tailored workflows without custom development or cumbersome UIs.
GraphRAG-powered AI Agent interfaces: Real-world applications in incident and change management (18 minute read)

GraphRAG, which combines structured knowledge graphs with retrieval-augmented generation, delivers significant gains in incident and change management compared to traditional RAG by surfacing actionable, context-rich insights. Prototype evaluations on real-world ICM datasets demonstrate that GraphRAG consistently outperforms unstructured and flat vector-based retrieval, especially under noisy or incomplete data. The dual-mode framework presented enables both automated dashboards and real-time AI agent interfaces, supporting human-in-the-loop and autonomous workflows.

Quick Links

No Code Is Dead (15 minute read)

Generative AI is overtaking traditional no-code platforms by enabling rapid app creation via natural language, but without proper controls, this accelerates technical debt and creates unmaintainable code, so industry leaders advocate hybrid models (combining AI-driven automation with visual or low-code environments) to ensure readability, governance, and scalability.
The Crucial Role of NUMA Awareness in High-Performance Deep Learning (6 minute read)

Binding processes to NUMA nodes can double PyTorch throughput on multi-socket servers.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!
Track your referrals here.

Want to advertise in TLDR? πŸ“°

If your company is interested in reaching an audience of data engineering professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? πŸ’Ό

Apply here or send a friend's resume to jobs@tldr.tech and get $1k if we hire them!

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Joel Van Veluwen, Tzu-Ruey Ching & Remi Turpaud


Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR Data isn't for you, please unsubscribe.

Post a Comment

0 Comments