We Let AI Agents Orchestrate Our ML Experiments (5 minute read)
Teads built a multi-agent system to autonomously orchestrate their entire ML experimentation lifecycle. Specialized agents handle idea generation, code writing, experiment execution, result analysis, and decision-making, reducing experiment cycles from days to hours, increasing meaningful experiments by 4.5x, and improving production model performance by 8–12%
|
Scaling Recommendation Systems with Request-Level Deduplication (9 minute read)
Pinterest Engineering introduced request-level deduplication to efficiently scale their recommendation systems by sorting data by user + request ID in Apache Iceberg for massive compression to process and store the request-level data only once per unique request, using a separated context transformer with KV caching in ranking, and applying targeted fixes like SyncBatchNorm and user-level masking during training.
|
Zero Downtime Upgrade: Yelp's Cassandra 4.x Upgrade Story (8 minute read)
Yelp upgraded over 1,000 Cassandra nodes from version 3.11 to 4.1 across multiple clusters with zero downtime using a careful rolling upgrade strategy with Kubernetes init containers, automated pre-flight/flight/post-flight stages, version-specific images, and strict monitoring during the mixed-version period. The upgrade delivered 21-60% latency improvements overall, faster streaming, better observability, new guardrails, and preparation for Cassandra 5.
|
|
How To Set-up Your Data Stack For 2026 – Data Infrastructure For AI (8 minute read)
Building a successful AI-ready data infrastructure starts with simplicity and strong fundamentals rather than chasing the latest AI hype. Instead, focus on solid ingestion tools, SQL-based transformations (such as dbt), choosing the right storage/compute layer (warehouse or lakehouse), and strong data quality, governance, and ownership.
|
Stop Treating AI Memory Like a Search Problem (22 minute read)
Reliable AI memory needs more than store-and-retrieve approaches: it must manage decay, contradiction, confidence, compression, and expiry. The proposed SQLite-based design keeps plain-text memories locally, then scores them based on importance, confidence, and decay, so outdated or weakly supported facts stop dominating retrieval. New memories can supersede older ones, expired items fade into an archive, and duplicate beliefs are merged into higher-signal summaries.
|
Power BI and Support for Third-Party Semantic Models (6 minute read)
Power BI doesn't properly support third-party semantic models mainly due to technical limitations around query behavior, aggregation, and architecture, not competitive intent. As a result, Microsoft recommends keeping all metrics and business logic within Power BI's own semantic model for reliability and performance.
|
|
Introducing the Common AI Provider: LLM and AI Agent Support for Apache Airflow (5 minute read)
Apache Airflow's new apache-airflow-providers-common-ai package adds native LLM and agent support with 6 operators and 20+ model providers, requiring Airflow 3.0+. It includes structured tasks like @task.llm, @task.agent, @task.llm_sql, file analysis, branching, and schema comparison, plus direct access to 350+ existing Airflow hooks as typed AI tools. Features built-in human approval flows, durable execution with step-level replay from object storage, and end-to-end token/tool observability.
|
KumoRFM-2: The Most Powerful Predictive Model, for Humans and Agents (6 minute read)
KumoRFM-2 is Kumo's relational foundation model for predictions that can reason directly on database tables, keys, and time history, without the usual feature-engineering pipeline. Kumo claims it beats supervised ML on common relational benchmarks in few-shot settings, pointing to a simpler way to turn warehouse data into predictive and agent-ready applications.
|
|
Managing context in long-run agentic applications (14 minute read)
Long-running agents quickly hit context window limits and suffer from "context rot" (losing important earlier information). Slack uses intelligent context pruning + summarization strategies, with periodic "reflection" steps where the agent reviews and condenses its own history, improving agent reliability and coherence over long time horizons.
|
Scaling Prometheus in 2026: The Complete Comparison Guide (4 minute read)
Prometheus-compatible long-term storage has matured into clear options: VictoriaMetrics for most teams needing 4-5x less RAM and low operational burden, Thanos for the lowest-friction migration from existing Prometheus, OpenObserve for full-stack observability at lower cost, GreptimeDB for unified SQL-first metrics/logs/traces, and Mimir for large enterprises with 500+ developers and dedicated SREs. The key decision factor is not just infrastructure cost but “Ops Tax”.
|
|
OpenDuck (GitHub Repo)
OpenDuck is an open-source system that brings MotherDuck-style cloud capabilities to DuckDB, enabling hybrid queries that run across local and remote data with transparent access.
|
|
Love TLDR? Tell your friends and get rewards! |
|
Share your referral link below with friends to get free TLDR swag!
|
|
|
| Track your referrals here. |
|
|
|
0 Comments