Your Cart Has a Story. Here's How We Learned to Read It (7 minute read)
Zepto built a Cart Contextual Model that treats shopping carts as “sentences” and uses a Transformer-based masked language model (MLM) to infer user intent in real time as items are added. By training on historical cart patterns with temporal, geographical, and product signals plus inverse-frequency masking to handle long-tail items, the model predicts what else the user will likely buy.
|
Vector Search in Manticore Search: A Deep Dive (28 minute read)
Manticore Search argues vector search should be tuned like a production retrieval system, not treated as a default embedding feature. It recommends aligning similarity metrics with models, tuning HNSW for recall, latency, and memory, and using batching, chunk optimization, and physical backups to keep indexes consistent.
|
|
The Rise of Multi-Query Engines (7 minute read)
AI agents are creating more small, bursty data queries, making single-warehouse costs harder to manage. Multi-engine routing cuts cost by sending each query to the best engine while keeping familiar workflows.
|
Debunking 8 data layout myths: why Liquid Clustering outperforms partitioning (11 minute read)
Databricks debunks 8 common myths about data layout, arguing that Liquid Clustering is superior to traditional Hive-style partitioning for modern lakehouses. Unlike rigid partitioning, Liquid Clustering dynamically organizes data using clustering keys that evolve over time, supports row-level concurrency, metadata-only operations, and works seamlessly across open table formats.
|
|
dbt Core v2 is here: still open source, now rebuilt for what's next (9 minute read)
dbt Core v2.0 alpha makes the Fusion engine's Rust-based runtime open source under Apache 2.0, unifying Core and Fusion around a shared foundation with faster parsing, Parquet artifacts, better local docs, simpler installs, and a tighter language spec. Fusion remains the recommended free CLI for most users, while Core v2 serves teams that need fully open source code or custom OSS builds.
|
ingestr (GitHub Repo)
ingestr is a CLI ELT tool for moving data from many databases and SaaS apps into warehouses or storage with simple flags, no backend or custom code required. It supports incremental loads, easy install, and broad connector coverage.
|
Diving deep into Redis's new array data type (25 minute read)
Redis Array is a brand-new native data type (introduced in Redis 8.8) designed for constant-time positional access by index, filling a long-standing gap in Redis where position/index itself carries semantic meaning. It efficiently supports both dense and extremely sparse arrays using a hierarchical group-based structure, allowing fast random access, range queries, ring-buffer semantics, pattern matching across sparse data, and fixed memory usage.
|
Routing Multiple Query Engines with Iceberg (18 minute read)
QueryFlux is an open-source Rust-based SQL routing proxy that intelligently directs queries across multiple query engines (Trino, Spark, DuckDB, Snowflake, Athena, Flink, etc.), sharing the same Iceberg tables. It handles protocol translation, dialect conversion via SQLGlot, cost-aware routing, concurrency control, and health-based failover.
|
|
MongoDB and Stored Procedures (10 minute read)
MongoDB can run low-latency transactional logic without stored procedures by combining ACID transactions, bulkWrite, validation, indexes, and pipeline updates. This is demonstrated through an example that processes payments with card checks, vendor checks, limits, duplicate prevention, and ledger writes.
|
|
Pluto 1.0 Release (12 minute read)
Pluto 1.0 marks the Julia notebook environment as stable, with major improvements to reproducibility, reactivity, sharing, accessibility, education, docs, and editor tools.
|
|
Love TLDR? Tell your friends and get rewards! |
|
Share your referral link below with friends to get free TLDR swag!
|
|
|
| Track your referrals here. |
|
|
|
0 Comments