How AWS S3 is built (78 minute podcast) Amazon S3 processes hundreds of millions of transactions per second, manages over 500 trillion objects, and operates across hundreds of exabytes with 11 nines of durability, achieved through auditor microservices and automated repair systems. Recent architectural advances include a near-total Rust rewrite for core pathways, rigorous formal methods for correctness, and rollout of new primitives like S3 Vectors supporting 20 trillion vectors per bucket with sub-100ms queries. S3's design emphasizes simplicity at scale, crash consistency, proactive defense against correlated failures, and engineering practices where increased scale enhances reliability and performance. | High-Risk, High-Scale: Guaranteeing Ad Budget Precision at 1 Million Events/Second (5 minute read) Flipkart Ads processes over 1 million events per second with a horizontally scalable, stateful stream-processing architecture that prioritizes rapid spend enforcement while ensuring strict budget control. The system employs Flink-managed state for distributed deduplication and key-based idempotency, watermarking to manage temporal skew due to delayed mobile events, and a Lambda architecture that separates sub-second enforcement from batch reconciliation for financial accuracy. | | Path forward for Data Governance: Existence Over Essence (8 minute read) Traditional compliance and documentation models fail as automation and autonomous agents scale, making embedded, computable governance essential for observable, auditable, and context-sensitive oversight. Data governance must evolve from rigid, essence-based frameworks to a dynamic, socio-technical discipline that enables real-time negotiation of meaning, strategic direction, and continuous accountability within distributed, AI-driven environments. | The Desperate Need For An "Agent Contract" (6 minute read) The AI Agent Contract is a framework that establishes explicit, enforceable agreements between data producers, engineers, and AI agents to ensure reliable inputs, consistent definitions, and controlled changes, preventing common failures from schema drifts, tool breakage, or mismatched expectations. | | pynb (Tool) pynb is a macOS tool for running Python notebooks without kernels or environment setup, using plain .py files that are git friendly and scalable to large datasets. It supports SQL alongside Python, works with your existing ChatGPT subscription and agents, and keeps all data local. | Benchmarking 1B Vectors with Low Latency and High Throughput (5 minute read) ScyllaDB Vector Search achieves industry-leading p99 latencies as low as 1.7 ms and throughput up to 252,000 QPS on billion-scale datasets, demonstrated using the yandex-deep_1b benchmark with 96-dimensional vectors. The architecture co-locates structured and vector data, supporting hybrid queries. Now generally available, upcoming enhancements include native filtering, quantization, and optimized hybrid retrieval. | Feast Joins the PyTorch Ecosystem: Bridging Feature Stores and Deep Learning (5 minute read) Feast, an open-source feature store, has joined the PyTorch Ecosystem, taming the training-serving skew by ensuring models receive identical feature transformations in both development and production. With unified APIs, point-in-time joins, support for Spark/Snowflake/Flink, OpenTelemetry observability, and RBAC-based governance, Feast empowers teams to maintain data consistency and lineage for large-scale AI deployments. | Sirius (GitHub Repo) Sirius is a GPU-native SQL engine that plugs into existing query engines (currently DuckDB, with Doris "coming soon") via Substrait, so teams can offload execution to GPUs without rewriting SQL or rebuilding the whole platform. Built on NVIDIA CUDA-X, it reports ~10x TPC-H (on SF=100) speedup at similar on-demand cost. Today, it accelerates a limited operator/type set (joins, group-bys, etc.; common scalar types) and falls back to CPU for unsupported features. | | Optimizing Data Transfer in Distributed AI/ML Training Workloads (15 minute read) Profiling GPU-to-GPU data transfer in distributed training reveals stark differences in performance between PCIe and NVLink interconnects. NVLink impacts throughput by only 8% versus over 6x slowdown for PCIe. Techniques like gradient compression, optimized memory usage, and parallelized reduction drive substantial performance and cost gains, illustrating the value of regular profiling with NVIDIA Nsight™ Systems for all AI/ML teams. | | | Love TLDR? Tell your friends and get rewards! | | Share your referral link below with friends to get free TLDR swag! | | | | Track your referrals here. | | | |
0 Comments