Latest

6/recent/ticker-posts

Header Ads Widget

AWS Resilience Hub 🏢, Reliability Metrics at Scale ⚖️, Multi Cloud AI ☁️

AWS has launched the next generation of Resilience Hub, introducing an organization-wide system that helps Site Reliability Engineers ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌  ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 

TLDR

Together With Redwood

TLDR DevOps 2026-05-29

Workload automation shouldn't need its own infrastructure team (Sponsor)

Redwood is a Gartner SOAP Leader for two years running - and RunMyJobs by Redwood is why.

✅ Agentless - no agents, VMs, or databases to deploy or maintain

✅ 99.95% uptime SLA - we manage the infrastructure, you just use it

✅ Trusted by 50% of the Fortune 50

With RunMyJobs, you can:

>> Orchestrate applications, data and infrastructure across cloud, on-prem and hybrid - with 83+ native connectors, no agents required

>> Build automations faster with AI embedded across the automation lifecycle - scripts, documentation, and workflows

>> Stay ahead of failures with predictive SLA monitoring and advanced observability

Get a demo today and eliminate infrastructure overhead

📱

News & Trends

Introducing the next generation of AWS Resilience Hub for generative AI-based SRE resilience journey (4 minute read)

AWS has launched the next generation of Resilience Hub, introducing an organization-wide system that helps Site Reliability Engineers set consistent resilience goals across hundreds of applications using AI-powered failure mode analysis, dependency discovery, and modular policies with targets like 99.95% availability SLOs. The service is now generally available in AWS commercial regions with a new service-based pricing model that includes two free failure mode assessments per month, and it integrates with AWS Organizations to let teams evaluate resilience from a single delegated administrator account.
Announcing Rust 1.96.0 (3 minute read)

Rust 1.96.0 stabilizes new core::range types that implement IntoIterator instead of Iterator, allowing range values to be Copy and making them easier to store inside lightweight structs like spans and slice accessors. The release also adds assert_matches! and debug_assert_matches! for pattern-based assertions with better failure output, tightens WebAssembly linking by treating undefined symbols as errors by default, and fixes two Cargo vulnerabilities affecting third-party registries while leaving crates.io users unaffected.
🚀

Opinions & Tutorials

How ACR Artifact Cache Handles Multi-Arch Images: What Gets Cached and When Webhooks Fire (9 minute read)

Azure Container Registry Artifact Cache stores the full manifest list but only the requested architecture manifest, triggering asynchronous copy where subsequent pulls stop proxying to upstream once complete. A single-platform multi-arch pull emits three push webhooks, and the completion push event indicates local caching and storage charge initiation.
ISO 27001 on AWS: Building Compliance Into the Architecture (7 minute read)

An ISO 27001 certification effort at a Terraform-first AWS startup required turning infrastructure, access control, encryption, monitoring, and vulnerability management into code so audit evidence could be generated directly from Git and production systems. Compliance shifted from documentation to embedded engineering practices, with Security Hub metrics and automated pipelines used as measurable proof of control effectiveness.
🧑‍💻

Resources & Tools

Why "we opened the PR" doesn't mean the work is done (Sponsor)

We talked with 100+ platform and DevEx teams and wrote a guide on the best practices for automating maintenance, KTLO, and tech debt work at scale.

Read the guide

See the 10 anti-patterns to avoid

Crawl4AI (GitHub Repo)

Crawl4AI, the most-starred web crawler on GitHub with over 50,000 stars, released version 0.8.6 featuring a critical security hotfix that replaces a compromised dependency and urges users on v0.8.5 to upgrade immediately. The open-source tool converts web content into LLM-ready Markdown and recently launched a sponsorship program targeting its first 50 founding sponsors while offering early access to a new cost-effective large-scale web extraction platform.
MarkItDown (GitHub Repo)

Microsoft has released MarkItDown, an open-source Python utility that converts various file formats (including PDF, Word, PowerPoint, and Excel) into Markdown for use with large language models and text analysis. The tool requires Python 3.10 or higher and focuses on preserving document structure like headings, lists, and tables while being token-efficient, with optional features including OCR support through plugins and integration with Azure's Content Understanding service for higher-quality conversions.
OpenCode (GitHub Repo)

OpenCode is an open-source AI coding agent for the terminal, with built-in modes for full-access development work and read-only planning. It supports subagents for complex searches and multi-step tasks, ships through npm, Homebrew, Scoop, Chocolatey, Arch, mise, and Nix, and now has a desktop beta for macOS, Windows, and Linux.
🎁

Miscellaneous

The Silent Failure of Reliability Metrics at Scale: Lessons Learned from a Decade of Broken Metrics (8 minute read)

Reliability metrics and SLIs gradually lose accuracy as systems evolve, with broadened scopes and shifting semantics causing green dashboards to mask real issues. Improving fidelity requires bounded instrumentation, explicit metrics, and strong correlation to prevent misleading operational confidence.
AI agent at the wheel: How an attacker used LLMs to move from a CVE to an internal database in 4 pivots (7 minute read)

The Sysdig Threat Research Team observed what appears to be the first documented AI agent-driven cyberattack on May 10, where an attacker exploited a marimo notebook vulnerability (CVE-2026-39987) and used a large language model to autonomously navigate from initial access through AWS credentials to exfiltrating an entire PostgreSQL database in under two minutes. Four key signatures pointed to real-time AI composition rather than pre-scripted automation: the agent dumped a non-existent "credential" table based on schema assumptions, left a Chinese-language internal monologue comment mid-attack, used distinctively AI-formatted commands with separators and bounded captures, and dynamically chained outputs from one command as inputs to the next—all while spreading requests across multiple Cloudflare Workers IPs to evade detection.

Quick Links

Monitor Azure Managed Redis with Datadog (4 minute read)

Datadog's Azure Managed Redis integration gives teams agentless visibility into Redis cache activity, efficiency, resource pressure, latency, and availability through automatic metrics, dashboards, and recommended monitors.
Legacy Image Provider to Cloudflare Images: Traffic Estimation and Safe Rollout (5 minute read)

Migration to Cloudflare Images preserved legacy URLs by running dual paths and using Cloudflare edge origin overrides with S3 host-header HTTPS while validating image quality, compression, and egress cost, and executing a canary rollout with prefix purging and traffic ramp.
Slack AI: The Path to Multi-Cloud (8 minute read)

Slack evolved its AI infrastructure through four phases over three years, migrating from AWS SageMaker to Bedrock and eventually to a multi-cloud architecture spanning AWS and Google Cloud Platform by early 2026 to access best-in-class models while maintaining enterprise security and avoiding vendor lock-in.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!
Track your referrals here.

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of devops professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? 💼

Apply here, create your own role or send a friend's resume to jobs@tldr.tech and get $1k if we hire them! TLDR is one of Inc.'s Best Bootstrapped businesses of 2025.

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Kunal Desai & Martin Hauskrecht


Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR DevOps isn't for you, please unsubscribe.

Post a Comment

0 Comments