Latest

6/recent/ticker-posts

Header Ads Widget

GitHub Availability 📉, Cloud Cost Optimization ☁️, Autonomy Problem ✨

GitHub says recent outages were caused by rapid growth in AI-driven development, which has pushed the platform beyond its current scaling limits ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌  ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 

TLDR

Together With Atlassian

TLDR DevOps 2026-04-29

Bolting AI onto old service management playbooks? Time to shatter the service quo (Sponsor)

Service management workflows were designed for the 2010s. With AI, work happens in real-time across code deploys, SaaS apps, devices, and collaboration tools. So why is service still trapped in a queue?

In this whitepaper, Atlassian shares its AI-native vision for service management.

Topics include:

  • Why old service management playbooks are failing in the AI era and forcing teams to reimagine experiences from the ground up.
  • How Rovo and Teamwork Graph unlock smarter, context-aware AI and better, more proactive service experiences.
  • How Service Collection can free your team from the relics of legacy service desks and help you shatter the service quo.

Read the whitepaper

📱

News & Trends

How we built the most performant DeepSeek V3.2, MiniMax-M2.5 and Qwen 3.5 397B on DigitalOcean NVIDIA HGX™ B300 GPU Droplets (5 minute read)

DigitalOcean announced general availability of DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B on its Serverless Inference platform, which achieved the fastest output speeds among all tested providers, with DeepSeek V3.2 delivering 230 tokens per second and sub-1-second time to first token for 10,000 input tokens. The performance was achieved through NVIDIA's HGX B300 GPUs with 288GB memory, NVFP4 quantization for 1.8x smaller memory footprint, and custom optimizations to the vLLM serving framework in collaboration with Inferact.
Kubernetes v1.36: Mutable Pod Resources for Suspended Jobs (beta) (3 minute read)

Kubernetes v1.36 promoted to beta the ability to modify CPU, memory, GPU, and other resource requests in suspended Jobs' pod templates, eliminating the need to delete and recreate Jobs when resource requirements change. The feature, enabled by default, lets queue controllers and administrators adjust resources before Jobs start running. It is particularly useful for batch and machine learning workloads where optimal allocation depends on current cluster conditions.
An update on GitHub availability (6 minute read)

GitHub says recent outages were caused by rapid growth in AI-driven development, which has pushed the platform beyond its current scaling limits. The company is prioritizing reliability by expanding capacity, isolating critical systems, and reducing single points of failure to handle the surge.
🚀

Opinions & Tutorials

The Autonomy Problem: Why AI Agents Demand a New Security Playbook (4 minute read)

AI agents automate development and business tasks but introduce new risks like prompt injection, privilege escalation, and cascading failures that expand attack surfaces, prompting NIST concern. Effective mitigation requires layered controls across model design, system permissions, and human oversight to ensure secure deployment.
How it feels to run an incident with AI SRE (8 minute read)

This post describes incident.io's evolving AI SRE experience, which automates incident investigation, debugging, and resolution within a unified workflow, reducing context switching by integrating Slack, coding tools, and updates, and enabling rapid diagnosis, fixes, and reporting with minimal manual effort.
🧑‍💻

Resources & Tools

You're already paying for quality… mostly through incidents, rework & downtime. (Sponsor)

Learn how to reduce failure costs by prioritizing testing based on real risk and impact, so your team can ship faster with fewer incidents. Read the whitepaper from the Qase.io team and learn the four-step process for making testing decisions based on testing economics.
Gitnexus (GitHub Repo)

GitNexus is an open-source tool that indexes codebases into knowledge graphs to give AI coding assistants like Cursor, Claude Code, and Codex full architectural context, preventing them from missing dependencies and breaking call chains. The tool is available as a CLI with MCP server integration, a web UI that runs entirely in-browser via WebAssembly, and as an enterprise SaaS or self-hosted offering through akonlabs.com.
VictoriaMetrics (GitHub Repo)

VictoriaMetrics is a fast, cost-effective, and scalable solution for monitoring and managing time series data. It delivers high performance and reliability, making it an ideal choice for businesses of all sizes.
🎁

Miscellaneous

How GitHub uses eBPF to improve deployment safety (7 minute read)

GitHub mitigates circular deployment dependencies, where outages could block their own recovery, by using eBPF to monitor and restrict deployment scripts' network access and detect hidden, direct, and transient dependencies. This enables per-process control, DNS interception, and real-time auditing of risky calls like GitHub API usage during incident recovery.
Kubernetes for platform teams: Leveraging k0s and k0rdent (6 minute read)

This post demonstrates how to build a scalable, multi-cluster Kubernetes platform on OpenStack using k0s, k0rdent, and Hosted Control Planes (HCP), which eliminates the need for dedicated 3-node control planes per cluster by centralizing them in a single management cluster. The architecture shifts from managing individual clusters to operating a declarative system that handles provisioning, scaling, and upgrades across entire fleets while significantly reducing infrastructure costs and operational complexity.

Quick Links

How do you move code safely from one environment to the next? (Sponsor)

Deployment ≠ promotion. This blog, from the creators of ArgoCD, explains why promotion is the missing layer in GitOps stacks. Learn how healthy GitOps uses continuous promotion to govern movements between environments. Read the blog
From air-gapped to private cloud: Security that adapts to your environment (3 minute read)

Cloud-native security must adapt to diverse deployment constraints rather than enforce SaaS models, and Sysdig Secure delivers consistent runtime detection across private cloud, on-premises, and air-gapped environments with flexible, locally controlled implementations.
Cloud Cost Optimization: Principles that still matter (5 minute read)

Cloud cost optimization is a continuous, strategic practice of aligning usage with business value, made more critical by unpredictable, resource-intensive AI workloads that require strong visibility, governance, and iterative management.
Ghostty Is Leaving GitHub (3 minute read)

Mitchell Hashimoto, cofounder of HashiCorp, has announced that he is moving the Ghostty project off GitHub after 18 years of deep personal and professional attachment, citing growing frustration and disappointment with the platform.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!
Track your referrals here.

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of devops professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? 💼

Apply here, create your own role or send a friend's resume to jobs@tldr.tech and get $1k if we hire them! TLDR is one of Inc.'s Best Bootstrapped businesses of 2025.

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Kunal Desai & Martin Hauskrecht


Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR DevOps isn't for you, please unsubscribe.

Post a Comment

0 Comments