In January 2026, Microsoft and NVIDIA released the second iteration of the NVIDIA Dynamo Planner—a groundbreaking tool for optimizing large language model (LLM) inference on Azure Kubernetes Service (AKS). This collaboration addresses one of the most challenging aspects of production AI: efficiently scaling GPU resources to balance cost, latency, and throughput. This comprehensive guide explores […]
Read more →Tag: AI Infrastructure
Kubernetes 1.35: In-Place Pod Resource Updates and AI Model Image Volumes
Kubernetes 1.35, released in January 2026 and now supported on Amazon EKS and EKS Distro, marks a significant milestone in container orchestration—particularly for AI/ML workloads. This release introduces In-Place Pod Resource Updates, allowing you to resize CPU and memory without restarting pods, and Image Volumes, a game-changer for delivering large AI models using OCI container […]
Read more →2025 in Review: The Infrastructure Readiness Lesson
2025 taught enterprise technology leaders a critical lesson: infrastructure readiness matters more than model capability. This year-end review explores platform engineering, data governance, healthcare AI breakthroughs, and five predictions for 2026.
Read more →GPU Resource Management in Cloud: Optimizing AI Workloads
GPU resource management is critical for cost-effective AI workloads. After managing GPU resources for 40+ AI projects, I’ve learned what works. Here’s the complete guide to optimizing GPU resources in the cloud. Figure 1: GPU Resource Management Architecture Why GPU Resource Management Matters GPU resources are expensive and limited: Cost: GPUs are the most expensive […]
Read more →