AI/ML – Page 9 – C4: Container, Code, Cloud & Context

Deploying LLM Applications on Cloud Run: A Complete Guide

Posted on November 5, 2024 by Nithin Mohan TK 6 min read

Last year, I deployed our first LLM application to Cloud Run. What should have taken hours took three days. Cold starts killed our latency. Memory limits caused crashes. Timeouts broke long-running requests. After deploying 20+ LLM applications to Cloud Run, I’ve learned what works and what doesn’t. Here’s the complete guide. Figure 1: Cloud Run […]

Cost Optimization for AI Workloads: Tracking and Reducing LLM Costs

Posted on July 8, 2024 by Nithin Mohan TK 5 min read

Last quarter, our LLM costs hit $12,000. In a single month. We had no idea where the money was going. No tracking, no budgets, no alerts. That’s when I realized: cost optimization isn’t optional for AI workloads—it’s survival. Here’s how we cut costs by 65% without sacrificing quality. Figure 1: Cost Optimization Architecture The $12,000 […]

Prompt Performance Monitoring: Tracking LLM Response Quality

Posted on June 12, 2024 by Nithin Mohan TK 6 min read

Three weeks after launching our AI customer support system, we noticed something strange. Response quality was degrading—slowly, almost imperceptibly. Users weren’t complaining yet, but satisfaction scores were dropping. The problem? We had no way to measure prompt performance. We were optimizing blind. That’s when I built a comprehensive prompt performance monitoring system. Figure 1: Prompt […]

Searching in

Category: AI/ML

Cost Optimization for AI Workloads: Tracking and Reducing LLM Costs

Prompt Performance Monitoring: Tracking LLM Response Quality