Observability – Page 2 – C4: Container, Code, Cloud & Context

Retrieval Evaluation Metrics: Measuring What Matters in Search and RAG Systems

Posted on August 15, 2025 by Nithin Mohan TK 18 min read

Introduction: Retrieval evaluation is the foundation of building effective RAG systems and search applications. Without proper metrics, you’re flying blind—unable to tell if your retrieval improvements actually help or hurt end-user experience. This guide covers the essential metrics for evaluating retrieval systems: precision and recall at various cutoffs, Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative […]

Read more →

LLM Monitoring and Observability: Metrics, Traces, and Alerts

Posted on June 26, 2025 by Nithin Mohan TK 13 min read

Introduction: LLM applications are notoriously difficult to debug. Unlike traditional software where errors are obvious, LLM issues manifest as subtle quality degradation, unexpected costs, or slow responses. Proper observability is essential for production LLM systems. This guide covers monitoring strategies: tracking latency, tokens, and costs; implementing distributed tracing for complex chains; structured logging for debugging; […]

Read more →

LLM Monitoring and Alerting: Building Observability for Production AI Systems

Posted on February 3, 2025 by Nithin Mohan TK 20 min read

Introduction: LLM monitoring is essential for maintaining reliable, cost-effective AI applications in production. Unlike traditional software where errors are obvious, LLM failures can be subtle—degraded output quality, increased hallucinations, or slowly rising costs that go unnoticed until the monthly bill arrives. Effective monitoring tracks latency, token usage, error rates, output quality, and cost metrics in […]

Read more →

Mastering AWS, EKS, Python, Kubernetes, and Terraform for Monitoring and Observability for SRE: Unveiling the Secrets of Cloud Infrastructure Optimization

Posted on December 8, 2024 by Nithin Mohan TK 6 min read

As the world of software development continues to evolve, the need for robust infrastructures and efficient monitoring systems cannot be overemphasized. Whether you are an engineer, a site reliability engineer (SRE), or an IT manager, the need to harness the power of tools like Amazon Web Services (AWS), Elastic Kubernetes Service (EKS), Kubernetes, Terraform, and […]

Read more →

LLM Evaluation: Metrics, Benchmarks, and A/B Testing

Posted on October 15, 2024 by Nithin Mohan TK 12 min read

Introduction: Evaluating LLM outputs is challenging because there’s often no single “correct” answer. Traditional metrics like BLEU and ROUGE fall short for open-ended generation. This guide covers modern evaluation approaches: automated metrics for specific tasks, LLM-as-judge for quality assessment, human evaluation frameworks, A/B testing in production, and building comprehensive evaluation pipelines. These techniques help you […]

Read more →

LLM Observability: Cost Tracking and Quality Monitoring (Part 2 of 2)

Posted on October 13, 2024 by Nithin Mohan TK 14 min read

Introduction: You can’t improve what you can’t measure. LLM applications are notoriously difficult to debug—prompts are opaque, responses are non-deterministic, and failures often manifest as subtle quality degradation rather than crashes. Observability gives you visibility into every LLM call: what prompts were sent, what responses came back, how long it took, how much it cost, […]

Read more →

Searching in

Tag: Observability

Retrieval Evaluation Metrics: Measuring What Matters in Search and RAG Systems

LLM Monitoring and Observability: Metrics, Traces, and Alerts

LLM Monitoring and Alerting: Building Observability for Production AI Systems

Mastering AWS, EKS, Python, Kubernetes, and Terraform for Monitoring and Observability for SRE: Unveiling the Secrets of Cloud Infrastructure Optimization

LLM Evaluation: Metrics, Benchmarks, and A/B Testing

LLM Observability: Cost Tracking and Quality Monitoring (Part 2 of 2)