LLM – Page 4 – C4: Container, Code, Cloud & Context

Streaming Responses for LLMs: Implementing Server-Sent Events

Posted on June 15, 2025 by Nithin Mohan TK 10 min read

Streaming LLM responses dramatically improves user experience. After implementing streaming for 20+ LLM applications, I’ve learned what works. Here’s the complete guide to implementing Server-Sent Events for LLM streaming. Figure 1: Streaming Architecture Why Streaming Matters Streaming LLM responses provides significant benefits: Perceived performance: Users see results immediately, not after 10+ seconds Better UX: Progressive […]

Read more →

Fine-Tuning LLMs: From Data Preparation to Production Deployment

Posted on May 17, 2025 by Nithin Mohan TK 6 min read

Introduction: Fine-tuning transforms a general-purpose LLM into a specialized model tailored to your domain, style, or task. While prompt engineering can get you far, fine-tuning offers consistent behavior, reduced token usage, and capabilities that prompting alone cannot achieve. This guide covers the complete fine-tuning workflow—from data preparation to deployment—using both cloud APIs (OpenAI, Together AI) […]

Read more →

Model Routing Strategies: Intelligent Request Distribution Across LLMs

Posted on May 8, 2025 by Nithin Mohan TK 18 min read

Introduction: Not every request needs GPT-4. Simple questions can be handled by smaller, faster, cheaper models, while complex reasoning tasks benefit from more capable ones. Model routing intelligently directs requests to the most appropriate model based on task complexity, cost constraints, latency requirements, and quality needs. This approach can reduce costs by 50-80% while maintaining […]

Read more →

Function Calling Deep Dive: Building LLM-Powered Tools and Agents

Posted on April 15, 2025 by Nithin Mohan TK 9 min read

Introduction: Function calling transforms LLMs from text generators into action-taking agents. Instead of just describing what to do, the model can actually do it—query databases, call APIs, execute code, and interact with external systems. OpenAI’s function calling (now called “tools”) and similar features from Anthropic and others let you define available functions, and the model […]

Read more →

Quantization Methods for LLMs: GPTQ, AWQ, and BitsAndBytes

Posted on April 8, 2025 by Nithin Mohan TK 5 min read

Last year, I needed to run a 13B parameter model on a 16GB GPU. Full precision required 52GB. After testing GPTQ, AWQ, and BitsAndBytes, I reduced memory to 7GB with minimal accuracy loss. After quantizing 30+ models, I’ve learned which method works best for each scenario. Here’s the complete guide to LLM quantization. Figure 1: […]

Read more →

Enterprise Machine Learning in Production: Healthcare and Financial Services Case Studies

Posted on March 31, 2025 by Nithin Mohan TK 4 min read

Real-world enterprise ML implementations in healthcare diagnostics and financial fraud detection. Explore RAG and LLM integration patterns, ML maturity frameworks, and strategic recommendations for building ML-enabled organizations.

Read more →

Searching in

Tag: LLM

Streaming Responses for LLMs: Implementing Server-Sent Events

Fine-Tuning LLMs: From Data Preparation to Production Deployment

Model Routing Strategies: Intelligent Request Distribution Across LLMs

Function Calling Deep Dive: Building LLM-Powered Tools and Agents

Quantization Methods for LLMs: GPTQ, AWQ, and BitsAndBytes

Enterprise Machine Learning in Production: Healthcare and Financial Services Case Studies