Artificial Intelligence(AI) – Page 9 – C4: Container, Code, Cloud & Context

Function Calling Deep Dive: Building LLM-Powered Tools and Agents

Posted on April 15, 2025 by Nithin Mohan TK 9 min read

Introduction: Function calling transforms LLMs from text generators into action-taking agents. Instead of just describing what to do, the model can actually do it—query databases, call APIs, execute code, and interact with external systems. OpenAI’s function calling (now called “tools”) and similar features from Anthropic and others let you define available functions, and the model […]

Read more →

Quantization Methods for LLMs: GPTQ, AWQ, and BitsAndBytes

Posted on April 8, 2025 by Nithin Mohan TK 5 min read

Last year, I needed to run a 13B parameter model on a 16GB GPU. Full precision required 52GB. After testing GPTQ, AWQ, and BitsAndBytes, I reduced memory to 7GB with minimal accuracy loss. After quantizing 30+ models, I’ve learned which method works best for each scenario. Here’s the complete guide to LLM quantization. Figure 1: […]

Read more →

Advanced RAG Patterns: From Naive Retrieval to Production-Grade Systems (Part 1 of 2)

Posted on April 7, 2025 by Nithin Mohan TK 12 min read

Introduction: Retrieval-Augmented Generation (RAG) has become the go-to architecture for building LLM applications that need access to private or current information. By retrieving relevant documents and including them in the prompt, RAG grounds LLM responses in factual content, reducing hallucinations and enabling knowledge that wasn’t in the training data. But naive RAG implementations often disappoint—the […]

Read more →

LLM Security: Defense Patterns for Production Applications (Part 2 of 2)

Posted on March 30, 2025 by Nithin Mohan TK 12 min read

Introduction: LLM applications face unique security challenges—prompt injection, data leakage, jailbreaking, and harmful content generation. Traditional security measures don’t address these AI-specific threats. This guide covers defensive techniques for production LLM systems: input sanitization, prompt injection detection, output filtering, rate limiting, content moderation, and audit logging. These patterns help you build LLM applications that are […]

Read more →

Production RAG Architecture: Building Scalable Vector Search Systems

Posted on March 14, 2025 by Nithin Mohan TK 4 min read

Three months into production, our RAG system started failing at 2AM. Not gracefully—complete outages. The problem wasn’t the models or the embeddings. It was the architecture. After rebuilding it twice, here’s what I learned about building RAG systems that actually work in production. Figure 1: Production RAG Architecture Overview The Night Everything Broke It was […]

Read more →

LLM Fine-Tuning Techniques: From LoRA to Full Parameter Training

Posted on February 28, 2025 by Nithin Mohan TK 19 min read

Introduction: Fine-tuning transforms general-purpose LLMs into specialized models that excel at your specific tasks. While prompting can get you far, fine-tuning unlocks capabilities that prompting alone cannot achieve: consistent output formats, domain-specific knowledge, reduced latency from shorter prompts, and behavior that would require extensive few-shot examples. This guide covers the practical aspects of LLM fine-tuning: […]

Read more →

Searching in

Category: Artificial Intelligence(AI)

Function Calling Deep Dive: Building LLM-Powered Tools and Agents

Quantization Methods for LLMs: GPTQ, AWQ, and BitsAndBytes

Advanced RAG Patterns: From Naive Retrieval to Production-Grade Systems (Part 1 of 2)

LLM Security: Defense Patterns for Production Applications (Part 2 of 2)

LLM Fine-Tuning Techniques: From LoRA to Full Parameter Training