AI/ML – Page 7 – C4: Container, Code, Cloud & Context

Building Multi-Agent Workflows: Advanced LangGraph Patterns

Posted on August 5, 2025 by Nithin Mohan TK 11 min read

Building multi-agent workflows requires careful orchestration. After building 18+ multi-agent systems with LangGraph, I’ve learned what works. Here’s the complete guide to advanced LangGraph patterns for multi-agent workflows. Figure 1: Multi-Agent Architecture with LangGraph Why Multi-Agent Workflows Multi-agent systems offer significant advantages: Specialization: Each agent handles specific tasks Parallelism: Agents can work simultaneously Scalability: Add […]

Read more →

Streaming Responses for LLMs: Implementing Server-Sent Events

Posted on June 15, 2025 by Nithin Mohan TK 10 min read

Streaming LLM responses dramatically improves user experience. After implementing streaming for 20+ LLM applications, I’ve learned what works. Here’s the complete guide to implementing Server-Sent Events for LLM streaming. Figure 1: Streaming Architecture Why Streaming Matters Streaming LLM responses provides significant benefits: Perceived performance: Users see results immediately, not after 10+ seconds Better UX: Progressive […]

Read more →

RESTful AI API Design: Best Practices for LLM APIs

Posted on April 15, 2025 by Nithin Mohan TK 13 min read

Designing RESTful APIs for LLMs requires careful consideration. After building 30+ LLM APIs, I’ve learned what works. Here’s the complete guide to RESTful AI API design. Figure 1: RESTful AI API Architecture Why LLM APIs Are Different LLM APIs have unique requirements: Async operations: LLM inference can take seconds or minutes Streaming responses: Need to […]

Read more →

Quantization Methods for LLMs: GPTQ, AWQ, and BitsAndBytes

Posted on April 8, 2025 by Nithin Mohan TK 5 min read

Last year, I needed to run a 13B parameter model on a 16GB GPU. Full precision required 52GB. After testing GPTQ, AWQ, and BitsAndBytes, I reduced memory to 7GB with minimal accuracy loss. After quantizing 30+ models, I’ve learned which method works best for each scenario. Here’s the complete guide to LLM quantization. Figure 1: […]

Read more →

Advanced LoRA Techniques: Multi-LoRA, LoRA+, and Beyond

Posted on March 15, 2025 by Nithin Mohan TK 6 min read

Last year, I fine-tuned a 7B parameter model with standard LoRA. It worked, but accuracy was 5% lower than full fine-tuning. After experimenting with Multi-LoRA, LoRA+, and advanced techniques, I’ve achieved 98% of full fine-tuning performance with 1% of the parameters. Here’s everything you need to know about advanced LoRA techniques. Figure 1: LoRA Techniques […]

Read more →

Production RAG Architecture: Building Scalable Vector Search Systems

Posted on March 14, 2025 by Nithin Mohan TK 4 min read

Three months into production, our RAG system started failing at 2AM. Not gracefully—complete outages. The problem wasn’t the models or the embeddings. It was the architecture. After rebuilding it twice, here’s what I learned about building RAG systems that actually work in production. Figure 1: Production RAG Architecture Overview The Night Everything Broke It was […]

Read more →

Searching in

Category: AI/ML

Building Multi-Agent Workflows: Advanced LangGraph Patterns

Streaming Responses for LLMs: Implementing Server-Sent Events

RESTful AI API Design: Best Practices for LLM APIs

Quantization Methods for LLMs: GPTQ, AWQ, and BitsAndBytes

Advanced LoRA Techniques: Multi-LoRA, LoRA+, and Beyond