LLM – Page 20 – C4: Container, Code, Cloud & Context

Rate Limiting for LLM APIs: Token Buckets, Queues, and Adaptive Throttling

Posted on August 22, 2024 by Nithin Mohan TK 13 min read

Introduction: LLM APIs have strict rate limits—requests per minute, tokens per minute, and concurrent request limits. Exceeding these limits results in 429 errors that can cascade through your application. Effective rate limiting on your side prevents hitting API limits, provides fair access across users, and enables graceful degradation under load. This guide covers practical rate […]

Read more →

LLM Security: Understanding Prompt Injection, Jailbreaking, and Attack Vectors (Part 1 of 2)

Posted on August 20, 2024 by Nithin Mohan TK 14 min read

A comprehensive guide to securing LLM applications against prompt injection, jailbreaking, and data exfiltration attacks. Includes production-ready defense implementations.

Read more →

LLM Batch Processing: Scaling AI Workloads from Hundreds to Millions

Posted on August 18, 2024 by Nithin Mohan TK 9 min read

Introduction: Processing thousands or millions of items through LLMs requires different patterns than single-request applications. Naive sequential processing is too slow, while uncontrolled parallelism hits rate limits and wastes money on retries. This guide covers production batch processing patterns: chunking strategies, parallel execution with rate limiting, progress tracking, checkpoint/resume for long jobs, cost estimation, and […]

Read more →

LLM Output Formatting: JSON Mode, Pydantic Parsing, and Template-Based Outputs

Posted on August 15, 2024 by Nithin Mohan TK 13 min read

Introduction: LLM outputs are inherently unstructured text, but applications need structured data—JSON objects, typed responses, specific formats. Getting reliable structured output requires careful prompt engineering, output parsing, validation, and error recovery. This guide covers practical output formatting techniques: JSON mode and structured outputs, Pydantic-based parsing, format enforcement with retries, template-based formatting, and strategies for handling […]

Read more →

LLM Chain Composition: Building Complex AI Workflows with Sequential, Parallel, and Conditional Patterns

Posted on August 10, 2024 by Nithin Mohan TK 11 min read

Introduction: Complex LLM applications rarely consist of a single prompt—they chain multiple steps together, each building on the previous output. Chain composition enables sophisticated workflows: retrieval-augmented generation, multi-step reasoning, iterative refinement, and conditional branching. Understanding how to compose chains effectively is essential for building production LLM systems. This guide covers practical chain patterns: sequential chains, […]

Read more →

Building LLM Agents with Tools: From Simple Loops to Production Systems

Posted on August 5, 2024 by Nithin Mohan TK 11 min read

Introduction: LLM agents extend language models beyond text generation into autonomous action. By connecting LLMs to tools—web search, code execution, APIs, databases—agents can gather information, perform calculations, and interact with external systems. This guide covers building tool-using agents from scratch: defining tools with schemas, implementing the reasoning loop, handling tool execution, managing conversation state, and […]

Read more →

Searching in

Tag: LLM

Rate Limiting for LLM APIs: Token Buckets, Queues, and Adaptive Throttling

LLM Security: Understanding Prompt Injection, Jailbreaking, and Attack Vectors (Part 1 of 2)

LLM Batch Processing: Scaling AI Workloads from Hundreds to Millions

LLM Output Formatting: JSON Mode, Pydantic Parsing, and Template-Based Outputs

LLM Chain Composition: Building Complex AI Workflows with Sequential, Parallel, and Conditional Patterns

Building LLM Agents with Tools: From Simple Loops to Production Systems