AIStackWatch Wiki
AI builder stack wiki
Short, practical explainers for the concepts that show up when teams ship AI into products.
Agentic Coding Explained
Agentic coding hands multi-step programming tasks to an LLM that can read files, run tools, and iterate without constant prompting.
Evals for LLM Applications
Evals are the test suite for your prompts — datasets plus scorers that tell you whether a change made things better or worse.
Function-calling and Tool-use
Tool-use lets an LLM request structured function calls — the app executes them and feeds results back until the task is done.
Function-calling API
Function-calling turns the LLM's output into a typed JSON object that your app can safely dispatch to real code.
Inference Providers Compared
Inference providers run open-weight models behind an API so you don't have to own the GPUs.
LLM Context Windows
The context window is the maximum number of tokens an LLM can attend to in a single call — inputs plus outputs.
LLM Observability Stack
LLM observability captures every prompt, response, tool call, and cost so you can debug and improve a production pipeline.
LLM Safety Filters
Safety filters block or rewrite outputs that violate provider policy — necessary protection with real tradeoffs on legitimate requests.
Model Context Protocol
MCP is an open protocol for plugging tools, data sources, and prompts into LLM agents — a USB-C for model context.
pgvector vs Dedicated Vector Databases
pgvector adds vector search to Postgres — good enough for most apps, until your index outgrows a single node.
Prompt Caching for LLM APIs
Prompt caching lets LLM providers reuse KV-cache state for repeated prefixes, cutting cost and latency on long, stable system prompts.
Retrieval-Augmented Generation
RAG injects retrieved documents into an LLM prompt so the model can answer from your data instead of its training corpus.
Streaming LLM Responses
Streaming emits tokens as they're generated so users see output immediately instead of waiting for the whole response.
System Prompt Engineering
The system prompt sets the model's role, rules, and output shape — the highest-leverage lever in any LLM app.
Text Embeddings
An embedding is a fixed-length float vector that represents the semantic meaning of a piece of text.
Tokens, Context, and Cost
LLM bills are driven by input and output tokens — knowing how they multiply is the difference between a profitable feature and a money pit.
TTFT and p95 Latency for LLMs
LLM latency breaks into time to first token plus a per-token streaming rate — both matter, but TTFT is what users feel.
What is a Vector Database?
A vector database stores high-dimensional embeddings and answers nearest-neighbor queries in sub-second time.
What is an Agent Framework?
An agent framework orchestrates multi-step LLM workflows with tool-use, memory, and branching logic.
When to Fine-tune
Fine-tuning adjusts model weights on your data — powerful for narrow tasks, wasteful as a first resort.