AIStackWatch

AIStackWatch Wiki

AI builder stack wiki

Short, practical explainers for the concepts that show up when teams ship AI into products.

Agentic Coding Explained

Agentic coding hands multi-step programming tasks to an LLM that can read files, run tools, and iterate without constant prompting.

Read entry

Evals for LLM Applications

Evals are the test suite for your prompts — datasets plus scorers that tell you whether a change made things better or worse.

Read entry

Function-calling and Tool-use

Tool-use lets an LLM request structured function calls — the app executes them and feeds results back until the task is done.

Read entry

Function-calling API

Function-calling turns the LLM's output into a typed JSON object that your app can safely dispatch to real code.

Read entry

Inference Providers Compared

Inference providers run open-weight models behind an API so you don't have to own the GPUs.

Read entry

LLM Context Windows

The context window is the maximum number of tokens an LLM can attend to in a single call — inputs plus outputs.

Read entry

LLM Observability Stack

LLM observability captures every prompt, response, tool call, and cost so you can debug and improve a production pipeline.

Read entry

LLM Safety Filters

Safety filters block or rewrite outputs that violate provider policy — necessary protection with real tradeoffs on legitimate requests.

Read entry

Model Context Protocol

MCP is an open protocol for plugging tools, data sources, and prompts into LLM agents — a USB-C for model context.

Read entry

pgvector vs Dedicated Vector Databases

pgvector adds vector search to Postgres — good enough for most apps, until your index outgrows a single node.

Read entry

Prompt Caching for LLM APIs

Prompt caching lets LLM providers reuse KV-cache state for repeated prefixes, cutting cost and latency on long, stable system prompts.

Read entry

Retrieval-Augmented Generation

RAG injects retrieved documents into an LLM prompt so the model can answer from your data instead of its training corpus.

Read entry

Streaming LLM Responses

Streaming emits tokens as they're generated so users see output immediately instead of waiting for the whole response.

Read entry

System Prompt Engineering

The system prompt sets the model's role, rules, and output shape — the highest-leverage lever in any LLM app.

Read entry

Text Embeddings

An embedding is a fixed-length float vector that represents the semantic meaning of a piece of text.

Read entry

Tokens, Context, and Cost

LLM bills are driven by input and output tokens — knowing how they multiply is the difference between a profitable feature and a money pit.

Read entry

TTFT and p95 Latency for LLMs

LLM latency breaks into time to first token plus a per-token streaming rate — both matter, but TTFT is what users feel.

Read entry

What is a Vector Database?

A vector database stores high-dimensional embeddings and answers nearest-neighbor queries in sub-second time.

Read entry

What is an Agent Framework?

An agent framework orchestrates multi-step LLM workflows with tool-use, memory, and branching logic.

Read entry

When to Fine-tune

Fine-tuning adjusts model weights on your data — powerful for narrow tasks, wasteful as a first resort.

Read entry