You’ll build a production-ready n8n chatbot that answers from your docs, keeps context across messages, escalates when unsure, and logs everything for QA
n8n Chatbot Overview
What you’ll learn: How the chatbot connects to channels, uses RAG for grounded answers, and escalates when confidence is low
You’ll wire an n8n chatbot backend that serves a website widget, Telegram, or WhatsApp. It keeps conversation context, grounds answers in your docs using RAG (Retrieval‑Augmented Generation), and hands off to humans when needed
- Channels: website widget, Telegram Bot API, WhatsApp gateways
- Core flow: Webhook - memory (Redis or Postgres) - RAG - AI - escalate or log - reply
- Outcomes: fewer tickets, faster first response, auditable support
The Toyota andon cord ends guesswork: pull the cord, get help. Great support bots do the sameknow when to escalate
Mermaid: Channel entry points
flowchart TD
W[Web Widget] --> WH[Webhook]
T[Telegram Bot] --> WH
H[WhatsApp] --> WH
WH --> P[Process Flow]
classDef trigger fill:#e1f5fe,stroke:#01579b
classDef process fill:#fff3e0,stroke:#ef6c00
class W,T,H trigger
class WH,P process
Next, let’s map the architecture and required setup
Architecture Setup
What you’ll learn: The core nodes, webhook design, payloads, and response modes that make the chatbot reliable in production
At a glance, your n8n chat flow is a simple request - process - response pipeline with clear handoff points to human support
Mermaid: End-to-end flow
flowchart TD
A[Webhook] --> B[Load Memory]
B --> C[RAG Search]
C --> D[AI Reply]
D --> E{Escalate?]
E -->|Yes| F[Create Ticket]
E -->|No| G[Send Reply]
classDef trigger fill:#e1f5fe,stroke:#01579b
classDef process fill:#fff3e0,stroke:#ef6c00
classDef action fill:#e8f5e8,stroke:#2e7d32
classDef alert fill:#f3e5f5,stroke:#7b1fa2
class A trigger
class B,C,D process
class F action
class G action
class E alert
Components
- Webhook trigger: receives messages from each channel
- Data store: Redis for speed or Postgres for durability
- RAG: vector search over your docs using a vector database such as Pinecone or pgvector (Postgres extension)
- AI: LLM (Large Language Model) via n8n AI or Agent node
- Escalation and logging: Zendesk or Jira for tickets, Slack for alerts, structured DB logs for QA
Webhook basics (n8n)
- Add Webhook node with Method POST and Path
/api/support/chat - Auth: require header X-API-Key, validate before running logic
- Response control: use Respond to Webhook node for precise timing and headers
Example payload (frontend - n8n)
{
"userId": "u_1299",
"sessionId": "sess_78f3",
"channel": "web",
"message": "Does the Pro plan include SSO?"
}
Response modes
| Mode | Timing | Use |
|---|---|---|
| Immediately | Before workflow runs | Health checks or quick ACK |
| Last Node | After full run | Simple bots |
| Respond to Webhook | Wherever you place it | Production bots requiring full control |
Pro tip: gate the flow early
{{
const k = $json.headers["x-api-key"];
if (!k || k !== $env.SUPPORT_BOT_KEY) throw new Error("unauthorized");
return $json;
}}
Use Respond to Webhook for production so you control timing, headers, and body across success, escalation, and error paths
With the webhook in place, let’s persist context and ground answers in your docs
Memory and RAG
What you’ll learn: How to persist chat history, retrieve focused context, and keep answers grounded with RAG
Stateless webhooks forget everything. Persist history by user or session and fetch it per request to build a focused context window
Conversation memory
- Redis keys:
chat:<userId>:messagesas a list, optional TTL (time to live) for cleanup - Postgres table: durable and queryable for analytics
SQL schema
CREATE TABLE chat_messages (
id BIGSERIAL PRIMARY KEY,
user_id TEXT NOT NULL,
session_id TEXT,
role TEXT CHECK (role IN ('user','assistant','system')),
content TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON chat_messages (user_id, created_at);
Mermaid: Minimal ERD
erDiagram
ChatMessage {
int id
string user_id
string session_id
string role
string content
datetime created_at
}
- Fetch last N turns: for example, 10, to cap tokens
- Summarize older context: compress long threads into a brief summary stored alongside history
Map identity per channel (n8n expression)
{{ $json.channel === 'telegram' ? $json.message.from.id : $json.userId }}
RAG wiring
- Index once: ingest docs - chunk 800–1200 chars - create embeddings (numeric vectors) - store in a vector database such as Pinecone or pgvector
- Query time: embed the user message - retrieve top‑k chunks (k = 3–5) - pass chunks as context to the model
- Guardrails: instruct the model to answer only from retrieved context or say “I do not know” and escalate
Context passed to the AI (pseudo)
{
"system": "You are a precise support agent. If info is not in context, say so and escalate.",
"context": ["<doc-chunk-1>", "<doc-chunk-2>", "<doc-chunk-3>"],
"history": [{"role":"user","content":"..."},{"role":"assistant","content":"..."}],
"question": "Does the Pro plan include SSO?"
}
Mermaid: Retrieval path
flowchart TD
Q[User Query] --> E[Embed Query]
E --> V[Vector Search]
V --> K[Top-k Chunks]
K --> C[AI Context]
classDef process fill:#fff3e0,stroke:#ef6c00
class Q,E,V,K,C process
With grounded context available, connect the AI, add confidence scoring, and wire escalation
AI, Escalation, Logging
What you’ll learn: How to prompt the model, score confidence, escalate low-confidence cases, and log every turn for QA
You’ll feed history and RAG context into the AI node, score confidence, then reply or escalate
AI node (n8n)
- Model: pick a responsive, cost‑balanced LLM
- System prompt
You are ACME Support. Be concise, friendly, and accurate.
Only answer from <Context>. If missing, say you are unsure and suggest escalation.
When relevant, ask one clarifying question before answering.
- Inputs: history array, user question, joined RAG chunks
- Outputs: answer text, optional citations, and a short answer_type
Confidence and escalation
- Score: ask a small LLM, “Is the answer fully grounded in context? Return 0–1”
- Rules
- If score ≥ 0.75 - reply
- If user types “agent” or “human”, or score < 0.75 - escalate
- Escalation
- Create a ticket in Zendesk or Jira with full history and retrieved chunks
- Notify Slack channel for triage
Escalation payload (example)
{
"subject": "Escalated chat from u_1299",
"tags": ["n8n-chatbot","escalation"],
"custom_fields": {
"confidence": 0.42,
"channel": "web"
},
"description": "History: ...\nAI Attempt: ...\nRAG Sources: ..."
}
Mermaid: Decision branch
flowchart TD
A[AI Draft] --> B[Score Answer]
B --> C{Score >= 0.75}
C -->|Yes| D[Reply]
C -->|No| E[Escalate]
E --> F[Slack Notify]
E --> G[Create Ticket]
classDef process fill:#fff3e0,stroke:#ef6c00
classDef action fill:#e8f5e8,stroke:#2e7d32
classDef alert fill:#f3e5f5,stroke:#7b1fa2
class A,B,C process
class D action
class E,F,G alert
Structured logging
- IDs: conversationId, userId, sessionId
- Content: userMessage, aiMessage
- Meta: model, tokens, latencyMs, confidence, escalated
- RAG: docIds returned for audits
This data lets you tune prompts, docs, and thresholds with evidence, not guesswork
Add a few golden-path transcripts and replay them after each change to catch regressions quickly
With quality controls in place, harden the service for production traffic
Production Readiness
What you’ll learn: How to handle errors, secure the webhook, and scale n8n under load
Reliability
- Timeouts and retries: use backoff on LLM and vector DB calls
- Fallbacks: if RAG fails, apologize and escalate
- Dead letters: route failed executions to an ops channel
Security
- Webhook auth: X-API-Key, optional IP allowlist
- PII: encrypt at rest, scrub secrets from logs, rotate keys
- Abuse controls: per-user rate limit and message length caps
Performance
- Parallelize: run RAG and sentiment in parallel while the model drafts
- Cache: store top Q and A and embeddings in Redis
- Scale: run n8n queue mode with Redis and multiple workers, keep DB indexed
End-to-end example
- Webhook receives POST
/api/support/chat - Function validates header and normalizes payload
- DB or Redis fetches last 10 messages
- If long thread, summarize older context
- Embed query and run vector search for top‑k docs
- AI or Agent creates a draft answer using history and context
- Small LLM scores confidence from 0 to 1
- If score < 0.75 or a trigger phrase appears, take the escalate branch
- Create ticket and notify Slack with history and context
- Insert a structured log for both turns and metadata
- Respond to Webhook with 200 on answer or 202 on escalation
Mermaid: Node map
flowchart TD
A[Webhook POST] --> B[Validate]
B --> C[Fetch Memory]
C --> D[Summarize]
D --> E[Embed Query]
E --> F[Vector Search]
F --> G[AI Draft]
G --> H[Score]
H --> I{Escalate?]
I -->|Yes| J[Ticket]
I -->|Yes| K[Slack Alert]
I -->|No| L[Send Reply]
J --> M[Log Turn]
K --> M
L --> M
classDef trigger fill:#e1f5fe,stroke:#01579b
classDef process fill:#fff3e0,stroke:#ef6c00
classDef action fill:#e8f5e8,stroke:#2e7d32
classDef alert fill:#f3e5f5,stroke:#7b1fa2
class A trigger
class B,C,D,E,F,G,H,I process
class J,K,L action
class K alert
class M process
Next steps: add streaming replies, more channels, smarter doc chunking, and A or B test prompts and models
Start simple: memory, one doc set, one escalation rule. Ship in days, improve weekly