The Agent Sidecar Pattern pairs an n8n MCP server with lightweight Rails or Flask sidecars. n8n orchestrates; sidecars execute heavy tasks via URL payloads, webhook-first callbacks, and end‑to‑end tracing.
What it is
What you’ll learn: How the n8n MCP server coordinates Rails and Flask sidecars, what “MCP” and “sidecar” mean, and how data flows via URLs
MCP means Model Context Protocol, a standard for exposing tools to AI agents
A sidecar is a companion service that handles work outside the main orchestrator
- Agent means any MCP client that calls tools, such as an IDE agent or chat agent
- n8n acts as orchestrator and protocol bridge for MCP workflows
- Sidecars specialize in compute‑intensive or stateful jobs, including GPU tasks
- Data moves as strings and URLs instead of raw binary
Think of n8n agents as the driver and sidecars as the pit crew decisions up front, horsepower on the side
High‑level flow
flowchart TD
A[MCP Client] --> B[n8n Orchestrator]
B --> C{Route}
C --> D[Flask Sidecar]
C --> E[Rails Sidecar]
D --> F[Object Store]
E --> F[Object Store]
F --> G[Webhook to n8n]
G --> H[Agent Result]
classDef trigger fill:#e1f5fe,stroke:#01579b
classDef process fill:#fff3e0,stroke:#ef6c00
classDef action fill:#e8f5e8,stroke:#2e7d32
classDef alert fill:#f3e5f5,stroke:#7b1fa2
class A trigger
class B,C,F process
class D,E,H action
class G alert
Next, see why this pattern works well in production
Why it matters
What you’ll learn: When to use sidecars, how the pattern scales, and how it simplifies reliability, debugging, and cost
You want production‑ready n8n agent workflows without cramming heavy code into n8n. This pattern scales, isolates risk, and keeps debugging simple
- Independent scaling grow sidecars without touching the orchestrator
- Reliability webhook‑first flow with polling fallback prevents stuck jobs
- Observability distributed traces show where latency accumulates
- Lower coupling strict contracts using strings and URLs make tests simple
- Cost control right‑size CPU or GPU per task
You get a clean separation of concerns that survives real traffic and real failures
Core components
What you’ll learn: The five building blocks of an n8n MCP server with Rails and Flask sidecars and how each piece fits
This section breaks down the orchestrator, sidecars, data contracts, communication model, and observability
n8n as MCP server
n8n exposes tools to agents via MCP and fans requests into workflows. It validates inputs, routes to sidecars, and shapes responses
- Expose workflows as MCP tools with clear JSON schemas
- Validate early to reject oversized payloads or missing URLs
- Keep flows small and single‑purpose for debuggability
Tip: run n8n in queue mode so webhook intake and execution scale independently
Rails and Flask sidecars
Pick the framework that matches the work profile
| Choice | Use when | Tasks |
|---|---|---|
| Flask | Stateless loops or GPU work | Embeddings, image transforms |
| Rails | Stateful jobs or DB writes | OCR, PDF parsing, audits |
Sidecars expose simple HTTP endpoints or MCP tools and return URLs to results
Data contracts
Binary blobs through orchestrators hurt logs, retries, and caches. Send references instead
- Inputs text_url, image_url, document_url
- Outputs result_url, embedding_url, preview_url
- Storage keep artifacts in S3 or GCS with content‑hashed keys
Small strings flow through n8n. Large bytes live in object storage
Communication model
Prefer push for speed and keep pull for resilience
| Aspect | Webhooks default | Polling fallback |
|---|---|---|
| Latency | Low, push on completion | Interval bound, higher |
| Cost | Efficient at scale | Noisy when idle |
| Ops | Needs public endpoint and retries | Simple wiring, wasteful |
Use webhooks for real‑time steps. Schedule a periodic poll to catch misses
Observability
Treat every request as a trace that spans agent to n8n to sidecars to storage. A trace is a record of the path a request takes across services
- Trace IDs propagate a trace_id and parent span in headers
- Budgets define a total latency budget and allocate per hop
- Signals emit spans with errors, retries, and payload sizes
OpenTelemetry is a standard toolkit for traces, metrics, and logs
Job and artifacts ERD
erDiagram
AgentCall ||--o{ Job : creates
Job ||--o{ Artifact : outputs
Job ||--o{ TraceSpan : traces
AgentCall {
int id
string tool
datetime created_at
}
Job {
int id
string status
string sidecar
string trace_id
datetime created_at
}
Artifact {
int id
string type
string url
int size_bytes
datetime created_at
}
TraceSpan {
int id
string trace_id
string name
datetime start_time
}
With core parts in place, let us walk the happy path end to end
End‑to‑end flow
What you’ll learn: The request path from tool call to final response, including validation, processing, storage, and callbacks
From tool call to final response, the happy path stays short and predictable
- The AI agent invokes an n8n MCP tool with parameters
- n8n validates, writes a job record, and forwards to the chosen sidecar
- The sidecar fetches input by URL, processes work, persists output, and posts a webhook
- n8n resumes the workflow, enriches metadata, and returns a compact result to the agent
Step flow
flowchart TD
S[Tool Call] --> V[Validate]
V --> R[Route]
R --> P[Process in Sidecar]
P --> U[Save Artifact]
U --> W[Send Webhook]
W --> N[Resume Flow]
N --> T[Return Result]
classDef trigger fill:#e1f5fe,stroke:#01579b
classDef process fill:#fff3e0,stroke:#ef6c00
classDef action fill:#e8f5e8,stroke:#2e7d32
classDef alert fill:#f3e5f5,stroke:#7b1fa2
class S trigger
class V,R,U,N process
class P,T action
class W alert
Example tool call
{
"tool": "generate_embeddings",
"params": {
"text_url": "s3://docs/invoice-042.txt",
"model": "mini-lm"
},
"headers": {
"x-trace-id": "9f3c2f...",
"x-latency-budget-ms": "2000"
}
}
Webhook from sidecar
{
"status": "completed",
"embedding_url": "s3://results/emb/042.json",
"metrics": { "latency_ms": 480, "retries": 0 },
"trace_id": "9f3c2f..."
}
The agent gets a small JSON response plus URLs it can dereference on demand
Applying the pattern
What you’ll learn: When to adopt sidecars, how to structure tools and contracts, and what pitfalls to avoid
Use this section to decide fit and sketch your first deployment
When to use
- Use sidecars for GPU tasks, background jobs, or regulated audit trails
- Use sidecars when payloads exceed comfortable in‑process limits
- Keep it simple if an HTTP node can finish in under 300 ms with tiny payloads
Example architecture
- Tools embed_text, process_image, and extract_pdf as separate MCP workflows
- Contracts use URLs for inputs and URLs plus small summaries for outputs
- Sidecars use Flask for embeddings and images, Rails for PDF or OCR with Sidekiq
- Communication is webhook‑first with polling every 2 to 5 minutes as a safety net
- Observability uses OpenTelemetry spans, per‑hop budgets, and error tags
A small, composable graph beats a single mega‑workflow
Common pitfalls
- Binary in n8n passing blobs through n8n instead of URLs
- Polling first skipping webhook retries and backoff
- Mixed tools combining many unrelated tools in one MCP workflow
- Hidden artifacts writing to temp disks instead of object storage
- Lost traces not propagating trace headers across services
Key takeaways: keep n8n as the MCP brain, push heavy lifting to sidecars, move data by URL, prefer webhooks, and budget latency with traces. Start small, then scale the sharp edges.