TL;DR: Orchestrate in n8n, compute in Postgres. Batch hard (100–500). Use INSERT … ON CONFLICT for upserts, dedupe with keys, prefer ordered IDs (ULID or UUID v7), and stream via webhooks with idempotency
Architecture Split
What you’ll learn: How to split work between n8n and Postgres for speed, stability, and low latency
Naive imports push every transform and write through n8n one row at a time. That burns memory and network and it thrashes indexes
- Treat n8n as the conductor, not the orchestra
- Let Postgres do heavy lifting: upserts, filtering, ordering, joins
- Keep payloads small and push arrays in batches, not singletons
A clean split works better
What belongs where
n8n
- Orchestration, retries, rate limiting, batching, fan-out and fan-in
- Light transforms and validation
- Webhooks and workflow state
Postgres and Supabase
- Upserts with ON CONFLICT, constraints, dedupe enforcement
- Window functions for across-row calculations, projections, server-side filtering
- Materialized views for read models (stored query results)
Keep compute in the database and keep the workflow lean
Common patterns
| Pattern | When | Notes |
|---|---|---|
| Webhooks to DB | Source can push events | Lowest latency, requires idempotency (safe to retry without new effects) |
| Polling to Batches | No webhooks available | Simple to start, reconcile on a schedule |
| Supabase REST | Need managed auth and Row Level Security (per-user row policies) | Great for CRUD, bulk via upsert RPC or SQL |
| Direct SQL | Need maximum throughput | Fewer hops, mind connection pooling |
When uncertain, start with REST or Webhooks, then evolve to direct SQL for hot paths
flowchart TD
A[Source Data] --> B[n8n Orchestrate]
B --> C[Build SQL]
C --> D[Postgres Upsert]
D --> E[Read Model]
classDef trigger fill:#e1f5fe,stroke:#01579b
classDef process fill:#fff3e0,stroke:#ef6c00
classDef action fill:#e8f5e8,stroke:#2e7d32
class B trigger
class C process
class D action
Transition: With roles set, move to bulk upserts and dedupe
Bulk Upserts
What you’ll learn: How to upsert in batches, guard freshness, and dedupe before writes
Row-by-row INSERT or UPDATE in a loop is the fastest way to go slow. You will hit API limits and saturate n8n memory
- Use a single INSERT … ON CONFLICT per batch
- Deduplicate inside n8n before hitting the database
- Gate updates by freshness so newer rows win
ON CONFLICT basics
-- Conflict on a unique key (external_id or a composite)
INSERT INTO public.items (
external_id, name, email, updated_at, payload
) VALUES
-- values expanded by n8n per batch
($1,$2,$3,$4,$5), ($6,$7,$8,$9,$10)
ON CONFLICT (external_id) DO UPDATE SET
name = EXCLUDED.name,
email = EXCLUDED.email,
updated_at = EXCLUDED.updated_at,
payload = EXCLUDED.payload
WHERE EXCLUDED.updated_at > items.updated_at -- only if newer
Gotchas
- The conflict target must be UNIQUE via index or constraint
- Updating many columns can bloat row size, be selective
- Add a WHERE guard to prevent stale overwrites
Default to a single conflict key when possible
Upsert keys
| Scenario | Conflict Target | Why |
|---|---|---|
| Third-party master ID | UNIQUE(external_id) | Simple and stable from source |
| No stable ID | UNIQUE(source, natural_key1, natural_key2) | Composite natural identity |
| Time-series facts | UNIQUE(external_id, period_start) | One record per entity and period |
Keep keys narrow to reduce index size
Dedupe in n8n
Explain once: Upsert means insert or update based on a unique key
Remove Duplicates settings
- Mode: within current input to collapse repeats
- Scope: workflow if you want cross-run idempotency
- Key: external_id or a normalized composite
Additional tips
- Normalize values before comparing, for example lower-case emails and trim whitespace
- Keep a small rolling cache when payloads are large using a Code node with an LRU map
Batch the SQL in n8n
- HTTP Request to fetch page N
- Code to map to database columns
- SplitInBatches with size 100–500
- Code to build VALUES list and parameter array
- Postgres or Supabase node for a single INSERT … ON CONFLICT
- IF errors then Wait with exponential backoff and retry up to 3
flowchart TD
A[Fetch Page] --> B[Normalize]
B --> C[Dedupe]
C --> D[Batch 100-500]
D --> E[Build SQL]
E --> F[Upsert]
F --> G{Error?}
G -->|Yes| H[Wait and Retry]
G -->|No| I[Notify]
classDef trigger fill:#e1f5fe,stroke:#01579b
classDef process fill:#fff3e0,stroke:#ef6c00
classDef action fill:#e8f5e8,stroke:#2e7d32
classDef alert fill:#f3e5f5,stroke:#7b1fa2
class A trigger
class B process
class C process
class D process
class E process
class F action
class H alert
class I action
Transition: After consistent writes, control rate and concurrency
Rate Limits
What you’ll learn: How to throttle APIs, pick batch sizes, and avoid spikes
APIs, n8n, and database pools all have ceilings. Naive loops spike then stall
- Batch requests and inserts
- Add spacing between bursts
- Use per-API throttles, not one global brake
Built-in helpers
- SplitInBatches to control memory and concurrency per chunk
- Wait to add 200–1000 ms between batches for strict APIs
- Retry on Fail with backoff 1s then 2s then 4s capped near 30s
- HTTP Request batching via items per batch and batch interval
Use node-level features first, then code if needed
Batch size design
- 100–500 items per database upsert in most cases
- 20–100 items per outbound API call when payloads are small
- Aim for p50 latency times batch size near 0.5–2 seconds per burst
- Keep a soft ceiling of 5–10 concurrent executions for bursty flows
Rolling-window throttle
// Code node: per-API token bucket style throttle
// Token bucket: a rate limiter that lets short bursts then refills over time
const limit = { max: 60, windowMs: 60_000 } // 60 req per 60s
const key = 'crm_rate_window'
const now = Date.now()
const ts = (getWorkflowStaticData('global')[key] ||= [])
while (ts.length && now - ts > limit.windowMs) ts.shift()
if (ts.length >= limit.max) {
const waitMs = limit.windowMs - (now - ts) + 50
return new Promise(res => setTimeout(() => { ts.push(Date.now()); res(items) }, waitMs))
}
ts.push(now)
return items
flowchart TD
A[Call In] --> B{Token Available?}
B -->|Yes| C[Proceed]
B -->|No| D[Wait]
D --> B
classDef process fill:#fff3e0,stroke:#ef6c00
classDef action fill:#e8f5e8,stroke:#2e7d32
class A process
class C action
class D process
Performance tip: throttle per API, not globally, to keep fast services fast while respecting strict ones
Transition: With stable throughput, choose ID strategies and schema that scale
IDs and Schema
What you’ll learn: How to pick ordered IDs, index wisely, and model JSONB versus normalized tables
Random UUID v4 scatters inserts across the index. Under load it fragments and slows
- Prefer ordered IDs such as ULID or UUID v7 for append-like index behavior
- Separate ingest tables from read-optimized tables when needed
- Index only what you read or collide on
ID choices
Explain once: ULID is a time-ordered unique ID, UUID v7 is a time-ordered UUID with native support in modern databases
| ID Type | Ordering | Notes |
|---|---|---|
| UUID v4 | None | Native and easy, worst for hot writes |
| ULID | Time ordered | Store as text or bytea, sortable, needs generator |
| UUID v7 | Time ordered | Native type in new versions, best balance for write and sort |
Practical DDL
-- Conflict or lookup index kept narrow
CREATE UNIQUE INDEX uq_items_external ON public.items (external_id);
-- Ordered ID as primary key (generated in app or via extension)
-- CREATE TABLE public.items (
-- id uuid PRIMARY KEY DEFAULT uuid_v7(),
-- ...
-- );
-- Ingest versus read model
CREATE MATERIALIZED VIEW public.items_read AS
SELECT id, name, email
FROM public.items
WHERE deleted_at IS NULL;
Rules of thumb
- Keep conflict target columns short and selective
- Avoid wide composite primary keys on hot tables
- Use JSONB for raw payloads but index only the paths you filter on
JSONB or normalized
| Choice | When | Tradeoffs |
|---|---|---|
| JSONB ingest plus targeted indexes | Upstream shape shifts often | Flexible and fewer migrations, but JSON path indexes add write cost |
| Normalized columns | Stable schema and heavy querying | Fast reads and type safety, more migration churn when shapes change |
erDiagram
Item {
string id
string external_id
string name
string email
datetime updated_at
string payload
}
EventLog {
string event_id
datetime received_at
}
Transition: With a solid schema, wire near real-time webhooks with idempotency
Webhooks Flow
What you’ll learn: How to build an idempotent webhook pipeline with a reconciliation poll
Polling works, yet webhooks cut latency and API calls. Replays happen, so design for idempotency
- Prefer webhooks for near real-time
- Store event IDs to drop repeats safely
- Run a daily or weekly reconciliation poll to close gaps
Idempotent handler
-- Event sink to dedupe deliveries
CREATE TABLE IF NOT EXISTS public.event_log (
event_id text PRIMARY KEY,
received_at timestamptz NOT NULL DEFAULT now()
);
// n8n Webhook then Code guard then Postgres
// Idempotency: safe to retry without extra effects
const id = $json.event_id // stable from source
// Try insert; if conflict, short-circuit the workflow
// Use a Postgres node:
// INSERT INTO event_log(event_id) VALUES($1) ON CONFLICT DO NOTHING
// If affectedRows === 0 then already processed, return early
- Use event_id plus source in a composite when IDs are not globally unique
- Keep a TTL policy if the event space is large, for example 30–90 days
Hybrid model
- Webhook for create, update, and delete events
- Nightly poll to reconcile missed events using updated_at or checkpoints
- Emit metrics for counts, lag, and error rate
Reference flow
flowchart TD
A[Webhook or Poll] --> B[Normalize]
B --> C[Dedupe]
C --> D[Batch 100-500]
D --> E[Build SQL]
E --> F[Upsert]
F --> G{Error?}
G -->|Yes| H[Backoff]
H --> F
G -->|No| I[Notify]
F --> J[Dead Letter]
classDef trigger fill:#e1f5fe,stroke:#01579b
classDef process fill:#fff3e0,stroke:#ef6c00
classDef action fill:#e8f5e8,stroke:#2e7d32
classDef alert fill:#f3e5f5,stroke:#7b1fa2
class A trigger
class B process
class C process
class D process
class E process
class F action
class H alert
class I action
class J alert
Scale from 10k to 1M
- Scale batches from 100 to 250 to 500 while watching database write time and CPU
- Raise pool size only when the database is underutilized to avoid connection storms
- Reindex or run CLUSTER off-hours if UUID v4 is unavoidable
- Add partial indexes that match read filters and avoid over-indexing ingest
- Track p95 end-to-end latency and error budgets, then tune the bottleneck
Quick recap
- Batch everything at 100–500
- Use INSERT … ON CONFLICT with a freshness guard
- Pick strong dedupe keys and dedupe in-workflow
- Prefer ordered IDs such as ULID or UUID v7
- Use webhooks for real-time plus a reconciliation poll
- Rate-limit with rolling windows per API
Next steps: ship a small pilot flow with about 1k rows, measure, then scale. Keep IDs ordered, conflicts tight, and batches boring. Boring at scale is beautiful