ID choice is architecture. Pick well and your writes scale. Pick poorly and you fight fragmentation, cache misses, and I/O storms
UUIDs, in plain English
What you’ll learn: What UUIDs are, why teams use them, common versions, and the tradeoffs that impact databases
What is a UUID?
UUIDs (universally unique identifiers) are 128‑bit IDs designed to be unique without a central counter. You often see them as 36‑character strings with hyphens, like 550e8400-e29b-41d4-a716-446655440000
Why teams use UUIDs
- No central coordinator for ID generation across services or regions
- Easy merges across shards, services, or data centers
- Safer in APIs than sequential IDs because they are hard to guess
Common versions
- UUID v1 time + node info (node often derived from a MAC address). Leaks time and machine hints
- UUID v4 mostly random. Excellent uniqueness, poor locality for storage engines
- UUID v6/v7 time‑ordered layouts that keep inserts mostly sequential
Real‑world example
Twitter introduced 64‑bit Snowflake IDs (time + node + sequence) to avoid central contention while keeping order. The same lesson applies widely today
In short, UUIDs remove coordination, but they can hurt write performance if you store them the wrong way
Why MySQL struggles
What you’ll learn: How random keys affect MySQL InnoDB’s clustered B+ tree and increase page splits, I/O, and CPU
InnoDB basics
MySQL InnoDB stores rows in a clustered B+ tree (a balanced search tree). The primary key defines the physical row order
- Internal nodes route by key ranges
- Leaf nodes hold sorted rows on pages
- Pages are typically 16 KB in InnoDB
- Sequential keys append to the rightmost leaf, which stays hot in cache
Why random keys hurt
- Scattered inserts target many pages instead of the hot rightmost page
- More page splits and lower page fill, often near 50% under heavy randomness
- Poor cache locality increases I/O and CPU because data is not clustered together in memory
B+ Tree Impact Visualization
Here’s how different key types affect MySQL InnoDB’s clustered B+ tree:
flowchart TD
subgraph seq[Auto-Increment Keys Sequential]
A1[Root Node Ranges 1-1000 1001-2000 2001+]
A2[Leaf 1-1000 95% Full]
A3[Leaf 1001-2000 95% Full]
A4[Leaf 2001+ Hot Insert Point Cache Hit]
A1 --> A2
A1 --> A3
A1 --> A4
A5[New Row ID 2156] --> A4
end
subgraph rand[UUID v4 Keys Random]
B1[Root Node Ranges Scattered]
B2[Leaf Block A 50% Full]
B3[Leaf Block B 50% Full]
B4[Leaf Block C 50% Full]
B1 --> B2
B1 --> B3
B1 --> B4
B5[New Random UUID] --> B6[Any Leaf Page Split Cache Miss]
B6 --> B2
B6 --> B3
B6 --> B4
end
classDef goodPerf fill:#e8f5e8,stroke:#2e7d32
classDef badPerf fill:#ffebee,stroke:#c62828
classDef newItem fill:#e1f5fe,stroke:#01579b
class A4,A2,A3 goodPerf
class B6,B2,B3,B4 badPerf
class A5,B5 newItem
Key Insight: Sequential keys keep inserts hot in the rightmost leaf (95% page utilization), while random UUIDs scatter across all leaves causing page splits and fragmentation (50% utilization).
Randomness is great for uniqueness and terrible for locality. UUID v4 as a primary key pushes InnoDB toward constant rebalancing and wasted space
Now that you see the cost of random keys, let’s make UUIDs behave better in storage and order
Store and order UUIDs
What you’ll learn: How to store UUIDs compactly and keep inserts mostly sequential using ordered variants
Store compactly
- Avoid
CHAR(36)for primary keys - Use
BINARY(16)and convert at the application edges
Example schema and conversions
-- 1) Binary storage
CREATE TABLE users (
id BINARY(16) PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
-- 2) Insert with UUIDv7 from app
INSERT INTO users (id, email) VALUES (UUID_TO_BIN('018f6e6a-4b6a-7c3f-b2d0-9f7c6df4a8a1'), '[email protected]');
-- 3) Read as string when you need it
SELECT BIN_TO_UUID(id) AS id, email FROM users;
If you are on UUID v1
Use UUID_TO_BIN(uuid, 1) to swap the time bits forward for better locality, and read with BIN_TO_UUID(col, 1) to reverse it
-- v1 optimization
INSERT INTO orders (id, total)
VALUES (UUID_TO_BIN('d211ca18-d389-11ee-a506-0242ac120002', 1), 42.00);
SELECT BIN_TO_UUID(id, 1) AS id, total FROM orders;
Prefer ordered generators
- UUID v7 millisecond timestamp prefix plus randomness. Great InnoDB locality with standard UUID format
- UUID v6 similar time‑ordered layout using rearranged v1 semantics
Size reality check
| Representation | Bytes | Orderable |
|---|---|---|
| INT auto‑inc | 4 | Yes |
| BIGINT auto‑inc | 8 | Yes |
| UUID in BINARY(16) | 16 | With v6/v7 |
One change saves space, the other saves your write path. Use compact storage and ordered IDs together
Guideline: Use BINARY(16) plus UUID v7 for MySQL InnoDB. This combination reduces index size and minimizes page splits under write load
ID alternatives
What you’ll learn: The tradeoffs among auto‑increment, UUID v4, UUID v7, ULID, Snowflake, and NanoID
ID Scheme Comparison Matrix
flowchart LR
subgraph legend[ID Scheme Performance vs Size]
L1[Excellent Performance Small Size]
L2[Good Performance Medium Size]
L3[Poor Performance Large Size]
end
subgraph schemes[ID Schemes Positioned by Trade-offs]
A[Auto-increment 4-8 bytes Excellent]
S[Snowflake 8 bytes Excellent]
U7[UUID v7 16 bytes Good]
UL[ULID 16 bytes Good]
U4[UUID v4 16 bytes Poor]
N[NanoID 16 bytes Poor]
end
classDef excellent fill:#e8f5e8,stroke:#2e7d32
classDef good fill:#fff3e0,stroke:#ef6c00
classDef poor fill:#ffebee,stroke:#c62828
class A,S excellent
class U7,UL good
class U4,N poor
| Scheme | Size | Orderable | MySQL Performance |
|---|---|---|---|
| Auto‑inc INT/BIGINT | 4/8 B | ✅ Yes | ⭐⭐⭐ Excellent |
| UUID v7 | 16 B | ✅ Yes | ⭐⭐ Good |
| ULID | 16 B | ✅ Yes | ⭐⭐ Good |
| Snowflake | 8 B | ✅ Yes | ⭐⭐⭐ Excellent |
| UUID v4 | 16 B | ❌ No | ❌ Poor |
| NanoID | ~16 B | ❌ No | ❌ Poor |
Notes per scheme
- Auto‑inc small and fast, but merging shards is painful and IDs are easy to enumerate
- UUID v4 no coordination and good privacy, but causes fragmentation and index bloat in InnoDB
- UUID v7 time‑ordered with great locality, stays at 16 bytes everywhere
- ULID lexicographically sortable base32 string (26 characters). Millisecond buckets can collide under extreme bursty traffic
- Snowflake tiny, sortable, and fast. Requires node IDs and clock discipline
- NanoID short, URL‑safe strings, but no natural order and v4‑like locality issues
Rule of thumb: default to UUID v7 for distributed SQL. Choose Snowflake when you need compact 64‑bit IDs and can manage node clocks
Handy snippets
-- Virtual column for readable IDs without storing 36 chars
ALTER TABLE users
ADD COLUMN id_str VARCHAR(36) GENERATED ALWAYS AS (BIN_TO_UUID(id)) VIRTUAL;
# Pseudo Ruby for UUID v7 (use a well‑tested library in production)
def uuid_v7
ts = (Time.now.to_f * 1000).to_i # 48‑bit ms
rand_a = SecureRandom.random_number(1 << 12)
rand_b = SecureRandom.random_number(1 << 62)
# pack bits per spec, then hex with hyphens
end
With options in mind, let’s choose a default and outline a safe migration path
Choose and migrate
What you’ll learn: Practical defaults for new apps and a step‑by‑step path to migrate existing tables
Recommended plans
- New single‑DB app use BIGINT auto‑inc. Add a unique external ID later if you expose records
- New distributed app use UUID v7 in
BINARY(16). Keep string conversions at the application edge - Existing UUID v4 tables convert to
BINARY(16)first. Put new tables on UUID v7 and keep old tables stable - If stuck on UUID v1 insert with
UUID_TO_BIN(uuid, 1)and exposeBIN_TO_UUID(col, 1)for reads - Secondary indexes audit each one and drop dead weight. Remember every secondary index stores the primary key again
Minimal migration example
-- Step 1: add binary column alongside old char PK
ALTER TABLE events ADD COLUMN id_bin BINARY(16) NULL;
-- Step 2: backfill in chunks
UPDATE events SET id_bin = UUID_TO_BIN(id_char) WHERE id_bin IS NULL LIMIT 10000;
-- Step 3: switch PK (requires downtime or orchestrated swap)
ALTER TABLE events DROP PRIMARY KEY, ADD PRIMARY KEY (id_bin);
-- Step 4: optional remove old char column after apps switch
ALTER TABLE events DROP COLUMN id_char;
Quick checklist
- Use
BINARY(16) - Prefer UUID v7 or v6
- Prune secondary indexes
- Keep IDs opaque at the API boundary
TL;DR: For MySQL InnoDB, use UUID v7 in BINARY(16), or BIGINT auto‑inc if you do not need distributed generation. Avoid CHAR(36) and random UUID v4 as primary keys