7 min read

UUIDs as Primary Keys in MySQL: A Practical Guide

Hero image for UUIDs as Primary Keys in MySQL: A Practical Guide
Table of Contents

💡

ID choice is architecture. Pick well and your writes scale. Pick poorly and you fight fragmentation, cache misses, and I/O storms

UUIDs, in plain English

What you’ll learn: What UUIDs are, why teams use them, common versions, and the tradeoffs that impact databases

What is a UUID?

UUIDs (universally unique identifiers) are 128‑bit IDs designed to be unique without a central counter. You often see them as 36‑character strings with hyphens, like 550e8400-e29b-41d4-a716-446655440000

Why teams use UUIDs

  • No central coordinator for ID generation across services or regions
  • Easy merges across shards, services, or data centers
  • Safer in APIs than sequential IDs because they are hard to guess

Common versions

  • UUID v1 time + node info (node often derived from a MAC address). Leaks time and machine hints
  • UUID v4 mostly random. Excellent uniqueness, poor locality for storage engines
  • UUID v6/v7 time‑ordered layouts that keep inserts mostly sequential

Real‑world example

Twitter introduced 64‑bit Snowflake IDs (time + node + sequence) to avoid central contention while keeping order. The same lesson applies widely today

In short, UUIDs remove coordination, but they can hurt write performance if you store them the wrong way


Why MySQL struggles

What you’ll learn: How random keys affect MySQL InnoDB’s clustered B+ tree and increase page splits, I/O, and CPU

InnoDB basics

MySQL InnoDB stores rows in a clustered B+ tree (a balanced search tree). The primary key defines the physical row order

  • Internal nodes route by key ranges
  • Leaf nodes hold sorted rows on pages
  • Pages are typically 16 KB in InnoDB
  • Sequential keys append to the rightmost leaf, which stays hot in cache

Why random keys hurt

  • Scattered inserts target many pages instead of the hot rightmost page
  • More page splits and lower page fill, often near 50% under heavy randomness
  • Poor cache locality increases I/O and CPU because data is not clustered together in memory

B+ Tree Impact Visualization

Here’s how different key types affect MySQL InnoDB’s clustered B+ tree:

flowchart TD
    subgraph seq[Auto-Increment Keys Sequential]
        A1[Root Node Ranges 1-1000 1001-2000 2001+]
        A2[Leaf 1-1000 95% Full]
        A3[Leaf 1001-2000 95% Full]
        A4[Leaf 2001+ Hot Insert Point Cache Hit]
        
        A1 --> A2
        A1 --> A3
        A1 --> A4
        
        A5[New Row ID 2156] --> A4
    end
    
    subgraph rand[UUID v4 Keys Random]
        B1[Root Node Ranges Scattered]
        B2[Leaf Block A 50% Full]
        B3[Leaf Block B 50% Full]
        B4[Leaf Block C 50% Full]
        
        B1 --> B2
        B1 --> B3
        B1 --> B4
        
        B5[New Random UUID] --> B6[Any Leaf Page Split Cache Miss]
        B6 --> B2
        B6 --> B3
        B6 --> B4
    end
    
    classDef goodPerf fill:#e8f5e8,stroke:#2e7d32
    classDef badPerf fill:#ffebee,stroke:#c62828
    classDef newItem fill:#e1f5fe,stroke:#01579b
    
    class A4,A2,A3 goodPerf
    class B6,B2,B3,B4 badPerf
    class A5,B5 newItem

Key Insight: Sequential keys keep inserts hot in the rightmost leaf (95% page utilization), while random UUIDs scatter across all leaves causing page splits and fragmentation (50% utilization).

💡

Randomness is great for uniqueness and terrible for locality. UUID v4 as a primary key pushes InnoDB toward constant rebalancing and wasted space

Now that you see the cost of random keys, let’s make UUIDs behave better in storage and order


Store and order UUIDs

What you’ll learn: How to store UUIDs compactly and keep inserts mostly sequential using ordered variants

Store compactly

  • Avoid CHAR(36) for primary keys
  • Use BINARY(16) and convert at the application edges

Example schema and conversions

-- 1) Binary storage
CREATE TABLE users (
  id BINARY(16) PRIMARY KEY,
  email VARCHAR(255) NOT NULL UNIQUE,
  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

-- 2) Insert with UUIDv7 from app
INSERT INTO users (id, email) VALUES (UUID_TO_BIN('018f6e6a-4b6a-7c3f-b2d0-9f7c6df4a8a1'), '[email protected]');

-- 3) Read as string when you need it
SELECT BIN_TO_UUID(id) AS id, email FROM users;

If you are on UUID v1

Use UUID_TO_BIN(uuid, 1) to swap the time bits forward for better locality, and read with BIN_TO_UUID(col, 1) to reverse it

-- v1 optimization
INSERT INTO orders (id, total)
VALUES (UUID_TO_BIN('d211ca18-d389-11ee-a506-0242ac120002', 1), 42.00);

SELECT BIN_TO_UUID(id, 1) AS id, total FROM orders;

Prefer ordered generators

  • UUID v7 millisecond timestamp prefix plus randomness. Great InnoDB locality with standard UUID format
  • UUID v6 similar time‑ordered layout using rearranged v1 semantics

Size reality check

RepresentationBytesOrderable
INT auto‑inc4Yes
BIGINT auto‑inc8Yes
UUID in BINARY(16)16With v6/v7

One change saves space, the other saves your write path. Use compact storage and ordered IDs together

💡

Guideline: Use BINARY(16) plus UUID v7 for MySQL InnoDB. This combination reduces index size and minimizes page splits under write load


ID alternatives

What you’ll learn: The tradeoffs among auto‑increment, UUID v4, UUID v7, ULID, Snowflake, and NanoID

ID Scheme Comparison Matrix

flowchart LR
    subgraph legend[ID Scheme Performance vs Size]
        L1[Excellent Performance Small Size]
        L2[Good Performance Medium Size] 
        L3[Poor Performance Large Size]
    end
    
    subgraph schemes[ID Schemes Positioned by Trade-offs]
        A[Auto-increment 4-8 bytes Excellent]
        S[Snowflake 8 bytes Excellent]
        U7[UUID v7 16 bytes Good]
        UL[ULID 16 bytes Good]
        U4[UUID v4 16 bytes Poor]
        N[NanoID 16 bytes Poor]
    end
    
    classDef excellent fill:#e8f5e8,stroke:#2e7d32
    classDef good fill:#fff3e0,stroke:#ef6c00
    classDef poor fill:#ffebee,stroke:#c62828
    
    class A,S excellent
    class U7,UL good
    class U4,N poor
SchemeSizeOrderableMySQL Performance
Auto‑inc INT/BIGINT4/8 B✅ Yes⭐⭐⭐ Excellent
UUID v716 B✅ Yes⭐⭐ Good
ULID16 B✅ Yes⭐⭐ Good
Snowflake8 B✅ Yes⭐⭐⭐ Excellent
UUID v416 B❌ No❌ Poor
NanoID~16 B❌ No❌ Poor

Notes per scheme

  • Auto‑inc small and fast, but merging shards is painful and IDs are easy to enumerate
  • UUID v4 no coordination and good privacy, but causes fragmentation and index bloat in InnoDB
  • UUID v7 time‑ordered with great locality, stays at 16 bytes everywhere
  • ULID lexicographically sortable base32 string (26 characters). Millisecond buckets can collide under extreme bursty traffic
  • Snowflake tiny, sortable, and fast. Requires node IDs and clock discipline
  • NanoID short, URL‑safe strings, but no natural order and v4‑like locality issues

Rule of thumb: default to UUID v7 for distributed SQL. Choose Snowflake when you need compact 64‑bit IDs and can manage node clocks

Handy snippets

-- Virtual column for readable IDs without storing 36 chars
ALTER TABLE users
  ADD COLUMN id_str VARCHAR(36) GENERATED ALWAYS AS (BIN_TO_UUID(id)) VIRTUAL;
# Pseudo Ruby for UUID v7 (use a well‑tested library in production)
def uuid_v7
  ts = (Time.now.to_f * 1000).to_i # 48‑bit ms
  rand_a = SecureRandom.random_number(1 << 12)
  rand_b = SecureRandom.random_number(1 << 62)
  # pack bits per spec, then hex with hyphens
end

With options in mind, let’s choose a default and outline a safe migration path


Choose and migrate

What you’ll learn: Practical defaults for new apps and a step‑by‑step path to migrate existing tables

  1. New single‑DB app use BIGINT auto‑inc. Add a unique external ID later if you expose records
  2. New distributed app use UUID v7 in BINARY(16). Keep string conversions at the application edge
  3. Existing UUID v4 tables convert to BINARY(16) first. Put new tables on UUID v7 and keep old tables stable
  4. If stuck on UUID v1 insert with UUID_TO_BIN(uuid, 1) and expose BIN_TO_UUID(col, 1) for reads
  5. Secondary indexes audit each one and drop dead weight. Remember every secondary index stores the primary key again

Minimal migration example

-- Step 1: add binary column alongside old char PK
ALTER TABLE events ADD COLUMN id_bin BINARY(16) NULL;

-- Step 2: backfill in chunks
UPDATE events SET id_bin = UUID_TO_BIN(id_char) WHERE id_bin IS NULL LIMIT 10000;

-- Step 3: switch PK (requires downtime or orchestrated swap)
ALTER TABLE events DROP PRIMARY KEY, ADD PRIMARY KEY (id_bin);

-- Step 4: optional remove old char column after apps switch
ALTER TABLE events DROP COLUMN id_char;

Quick checklist

  • Use BINARY(16)
  • Prefer UUID v7 or v6
  • Prune secondary indexes
  • Keep IDs opaque at the API boundary
💡

TL;DR: For MySQL InnoDB, use UUID v7 in BINARY(16), or BIGINT auto‑inc if you do not need distributed generation. Avoid CHAR(36) and random UUID v4 as primary keys

📧