Interviews

System Design Interview Guide (2026)

System design interview guide for 2026: a repeatable framework, worked questions like URL shortener and news feed, plus caching and scaling answers.

O OnJob Editorial· June 5, 2026·12 min read

System design is where senior offers are won or lost. There’s no single correct answer — you’re scored on how you reason about scale, trade-offs and failure. The candidates who struggle aren’t the ones who don’t know Redis; they’re the ones who dive into databases before clarifying requirements. This guide gives you a repeatable framework, the building blocks interviewers expect you to name, and worked answers to the questions that come up most in 2026.

A framework you can reuse in any round

Use the same sequence every time so you never freeze:

  1. Clarify requirements. Functional (“users shorten URLs and get redirects”) and non-functional (“read-heavy, low latency, highly available”). Ask before you design.
  2. Estimate scale. Back-of-envelope: daily active users, reads/sec, writes/sec, storage/year. This drives every later decision.
  3. Define the API. A few endpoints (POST /shorten, GET /{code}) anchor the design.
  4. Sketch the high-level diagram. Client → load balancer → app servers → cache → database, plus any queues or CDNs.
  5. Drill into data model + storage. SQL vs NoSQL, schema, indexes.
  6. Address bottlenecks. Caching, replication, sharding, async processing.
  7. Discuss trade-offs and failure modes. What breaks at 10x? What happens when a node dies?

State which step you’re on as you go — it shows structure and keeps the interviewer with you.

The building blocks you must be able to explain

Load balancer — distributes traffic across servers (round-robin, least-connections); enables horizontal scaling and removes single points of failure.

Caching — store hot data in memory (Redis, Memcached) to cut database load and latency. Know the patterns: cache-aside (lazy load on miss), write-through, write-back. Always name an eviction policy (LRU) and TTLs, and acknowledge invalidation as the hard part.

Database replication — a primary handles writes, read replicas serve reads. Scales reads and adds redundancy; the cost is replication lag (replicas can be slightly stale).

Sharding (partitioning) — split data across nodes by a shard key (e.g. user_id). Scales writes and storage beyond one machine. The danger is hot shards from a bad key and expensive cross-shard joins. Mention consistent hashing to minimize reshuffling when nodes are added.

CDN — caches static assets near users geographically; offloads origin servers and cuts latency.

Message queue — Kafka, RabbitMQ, SQS; decouples producers from consumers, absorbs traffic spikes, and enables async work (emails, image processing) so the request path stays fast.

CAP theorem — under a network partition you choose Consistency or Availability. Most large web systems pick AP with eventual consistency; payments and inventory lean CP. Naming where you land, and why, is a strong signal.

Worked example: design a URL shortener

Requirements. Shorten a long URL to a short code; redirect on access. Read-heavy (~100:1 reads:writes). Low latency, high availability.

Scale estimate. Say 100M new URLs/month ≈ 40 writes/sec, ~4,000 reads/sec at the 100:1 ratio. Tiny per-record storage, but it accumulates over years — plan for billions of rows.

API. POST /shorten {long_url}{short_code}; GET /{short_code} → 301 redirect.

Encoding. Use an auto-increment ID encoded in base62 ([a-zA-Z0-9]) — 7 characters give ~3.5 trillion combinations. Avoids the collision-and-retry loop of random hashing.

Storage. A key-value store (short_code → long_url) fits perfectly; a NoSQL store like DynamoDB or Cassandra scales horizontally for this simple access pattern. Index on short_code.

Scaling reads. Put Redis in front, cache-aside, with the hot short codes in memory — most traffic never touches the database. A CDN/edge layer can cache redirects further.

Trade-offs. Custom aliases need a uniqueness check (a write-time collision detection). For analytics (click counts), write events to a queue and aggregate async so they don’t slow redirects.

Worked example: design a news feed (like Twitter/Instagram)

The core tension is fan-out: when a user posts, how do followers see it?

  • Fan-out on write (push) — at post time, copy the post into every follower’s precomputed feed. Reads are instant. Breaks for celebrities with 50M followers (one post = 50M writes).
  • Fan-out on read (pull) — build the feed on request by querying who the user follows. Cheap writes, but reads are heavy and slow.
  • Hybrid (the real answer) — push for normal users, pull for celebrities, then merge at read time. This is what large platforms actually do.

Add a cache for hot feeds, a CDN for media, a queue for the fan-out work, and rank with a scoring service rather than pure chronology. Naming the hybrid and why pure push fails is the senior-level insight here.

More questions you should rehearse

Design a rate limiter. Token bucket (refill tokens at a fixed rate, allow a burst up to capacity) is the standard answer; store counters in Redis keyed by user/IP so the limit holds across distributed servers.

Design a chat system (WhatsApp). WebSockets for real-time push, a message queue for delivery, a database for history, and presence tracking. Discuss delivery guarantees (at-least-once + dedup) and offline message storage.

Design a typeahead / autocomplete. A trie for prefix matching, cached top suggestions per prefix, updated asynchronously from query logs. Optimize for read latency since every keystroke hits it.

How would you scale a service from 1k to 1M users? Walk the ladder: vertical scaling → add a load balancer + horizontal scaling → add caching → add read replicas → shard the database → introduce async queues and a CDN. Name the bottleneck that forces each step.

How to stand out in a system design interview

Three habits win offers: clarify before designing (jumping to the database loses points), justify every choice with a trade-off (“Redis here because reads dominate; the cost is invalidation complexity”), and discuss failure modes (“if the primary dies, a replica is promoted”). Interviewers are scoring judgment under ambiguity, not memorized diagrams.

Rehearse explaining a design out loud against a timer — it’s a different skill from reading about one. Practice full system design rounds with OnJob’s AI mock interviews and get a confidence score on how clearly you reason through trade-offs, then create a free account to find matched senior engineering roles. Pair this with our Google interview questions and data structures interview questions for a complete senior-SDE loop.

FAQ

What level of role gets system design interviews? Usually mid-level and above (SDE II and up), and almost always for senior, staff and engineering manager roles. Some new-grad loops include a lightweight version focused on basic components and trade-offs rather than full distributed-systems depth.

How do I do capacity estimation under pressure? Use round numbers and state your assumptions out loud. Pick daily active users, multiply by actions per user to get requests/day, divide by ~100,000 seconds to get requests/second, then estimate storage from record size × volume × retention. Precision doesn’t matter — the structured reasoning does.

What’s the single most common mistake in system design interviews? Designing before clarifying. Candidates jump to “I’ll use a database” without asking about read/write ratios, scale, or consistency needs. Always spend the first few minutes on functional and non-functional requirements, then estimate scale — those answers determine every decision that follows.

Ready to put this into action?

Create your free OnJob profile and let AI match you to jobs you can actually win.

Create my free profile

Free OnJob tools & guides

Related reading

Create my free profile — free