System design is the art of designing large-scale distributed systems. It's essential for senior engineering roles and a key part of technical interviews at top companies.
Why Learn System Design?
- Senior interviews: Required for staff/senior engineer roles
- Real-world impact: Design systems that handle millions of users
- Architectural thinking: See the big picture beyond code
- Trade-off analysis: Every decision has pros and cons
Key Concepts
Latency vs Throughput
- Latency: Time to complete a single request (milliseconds)
- Throughput: Requests handled per second (RPS)
- Optimizing one often affects the other
CAP Theorem
In a distributed system, you can only guarantee two of three:
- Consistency: All nodes see the same data at the same time
- Availability: Every request gets a response
- Partition Tolerance: System works despite network failures
Since network partitions are unavoidable, you must choose between CP (consistent) or AP (available).
Availability Levels
| Availability | Downtime/Year | Downtime/Month |
|---|---|---|
| 99% (two nines) | 3.65 days | 7.3 hours |
| 99.9% (three nines) | 8.76 hours | 43.8 minutes |
| 99.99% (four nines) | 52.6 minutes | 4.38 minutes |
| 99.999% (five nines) | 5.26 minutes | 26.3 seconds |
Scalability
Vertical vs Horizontal Scaling
| Vertical (Scale Up) | Horizontal (Scale Out) |
|---|---|
| Add more CPU, RAM, storage | Add more servers |
| Simple, no code changes | Requires distributed design |
| Limited by hardware | Near-infinite scalability |
| Single point of failure | Built-in redundancy |
Stateless Services
Stateless services are easier to scale horizontally:
// Stateful (bad for scaling)
class Server {
sessions = {}; // State stored in memory
handleRequest(sessionId) {
return this.sessions[sessionId]; // Only works on this server
}
}
// Stateless (good for scaling)
class Server {
handleRequest(sessionId) {
return redis.get(sessionId); // State stored externally
}
} Load Balancing
Distribute traffic across multiple servers to improve reliability and performance.
Load Balancing Algorithms
- Round Robin: Distribute requests sequentially
- Least Connections: Send to server with fewest active connections
- IP Hash: Same client always goes to same server
- Weighted: More powerful servers get more traffic
Architecture
┌─────────────┐
│ Client │
└──────┬──────┘
│
┌──────▼──────┐
│Load Balancer│
└──────┬──────┘
┌───────────────┼───────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Server 1 │ │ Server 2 │ │ Server 3 │
└─────────────┘ └─────────────┘ └─────────────┘ Caching
Store frequently accessed data in fast storage to reduce latency and database load.
Caching Layers
- Browser cache: Static assets (CSS, JS, images)
- CDN: Geographically distributed edge servers
- Application cache: In-memory (Redis, Memcached)
- Database cache: Query result caching
Cache Strategies
// Cache-Aside (Lazy Loading)
async function getUser(id) {
// 1. Check cache
let user = await cache.get(`user:${id}`);
if (!user) {
// 2. Cache miss - fetch from DB
user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
// 3. Store in cache
await cache.set(`user:${id}`, user, { ttl: 3600 });
}
return user;
}
// Write-Through
async function updateUser(id, data) {
// 1. Update database
await db.query('UPDATE users SET ? WHERE id = ?', [data, id]);
// 2. Update cache immediately
await cache.set(`user:${id}`, { id, ...data });
} Cache Invalidation
- TTL (Time To Live): Auto-expire after time period
- Event-based: Invalidate on data changes
- Version keys: Include version in cache key
Databases at Scale
Replication
Copy data across multiple servers for redundancy and read scaling.
┌─────────────┐
┌────────│ Primary │────────┐
│ │ (Write) │ │
│ └─────────────┘ │
│ Replicate Replicate │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Replica 1 │ │ Replica 2 │
│ (Read) │ │ (Read) │
└─────────────┘ └─────────────┘ Sharding (Partitioning)
Split data across multiple databases based on a key.
// Shard by user ID
function getShardId(userId) {
return userId % NUM_SHARDS;
}
// User 1-1000 → Shard 0
// User 1001-2000 → Shard 1
// etc. Sharding Strategies
- Range-based: Users A-M → Shard 1, N-Z → Shard 2
- Hash-based: hash(key) % num_shards
- Directory-based: Lookup table maps keys to shards
Message Queues
Decouple services and handle asynchronous processing.
Use Cases
- Email sending: Queue emails for async processing
- Image processing: Resize/optimize uploaded images
- Order processing: Handle orders asynchronously
- Event streaming: Real-time data pipelines
Architecture
┌─────────────┐
│ Producer │
│ (Web Server)│
└──────┬──────┘
│ Publish
┌──────▼──────┐
│ Queue │
│ (RabbitMQ) │
└──────┬──────┘
│ Consume
┌──────▼──────┐
│ Consumer │
│ (Worker) │
└─────────────┘ Popular Tools
- RabbitMQ: Traditional message broker
- Apache Kafka: Distributed streaming platform
- Amazon SQS: Managed cloud queue
- Redis: Can also function as a simple queue
System Design Process
Step 1: Clarify Requirements
- What features are required?
- How many users? What scale?
- What are the latency requirements?
- What consistency guarantees are needed?
Step 2: Estimate Scale
// Example: Design Twitter
Daily Active Users: 200 million
Tweets per day: 500 million
Read/Write ratio: 100:1
// Calculations
Tweets/second: 500M / 86400 ≈ 6000 TPS
Reads/second: 600,000 TPS
// Storage (assuming 280 chars avg)
Per tweet: ~1 KB (with metadata)
Daily: 500 GB
Yearly: ~180 TB Step 3: High-Level Design
Draw the major components and their interactions.
Step 4: Deep Dive
Focus on critical components and potential bottlenecks.
Step 5: Address Bottlenecks
- Single points of failure
- Data consistency issues
- Scaling limitations