System Design: Building Scalable Systems

System design is the art of designing large-scale distributed systems. It's essential for senior engineering roles and a key part of technical interviews at top companies.

Why Learn System Design?

Senior interviews: Required for staff/senior engineer roles
Real-world impact: Design systems that handle millions of users
Architectural thinking: See the big picture beyond code
Trade-off analysis: Every decision has pros and cons

Key Concepts

Latency vs Throughput

Latency: Time to complete a single request (milliseconds)
Throughput: Requests handled per second (RPS)
Optimizing one often affects the other

CAP Theorem

In a distributed system, you can only guarantee two of three:

Consistency: All nodes see the same data at the same time
Availability: Every request gets a response
Partition Tolerance: System works despite network failures

Since network partitions are unavoidable, you must choose between CP (consistent) or AP (available).

Availability Levels

Availability	Downtime/Year	Downtime/Month
99% (two nines)	3.65 days	7.3 hours
99.9% (three nines)	8.76 hours	43.8 minutes
99.99% (four nines)	52.6 minutes	4.38 minutes
99.999% (five nines)	5.26 minutes	26.3 seconds

Scalability

Vertical vs Horizontal Scaling

Vertical (Scale Up)	Horizontal (Scale Out)
Add more CPU, RAM, storage	Add more servers
Simple, no code changes	Requires distributed design
Limited by hardware	Near-infinite scalability
Single point of failure	Built-in redundancy

Stateless Services

Stateless services are easier to scale horizontally:

// Stateful (bad for scaling)
class Server {
  sessions = {};  // State stored in memory

  handleRequest(sessionId) {
    return this.sessions[sessionId];  // Only works on this server
  }
}

// Stateless (good for scaling)
class Server {
  handleRequest(sessionId) {
    return redis.get(sessionId);  // State stored externally
  }
}

Load Balancing

Distribute traffic across multiple servers to improve reliability and performance.

Load Balancing Algorithms

Round Robin: Distribute requests sequentially
Least Connections: Send to server with fewest active connections
IP Hash: Same client always goes to same server
Weighted: More powerful servers get more traffic

Architecture

                    ┌─────────────┐
                    │   Client    │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │Load Balancer│
                    └──────┬──────┘
           ┌───────────────┼───────────────┐
           │               │               │
    ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
    │  Server 1   │ │  Server 2   │ │  Server 3   │
    └─────────────┘ └─────────────┘ └─────────────┘

Caching

Store frequently accessed data in fast storage to reduce latency and database load.

Caching Layers

Browser cache: Static assets (CSS, JS, images)
CDN: Geographically distributed edge servers
Application cache: In-memory (Redis, Memcached)
Database cache: Query result caching

Cache Strategies

// Cache-Aside (Lazy Loading)
async function getUser(id) {
  // 1. Check cache
  let user = await cache.get(`user:${id}`);

  if (!user) {
    // 2. Cache miss - fetch from DB
    user = await db.query('SELECT * FROM users WHERE id = ?', [id]);

    // 3. Store in cache
    await cache.set(`user:${id}`, user, { ttl: 3600 });
  }

  return user;
}

// Write-Through
async function updateUser(id, data) {
  // 1. Update database
  await db.query('UPDATE users SET ? WHERE id = ?', [data, id]);

  // 2. Update cache immediately
  await cache.set(`user:${id}`, { id, ...data });
}

Cache Invalidation

TTL (Time To Live): Auto-expire after time period
Event-based: Invalidate on data changes
Version keys: Include version in cache key

Databases at Scale

Replication

Copy data across multiple servers for redundancy and read scaling.

                    ┌─────────────┐
           ┌────────│   Primary   │────────┐
           │        │   (Write)   │        │
           │        └─────────────┘        │
           │ Replicate         Replicate   │
           ▼                               ▼
    ┌─────────────┐               ┌─────────────┐
    │  Replica 1  │               │  Replica 2  │
    │   (Read)    │               │   (Read)    │
    └─────────────┘               └─────────────┘

Sharding (Partitioning)

Split data across multiple databases based on a key.

// Shard by user ID
function getShardId(userId) {
  return userId % NUM_SHARDS;
}

// User 1-1000 → Shard 0
// User 1001-2000 → Shard 1
// etc.

Sharding Strategies

Range-based: Users A-M → Shard 1, N-Z → Shard 2
Hash-based: hash(key) % num_shards
Directory-based: Lookup table maps keys to shards

Message Queues

Decouple services and handle asynchronous processing.

Use Cases

Email sending: Queue emails for async processing
Image processing: Resize/optimize uploaded images
Order processing: Handle orders asynchronously
Event streaming: Real-time data pipelines

Architecture

                    ┌─────────────┐
                    │  Producer   │
                    │ (Web Server)│
                    └──────┬──────┘
                           │ Publish
                    ┌──────▼──────┐
                    │   Queue     │
                    │ (RabbitMQ)  │
                    └──────┬──────┘
                           │ Consume
                    ┌──────▼──────┐
                    │  Consumer   │
                    │  (Worker)   │
                    └─────────────┘

Popular Tools

RabbitMQ: Traditional message broker
Apache Kafka: Distributed streaming platform
Amazon SQS: Managed cloud queue
Redis: Can also function as a simple queue

System Design Process

Step 1: Clarify Requirements

What features are required?
How many users? What scale?
What are the latency requirements?
What consistency guarantees are needed?

Step 2: Estimate Scale

// Example: Design Twitter
Daily Active Users: 200 million
Tweets per day: 500 million
Read/Write ratio: 100:1

// Calculations
Tweets/second: 500M / 86400 ≈ 6000 TPS
Reads/second: 600,000 TPS

// Storage (assuming 280 chars avg)
Per tweet: ~1 KB (with metadata)
Daily: 500 GB
Yearly: ~180 TB

Step 3: High-Level Design

Draw the major components and their interactions.

Step 4: Deep Dive

Focus on critical components and potential bottlenecks.

Step 5: Address Bottlenecks

Single points of failure
Data consistency issues
Scaling limitations

Why Learn System Design?

Key Concepts

Latency vs Throughput

CAP Theorem

Availability Levels

Scalability

Vertical vs Horizontal Scaling

Stateless Services

Load Balancing

Load Balancing Algorithms

Architecture

Caching

Caching Layers

Cache Strategies

Cache Invalidation

Databases at Scale

Replication

Sharding (Partitioning)

Sharding Strategies

Message Queues

Use Cases

Architecture

Popular Tools

System Design Process

Step 1: Clarify Requirements

Step 2: Estimate Scale

Step 3: High-Level Design

Step 4: Deep Dive

Step 5: Address Bottlenecks

Related Guides