System Design: Building Scalable Systems

System design is the art of designing large-scale distributed systems. It's essential for senior engineering roles and a key part of technical interviews at top companies.

Why Learn System Design?

  • Senior interviews: Required for staff/senior engineer roles
  • Real-world impact: Design systems that handle millions of users
  • Architectural thinking: See the big picture beyond code
  • Trade-off analysis: Every decision has pros and cons

Key Concepts

Latency vs Throughput

  • Latency: Time to complete a single request (milliseconds)
  • Throughput: Requests handled per second (RPS)
  • Optimizing one often affects the other

CAP Theorem

In a distributed system, you can only guarantee two of three:

  • Consistency: All nodes see the same data at the same time
  • Availability: Every request gets a response
  • Partition Tolerance: System works despite network failures

Since network partitions are unavoidable, you must choose between CP (consistent) or AP (available).

Availability Levels

Availability Downtime/Year Downtime/Month
99% (two nines) 3.65 days 7.3 hours
99.9% (three nines) 8.76 hours 43.8 minutes
99.99% (four nines) 52.6 minutes 4.38 minutes
99.999% (five nines) 5.26 minutes 26.3 seconds

Scalability

Vertical vs Horizontal Scaling

Vertical (Scale Up) Horizontal (Scale Out)
Add more CPU, RAM, storage Add more servers
Simple, no code changes Requires distributed design
Limited by hardware Near-infinite scalability
Single point of failure Built-in redundancy

Stateless Services

Stateless services are easier to scale horizontally:

// Stateful (bad for scaling)
class Server {
  sessions = {};  // State stored in memory

  handleRequest(sessionId) {
    return this.sessions[sessionId];  // Only works on this server
  }
}

// Stateless (good for scaling)
class Server {
  handleRequest(sessionId) {
    return redis.get(sessionId);  // State stored externally
  }
}

Load Balancing

Distribute traffic across multiple servers to improve reliability and performance.

Load Balancing Algorithms

  • Round Robin: Distribute requests sequentially
  • Least Connections: Send to server with fewest active connections
  • IP Hash: Same client always goes to same server
  • Weighted: More powerful servers get more traffic

Architecture

                    ┌─────────────┐
                    │   Client    │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │Load Balancer│
                    └──────┬──────┘
           ┌───────────────┼───────────────┐
           │               │               │
    ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
    │  Server 1   │ │  Server 2   │ │  Server 3   │
    └─────────────┘ └─────────────┘ └─────────────┘

Caching

Store frequently accessed data in fast storage to reduce latency and database load.

Caching Layers

  • Browser cache: Static assets (CSS, JS, images)
  • CDN: Geographically distributed edge servers
  • Application cache: In-memory (Redis, Memcached)
  • Database cache: Query result caching

Cache Strategies

// Cache-Aside (Lazy Loading)
async function getUser(id) {
  // 1. Check cache
  let user = await cache.get(`user:${id}`);

  if (!user) {
    // 2. Cache miss - fetch from DB
    user = await db.query('SELECT * FROM users WHERE id = ?', [id]);

    // 3. Store in cache
    await cache.set(`user:${id}`, user, { ttl: 3600 });
  }

  return user;
}

// Write-Through
async function updateUser(id, data) {
  // 1. Update database
  await db.query('UPDATE users SET ? WHERE id = ?', [data, id]);

  // 2. Update cache immediately
  await cache.set(`user:${id}`, { id, ...data });
}

Cache Invalidation

  • TTL (Time To Live): Auto-expire after time period
  • Event-based: Invalidate on data changes
  • Version keys: Include version in cache key

Databases at Scale

Replication

Copy data across multiple servers for redundancy and read scaling.

                    ┌─────────────┐
           ┌────────│   Primary   │────────┐
           │        │   (Write)   │        │
           │        └─────────────┘        │
           │ Replicate         Replicate   │
           ▼                               ▼
    ┌─────────────┐               ┌─────────────┐
    │  Replica 1  │               │  Replica 2  │
    │   (Read)    │               │   (Read)    │
    └─────────────┘               └─────────────┘

Sharding (Partitioning)

Split data across multiple databases based on a key.

// Shard by user ID
function getShardId(userId) {
  return userId % NUM_SHARDS;
}

// User 1-1000 → Shard 0
// User 1001-2000 → Shard 1
// etc.

Sharding Strategies

  • Range-based: Users A-M → Shard 1, N-Z → Shard 2
  • Hash-based: hash(key) % num_shards
  • Directory-based: Lookup table maps keys to shards

Message Queues

Decouple services and handle asynchronous processing.

Use Cases

  • Email sending: Queue emails for async processing
  • Image processing: Resize/optimize uploaded images
  • Order processing: Handle orders asynchronously
  • Event streaming: Real-time data pipelines

Architecture

                    ┌─────────────┐
                    │  Producer   │
                    │ (Web Server)│
                    └──────┬──────┘
                           │ Publish
                    ┌──────▼──────┐
                    │   Queue     │
                    │ (RabbitMQ)  │
                    └──────┬──────┘
                           │ Consume
                    ┌──────▼──────┐
                    │  Consumer   │
                    │  (Worker)   │
                    └─────────────┘

Popular Tools

  • RabbitMQ: Traditional message broker
  • Apache Kafka: Distributed streaming platform
  • Amazon SQS: Managed cloud queue
  • Redis: Can also function as a simple queue

System Design Process

Step 1: Clarify Requirements

  • What features are required?
  • How many users? What scale?
  • What are the latency requirements?
  • What consistency guarantees are needed?

Step 2: Estimate Scale

// Example: Design Twitter
Daily Active Users: 200 million
Tweets per day: 500 million
Read/Write ratio: 100:1

// Calculations
Tweets/second: 500M / 86400 ≈ 6000 TPS
Reads/second: 600,000 TPS

// Storage (assuming 280 chars avg)
Per tweet: ~1 KB (with metadata)
Daily: 500 GB
Yearly: ~180 TB

Step 3: High-Level Design

Draw the major components and their interactions.

Step 4: Deep Dive

Focus on critical components and potential bottlenecks.

Step 5: Address Bottlenecks

  • Single points of failure
  • Data consistency issues
  • Scaling limitations

Related Guides