Databases: Complete Guide to Data Persistence

Databases are the backbone of modern applications. They store, organize, and retrieve data efficiently, making it possible to build everything from simple blogs to complex enterprise systems.

Why Learn Databases?

Nearly every application needs to store data persistently. Understanding databases helps you:

  • Design efficient data models: Structure data for your application's needs
  • Write performant queries: Retrieve data quickly, even at scale
  • Choose the right database: Pick the best tool for your use case
  • Avoid common pitfalls: Prevent data loss, inconsistency, and security issues

SQL vs NoSQL

Feature SQL (Relational) NoSQL
Data Structure Tables with rows and columns Documents, key-value, graphs, etc.
Schema Fixed, predefined schema Flexible, dynamic schema
Relationships JOINs between tables Embedded or referenced
Scaling Vertical (bigger server) Horizontal (more servers)
ACID Compliance Strong guarantees Varies by database
Best For Complex queries, transactions Flexible data, high scale

SQL Fundamentals

SQL (Structured Query Language) is the standard language for relational databases.

Creating Tables

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    title VARCHAR(200) NOT NULL,
    content TEXT,
    published BOOLEAN DEFAULT false
);

CRUD Operations

-- CREATE: Insert data
INSERT INTO users (name, email)
VALUES ('Alice', 'alice@example.com');

-- READ: Select data
SELECT * FROM users;
SELECT name, email FROM users WHERE id = 1;
SELECT * FROM users ORDER BY created_at DESC LIMIT 10;

-- UPDATE: Modify data
UPDATE users
SET name = 'Alicia'
WHERE id = 1;

-- DELETE: Remove data
DELETE FROM users WHERE id = 1;

JOINs - Combining Tables

-- INNER JOIN: Only matching rows
SELECT users.name, posts.title
FROM users
INNER JOIN posts ON users.id = posts.user_id;

-- LEFT JOIN: All users, matching posts
SELECT users.name, COUNT(posts.id) as post_count
FROM users
LEFT JOIN posts ON users.id = posts.user_id
GROUP BY users.id;

-- Multiple conditions
SELECT u.name, p.title, p.published
FROM users u
JOIN posts p ON u.id = p.user_id
WHERE p.published = true
ORDER BY p.created_at DESC;

Aggregations

-- Count, Sum, Average
SELECT COUNT(*) FROM users;
SELECT AVG(price) FROM products;
SELECT SUM(quantity) FROM orders;

-- Group By with Having
SELECT user_id, COUNT(*) as post_count
FROM posts
GROUP BY user_id
HAVING COUNT(*) > 5;

Database Design

Normalization

Normalization reduces data redundancy and improves integrity:

  • 1NF: Each column contains atomic values, no repeating groups
  • 2NF: All non-key columns depend on the entire primary key
  • 3NF: No transitive dependencies (non-key depends only on key)

Example: E-commerce Schema

-- Users table
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL
);

-- Products table
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    stock INTEGER DEFAULT 0
);

-- Orders table
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    total DECIMAL(10, 2) NOT NULL,
    status VARCHAR(50) DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Order items (many-to-many relationship)
CREATE TABLE order_items (
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

NoSQL Databases

Document Databases (MongoDB)

// MongoDB document structure
{
  "_id": ObjectId("..."),
  "name": "Alice",
  "email": "alice@example.com",
  "address": {
    "street": "123 Main St",
    "city": "New York"
  },
  "orders": [
    { "product": "Widget", "quantity": 2 },
    { "product": "Gadget", "quantity": 1 }
  ]
}

// MongoDB queries
db.users.find({ "address.city": "New York" })
db.users.find({ orders: { $elemMatch: { quantity: { $gt: 1 } } } })
db.users.updateOne({ _id: id }, { $push: { orders: newOrder } })

Key-Value Stores (Redis)

# Simple key-value
SET user:1:name "Alice"
GET user:1:name

# With expiration (for caching)
SET session:abc123 "user_data" EX 3600

# Lists
LPUSH notifications:user:1 "New message"
LRANGE notifications:user:1 0 10

# Hashes
HSET user:1 name "Alice" email "alice@example.com"
HGETALL user:1

When to Use NoSQL

  • Document DB: Flexible schemas, nested data, rapid iteration
  • Key-Value: Caching, sessions, real-time leaderboards
  • Graph DB: Social networks, recommendation engines
  • Time-Series: IoT data, metrics, logs

ORMs & Query Builders

ORMs (Object-Relational Mappers) let you work with databases using your programming language's objects.

Prisma (Node.js)

// schema.prisma
model User {
  id    Int     @id @default(autoincrement())
  email String  @unique
  name  String
  posts Post[]
}

model Post {
  id       Int    @id @default(autoincrement())
  title    String
  author   User   @relation(fields: [authorId], references: [id])
  authorId Int
}

// Usage
const user = await prisma.user.create({
  data: { name: 'Alice', email: 'alice@example.com' }
});

const usersWithPosts = await prisma.user.findMany({
  include: { posts: true }
});

SQLAlchemy (Python)

from sqlalchemy import Column, Integer, String, ForeignKey
from sqlalchemy.orm import relationship

class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String(100))
    posts = relationship('Post', back_populates='author')

class Post(Base):
    __tablename__ = 'posts'
    id = Column(Integer, primary_key=True)
    title = Column(String(200))
    author_id = Column(Integer, ForeignKey('users.id'))
    author = relationship('User', back_populates='posts')

# Query
users = session.query(User).filter(User.name.like('%Ali%')).all()

Performance & Indexing

Indexes

Indexes speed up queries but slow down writes. Create them for frequently queried columns.

-- Create index for faster lookups
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_posts_user_id ON posts(user_id);

-- Composite index for queries on multiple columns
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- View query execution plan
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';

Performance Tips

  • Index foreign keys: Speed up JOINs significantly
  • Avoid SELECT *: Only fetch columns you need
  • Use LIMIT: Don't fetch more rows than necessary
  • Batch operations: Insert/update many rows at once
  • Connection pooling: Reuse database connections

Choosing a Database

Use Case Recommended Database
General web app PostgreSQL
Rapid prototyping SQLite, MongoDB
High-scale web app PostgreSQL, MongoDB
Caching layer Redis
Social network Neo4j (graph)
Time-series/metrics InfluxDB, TimescaleDB
Full-text search Elasticsearch

Related Guides