Databases: Complete Guide to Data Persistence

Databases are the backbone of modern applications. They store, organize, and retrieve data efficiently, making it possible to build everything from simple blogs to complex enterprise systems.

Why Learn Databases?

Nearly every application needs to store data persistently. Understanding databases helps you:

Design efficient data models: Structure data for your application's needs
Write performant queries: Retrieve data quickly, even at scale
Choose the right database: Pick the best tool for your use case
Avoid common pitfalls: Prevent data loss, inconsistency, and security issues

SQL vs NoSQL

Feature	SQL (Relational)	NoSQL
Data Structure	Tables with rows and columns	Documents, key-value, graphs, etc.
Schema	Fixed, predefined schema	Flexible, dynamic schema
Relationships	JOINs between tables	Embedded or referenced
Scaling	Vertical (bigger server)	Horizontal (more servers)
ACID Compliance	Strong guarantees	Varies by database
Best For	Complex queries, transactions	Flexible data, high scale

SQL Fundamentals

SQL (Structured Query Language) is the standard language for relational databases.

Creating Tables

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    title VARCHAR(200) NOT NULL,
    content TEXT,
    published BOOLEAN DEFAULT false
);

CRUD Operations

-- CREATE: Insert data
INSERT INTO users (name, email)
VALUES ('Alice', 'alice@example.com');

-- READ: Select data
SELECT * FROM users;
SELECT name, email FROM users WHERE id = 1;
SELECT * FROM users ORDER BY created_at DESC LIMIT 10;

-- UPDATE: Modify data
UPDATE users
SET name = 'Alicia'
WHERE id = 1;

-- DELETE: Remove data
DELETE FROM users WHERE id = 1;

JOINs - Combining Tables

-- INNER JOIN: Only matching rows
SELECT users.name, posts.title
FROM users
INNER JOIN posts ON users.id = posts.user_id;

-- LEFT JOIN: All users, matching posts
SELECT users.name, COUNT(posts.id) as post_count
FROM users
LEFT JOIN posts ON users.id = posts.user_id
GROUP BY users.id;

-- Multiple conditions
SELECT u.name, p.title, p.published
FROM users u
JOIN posts p ON u.id = p.user_id
WHERE p.published = true
ORDER BY p.created_at DESC;

Aggregations

-- Count, Sum, Average
SELECT COUNT(*) FROM users;
SELECT AVG(price) FROM products;
SELECT SUM(quantity) FROM orders;

-- Group By with Having
SELECT user_id, COUNT(*) as post_count
FROM posts
GROUP BY user_id
HAVING COUNT(*) > 5;

Database Design

Normalization

Normalization reduces data redundancy and improves integrity:

1NF: Each column contains atomic values, no repeating groups
2NF: All non-key columns depend on the entire primary key
3NF: No transitive dependencies (non-key depends only on key)

Example: E-commerce Schema

-- Users table
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL
);

-- Products table
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    stock INTEGER DEFAULT 0
);

-- Orders table
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    total DECIMAL(10, 2) NOT NULL,
    status VARCHAR(50) DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Order items (many-to-many relationship)
CREATE TABLE order_items (
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

NoSQL Databases

Document Databases (MongoDB)

// MongoDB document structure
{
  "_id": ObjectId("..."),
  "name": "Alice",
  "email": "alice@example.com",
  "address": {
    "street": "123 Main St",
    "city": "New York"
  },
  "orders": [
    { "product": "Widget", "quantity": 2 },
    { "product": "Gadget", "quantity": 1 }
  ]
}

// MongoDB queries
db.users.find({ "address.city": "New York" })
db.users.find({ orders: { $elemMatch: { quantity: { $gt: 1 } } } })
db.users.updateOne({ _id: id }, { $push: { orders: newOrder } })

Key-Value Stores (Redis)

# Simple key-value
SET user:1:name "Alice"
GET user:1:name

# With expiration (for caching)
SET session:abc123 "user_data" EX 3600

# Lists
LPUSH notifications:user:1 "New message"
LRANGE notifications:user:1 0 10

# Hashes
HSET user:1 name "Alice" email "alice@example.com"
HGETALL user:1

When to Use NoSQL

Document DB: Flexible schemas, nested data, rapid iteration
Key-Value: Caching, sessions, real-time leaderboards
Graph DB: Social networks, recommendation engines
Time-Series: IoT data, metrics, logs

ORMs & Query Builders

ORMs (Object-Relational Mappers) let you work with databases using your programming language's objects.

Prisma (Node.js)

// schema.prisma
model User {
  id    Int     @id @default(autoincrement())
  email String  @unique
  name  String
  posts Post[]
}

model Post {
  id       Int    @id @default(autoincrement())
  title    String
  author   User   @relation(fields: [authorId], references: [id])
  authorId Int
}

// Usage
const user = await prisma.user.create({
  data: { name: 'Alice', email: 'alice@example.com' }
});

const usersWithPosts = await prisma.user.findMany({
  include: { posts: true }
});

SQLAlchemy (Python)

from sqlalchemy import Column, Integer, String, ForeignKey
from sqlalchemy.orm import relationship

class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String(100))
    posts = relationship('Post', back_populates='author')

class Post(Base):
    __tablename__ = 'posts'
    id = Column(Integer, primary_key=True)
    title = Column(String(200))
    author_id = Column(Integer, ForeignKey('users.id'))
    author = relationship('User', back_populates='posts')

# Query
users = session.query(User).filter(User.name.like('%Ali%')).all()

Performance & Indexing

Indexes

Indexes speed up queries but slow down writes. Create them for frequently queried columns.

-- Create index for faster lookups
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_posts_user_id ON posts(user_id);

-- Composite index for queries on multiple columns
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- View query execution plan
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';

Performance Tips

Index foreign keys: Speed up JOINs significantly
Avoid SELECT *: Only fetch columns you need
Use LIMIT: Don't fetch more rows than necessary
Batch operations: Insert/update many rows at once
Connection pooling: Reuse database connections

Choosing a Database

Use Case	Recommended Database
General web app	PostgreSQL
Rapid prototyping	SQLite, MongoDB
High-scale web app	PostgreSQL, MongoDB
Caching layer	Redis
Social network	Neo4j (graph)
Time-series/metrics	InfluxDB, TimescaleDB
Full-text search	Elasticsearch