Mastering Databases and SQL

A comprehensive course designed to take you from a complete beginner to an advanced user in database management and SQL. Learn the core principles, write powerful queries, and manage databases like a professional.

Module 1: Database Fundamentals

Lesson 1: Introduction to Databases and RDBMS

A database is an organized collection of structured information, or data, typically stored electronically in a computer system. It is a system for storing and retrieving data in an efficient manner. At its core, a database is a way to manage information so that it is easily accessible, managed, and updated.

What is an RDBMS?

RDBMS stands for Relational Database Management System. It's a type of database that stores data in tables, which are made up of rows and columns. The "relational" part comes from the fact that data within these tables can be linked, or "related," to one another using common fields. This relational model is the most common model for databases today.

Why use an RDBMS?

  • Data Integrity: It ensures data is consistent and accurate.
  • Scalability: It can handle massive amounts of data and many concurrent users.
  • Structure: It provides a clear, logical way to organize information.
  • ACID Properties: Transactions are Atomic, Consistent, Isolated, and Durable, ensuring reliability.

Example: A Simple User Database

Imagine a database for a social media site. We would have tables for users and posts. The users table might have columns like user_id, username, and email. The posts table would have a post_id, post_content, and a user_id. The user_id in the posts table is a foreign key that relates a post back to a specific user in the users table.

Module 1: Database Fundamentals

Lesson 2: MySQL Installation and Setup

MySQL is one of the most popular open-source RDBMS in the world. It is a robust and widely-used system, making it an excellent choice for a beginner. In this lesson, we'll walk through the process of installing MySQL on your local machine.

Installation Steps (General)

While the specific steps vary slightly depending on your operating system (Windows, macOS, Linux), the general process is as follows:

  1. Download the Installer: Go to the official MySQL website and download the appropriate Community Server installer for your OS.
  2. Run the Installer: Launch the installer and follow the on-screen prompts. Choose the "Developer Default" option to get all the essential tools, including the server, client, and MySQL Workbench (a graphical interface).
  3. Configuration: During the installation, you will be asked to configure the server. You'll need to set a root password. Remember this password, as it's the key to administering your server.
  4. Install Workbench: MySQL Workbench is a powerful tool that allows you to manage your databases with a visual interface. It is highly recommended for beginners.

Try It Yourself: Install MySQL

Follow the steps above to download and install MySQL Community Server and MySQL Workbench. After installation, launch MySQL Workbench and try to connect to your local MySQL instance using the root password you just created. If you can connect successfully, you're all set!

Module 1: Database Fundamentals

Lesson 3: Database Design Principles

A well-designed database is the foundation of a successful application. Proper database design helps in reducing data redundancy, improving data integrity, and ensuring that your database can be easily scaled and maintained. This is arguably the most important step in building any database-driven system.

Key Principles:

  • Identify Entities: Determine the main objects you need to store data about (e.g., Customers, Orders, Products).
  • Define Attributes: For each entity, decide on the data points you need to store (e.g., for Customers, you need first_name, last_name, email).
  • Establish Relationships: Figure out how the entities are connected. This could be a one-to-one, one-to-many, or many-to-many relationship.
  • Assign Primary Keys: Every table should have a column (or set of columns) that uniquely identifies each row. This is the Primary Key.
  • Assign Foreign Keys: A Foreign Key is a column in one table that references the primary key of another table, creating the link between them.

Example: E-commerce Database Design

Let's design a simple e-commerce database. We have two main entities: Customers and Orders.

  • Customers Table: customer_id (Primary Key), first_name, last_name, email.
  • Orders Table: order_id (Primary Key), order_date, order_total, and customer_id (Foreign Key).

The customer_id in the Orders table links each order to the customer who placed it. This is a classic one-to-many relationship (one customer can have many orders).

Module 1: Database Fundamentals

Lesson 4: Entity Relationship Diagrams

An Entity Relationship Diagram (ERD) is a visual representation of your database design. It uses a specific set of symbols to show the entities (tables) in a database and the relationships between them. ERDs are invaluable for planning a database before you write any code and for communicating the structure to other developers.

Key Components of an ERD:

  • Entities: Represented by rectangles. They correspond to tables.
  • Attributes: Represented by ovals. These are the columns of the table. The primary key is often underlined.
  • Relationships: Represented by diamonds or lines with specific symbols that indicate the type of relationship (e.g., one-to-many, many-to-many).

Try It Yourself: Create an ERD

Using the E-commerce example from the previous lesson (Customers and Orders), try to sketch a simple ERD. Use a rectangle for each table and connect them with a line. On the line, you would use "crow's foot" notation to indicate the one-to-many relationship, with the crow's foot symbol on the "many" side (the Orders table).

Module 1: Database Fundamentals

Lesson 5: Normalization and Data Integrity

Normalization is a process used in database design to organize tables in a way that reduces data redundancy and improves data integrity. It involves a series of rules called "normal forms" (1NF, 2NF, 3NF, etc.). The goal is to ensure that data is stored in a logical and non-redundant manner.

The Normal Forms (Simplified):

  • 1NF (First Normal Form): Each cell contains a single, atomic value, and each column has a unique name.
  • 2NF (Second Normal Form): The table must be in 1NF, and all non-key attributes are fully dependent on the primary key.
  • 3NF (Third Normal Form): The table must be in 2NF, and all columns that are not part of the primary key are independent of each other (i.e., no transitive dependencies).

Data Integrity:

Data Integrity refers to the accuracy and consistency of data throughout its lifecycle. Normalization is a key tool for achieving it. Other tools include constraints, which we'll cover in a later lesson.

Example of Normalization:

Imagine a table with order_id, customer_id, customer_name, and customer_email. This is not in 3NF because customer_name and customer_email are dependent on customer_id, not the primary key order_id. To normalize, we would split this into two tables: one for Orders and one for Customers, linking them with a foreign key.

Module 2: SQL Basics

Lesson 1: Creating Databases and Tables

SQL (Structured Query Language) is the standard language for managing and manipulating relational databases. We'll start with the fundamental commands for creating the very structure of our database.

Creating a Database:

The CREATE DATABASE statement is used to create a new database. It's a simple, straightforward command.


CREATE DATABASE school_db;
                        

Creating a Table:

To create a new table, you use the CREATE TABLE statement. You need to specify the table name and then define each column with its data type and any constraints.


USE school_db;

CREATE TABLE students (
    student_id INT PRIMARY KEY AUTO_INCREMENT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100)
);
                        

Module 2: SQL Basics

Lesson 2: Data Types and Constraints

Choosing the right data type for each column is crucial for efficient data storage and performance. Constraints are rules that enforce data integrity and prevent invalid data from being inserted into the database.

Common Data Types:

  • INT: Stores whole numbers.
  • VARCHAR(size): Stores variable-length strings of characters, up to a specified size.
  • DATE: Stores dates in YYYY-MM-DD format.
  • DECIMAL(P,S): Stores fixed-point numbers, useful for money. P is the total number of digits, and S is the number of digits after the decimal point.

Common Constraints:

  • PRIMARY KEY: Uniquely identifies each row in a table.
  • NOT NULL: Ensures a column cannot have a NULL value.
  • UNIQUE: Ensures all values in a column are different.
  • FOREIGN KEY: Links data between tables.

Try It Yourself: Add Constraints

Modify the students table from the previous lesson to include a UNIQUE constraint on the email column, and a NOT NULL constraint on the first_name and last_name columns. Think about why these constraints are important for data integrity.

Module 2: SQL Basics

Lesson 3: INSERT, UPDATE, DELETE Operations

These three commands are the core of Data Manipulation Language (DML). They allow you to add, change, and remove data from your tables.

INSERT (Adding Data):

The INSERT INTO statement is used to add new rows to a table. You can specify the columns to insert into or provide values for all columns in the correct order.


INSERT INTO students (first_name, last_name, email)
VALUES ('John', 'Doe', 'john.doe@example.com');
                        

UPDATE (Changing Data):

The UPDATE statement is used to modify existing records in a table. It's crucial to use a WHERE clause to specify which rows to update; otherwise, you will update all rows in the table.


UPDATE students
SET email = 'new.email@example.com'
WHERE student_id = 1;
                        

DELETE (Removing Data):

The DELETE FROM statement is used to remove one or more rows from a table. Just like with UPDATE, always use a WHERE clause to avoid deleting all data.


DELETE FROM students
WHERE student_id = 1;
                        

Module 2: SQL Basics

Lesson 4: SELECT Queries and Filtering

The SELECT statement is the most used SQL command. It allows you to retrieve data from one or more tables. The WHERE clause is a powerful tool for filtering the data you retrieve.

Basic SELECT:

The asterisk (*) is a wildcard that selects all columns from a table.


SELECT * FROM students;
                        

Selecting Specific Columns:

You can specify which columns you want to retrieve by listing their names.


SELECT first_name, last_name FROM students;
                        

Filtering with WHERE:

The WHERE clause is used to extract only those records that fulfill a specified condition.


SELECT * FROM students
WHERE last_name = 'Doe';
                        

Module 2: SQL Basics

Lesson 5: Sorting and Limiting Results

Once you've selected your data, you often need to organize it in a specific order and retrieve only a certain number of records. The ORDER BY and LIMIT clauses help you do just that.

Sorting with ORDER BY:

The ORDER BY clause is used to sort the result set in ascending (ASC) or descending (DESC) order.


SELECT * FROM students
ORDER BY last_name ASC;
                        

Limiting Results with LIMIT:

The LIMIT clause is used to specify the number of records to return. This is especially useful for pagination or retrieving the top N results.


SELECT * FROM students
ORDER BY last_name ASC
LIMIT 10;
                        

Module 3: Advanced Queries

Lesson 1: Joins (INNER, LEFT, RIGHT, FULL)

Joins are used to combine rows from two or more tables based on a related column between them. This is the cornerstone of working with relational databases.

INNER JOIN:

Returns only the rows that have matching values in both tables.


SELECT
    students.first_name,
    courses.course_name
FROM
    students
INNER JOIN
    courses ON students.student_id = courses.student_id;
                        

LEFT JOIN:

Returns all rows from the left table and the matching rows from the right table. If there is no match, the right side will have NULL values.


SELECT
    students.first_name,
    courses.course_name
FROM
    students
LEFT JOIN
    courses ON students.student_id = courses.student_id;
                        

Module 3: Advanced Queries

Lesson 2: Subqueries and Nested Queries

A subquery (or nested query) is a query within another SQL query. It is a powerful way to perform complex data retrieval, where the inner query's result is used by the outer query. Subqueries can be used in WHERE, FROM, and SELECT clauses.

Using a Subquery in the WHERE clause:

This is the most common use case. The inner query returns a single value or a list of values that the outer query then uses to filter its results.


SELECT
    first_name,
    last_name
FROM
    employees
WHERE
    department_id = (SELECT department_id FROM departments WHERE department_name = 'Sales');
                        

Module 3: Advanced Queries

Lesson 3: Aggregate Functions and GROUP BY

Aggregate functions perform a calculation on a set of values and return a single value. The GROUP BY clause is often used with these functions to group rows that have the same values into summary rows.

Common Aggregate Functions:

  • COUNT(): Returns the number of rows.
  • SUM(): Calculates the sum of a numeric column.
  • AVG(): Calculates the average value of a numeric column.
  • MAX(): Returns the highest value in a column.
  • MIN(): Returns the lowest value in a column.

Using GROUP BY:


SELECT
    department_id,
    AVG(salary) AS average_salary
FROM
    employees
GROUP BY
    department_id;
                        

This query would return the average salary for each department, thanks to the GROUP BY clause.

Module 3: Advanced Queries

Lesson 4: HAVING Clause and Window Functions

The HAVING clause is used to filter the results of a GROUP BY query. It is similar to the WHERE clause but operates on the aggregated data. Window functions perform calculations across a set of table rows that are somehow related to the current row.

Using HAVING:

This example finds departments where the average salary is greater than $60,000.


SELECT
    department_id,
    AVG(salary) AS average_salary
FROM
    employees
GROUP BY
    department_id
HAVING
    AVG(salary) > 60000;
                        

Window Functions (e.g., RANK()):

Window functions don't group rows; instead, they "look at" a set of rows and return a value for each row. A common example is ranking.


SELECT
    first_name,
    salary,
    RANK() OVER (ORDER BY salary DESC) as rank
FROM
    employees;
                        

Module 3: Advanced Queries

Lesson 5: Common Table Expressions (CTEs)

A Common Table Expression (CTE) is a temporary, named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs improve the readability and reusability of complex queries.

How to Use a CTE:

CTEs start with the WITH keyword, followed by the CTE name and the query that defines it. You can then use the CTE name in your main query as if it were a regular table.


WITH
    HighSalaries AS (
        SELECT
            employee_id,
            first_name,
            salary
        FROM
            employees
        WHERE
            salary > 70000
    )
SELECT
    *
FROM
    HighSalaries;
                        

Module 4: Database Administration

Lesson 1: User Management and Permissions

In a production environment, you should never use the root user for daily operations. User management involves creating different user accounts with specific permissions, ensuring that each user has only the privileges they need to perform their job.

Creating a User:

The CREATE USER statement is used to create a new user and assign a password.


CREATE USER 'dev_user'@'localhost' IDENTIFIED BY 'a_strong_password';
                        

Granting Permissions:

The GRANT statement is used to give specific privileges to a user on a database or a table.


GRANT SELECT, INSERT, UPDATE, DELETE ON school_db.* TO 'dev_user'@'localhost';
FLUSH PRIVILEGES;
                        

Module 4: Database Administration

Lesson 2: Indexes and Query Optimization

An index is a special lookup table used by the database search engine to speed up data retrieval. Think of it like the index in a book; instead of reading the entire book, you can quickly find the page number for a specific topic. An index is created on columns that are frequently used in WHERE clauses, JOIN conditions, and ORDER BY clauses.

Creating an Index:


CREATE INDEX idx_last_name ON students (last_name);
                        

The EXPLAIN command:

The EXPLAIN statement is a crucial tool for query optimization. It shows how MySQL executes a query, allowing you to see if it's using an index and where potential bottlenecks are.


EXPLAIN SELECT * FROM students WHERE last_name = 'Doe';
                        

Module 4: Database Administration

Lesson 3: Backup and Recovery Strategies

Regular backups are essential for any production database. They provide a safety net in case of hardware failure, data corruption, or accidental deletion. You should have a clear strategy for both backing up your data and restoring it in an emergency.

Backup Methods:

  • Logical Backups: Use tools like mysqldump to create SQL files that contain the CREATE TABLE and INSERT statements. These are portable and easy to restore.
  • Physical Backups: Involve making a copy of the actual database files on the file system. These are faster but less flexible.

Restoring from Backup:

To restore a logical backup, you simply execute the SQL file on your MySQL server.


mysql -u root -p school_db < school_db_backup.sql
                        

Module 4: Database Administration

Lesson 4: Stored Procedures and Functions

Stored Procedures and Functions are pre-compiled SQL code that is stored on the database server. They can be executed at any time, which can improve performance and reduce network traffic. They are also useful for encapsulating complex business logic.

Creating a Stored Procedure:

A stored procedure is a set of SQL statements with a name that is stored in the database. It can take parameters and can perform various operations.


DELIMITER //
CREATE PROCEDURE GetStudent(IN studentId INT)
BEGIN
    SELECT * FROM students WHERE student_id = studentId;
END //
DELIMITER ;

CALL GetStudent(1);
                        

Module 4: Database Administration

Lesson 5: Triggers and Events

A Trigger is a special type of stored procedure that is automatically executed in response to certain events on a particular table, such as an INSERT, UPDATE, or DELETE. They are often used to enforce business rules or maintain audit trails.

Creating a Trigger:

This example creates a trigger that fires before a new student is inserted into the students table. It would typically be used to perform some validation.


DELIMITER //
CREATE TRIGGER before_student_insert
BEFORE INSERT ON students
FOR EACH ROW
BEGIN
    -- You could add logic here to validate new.email
END //
DELIMITER ;
                        

Module 5: Performance & Advanced Topics

Lesson 1: Query Performance Tuning

Slow queries can significantly impact the performance and user experience of your application. Query tuning is the process of optimizing your SQL queries to execute faster and more efficiently. This often involves using the right indexes, rewriting queries, and avoiding common pitfalls.

Tips for Faster Queries:

  • Use Indexes: As discussed, indexes are your best friend. Make sure you have indexes on columns used in WHERE, JOIN, and ORDER BY clauses.
  • Avoid `SELECT *`: Only select the columns you actually need. This reduces the amount of data transferred and processed.
  • Use Joins Efficiently: Prefer INNER JOIN when possible, as it's often faster. Make sure your join columns are indexed.
  • Use `EXPLAIN`: Always use EXPLAIN to analyze your queries and understand their execution plan.

Module 5: Performance & Advanced Topics

Lesson 2: Database Monitoring and Maintenance

Database monitoring is the process of tracking the performance and health of your database system. Regular maintenance tasks, such as optimizing tables and checking for errors, are crucial for a healthy and performant database.

Monitoring Key Metrics:

  • CPU Usage: High CPU usage can indicate slow queries or heavy load.
  • Memory Usage: Ensure your database has enough memory to cache data and execute queries efficiently.
  • Disk I/O: High I/O can be a sign of inefficient queries or a need for faster storage.
  • Slow Query Log: MySQL has a log file that records all queries that take longer than a specified time to execute. This is an excellent way to identify performance issues.

Try It Yourself: Check for Slow Queries

In MySQL, you can enable the slow query log by editing your configuration file (`my.cnf` or `my.ini`). Add the following lines and restart the MySQL service:


slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
long_query_time = 2
                            

Module 5: Performance & Advanced Topics

Lesson 3: Replication and Clustering

For applications that require high availability, scalability, or disaster recovery, Replication and Clustering are essential. Replication involves copying data from one database server (the master) to others (the slaves), while clustering links multiple servers to work together as a single system.

Replication:

A typical replication setup involves one master database that handles all write operations, and one or more slave databases that handle read operations. This is a common strategy to offload read traffic from the master server.

Clustering:

Database clustering is a more advanced technique where multiple database servers are configured to work together as a single entity. This provides high availability and fault tolerance, as if one server fails, another can immediately take its place.

Module 5: Performance & Advanced Topics

Lesson 4: Security Best Practices

Database security is paramount to protect sensitive data from unauthorized access, modification, or destruction. A single security vulnerability can lead to a catastrophic data breach.

Key Security Measures:

  • Strong Passwords: Use complex, unique passwords for all database users.
  • Principle of Least Privilege: Grant users only the minimum permissions they need to perform their job.
  • Regular Backups: Ensure you have a solid backup and recovery plan.
  • SQL Injection Prevention: Use prepared statements or parameterized queries in your application code to prevent SQL injection attacks.
  • Encryption: Encrypt sensitive data both in transit and at rest.

Module 5: Performance & Advanced Topics

Lesson 5: Integration with Programming Languages

The final step in your journey is to connect your database to an application. Most modern programming languages have libraries or connectors that allow you to interact with a database.

Common Integration Methods:

  • PHP: Uses extensions like MySQLi or PDO to connect and execute queries.
  • Python: Libraries like mysql-connector-python allow you to connect and work with data.
  • Node.js: The mysql2 package is a popular choice for connecting from a Node.js application.

Example: PHP and MySQL

Here is a basic example of how you would connect to a MySQL database using PHP with the MySQLi extension.


<?php
$servername = "localhost";
$username = "dev_user";
$password = "a_strong_password";
$dbname = "school_db";

// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);

// Check connection
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
}
echo "Connected successfully";

// Close connection
$conn->close();
?>