A basic overview of database indexing - what is a database index and what does it do?
August 11, 2023
Want to grasp a complex concept in the time it takes to brew your morning coffee? In our XataByte: 3-minute byte-sized read series we offer concise explanations to help you unravel intricate database concepts without the time commitment. Today we are looking at database indexing.
At its core, a database index is pretty much like an index in a book - it allows you to quickly find information without having to read through all of the content. In a database, indexing involves creating a structured roadmap that speeds up the retrieval of specific information from a huge collection of data.
Imagine you’ve created a movie database app. Users want to explore films based on filters: genre, release year, lead actors, and more. But as your user base grows, running a search for these filters becomes as slow as watching a buffering video.
Here's where indexes step in, like movie genres neatly organized on your shelf. Let's say you decide to index the "release year" column. Behind the scenes, your database engine creates a digital list with two columns: a release year value and a link to the corresponding movie record.
When a user hunts for movies from a specific year, your query instantly knows where to find the desired films. Thanks to this index magic, the query's efficiency is supercharged, taking mere seconds instead of super long wait times.
Without indexing, databases might need to scan through every row of data to find the desired information. By creating a roadmap that guides database systems to relevant data points, indexing can reduce the time required for queries and can help ensure that query tasks are performed quickly, without unnecessary strain on system resources.
When dealing with database indexing, there are a few points to consider:
Database indexing is applied using SQL statements that define which columns to index and how. Different types of indexes, such as B-tree, hash, and full-text indexes, cater to different data structures and use cases.
Types of database indexes include:
The name "B-Tree" comes from the structure it forms - a balanced tree. This index type is good at maintaining balance between efficient range queries (like selecting values between A and Z) and quick equality searches (finding a specific value).
The term "Hash" refers to the method this index uses. It uses a hash function to create a unique identifier (hash) for each data value. The identifier acts as a pointer to the data, making searches for exact matches super fast.
"Bitmap" uses a series of 0s and 1s, similar to a bitmap image. Each bit represents the presence or absence of a data value in a particular category. This is pretty efficient for low-cardinality columns.
"Full-Text" indicates the scope of this index type. It's tailored for analyzing large text fields. Instead of just keywords, it considers the entire text, which is great for thorough keyword searches.
"Spatial" refers to the index's focus on spatial, or geographic, data. This index type arranges geographical coordinates (latitude, longitude) to allow for quick spatial queries.
Database indexes are typically created on columns in database tables. These columns are usually used in search conditions or as part of the sorting and filtering operations in queries. When a query is executed, the database engine first checks the index associated with the relevant column(s). This index provides information about the data distribution and the location of data values in the table. By using this information, the database engine can navigate directly to the relevant rows, avoiding the need to scan the entire table.
Let's consider a scenario where we have a
products table with millions of records. We want to find products with a specific category:
// Without an index (slower) SELECT * FROM products WHERE category = 'Tees';
// With an index (faster) CREATE INDEX idx_category ON products(category); SELECT * FROM products WHERE category = 'Tees';
In the first query, without an index, the database would need to scan the entire
products table to find matching rows. In the second query, after creating an index on the
category column, the database engine can jump to the rows related to 'Tees'.
Indexes are not limited to single-column scenarios. They can work in join operations too. Consider a scenario where you're joining the
// Without indexes (slower) SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id;
// With indexes (faster) CREATE INDEX idx_customer_id ON orders(customer_id); CREATE INDEX idx_id ON customers(id); SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id;
Creating indexes on the columns involved in the join improves the efficiency of the join operation; the query runs smoothly even as the dataset grows.
Aggregation functions like SUM, AVG, and COUNT can benefit from indexes too. Let's say you're calculating the total sales amount from the
// Without index (slower) SELECT SUM(amount) FROM sales WHERE date BETWEEN '2023-01-01' AND '2023-08-30';
// With index (faster) CREATE INDEX idx_date ON sales(date); SELECT SUM(amount) FROM sales WHERE date BETWEEN '2023-01-01' AND '2023-08-30';
By indexing the
date column, the query's execution time is reduced.
So there you have it – database indexing demystified (a little). While we've only scratched the surface of this topic, you now hold the key to understanding how indexes work their magic in optimizing data retrieval.
Join our community of subscribers to stay up to date with the latest news, tips and thought leadership, delivered directly to your inbox.