XataByte: Basics of database indexing

A basic overview of database indexing - what is a database index and what does it do?

Written by

J Edwards

Published on

August 11, 2023

Want to grasp a complex concept in the time it takes to brew your morning coffee? In our XataByte: 3-minute byte-sized read series we offer concise explanations to help you unravel intricate database concepts without the time commitment. Today we are looking at database indexing.

What is database indexing?

At its core, a database index is pretty much like an index in a book - it allows you to quickly find information without having to read through all of the content. In a database, indexing involves creating a structured roadmap that speeds up the retrieval of specific information from a huge collection of data.

Imagine you’ve created a movie database app. Users want to explore films based on filters: genre, release year, lead actors, and more. But as your user base grows, running a search for these filters becomes as slow as watching a buffering video.

Here's where indexes step in, like movie genres neatly organized on your shelf. Let's say you decide to index the "release year" column. Behind the scenes, your database engine creates a digital list with two columns: a release year value and a link to the corresponding movie record.

When a user hunts for movies from a specific year, your query instantly knows where to find the desired films. Thanks to this index magic, the query's efficiency is supercharged, taking mere seconds instead of super long wait times.

Without indexing, databases might need to scan through every row of data to find the desired information. By creating a roadmap that guides database systems to relevant data points, indexing can reduce the time required for queries and can help ensure that query tasks are performed quickly, without unnecessary strain on system resources.

Key considerations for database indexing

When dealing with database indexing, there are a few points to consider:

Selectivity: Not all columns are created equal for indexing. Columns with high selectivity, meaning they have a wide range of distinct values, are better candidates for indexing because they can quickly narrow down search results.
Size vs performance: While indexing speeds up queries, it does come with some overhead in terms of storage and update times. Striking the right balance between indexing and overall system performance is important.
Update impact: As data changes, indexes must be updated as well. This update process can impact database performance, so choosing when and how to update indexes is a strategic decision.

Applying database indexing

Database indexing is applied using SQL statements that define which columns to index and how. Different types of indexes, such as B-tree, hash, and full-text indexes, cater to different data structures and use cases.

Types of database indexes include:

B-Tree index

The name "B-Tree" comes from the structure it forms - a balanced tree. This index type is good at maintaining balance between efficient range queries (like selecting values between A and Z) and quick equality searches (finding a specific value).

Hash index

The term "Hash" refers to the method this index uses. It uses a hash function to create a unique identifier (hash) for each data value. The identifier acts as a pointer to the data, making searches for exact matches super fast.

Bitmap index

"Bitmap" uses a series of 0s and 1s, similar to a bitmap image. Each bit represents the presence or absence of a data value in a particular category. This is pretty efficient for low-cardinality columns.

Full-text index

"Full-Text" indicates the scope of this index type. It's tailored for analyzing large text fields. Instead of just keywords, it considers the entire text, which is great for thorough keyword searches.

Spatial index

"Spatial" refers to the index's focus on spatial, or geographic, data. This index type arranges geographical coordinates (latitude, longitude) to allow for quick spatial queries.

How to apply database indexes

Database indexes are typically created on columns in database tables. These columns are usually used in search conditions or as part of the sorting and filtering operations in queries. When a query is executed, the database engine first checks the index associated with the relevant column(s). This index provides information about the data distribution and the location of data values in the table. By using this information, the database engine can navigate directly to the relevant rows, avoiding the need to scan the entire table.

Let's consider a scenario where we have a products table with millions of records. We want to find products with a specific category:

// Without an index (slower)
SELECT * FROM products WHERE category = 'Tees';

// With an index (faster)
CREATE INDEX idx_category ON products(category);
SELECT * FROM products WHERE category = 'Tees';

In the first query, without an index, the database would need to scan the entire products table to find matching rows. In the second query, after creating an index on the category column, the database engine can jump to the rows related to 'Tees'.

Improving join operations

Indexes are not limited to single-column scenarios. They can work in join operations too. Consider a scenario where you're joining the orders and customers tables:

// Without indexes (slower)
SELECT * FROM orders
JOIN customers ON orders.customer_id = customers.id;

// With indexes (faster)
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_id ON customers(id);
SELECT * FROM orders
JOIN customers ON orders.customer_id = customers.id;

Creating indexes on the columns involved in the join improves the efficiency of the join operation; the query runs smoothly even as the dataset grows.

Fine-tuning aggregations

Aggregation functions like SUM, AVG, and COUNT can benefit from indexes too. Let's say you're calculating the total sales amount from the sales table:

// Without index (slower)
SELECT SUM(amount) FROM sales WHERE date BETWEEN '2023-01-01' AND '2023-08-30';

// With index (faster)
CREATE INDEX idx_date ON sales(date);
SELECT SUM(amount) FROM sales WHERE date BETWEEN '2023-01-01' AND '2023-08-30';

By indexing the date column, the query's execution time is reduced.

So there you have it – database indexing demystified (a little). While we've only scratched the surface of this topic, you now hold the key to understanding how indexes work their magic in optimizing data retrieval.

Want to add to the discussion? Reach out to us on Discord or follow us on Twitter. We'd love to hear your thoughts, answer your questions, and keep you updated on the latest at Xata.