Index

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Some databases extend the power of indexing by letting developers create indexes on functions or expressions.

Rails automatically adds an index to the id field of a database table, but in this case we’re asking it to find a row based on another piece of data. If you do that often enough you should almost certainly add an index for that data.

# a migration to index to the users email column

class AddEmailIndexToUsers < ActiveRecord::Migration

  def change

    add_index :users, :email, :unique => true

  end

end

Joining tables

Another classic example of needing to add an index to increase performance is when you are joining database tables on two unindexed strings. For instance: all users joined by email address to a table of invitations.

class User < ActiveRecord::Base

  has_many :order_invites, :foreign_key => ’email’, :primary_key => ’email’

end

If either the Users or Invitations table has a large number of rows this query will not perform well because the database is having to scan every record in the invitations table to check for a match against the email from the users table.

 class AddEmailIndexToInvitations < ActiveRecord::Migration

  def change

    add_index :invitations, :email

  end

end

Index Types

Although Postgres by defaults creates B-Tree index when using CREATE INDEX command, there are a couple of more indexes that will be certainly useful in many use cases.

B-Tree Index

B-Tree is a self-balancing tree data structure which keeps data ordered and easy to search. This index is appropriate for equality and range queries (using operators like >=, < etc.) and will work great with text, timestamp and number fields.

B-Tree indexes are a reasonable default for most of the queries, but not for all of them. The limitation comes from the underlying structure. Discussing the details of the B-Tree data structure itself is beyond the scope of this article; nevertheless, it’s worth keeping in mind that it’s a similar data structure to a binary search tree.

Hash Index

Before Postgres 10, the usage of hash indexes was discouraged since they used to be not WAL-logged. Fortunately, it’s changed in Postgres 10, and we can use them safely without worrying about rebuilding the index if something goes wrong with our database that would cause a crash. The use cases where hash indexes are useful are very limited, as they work only for equality, but they are a bit more efficient for this kind of queries comparing to b-tree indexes. If you store tokens for example and perform lookups by the token value, hash indexes would be a good way to optimize such queries.

BRIN Index (Block Range Index)

BRIN indexes were introduced in Postgres 9.5 which make them a pretty new addition. They tend to work very well for the large sets of ordered data, e.g., statistical data collected with timestamps which are later filtered by the time range. They will perform better than b-tree indexes in such case, although the difference won’t be drastic. However, the different of the size of the index will be huge – BRIN index can be smaller by literally few orders of magnitude comparing to b-tree index.

GIN Index (Generalized Inverted Index)

GIN Indexes are the perfect choice for “composite values” where you perform a query which looks for an element within such “composite”. That is the index you will most likely want to use for jsonb, array or hstore data structures. They are also an excellent choice for full-text search.

GIST Index (Generalized Inverted Search Tree Index)

GiST Indexes will be a good choice when the records overlap values under the same column. They are commonly used for geometry types and full-text search as well. The difference between GIN and GiST Index when it comes to full-text search is that GiST Index will be less taxing on writes comparing to GIN (as it is faster to build). But since it’s a lossy index, there might be some extra overhead involved for reads, which makes GIN index a better choice when you mostly care about reads optimization.