Joins vs Preload vs Includes vs Eager load in Rails

An in-depth discussion with a practical example to load associated data in Rails

7 min readAug 4, 2020

Rails provide a few ways to load associated data and before moving forward let’s consider one scenario as below,
there is a User table that has a one-to-many association with the Post table. With the scenario set, let’s begin with our discussion.

Joins

The Joins works perfectly fine for comparing and filtering data using associated data.

users = User.joins(:posts).where("posts.published = ?", true)

The query will load all users with posts having published attribute value as true and we can do so only because of the magic of joins.

User Load (1.0ms)  SELECT "users".* FROM "users" INNER JOIN "posts" ON "posts"."user_id" = "users"."id" WHERE (posts.published = TRUE) LIMIT $1  [["LIMIT", 11]]
 => #<ActiveRecord::Relation [#<User id: 1, name: "Tushar", email: "tushar@mail.com", created_at: "2021-01-16 14:55:30", updated_at: "2021-01-16 14:55:30">]>

Pros

It uses INNER JOIN query to load associated data, as a result, associated data does not get impacted.
ex. If the first user has a total of 12 posts,

=> users = User.joins(:posts).where("posts.published = ?", true)
=> users.first.posts.length #it will ouptput 12

Cons

It does not solve the (n+1) issue all alone, since it does not store associated data for future use. (Note: It can be solved by using joins+preload.
ex. users = User.joins(:posts).where(“posts.published = ?”, true).preload(:posts))
It may give a duplicate record in a few cases as it uses INNER JOIN. The solution over it is using select with it.
ex. Let’s consider that the total number of users is 4,

=> users1 = User.joins(:posts) 
=> users1.count 
# it will ouptput more than 4
# with some duplicate elements 
=> users2 = User.joins(:posts).select('distinct users.*')
=> users2.count 
# it will output only those user which have at list 1 post 
# without duplication

Extra : Joins with select example as below,
user = User.joins(:posts).select(“users.id, posts.name as post_name, posts.published as published”).where(id: 1)
It will give output of ActiveRecord array with post name, user id and post published for user with id as 1, so we can have all associated post name as user.map(&:post_name)

Preload

The Preload works perfectly fine for loading associated data.

users = User.preload(:posts)

The query will load and store all users with associated posts data.

User Load (0.7ms)  SELECT "users".* FROM "users" LIMIT $1  [["LIMIT", 11]]
Post Load (0.5ms)  SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN ($1, $2)  [["user_id", 1], ["user_id", 2]]
 => #<ActiveRecord::Relation [#<User id: 1, name: "Tushar", email: "tushar@mail.com", created_at: "2021-01-16 14:55:30", updated_at: "2021-01-16 14:55:30">, #<User id: 2, name: "Yash", email: "yash@mail.com", created_at: "2021-01-16 14:55:30", updated_at: "2021-01-16 14:55:30">]>

Pros

It stores the associated data, as a result, it solves the (n+1) issue

Cons

It by default loads data with two separate queries for loading data for the user and its associated posts data

Separate query for loading User and its associated Post table

It can not be used precisely for comparing and filtering for associated data, such as where can not be used for associated data filtering in preload

Includes

The Includes works perfectly fine for loading, comparing and filtering data with associated data.

users = User.includes(:posts).where(posts: { published: true })

The query will load all users with posts having published attribute values as true.

SQL (0.7ms)  SELECT DISTINCT "users"."id" FROM "users" LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id" WHERE "posts"."published" = $1 LIMIT $2  [["published", true], ["LIMIT", 11]]
SQL (0.4ms)  SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."name" AS t1_r1, "posts"."description" AS t1_r2, "posts"."published" AS t1_r3, "posts"."user_id" AS t1_r4, "posts"."created_at" AS t1_r5, "posts"."updated_at" AS t1_r6 FROM "users" LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id" WHERE "posts"."published" = $1 AND "users"."id" = $2  [["published", true], ["id", 1]]
 => #<ActiveRecord::Relation [#<User id: 1, name: "Tushar", email: "tushar@mail.com", created_at: "2021-01-16 14:55:30", updated_at: "2021-01-16 14:55:30">]>

The includes uses most of the time preload by default (hence gives an error when used withwhere including SQL query inside it without references) as given below,

> User.includes(:posts).where("posts.published = ?", true)

User Load (1.8ms)  SELECT "users".* FROM "users" WHERE (posts.published = TRUE) LIMIT $1  [["LIMIT", 11]]
Traceback (most recent call last):
ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR:  missing FROM-clause entry for table "posts")
LINE 1: SELECT "users".* FROM "users" WHERE (posts.published = TRUE)...
                                             ^

So by default, it will fire two queries the same as preload (unless used as where with Hash query or references) as given below,

> User.includes(:posts)

User Load (0.4ms)  SELECT "users".* FROM "users" LIMIT $1  [["LIMIT", 11]]
Post Load (0.5ms)  SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN ($1, $2)  [["user_id", 1], ["user_id", 2]]
 => #<ActiveRecord::Relation [#<User id: 1, name: "Tushar", email: "tushar@mail.com", created_at: "2021-01-16 14:55:30", updated_at: "2021-01-16 14:55:30">, #<User id: 2, name: "Yash", email: "yash@mail.com", created_at: "2021-01-16 14:55:30", updated_at: "2021-01-16 14:55:30">]>

Note: To use where with includes will work only with the Hash (given above). To use where with SQL-fragment references can be used (reference) as below,

User.includes(:posts).where("posts.published = ?", true).references(:posts)

The includes will by default use preload unless the references are being made on the association so as to give SQL output (which will be the same as the Hash one) as below,

SQL (0.7ms)  SELECT DISTINCT "users"."id" FROM "users" LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id" WHERE (posts.published = TRUE) LIMIT $1  [["LIMIT", 11]]
SQL (0.7ms)  SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."name" AS t1_r1, "posts"."description" AS t1_r2, "posts"."published" AS t1_r3, "posts"."user_id" AS t1_r4, "posts"."created_at" AS t1_r5, "posts"."updated_at" AS t1_r6 FROM "users" LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id" WHERE (posts.published = TRUE) AND "users"."id" = $1  [["id", 1]]
 => #<ActiveRecord::Relation [#<User id: 1, name: "Tushar", email: "tushar@mail.com", created_at: "2021-01-16 14:55:30", updated_at: "2021-01-16 14:55:30">]>

Pros

It can be used for either OUTER LEFT JOIN query or with two separate queries (same as preload) to load left and right table-associated data
It solves the (n+1) issue all alone since it stores associated data for future use
It is smart enough to load associated data with either a single query for simple cases (in terms of joining table) or with two separate queries for other cases (like when working as preload). If a query for includes needs to force into a single query, then it can be performed with includes+references as discussed earlier (which will work the same as eager_load)
ex. User.includes(:posts).references(:posts)

Cons

It can fire two separate queries to load left and right table data (without using any kind of join same as preload)
It sometimes uses a LEFT OUTER JOIN query to load associated data, as a result, associated data gets impacted.
ex. Let’s consider if the first user has a total of 12 posts,

=> users = User.includes(:posts).where("posts.published = ?", true)
=> users.first.posts.length 
# it may ouptput less than 12 as ex.4

Eager load

The Eager load works perfectly fine for loading, comparing and filtering data using associated data.

users = User.eager_load(:posts).where("posts.published = ?", true)

The query will load all users with posts having published attribute values as true.

SQL (0.7ms)  SELECT DISTINCT "users"."id" FROM "users" LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id" WHERE (posts.published = TRUE) LIMIT $1  [["LIMIT", 11]]
SQL (0.6ms)  SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."name" AS t1_r1, "posts"."description" AS t1_r2, "posts"."published" AS t1_r3, "posts"."user_id" AS t1_r4, "posts"."created_at" AS t1_r5, "posts"."updated_at" AS t1_r6 FROM "users" LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id" WHERE (posts.published = TRUE) AND "users"."id" = $1  [["id", 1]]
 => #<ActiveRecord::Relation [#<User id: 1, name: "Tushar", email: "tushar@mail.com", created_at: "2021-01-16 14:55:30", updated_at: "2021-01-16 14:55:30">]>

Pros

It uses a LEFT OUTER JOIN query to load associated data
It solves the (n+1) issue all alone since it stores associated data for future use.
Usually, it loads data with a single query for users and its associated post data in terms of joining the table.

Cons

It uses a LEFT OUTER JOIN query to load associated data, as a result, associated data gets impacted.
ex. If the first user has a total of 12 posts,

=> users = User.eager_load(:posts).where("posts.published = ?", true)
=> users.first.posts.length 
# it may ouptput less than 12 as ex.4

Extra: Since eager_load force data to load with a single query (while joining hte table) to load user and its associated post data, it takes comparatively more time to load a query than to load two separate query in most of the cases. So it depends more on the use-case to use joins/preload/includes/eager_load while loading/comparing/filtering associated data.

Joins vs Preload vs Includes vs Eager load in Rails

An in-depth discussion with a practical example to load associated data in Rails

Joins

Pros

Cons

Preload

Pros

Cons

Includes

Pros

Cons

Eager load

Pros

Cons

Written by Tushar Adhao

No responses yet