Use SQL, not Ruby loops
ActiveRecord makes database work easy. It makes slow code easy too.
Start with count, length, and size, which are not interchangeable:
count: runs a SQL COUNT. Efficient.length: loads all records into memory to count them. Heavy on large datasets.size: pickscountorlengthdepending on whether the collection is already loaded. Use with care.
When you can push work into SQL, do it. It almost always beats manipulating lists in Ruby.
Sorting and filtering
Say you want the first five users of an account, alphabetical by name.
Inefficient, all users loaded into memory:
@users = @account.users.sort_by { |user| user.name.downcase }.first(5)
Efficient, the database does the work:
@users = @account.users.order('LOWER(name)').limit(5)
order and limit in SQL skip the Ruby loop and the object loads.
Aggregation: do it in the database
Computing collaborators in Ruby:
# app/models/user.rb
class User < ActiveRecord::Base
has_many :comments
has_many :articles, through: :comments
def collaborators
articles.collect { |a| a.users }.flatten.uniq.reject { |u| u == self }
end
end
Loading and processing big datasets in Ruby is slow: object instantiation plus array work.
The SQL version:
# app/models/user.rb
class User < ActiveRecord::Base
has_many :comments
has_many :articles, through: :comments
def collaborators
User.joins(comments: { article: :comments })
.where('articles.id IN (?) AND users.id != ?', article_ids, id)
.distinct
end
end
Which runs:
SELECT DISTINCT users.* FROM users
INNER JOIN comments ON comments.user_id = users.id
INNER JOIN users users_comments ON users_comments.id = comments.user_id
INNER JOIN articles ON articles.id = comments.article_id
INNER JOIN comments comments_articles ON comments_articles.article_id = articles.id
WHERE (articles.id in (1) AND users.id != 1)
joins and where push the work into the database, and it gets much faster.
N+1 queries
A common setup:
# app/models/post.rb
class Post < ApplicationRecord
has_many :comments
end
# app/models/comment.rb
class Comment < ApplicationRecord
belongs_to :post
end
# app/controllers/posts_controller.rb
class PostsController < ApplicationController
def index
@posts = Post.all
end
end
# app/views/index.html.erb
<%= render @posts %>
# app/views/posts/_post.html.erb
<%= render @posts.title %>
<%= render @posts.comments %>
# app/views/comments/_comment.html.erb
<%= comment.content %>
Rendering <%= render @posts.comments %> iterates through every post and fires a query for each <%= comment.content %> partial. That is the N+1.
The fix is Post.includes(:comments):
# app/controllers/posts_controller.rb
class PostsController < ApplicationController
def index
# Perf gains
@posts = Post.includes(:comments)
end
end
This eager-loads everything in one DB request instead of N+1.
Takeaway
Reach for SQL first on data work, and keep Ruby for the application logic where SQL falls short. Good code uses both for what each does best.