Redesign your models for performance in rails
Inefficient SQL with Extensive Joins
When dealing with a substantial number of rows and multiple table joins, SQL queries may significantly slow down your Rails application.
This inefficiency is often pronounced when attributes of several related models are required in a single query.
Here’s a classic example using where
clauses leading to overly complex SQL queries:
# app/models/article.rb
class Article < ActiveRecord::Base
belongs_to :state # foreign_key: state_id
belongs_to :category # foreign_key: category_id
belongs_to :user # columns: id, username (index), user_id (fk, index)
end
# app/controllers/articles_controller.rb
def show
Article.includes([:state, :category]).
where("states.name" => "published",
"categories.name" => "hiking",
"articles.user_id" => 123)
end
This approach requires multiple joins and can greatly diminish performance in a high-volume app.
Scope Refactoring
Instead of cramming all the logic into a verbose query, refactor using scopes for better maintainability and performance.
# app/models/article.rb
class Article < ActiveRecord::Base
belongs_to :state
belongs_to :category
belongs_to :user
scope :for_state, -> (name) { joins(:state).where(states: { name: name }) }
scope :for_category, -> (name) { joins(:category).where(categories: { name: name }) }
end
# Usage:
Article.for_state("published").for_category("hiking")
Scopes make the code more modular and maintainable. However, this still does not improve our performance.
In fact, it’s almost the same query. You can check in your console with .to_sql
if you really want to.
The solution: Denormalize
While normalization is good for data integrity and avoiding redundancy, it sometimes hampers performance.
Denormalization simplifies the schema at the cost of redundancy but enhances read performance.
Here’s how you might reconsider your model if high query performance is critical:
# app/models/article.rb
class Article < ActiveRecord::Base
# Enums simulate what a related table would have done with less overhead
STATES = %w(draft review published archived)
CATEGORIES = %w(tips faqs misc hiking)
validates :state, inclusion: { in: STATES }
validates :category, inclusion: { in: CATEGORIES }
end
# Easier query without joins:
Article.where(state: "published", category: "hiking")
However, if your application isn’t a multi-tenant system (like a SaaS) and involves simpler, less dynamic categorization, consider using enums directly in your table instead of separate tables.
For instance, if your application only handles one blog for a single user, you can avoid the complexities of additional category tables by directly using enums.
Another Denormalization Example
Think critically about when to denormalize - this decision greatly depends on the specific needs and scale of your application.
By aligning your database design with your application’s requirements, you can optimize performance and maintain clean, effective code.
As a rule of thumb, don’t try to normalize something that doesn’t have its own controller in the app.
Take a tagging system, for example. It could be done with a ‘has and belongs to many’ setup or just with a jsonb
array as a column.
In the latter case, a valid reason to opt for a HABTM setup is if your user needs to create new tags with names and descriptions and search within them.
Otherwise, just denormalize that complexity ;)
Conclusion
Yes, performance issues are mostly solved by better defining the problem to eliminate the existing complexity in the code.
The added complexity that could have been removed from the code is often there because the team did not invest enough time in defining the problem and questioning the original requirements.
Debates are not a liability because we are not coding.
Coding things that we do not understand is the liability because ultimately we will be in a situation where we will create other meetings and other PRs to re-read code that is causing issues.
Related posts
Building a Rails Engine