Ik_guy

Geekskillz_logo1


"Everything that boots is beautiful."


Jun 26
2007

Using :select in Rails for Better Performance

Tags: Programming   Ruby on Rails


Here’s an article for the Ruby on Rails programmers out there. Geek Skillz is home-made using Rails, as an exercise for me to learn this popular framework. Rails makes it very easy to use databases with your web sites, which is a great thing, but with a little problem. As I search for Rails info on the web, I find that a lot of writers don’t seem to have the database background to understand performance issues. They let Rails deal with the database and think nothing more about it. I think it’s very important that if you use databases in your code, in whichever language and framework you choose, you should take time to learn about them in depth!

DISTINCT in Rails

I was looking into a way to list all the values that appear in one column of a table. I didn’t want all the rows, just the unique values that appear in a column. In SQL, this is done with the “DISTINCT” keyword. For example, “SELECT DISTINCT year FROM blog_posts” would list all the years that are in the year column of the blog_posts table.

So, how do you do a SELECT DISTINCT query with Rails? Here’s one solution that I found (I won’t post a link to the site since the author ought to be embarrassed!):

@distinctlist = Item.find(:all).map{ |i| i.fieldname }.uniq

That would certainly work. But what if your “item” table has thousands of rows? Millions of rows? This single line of code could take a very, very long time to complete! I mean, minutes or hours! The programmer might look at that and think it is elegant and compact, but won’t think so when it brings the server to its knees.

How does that statement work? First, it executes this:

Item.find(:all)

That will fetch all columns in all rows from the items table, without exception, and load it into memory. No query can perform more slowly than that, or use more memory. Once that horrendous step is done, all the data is passed to the map function. Once that step is done, all that data is then passed to the uniq function. If you do this on a table that will always have a few rows, then you’re fine. Otherwise, your website visitors will leave your site before the query even completes.

The better solution is to use the :select parameter of the ActiveRecord find method. Here’s an example:

Item.find( :all, :select => 'DISTINCT fieldname' )

The :select parameter is overriding part of Rails’ query to the database. Rather than letting Rails build the entire query, we are telling it which columns to select and how to query them. Rails will still generate most of the query, but use our select list instead.

The advantage of this method is that we pass all the work to the database. Databases are designed to be the very best at analyzing and retrieving data very quickly. That’s what they do! The database has indexes (which you should have created in the database migration) to perform the DISTINCT processing for us in sub-second time. This is far better than telling the database to give us all the data from the table so that we can deal with it using Ruby ourselves. Make the database work for you. Don’t reproduce database functionality in your Ruby code.

Note that my example will only give you one column from the table: fieldname. In the Item objects that get instantiated, you won’t be able to access the other attributes of the model. But that makes sense, given our query. We want to know the unique values of “fieldname”, so we don’t need the other columns of the table. That wouldn't make sense. Which brings us to another benefit: minimal data is sent from the database to your Rails app, lowering memory usage.

Improving Performance with :select

What else can you do with the :select option? You can tailor your queries to give you only the information you need, which can improve performance greatly in some situations. For example, say I have a table of text articles (like I do with this blog). It could have these columns:

created_at (datetime)
title (string)
bodytext (string, containing the entire text of an article)
comments_count (integer)

The bodytext column will have a huge amount of data compared to the other three columns. After years of posting articles on my blog, there can be hundreds of rows in this table. If I want to fetch many of the rows in this table, but don’t need to use the bodytext column, then I can improve performance and memory usage by excluding bodytext from the query, like this:

@articles = Article.find( :all, :select => ‘created_at, title, comments_count’ )

This will use a tiny amount of memory and be very fast compared to the output of this query:

@articles = Article.find( :all )

Using :select means that you need to be mindful of how the query result is used. If a part of my code tries to do article.bodytext on an element, it will find that bodytext does not exist. But I’m not suggesting that you use :select for general use queries. You’ll use it for specific situations where you know what you need and when it will be used.

I hope this is useful to someone. In general, I think carefully whenever I use find(:all) in Rails. Do my :conditions limit the query enough? Is there an index for the :conditions? Can I use :limit? Is there something in the :select option that I should be using to improve performance and memory usage? Those are the kind of things that should pop into your mind too whenever you use ActiveRecord’s find method.



StumbleUpon This! Bookmark This Article Digg This Story

  Dan, on Thursday, June 28, 2007 at 00:09 EDT:
Very useful, thanks.
  Neil, on Thursday, June 28, 2007 at 10:58 EDT:
I'm glad! Thanks for reading.
  Wes, on Wednesday, July 25, 2007 at 14:37 EDT:
Neil:
Very cool. I kept thinking there should be something in find() for doing a DISTINCT, but I couldn't find it, so I stumbled across your way and it's just what I need!
  Yasser, on Sunday, August 05, 2007 at 15:34 EDT:
Thanks for the tip. Is the performance any different from lets say this:

Item.find_by_sql("SELECT distinct column_name FROM table_name")
  Neil, on Wednesday, August 08, 2007 at 17:09 EDT:
Hi Yasser,

I think that your query will be identical to TableName.find(:all, :select => 'distinct column_name'). There may be some exceptions, but I don't know of any. Thanks for the comment!
  Bryan, on Tuesday, August 14, 2007 at 09:41 EDT:
Good tip. So much in rails makes sense after you see it. Very similar to the 'conditions' symbol for the where clause!
  Joel Parker Henderson, on Thursday, October 25, 2007 at 19:20 EDT:
Are you sure you're right about Item.find(:all) being slow?

Does Rails cache results for searches? If so, it would seem that Item.find(:all) can will be faster for short lists because of the Rails cache.

And what if i.fieldname is in the item model but not in the database? For example, if i.fieldname is a caculated field then there's no way to select it from the database without doing something more complex like a stored procedure.

  Neil, on Friday, October 26, 2007 at 01:00 EDT:
"Are you sure you're right about Item.find(:all) being slow?"

Joel, that's a huge oversimplification of the article. Read the article more carefully. Only in some cases might Item.find(:all) be slow. I think you skimmed the article without reading all the caveats. =)
  Jeff, on Sunday, December 02, 2007 at 21:32 EST:
I don't know how to thank you enough
  Narender, on Wednesday, May 07, 2008 at 11:48 EDT:
Good help !, I had the same feeling when
I first came across find(:all) , I thought why these people are inventing again new syntaxes instead of straight away using the sql string query which would be fast and comprehensive.


Post a comment:

Name:
E-mail: (will not be sold or published)
Website: (Optional)

Your comment:

Are you human?
Please enter the
text in the picture:


About Me

ThinkGeek

Feed-icon Technorati


Loot For Geeks:
4inkjets Great Prices and Best Quality!
Man's Wig! All size heads! Handsome! Sideburns! Modacrylic Fiber!
Protection
!
Make money online selling grit! Famous men sell grit!