Tags:
Programming
Ruby on Rails
Here’s an article for the Ruby on Rails programmers out there. Geek Skillz is home-made using Rails, as an exercise for me to learn this popular framework. Rails makes it very easy to use databases with your web sites, which is a great thing, but with a little problem. As I search for Rails info on the web, I find that a lot of writers don’t seem to have the database background to understand performance issues. They let Rails deal with the database and think nothing more about it. I think it’s very important that if you use databases in your code, in whichever language and framework you choose, you should take time to learn about them in depth!
DISTINCT in Rails I was looking into a way to list all the values that appear in one column of a table. I didn’t want all the rows, just the unique values that appear in a column. In SQL, this is done with the “DISTINCT” keyword. For example, “SELECT DISTINCT year FROM blog_posts” would list all the years that are in the year column of the blog_posts table. @distinctlist = Item.find(:all).map{ |i| i.fieldname }.uniq
That would certainly work. But what if your “item” table has thousands of rows? Millions of rows? This single line of code could take a very, very long time to complete! I mean, minutes or hours! The programmer might look at that and think it is elegant and compact, but won’t think so when it brings the server to its knees. How does that statement work? First, it executes this: Item.find(:all) That will fetch all columns in all rows from the items table, without exception, and load it into memory. No query can perform more slowly than that, or use more memory. Once that horrendous step is done, all the data is passed to the map function. Once that step is done, all that data is then passed to the uniq function. If you do this on a table that will always have a few rows, then you’re fine. Otherwise, your website visitors will leave your site before the query even completes. The better solution is to use the :select parameter of the ActiveRecord find method. Here’s an example: Item.find( :all, :select => 'DISTINCT fieldname' ) The :select parameter is overriding part of Rails’ query to the database. Rather than letting Rails build the entire query, we are telling it which columns to select and how to query them. Rails will still generate most of the query, but use our select list instead. The advantage of this method is that we pass all the work to the database. Databases are designed to be the very best at analyzing and retrieving data very quickly. That’s what they do! The database has indexes (which you should have created in the database migration) to perform the DISTINCT processing for us in sub-second time. This is far better than telling the database to give us all the data from the table so that we can deal with it using Ruby ourselves. Make the database work for you. Don’t reproduce database functionality in your Ruby code. Note that my example will only give you one column from the table: fieldname. In the Item objects that get instantiated, you won’t be able to access the other attributes of the model. But that makes sense, given our query. We want to know the unique values of “fieldname”, so we don’t need the other columns of the table. That wouldn't make sense. Which brings us to another benefit: minimal data is sent from the database to your Rails app, lowering memory usage. Improving Performance with :select What else can you do with the :select option? You can tailor your queries to give you only the information you need, which can improve performance greatly in some situations. For example, say I have a table of text articles (like I do with this blog). It could have these columns: created_at (datetime) title (string) bodytext (string, containing the entire text of an article) comments_count (integer) The bodytext column will have a huge amount of data compared to the other three columns. After years of posting articles on my blog, there can be hundreds of rows in this table. If I want to fetch many of the rows in this table, but don’t need to use the bodytext column, then I can improve performance and memory usage by excluding bodytext from the query, like this: @articles = Article.find( :all, :select => ‘created_at, title, comments_count’ ) This will use a tiny amount of memory and be very fast compared to the output of this query: @articles = Article.find( :all ) Using :select means that you need to be mindful of how the query result is used. If a part of my code tries to do article.bodytext on an element, it will find that bodytext does not exist. But I’m not suggesting that you use :select for general use queries. You’ll use it for specific situations where you know what you need and when it will be used. I hope this is useful to someone. In general, I think carefully whenever I use find(:all) in Rails. Do my :conditions limit the query enough? Is there an index for the :conditions? Can I use :limit? Is there something in the :select option that I should be using to improve performance and memory usage? Those are the kind of things that should pop into your mind too whenever you use ActiveRecord’s find method. |
About Me
![]()
![]() ![]() |
Very cool. I kept thinking there should be something in find() for doing a DISTINCT, but I couldn't find it, so I stumbled across your way and it's just what I need!
Item.find_by_sql("SELECT distinct column_name FROM table_name")
I think that your query will be identical to TableName.find(:all, :select => 'distinct column_name'). There may be some exceptions, but I don't know of any. Thanks for the comment!
Does Rails cache results for searches? If so, it would seem that Item.find(:all) can will be faster for short lists because of the Rails cache.
And what if i.fieldname is in the item model but not in the database? For example, if i.fieldname is a caculated field then there's no way to select it from the database without doing something more complex like a stored procedure.
Joel, that's a huge oversimplification of the article. Read the article more carefully. Only in some cases might Item.find(:all) be slow. I think you skimmed the article without reading all the caveats. =)
I first came across find(:all) , I thought why these people are inventing again new syntaxes instead of straight away using the sql string query which would be fast and comprehensive.
Post a comment: