A One-liner with Rails and ActiveRecord
I'm playing with setting up Typo as a replacement for Wordpress. Both of these packages are webloging software - you use them to make a blog. Typo's quite new, but its built with Ruby on Rails, which I'm interested in exploring, and it has a number of cool Web 2.0 features (AJAX search, etc.). Wordpress is a built with PHP, which isn't very sexy, but the package has been in development for a lot longer and is a little more mature as a result.
The Problem:
After using the conversion script supplied with Typo to copy data from my Wordpress database, convert it to Typo format, and finally save it into a new Typo database, everything was mostly good, except that all the image references were relative, instead of fully qualified. This was a small problem, because the referenced images weren't available via the server that I was testing Typo on -- not a critical deficiency in test mode, but I really wanted to see how Typo would display my real weblog entries - images and all. To make things show up nicely, I needed to replace every image reference with its fully qualified version.
That is, I needed to search each of my blog entries for every string that looked something like the following:
<img src="/images/foo/bar/biz.jpg">
Each time one of these strings was found, I needed to replace it with something like this (adding the bolded bit):
<img src="http://bohnsack.com/images/foo/bar/biz.jpg">
The Conventional Solution:
Most times, this kind of problem is solved by writing a script that loops through records obtained from a SQL SELECT, massages the data, and then does a SQL UPDATE to save every modified record. If you wanted to get this problem solved with Perl in this manner, it would look something like this:
#!/path/to/perl -w
use strict;
use DBI;
my $dbh = DBI->connect(''DBI:mysql:database=typo;host=localhost', 'username', 'password');
my $sth = $dbh->prepare('SELECT id, body FROM articles');
$sth->execute();
while( my $row = $sth->fetchrow_hashref() )
{
$row->{'body'} =~ s/src=\"(\/.*.(jpg|gif|png))/src=\"http:\/\/bohnsack.com$1/g;
$sql = 'UPDATE articles SET body = ? WHERE id = ?';
my $sth2 = $dbh->prepare($sql);
$sth2->execute($row->{'body'}, $row->{'id'});
}
The preceding code isn't hard to write in any language. A Ruby version looks fairly similar:
#!/path/to/ruby
require 'rubygems'
require 'mysql'
dbh = Mysql.real_connect('localhost', 'root', 'username', 'password')
dbh.query('SELECT id, body FROM articles').each_hash do |row|
row['body'].gsub!(/src="(\/.*.(jpg|gif|png))/, 'src=\"http://bohnsack.com\1')
dbh.query 'UPDATE articles SET body = \'' + Mysql.escape_string(row['body']) +
'\' WHERE id = ' + Mysql.escape_string(row['id'])
end
There isn't a huge difference between the Perl and Ruby code shown above, but there is a fairly large difference, when you use the stuff that comes with the Rails framework. With Rails' ActiveRecord, we're able to get this task completed with just a single line of code...
The Rails Solution:
$ ./script/console
Loading development environment.
>> Article.find_all.each { |a| a.body.gsub!(/src="\/(.*.(jpg|gif|png))/, 'src=\"http://bohnsack.com\1'); a.save }
Wow! Pretty cool. I am the one-line-of-code commander.
Let's take this step-by-step:
- console is a script that comes with rails. It invokes Ruby's irb (Interactive Ruby Shell) and takes care of setting of setting up environment, loading libraries, and connecting to your database(s). It gives you a prompt and allows you to program in Ruby interactively:
$ ./script/console Loading development environment. >> - After invoking this script, we use the
irb session to access the Article class and its find_all class method. This class springs forth almost automatically from the articles table, as it's defined in the database. There's no application-specific code, except for three lines that tell Article to use ActiveRecord as its base class. find_all returns an array of Article objects. In this case, the array contains an object for every article record in the database:
>> Article.find_all - Now that we have an array of Article instances, we can use one of Ruby's iterators (Array#each) to loop through each element:
>> Article.find_all.each - each expects a block (a major Rubyism). A block is a chunk of code delimited by curly braces or a do/end construct. The block gets run by each, with an Article object getting passed to the block via |a| during every iteration. In other words, the block is run for every object in the array, and the block has access to the object via a variable local to the block's scope, defined as the variable between the pipes (|a|):
>> Article.find_all.each { |a| ... } - Because a.body, a member variable corresponding to the article table's body column, is a string, it automatically has the gsub! method. We use this method to search and replace with a regular expression:
>> Article.find_all.each { |a| a.body.gsub!(/src="\/(.*.(jpg|gif|png))/, 'src=\"http://bohnsack.com\1'); ... } - Finally, we persist the changes made to each record back to the database by invoking ActiveRecord's save method. Before doing save, the record is only changed in memory - not in the database. When save is executed, ActiveRecord does a database UPDATE under the covers, without us having to write a single line of SQL:
>> Article.find_all.each { |a| a.body.gsub!(/src="\/(.*.(jpg|gif|png))/, 'src=\"http://bohnsack.com\1'); a.save }
That's it. Simple, but very powerful, and all with only one line of code.
Updates Aug 17th:
- Lots of good suggestions in the comments and here - I've implemented some of them, but I was just geeked that I got it to work, not really knowing what I was doing. The statement I created was mostly a result of bouncing on the tab key in irb, picking methods that looked good, looking at their inspect output, and then repeating a few times until everything worked.
- Should have used find(:all) instead of find_all() (which is deprecated).
- update_attribute() would have been cleaner that a.body.gsub!(); a.save()
- Parentheses look better around methods that take more than one argument. I've made this change where I call gsub!().
August 15th, 02005 at 9:13 pm
One line of code and a little cinnamon and you'll be unstoppable.
August 17th, 02005 at 4:56 am
Nice write up! You could use a.update_attribute() to avoid the explicit save (though I suppose you end up typing more characters), and also find(:all) instead of find_all (which is deprecated).
August 17th, 02005 at 6:10 am
Um, I don't think it'll work correctly if there are multiple images in an article body. Your regular expression will match from the start of the first one to the end of the last one, won't it?
Also, you changed the regular expression from the database update version so that you lose the opening slash.
I think it should be gsub!(/src="(\/[^"]*\.(jpg|gif|png))"/, 'src="http://bohnsack.com\1"')
But then I'm just a Ruby Nuby...
August 17th, 02005 at 6:14 am
Ive been trying to wrap my mind around blocks for a week now, and this post did it. Thanks for breaking it down for the layman.
August 17th, 02005 at 7:06 am
Rails ActiveRecord and blocks
(Found via redhanded) A nice post showing a one-liner using ActiveRecord to search and replace through all records of a table, including the db save. Has a nice explanation of blocks: A One-liner with Rails and ActiveRecord.
...
August 17th, 02005 at 8:06 am
Um, I don’t think it’ll work correctly if there are multiple images in an article body. Your regular expression will match from the start of the first one to the end of the last one, won’t it?
Maybe, but it worked for me. Perhaps this was because there was always a newline between each <img src=...>
August 17th, 02005 at 11:19 am
Mysql understands regex as well. You could have solved this with a single sql query as well - but maybe with only half the fun.
Though I wonder if the conclusion is that sql is even more elegant and easier to use than ruby - probably not. *g*
August 17th, 02005 at 11:33 am
I thought about doing it all in SQL, but I found this that says "MySQL's REGEX is only for matching, not for replacement.". Maybe they're wrong, but reading it caused me to give up on what would have probably taken a good while to figure out (the SQL) and use what I could figure out quickly without reading any docs (ActiveRecord and Ruby).