I'm playing with setting up Typo as a replacement for Wordpress. Both of these packages are webloging software - you use them to make a blog. Typo's quite new, but its built with Ruby on Rails, which I'm interested in exploring, and it has a number of cool Web 2.0 features (AJAX search, etc.). Wordpress is a built with PHP, which isn't very sexy, but the package has been in development for a lot longer and is a little more mature as a result.
The Problem:
After using the conversion script supplied with Typo to copy data from my Wordpress database, convert it to Typo format, and finally save it into a new Typo database, everything was mostly good, except that all the image references were relative, instead of fully qualified. This was a small problem, because the referenced images weren't available via the server that I was testing Typo on -- not a critical deficiency in test mode, but I really wanted to see how Typo would display my real weblog entries - images and all. To make things show up nicely, I needed to replace every image reference with its fully qualified version.
That is, I needed to search each of my blog entries for every string that looked something like the following:
<img src="/images/foo/bar/biz.jpg">
Each time one of these strings was found, I needed to replace it with something like this (adding the bolded bit):
<img src="http://bohnsack.com/images/foo/bar/biz.jpg">
The Conventional Solution:
Most times, this kind of problem is solved by writing a script that loops through records obtained from a SQL SELECT, massages the data, and then does a SQL UPDATE to save every modified record. If you wanted to get this problem solved with Perl in this manner, it would look something like this:
#!/path/to/perl -w
use strict;
use DBI;
my $dbh = DBI->connect(''DBI:mysql:database=typo;host=localhost', 'username', 'password');
my $sth = $dbh->prepare('SELECT id, body FROM articles');
$sth->execute();
while( my $row = $sth->fetchrow_hashref() )
{
$row->{'body'} =~ s/src=\"(\/.*.(jpg|gif|png))/src=\"http:\/\/bohnsack.com$1/g;
$sql = 'UPDATE articles SET body = ? WHERE id = ?';
my $sth2 = $dbh->prepare($sql);
$sth2->execute($row->{'body'}, $row->{'id'});
}
The preceding code isn't hard to write in any language. A Ruby version looks fairly similar:
#!/path/to/ruby
require 'rubygems'
require 'mysql'
dbh = Mysql.real_connect('localhost', 'root', 'username', 'password')
dbh.query('SELECT id, body FROM articles').each_hash do |row|
row['body'].gsub!(/src="(\/.*.(jpg|gif|png))/, 'src=\"http://bohnsack.com\1')
dbh.query 'UPDATE articles SET body = \'' + Mysql.escape_string(row['body']) +
'\' WHERE id = ' + Mysql.escape_string(row['id'])
end
There isn't a huge difference between the Perl and Ruby code shown above, but there is a fairly large difference, when you use the stuff that comes with the Rails framework. With Rails' ActiveRecord, we're able to get this task completed with just a single line of code...
The Rails Solution:
$ ./script/console
Loading development environment.
>> Article.find_all.each { |a| a.body.gsub!(/src="\/(.*.(jpg|gif|png))/, 'src=\"http://bohnsack.com\1'); a.save }
Wow! Pretty cool. I am the one-line-of-code commander.
Let's take this step-by-step:
- console is a script that comes with rails. It invokes Ruby's irb (Interactive Ruby Shell) and takes care of setting of setting up environment, loading libraries, and connecting to your database(s). It gives you a prompt and allows you to program in Ruby interactively:
$ ./script/console
Loading development environment.
>>
- After invoking this script, we use the irb session to access the Article class and its find_all class method. This class springs forth almost automatically from the articles table, as it's defined in the database. There's no application-specific code, except for three lines that tell Article to use ActiveRecord as its base class. find_all returns an array of Article objects. In this case, the array contains an object for every article record in the database:
>> Article.find_all
- Now that we have an array of Article instances, we can use one of Ruby's iterators (Array#each) to loop through each element:
>> Article.find_all.each
- each expects a block (a major Rubyism). A block is a chunk of code delimited by curly braces or a do/end construct. The block gets run by each, with an Article object getting passed to the block via |a| during every iteration. In other words, the block is run for every object in the array, and the block has access to the object via a variable local to the block's scope, defined as the variable between the pipes (|a|):
>> Article.find_all.each { |a| ... }
- Because a.body, a member variable corresponding to the article table's body column, is a string, it automatically has the gsub! method. We use this method to search and replace with a regular expression:
>> Article.find_all.each { |a| a.body.gsub!(/src="\/(.*.(jpg|gif|png))/, 'src=\"http://bohnsack.com\1'); ... }
- Finally, we persist the changes made to each record back to the database by invoking ActiveRecord's save method. Before doing save, the record is only changed in memory - not in the database. When save is executed, ActiveRecord does a database UPDATE under the covers, without us having to write a single line of SQL:
>> Article.find_all.each { |a| a.body.gsub!(/src="\/(.*.(jpg|gif|png))/, 'src=\"http://bohnsack.com\1'); a.save }
That's it. Simple, but very powerful, and all with only one line of code.
Updates Aug 17th:
- Lots of good suggestions in the comments and here - I've implemented some of them, but I was just geeked that I got it to work, not really knowing what I was doing. The statement I created was mostly a result of bouncing on the tab key in irb, picking methods that looked good, looking at their inspect output, and then repeating a few times until everything worked.
- Should have used find(:all) instead of find_all() (which is deprecated).
- update_attribute() would have been cleaner that a.body.gsub!(); a.save()
- Parentheses look better around methods that take more than one argument. I've made this change where I call gsub!().