Textpattern to Serendipity Migration Scripts
Tuesday, October 30. 2007
Not terribly riveting reading but this is the story of my migration from textpattern (version 4.0.4) to serendipity (version 1.1.3). This includes some PHP scripts I used to fix some problems on import - they're not the best or the most robust in the world but they worked for me and I wanted to pop them here in case they work for someone else. Feedback on the code isn't necessary because I won't be using it again ;)
Firstly let me say that serendipity has a fabulous import feature which brought in 90% of my content about 30 seconds after I saw the button labelled "Import Data". This is a real hurdle-remover for new users and definitely clinched the deal for me. Having done this I found that:
The images hadn't been transferred and I also realised that the custom tags that textpattern uses, such as:
This is going to be a really long article, but do read on if its helpful - I'm posting the rest as extended content so the disinterested need not scroll past it!
A running start
Firstly let me say that serendipity has a fabulous import feature which brought in 90% of my content about 30 seconds after I saw the button labelled "Import Data". This is a real hurdle-remover for new users and definitely clinched the deal for me. Having done this I found that:
- I had weird entity encodings everywhere
- my images hadn't arrived
- everything seemed much more spaced out than it had been on the old blog
Images
The images hadn't been transferred and I also realised that the custom tags that textpattern uses, such as:
hadn't been handled by the import script, so I kind of had two problems to solve. I started by transferring all the images across and renaming them to something more helpful.
<txp:image id="120" />
This is going to be a really long article, but do read on if its helpful - I'm posting the rest as extended content so the disinterested need not scroll past it!
Here's the code I used to move my images (both databases were on the same server as were both installations - this helped to simplify matters quite a bit. Also I have used an exec() call in this script - maybe not the most secure thing in the world but perfect for this fast-and-dirty one-off import script.
This placed all my images in the relavant folder in the serendipity upload folder, which was great. But I still had those mangled image tags to deal with. I wrote a script which edited each entry (its not pretty but it really did work) and fixed this - here it is:
It took a few attempts (mostly involving getting the same image multiple times on posts where there was more than one image.
The way that textpattern stores character entities gave me a problem when they were then imported into textpattern, and I had ’ and other such "features" all over the place. With a lot of swearing and some help from my friend (thanks Sara!) I eventually got it untangled using this:
There might be a few others that got through but putting these in certainly sorted out the majority of the problems with the encoding. I'm a little unclear why textpattern and serendipity couldn't communicate on this front but the main thing is that I did get most of my content through.
The spacing problems that I had early on I "fixed" by running my code through a script to remove a load of whitespace - actually I was seeing this output because I had the nl2br plugin turned on. Sadly now I've formatted all my existing posts I can't turn it off without re-reformatting them so I'm just living I with what I have for the time being!
I've had quite a bit of experience doing data import routines and I applied some of the same principles to solving the problems I experiences with this migration from textpattern to serendipity. I'm really happy with the new platform, I'd like to hang on to some of the stuff that I used for this, and so I've stored it here. If it helps you too, then great :)
<?php
$source_dir = '/path/to/textpattern/images/';
$target_dir = '/path/to/serendipity/uploads/';
echo "ready\n";
$db = mysql_connect('localhost','username','password');
mysql_select_db('textpattern',$db);
$sql = "select * from txp_image";
echo "connected\n";
$result = mysql_query($sql,$db) or die(mysql_error());
while($row=mysql_fetch_assoc($result)) {
echo $row['name']."\n";
exec('cp '.$source_dir.$row['id'].$row['ext'].' '.$target_dir.$row['category'].'/'.$row['name']);
}
?>
This placed all my images in the relavant folder in the serendipity upload folder, which was great. But I still had those mangled image tags to deal with. I wrote a script which edited each entry (its not pretty but it really did work) and fixed this - here it is:
<?php
echo "ready\n";
$db = mysql_connect('localhost','username','password');
mysql_select_db('textpattern',$db);
$sql = "select * from txp_image";
echo "connected\n";
$result = mysql_query($sql,$db) or die(mysql_error());
while($row=mysql_fetch_assoc($result)) {
$image_properties[$row['id']] = $row;
}
mysql_select_db('serendipity',$db);
echo "connected\n";
$sql = "select * from serendipity_entries";
$results = mysql_query($sql, $db) or die(mysql_error());
while($row = mysql_fetch_assoc($results)) {
$sql = '';
if(strpos($row['body'],'txp:image') === false) {
continue;
} else {
$new_body = $row['body'];
echo $row['title']."\n";
preg_match_all('/<txp:image id="([0-9]*)" [0-9a-zA-z=" ]*\/>/',$row['body'],$matches,PREG_OFFSET_CAPTURE);
foreach($matches[1] as $key=>$match) {
$replacement = '<img src="/uploads/'.$image_properties[$match[0]]['category'].'/'.$image_properties[$match[0]]['name'].'" />';
$new_body = preg_replace('/<txp:image id="([0-9]*)" [0-9a-zA-z=" ]*\/>/',$replacement,$new_body,1);
// echo $new_body."\n";
}
// now update the actual row
$sql = 'update serendipity_entries set body = "'.mysql_real_escape_string($new_body).'" where id = '.$row['id'];
mysql_query($sql,$db) or die(mysql_error());
}
}
?>
It took a few attempts (mostly involving getting the same image multiple times on posts where there was more than one image.
Entities
The way that textpattern stores character entities gave me a problem when they were then imported into textpattern, and I had ’ and other such "features" all over the place. With a lot of swearing and some help from my friend (thanks Sara!) I eventually got it untangled using this:
<?php
echo "ready\n";
$db = mysql_connect('localhost','root','Infwe9');
mysql_select_db('serendipity',$db);
echo "connected\n";
$sql = "select * from serendipity_comments";
$results = mysql_query($sql, $db) or die(mysql_error());
$table = array( "’" => "'",
"–" => "-",
"“" => "\"",
"”" => "\"",
"…" => "..."
);
while($row = mysql_fetch_assoc($results)) {
$sql = 'update serendipity_comments set body = "'.mysql_real_escape_string(str_replace(array_keys($table), array_values($table),$row['body'])).'"
where id='.$row['id'];
echo $sql."\n";
mysql_query($sql,$db) or die(mysql_error());
}
?>
There might be a few others that got through but putting these in certainly sorted out the majority of the problems with the encoding. I'm a little unclear why textpattern and serendipity couldn't communicate on this front but the main thing is that I did get most of my content through.
Vertical Space
The spacing problems that I had early on I "fixed" by running my code through a script to remove a load of whitespace - actually I was seeing this output because I had the nl2br plugin turned on. Sadly now I've formatted all my existing posts I can't turn it off without re-reformatting them so I'm just living I with what I have for the time being!
Import Scripts and One-Shots
I've had quite a bit of experience doing data import routines and I applied some of the same principles to solving the problems I experiences with this migration from textpattern to serendipity. I'm really happy with the new platform, I'd like to hang on to some of the stuff that I used for this, and so I've stored it here. If it helps you too, then great :)

This site uses a blogging platform called serendipity which is a nice little tool and I've been mostly happy since moving across from textpattern (I did write about the experience). Recently however, a few things have been going wrong with the feeds.
Tracked: Jun 29, 19:26