Textpattern to Serendipity Migration Scripts

Not terribly riveting reading but this is the story of my migration from textpattern (version 4.0.4) to serendipity (version 1.1.3). This includes some PHP scripts I used to fix some problems on import - they're not the best or the most robust in the world but they worked for me and I wanted to pop them here in case they work for someone else. Feedback on the code isn't necessary because I won't be using it again ;)

A running start

Firstly let me say that serendipity has a fabulous import feature which brought in 90% of my content about 30 seconds after I saw the button labelled "Import Data". This is a real hurdle-remover for new users and definitely clinched the deal for me. Having done this I found that:

  • I had weird entity encodings everywhere
  • my images hadn't arrived
  • everything seemed much more spaced out than it had been on the old blog

Images

The images hadn't been transferred and I also realised that the custom tags that textpattern uses, such as:



<txp:image id="120" />

hadn't been handled by the import script, so I kind of had two problems to solve. I started by transferring all the images across and renaming them to something more helpful.

This is going to be a really long article, but do read on if its helpful - I'm posting the rest as extended content so the disinterested need not scroll past it!Here's the code I used to move my images (both databases were on the same server as were both installations - this helped to simplify matters quite a bit. Also I have used an exec() call in this script - maybe not the most secure thing in the world but perfect for this fast-and-dirty one-off import script.



<?php
 
$source_dir = '/path/to/textpattern/images/';
$target_dir = '/path/to/serendipity/uploads/';
 
echo "ready\n";
 
$db = mysql_connect('localhost','username','password');
mysql_select_db('textpattern',$db);
 
$sql = "select * from txp_image";
 
echo "connected\n";
$result = mysql_query($sql,$db) or die(mysql_error());
 
while($row=mysql_fetch_assoc($result)) {
        echo $row['name']."\n";
        exec('cp '.$source_dir.$row['id'].$row['ext'].' '.$target_dir.$row['category'].'/'.$row['name']);
}
 
?>

This placed all my images in the relavant folder in the serendipity upload folder, which was great. But I still had those mangled image tags to deal with. I wrote a script which edited each entry (its not pretty but it really did work) and fixed this - here it is:



<?php
 
echo "ready\n";
 
$db = mysql_connect('localhost','username','password');
mysql_select_db('textpattern',$db);
 
$sql = "select * from txp_image";
 
echo "connected\n";
$result = mysql_query($sql,$db) or die(mysql_error());
 
while($row=mysql_fetch_assoc($result)) {
        $image_properties[$row['id']] = $row;
}
 
mysql_select_db('serendipity',$db);
 
echo "connected\n";
 
$sql = "select * from serendipity_entries";
 
$results = mysql_query($sql, $db) or die(mysql_error());
 
while($row = mysql_fetch_assoc($results)) {
        $sql = '';
 
        if(strpos($row['body'],'txp:image') === false) {
                continue;
        } else {
                $new_body = $row['body'];
                echo $row['title']."\n";
                preg_match_all('/<txp:image id="([0-9]*)" [0-9a-zA-z=" ]*\/>/',$row['body'],$matches,PREG_OFFSET_CAPTURE);
                foreach($matches[1] as $key=>$match) {
                        $replacement = '<img src="/uploads/'.$image_properties[$match[0]]['category'].'/'.$image_properties[$match[0]]['name'].'" />';
                        $new_body = preg_replace('/<txp:image id="([0-9]*)" [0-9a-zA-z=" ]*\/>/',$replacement,$new_body,1);
//                      echo $new_body."\n";
                }
 
                // now update the actual row
                $sql = 'update serendipity_entries set body = "'.mysql_real_escape_string($new_body).'" where id = '.$row['id'];
                mysql_query($sql,$db) or die(mysql_error());
        }
 
 
}
 
?>

It took a few attempts (mostly involving getting the same image multiple times on posts where there was more than one image.

Entities

The way that textpattern stores character entities gave me a problem when they were then imported into textpattern, and I had &#8217; and other such "features" all over the place. With a lot of swearing and some help from my friend (thanks Sara!) I eventually got it untangled using this:



<?php
 
echo "ready\n";
 
$db = mysql_connect('localhost','root','Infwe9');
 
mysql_select_db('serendipity',$db);
 
echo "connected\n";
 
$sql = "select * from serendipity_comments";
 
$results = mysql_query($sql, $db) or die(mysql_error());
 
$table = array( "&#8217;" => "'",
                "&#8211;" => "-",
                "&#8220;" => "\"",
                "&#8221;" => "\"",
                "&#8230;" => "..."
        );
 
while($row = mysql_fetch_assoc($results)) {
 
        $sql = 'update serendipity_comments set body = "'.mysql_real_escape_string(str_replace(array_keys($table), array_values($table),$row['body'])).'"
                where id='.$row['id'];
        echo $sql."\n";
        mysql_query($sql,$db) or die(mysql_error());
}
 
?>

There might be a few others that got through but putting these in certainly sorted out the majority of the problems with the encoding. I'm a little unclear why textpattern and serendipity couldn't communicate on this front but the main thing is that I did get most of my content through.

Vertical Space

The spacing problems that I had early on I "fixed" by running my code through a script to remove a load of whitespace - actually I was seeing this output because I had the nl2br plugin turned on. Sadly now I've formatted all my existing posts I can't turn it off without re-reformatting them so I'm just living I with what I have for the time being!

Import Scripts and One-Shots

I've had quite a bit of experience doing data import routines and I applied some of the same principles to solving the problems I experiences with this migration from textpattern to serendipity. I'm really happy with the new platform, I'd like to hang on to some of the stuff that I used for this, and so I've stored it here. If it helps you too, then great :)

One thought on “Textpattern to Serendipity Migration Scripts

  1. This site uses a blogging platform called serendipity which is a nice little tool and I've been mostly happy since moving across from textpattern (I did write about the experience). Recently however, a few things have been going wrong with the feeds.

Leave a Reply

Please use [code] and [/code] around any source code you wish to share.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>