Combining PDF Files With Pdftk

I'm currently delivering all my talks with PDF format slides, using Jakob's PDF Presenter Console, which is awesome but lacks a "goto slide" button and is a little slow to click forward. It doesn't matter for a short talk but I had 200+ slides for my ZCE preparation tutorial at the Dutch PHP Conference and I was concerned about losing my place! Therefore I split my slides up into several decks, but still need to publish them as a whole.

For years I've used PDF Shuffler for this sort of thing but I wondered if there was an easy way of doing this from the command line this time, since I literally wanted to glue together a bunch of files one after another. Predictably, there is and it's called pdftk - the PDF Toolkit.It does exactly what I need, but I initially stumbled at supplying all the PDFs in order. My files are called 1-intro.pdf, 2-basics.pdf, and so on. Which is fine but there are 14 of them and when sorted, 9-oop.pdf is the last entry :) I untangled this with:

ls *-*.pdf | sort -n > files.txt

One vim macro later and I had them all on one line (yes, I realise there must be a better way to do this, leave me a comment and tell me what it is!), and so I could pass them into pdftk:

pdftk 1-intro.pdf 2-basics.pdf 3-strings.pdf 4-arrays.pdf 5-functions.pdf 6-files.pdf 7-config.pdf 8-qstyles.pdf 9-oop.pdf 10-http.pdf 11-api-data.pdf 12-security.pdf 13-databases.pdf 14-tips.pdf cat output all.pdf

So there you have it, a great little tool that I will immediately forget the name of so hopefully I'll remember to come and read my blog to remember what to do ...

3 thoughts on “Combining PDF Files With Pdftk

  1. [geshi lang=bash]ls *-*.pdf | sort -n | xargs[/geshi]

    will combine the output into one line. You could also:

    [geshi lang=bash]ls *-*.pdf | sort -n | xargs -I % pdftk % cat output all.pdf[/geshi]

    to do everything all in one step (-I with gnu xargs, -J on bsd xargs).

  2. pdftk is also rather handy for SEO when you're building sites which contain PDFs supplied by other people. You can use it to change the metadata for an existing PDF without having to rebuild the layout (which I've found is frequently a lossy process depending on how the PDF was authored.) Google uses the metadata title in the search results, so you can make your listings consistent ('dump_data' and 'update_info' are the switches you need.)

Leave a Reply

Please use [code] and [/code] around any source code you wish to share.