Using xclip, pandoc, calibre and mutt to create a Kindle file from an academic journal article
I love reading things on my Kindle, but as you’ll know if you’ve ever tried, it’s not great for reading PDFs. As academics mostly seem to read online journal articles in PDF form, this usually means printing things out. Fail.
With a bit of trial and error, I’ve come up with a quick and simple way to turn online academic journal articles into surprisingly clean mobi (Kindle) files.
Let’s assume for the moment you are running a version of debian Linux. I’m using Ubuntu. That’s a big assumption, I know. If you’re using Mac or Windows you can still do this stuff. It’s just that I’m not your guide.
Let’s also assume you have installed xclip, pandoc, calibre and mutt, all of which I found in the Ubuntu repository. Let’s also assume you’ve set up mutt, although this isn’t essential. Now we’re geeking!
Now, we find ourselves looking at a potentially tasty article on Anthrosource. Instead of downloading the PDF, hit
Full Text (HTML)
Anthrosource is helpfully giving us the option of a plainer alternative to the “enhanced” PDF. Great. So we hit the Go to old article view link in the top left-hand corner.
Once you’re there, position your cursor immediately to the left of the first word of the article title, hold down the left mouse or touchpad button, and hit the end key on your keyboard. Or just drag the mouse to the bottom. This will conveniently highlight the full body text and references of the article, alongside a couple of extraneous bits. Don’t worry about them.
Now we have our highlighted text we can copy it with Ctrl + C or right click, copy. Next, open a terminal and type:
xclip -selection clipboard -o -t text/html > Golden_Snail.html
Now we have an html file with the contents of the article in it. The next step is to convert it into an epub file with pandoc, which is just about my favourite virtual thing.
pandoc Golden_Snail.html -o Golden_Snail.epub
We then use calibre to convert the open source epub format to the less open source mobi format required by Kindle.
ebook-convert Golden_Snail.epub Golden_Snail.mobi
Yay, we now have a mobi file. If you want to modify the file’s metadata so it won’t show up as “unknown” on your kindle, use this option to make the filename the ebook title:
ebook-convert Golden_Snail.epub Golden_Snail.mobi --title="$FILENAME"
If you want to get even funkier, use the
option to assign an image or url as the cover image. Ebook-convert offers many other options as well. How awesome are the Calibre developers? Donate to them if you are feeling it. I just did.
At this point you can pat yourself on the back and transfer it to your device by attaching it to an email or by connecting your Kindle to the mothership with a usb. For extra coolness, use mutt to email it to your @kindle.com address via the command line.
echo "Golden_Snail.mobi" | mutt firstname.lastname@example.org -a Golden_Snail.mobi
Once you’ve gotten your head around this process, automate it in a shell script thus:
#!/bin/sh #Ask user to name the file echo "Filename:" read FILENAME echo "Converting and emailing..." # Let xclip do its thing, creating an html file xclip -selection clipboard -o -t text/html > $FILENAME.html # have pandoc convert html to epub pandoc $FILENAME.html -o $FILENAME.epub # have calibre's ebook-convert function convert epub to mobi, modifying the file title so it doesn't show up as "unknown" on your kindle! ebook-convert $FILENAME.epub $FILENAME.mobi --title="$FILENAME" # Have mutt email your kindle account with the filename as subject echo "$FILENAME" | mutt email@example.com -a $FILENAME.mobi
To spell that process out, put the above text in a file and call it something like xclipit.sh. Now go ahead and put that in your path. Mine resides in /usr/local/bin/sh. You’ll need to do some sudo-ing to achieve this. Then you need to give it permissions with
chmod 755 xclipit.sh
Call this script by typing xclipit.sh in a terminal once you’ve done your select and copy, and it’ll ask you for a filename, then it’ll take care of the rest. That turns the whole affair into a two step process. The result is a very readable mobi file with a few ignorable shortcomings. Win!
For me this is much nicer to read than the output of pdf-resizing tools such as k2pdfopt. And, to blow my own trumpet for a minute here, the mobi version of a paper from Antiquity I just generated was a heck of a lot better than the “send to my Kindle” pdf Cambridge Journals spewed out.