Adventures in Plain Text

Here I’ll share some useful learnings from the point of view of a qualitative social scientist about the virtues of using open source software and plain text formats such as markdown.

Using xclip, pandoc, calibre and mutt to create a Kindle file from an academic journal article

I love reading things on my Kindle, but as you’ll know if you’ve ever tried, it’s not great for reading PDFs. As academics mostly seem to read online journal articles in PDF form, this usually means printing things out. Fail.

With a bit of trial and error, I’ve come up with a quick and simple way to turn online academic journal articles into surprisingly clean mobi (Kindle) files.

Assumptions

Let’s assume for the moment you are running a version of debian Linux. I’m using Ubuntu. That’s a big assumption, I know. If you’re using Mac or Windows you can still do this stuff. It’s just that I’m not your guide.

Let’s also assume you have installed xclip, pandoc, calibre and mutt, all of which I found in the Ubuntu repository. Let’s also assume you’ve set up mutt, although this isn’t essential. Now we’re geeking!

Now, we find ourselves looking at a potentially tasty article on Anthrosource. Instead of downloading the PDF, hit
Full Text (HTML)


Anthrosource is helpfully giving us the option of a plainer alternative to the “enhanced” PDF. Great. So we hit the Go to old article view link in the top left-hand corner.

Once you’re there, position your cursor immediately to the left of the first word of the article title, hold down the left mouse or touchpad button, and hit the end key on your keyboard. Or just drag the mouse to the bottom. This will conveniently highlight the full body text and references of the article, alongside a couple of extraneous bits. Don’t worry about them.

Now we have our highlighted text we can copy it with Ctrl + C or right click, copy. Next, open a terminal and type:

xclip -selection clipboard -o -t text/html > Golden_Snail.html

Now we have an html file with the contents of the article in it. The next step is to convert it into an epub file with pandoc, which is just about my favourite virtual thing.

 pandoc Golden_Snail.html -o Golden_Snail.epub

We then use calibre to convert the open source epub format to the less open source mobi format required by Kindle.

ebook-convert Golden_Snail.epub Golden_Snail.mobi

Yay, we now have a mobi file. If you want to modify the file’s metadata so it won’t show up as “unknown” on your kindle, use this option to make the filename the ebook title:

ebook-convert Golden_Snail.epub Golden_Snail.mobi --title="$FILENAME"

If you want to get even funkier, use the

--cover

option to assign an image or url as the cover image. Ebook-convert offers many other options as well. How awesome are the Calibre developers? Donate to them if you are feeling it. I just did.

At this point you can pat yourself on the back and transfer it to your device by attaching it to an email or by connecting your Kindle to the mothership with a usb. For extra coolness, use mutt to email it to your @kindle.com address via the command line.

echo "Golden_Snail.mobi" | mutt youraddress@kindle.com -a Golden_Snail.mobi

Once you’ve gotten your head around this process, automate it in a shell script thus:

#!/bin/sh
#Ask user to name the file
echo "Filename:"
read FILENAME
echo "Converting and emailing..."
# Let xclip do its thing, creating an html file
xclip -selection clipboard -o -t text/html > $FILENAME.html
# have pandoc convert html to epub
pandoc $FILENAME.html -o $FILENAME.epub
# have calibre's ebook-convert function convert epub to mobi, modifying the file title so it doesn't show up as "unknown" on your kindle!
ebook-convert $FILENAME.epub $FILENAME.mobi --title="$FILENAME"
# Have mutt email your kindle account with the filename as subject
echo "$FILENAME" | mutt yourname@kindle.com -a $FILENAME.mobi

To spell that process out, put the above text in a file and call it something like xclipit.sh. Now go ahead and put that in your path. Mine resides in /usr/local/bin/sh. You’ll need to do some sudo-ing to achieve this. Then you need to give it permissions with

chmod 755 xclipit.sh

Call this script by typing xclipit.sh in a terminal once you’ve done your select and copy, and it’ll ask you for a filename, then it’ll take care of the rest. That turns the whole affair into a two step process. The result is a very readable mobi file with a few ignorable shortcomings. Win!
Screenshot from 2017-09-05 16-01-07
For me this is much nicer to read than the output of pdf-resizing tools such as k2pdfopt. And, to blow my own trumpet for a minute here, the mobi version of a paper from Antiquity I just generated was a heck of a lot better than the “send to my Kindle” pdf Cambridge Journals spewed out.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s