Improve EPUB Readability: How to Fix Not Readable EPUB with a Bash Script"

I've got an old EPUB that was not displayed properly on my device. The lines of some paragraphs were going over the edge of the screen and to read them I had to copy-paste the text in a separate editor. Not the best UX I've seen. So I decided to fix the EPUB.

Step zero. Research

I checked if EPUBs are easily fixable in general. Turns out, EPUB is just a fancy collection of HTML files. Then there are many obvious reasons for the failure: not responsive markup, fixed width, broken CSS etc. I am sure that our generative AI friends would be able to find another fifty reasons for that.

Step one. Analyze

Unpack EPUB.

unzip /path-to-the-book/book.epub -d /path-to-the-book/unzipped

There's now a nice structure with some main components:

  • mimetype: "application/epub+zip"

  • META-INF: metadata;

  • OEBPS (or OPS): content directory that holds most of the files. Here are all the HTML content files

I could not find any absolute positioning issues. But look at that!

Non-breaking space makes a lump out of a neat text if it is put everywhere.

Step two. Solve

If it would be only one file, I'd just replace the nbsp-s in Sublime. But I had 25 html files and it's too much manual work. Time to automize.

First: create the script file and make it executable.

touch replace_nbsp.sh     
chmod +x replace_nbsp.sh
open replace_nbsp.sh

Second: read the input folder

#!/bin/bash

# Check if folder path is provided as a command line argument. 
# I first hardcoded the path but it's embarrassing to put such code in internet
if [ -z "$1" ]; then
    echo "Please provide the folder path as a command line argument"
    exit 1
fi

folder_path="$1"
#...

Third: iterate through all files. Using sed replaces everything that looks like an NBSP in HTMLs.

#...

# Iterate through each HTML file in the folder
# It's not recursive so you need to specify path to OEBPS folder
for file in "$folder_path"/*.html; do
    echo "$file"
    # Check if the file exists
    if [ -f "$file" ]; then
        # Replace non-breaking spaces with normal spaces in the file
        # The <0xa0> space is also in the second replace pattern. 
        # It's just not visible because it is... a space
        sed -i '' -e 's/&nbsp;/ /g' -e 's/ / /g' "$file"
        echo "Non-breaking spaces replaced in: $file"
    fi
done

NBSP is sometimes useful. In strings like "25 €" or "Mr. Cat" the two parts should be considered as one word when doing the line wrap. But for my use case, as the book is not to be published anywhere, it's fine to neglect that.

Step three. Compile the book

No. Then the mimetype file gets compressed and eBook readers cannot handle it. I used this online validator to check for errors. After getting the error code it was simpler. There is a nice solution to that. Thanks SO, you saved my life again.

# first zip the mimetype without compression
zip -0 file.epub mimetype
# Then add the rest
zip -9 -r file.epub META-INF OEBPS

Conclusion. Making it useful

I don't think I'll get such an issue with EPUB ever again. What I find interesting is discovering the principle on which EPUB is built. Some possible uses:

  • Write a personalised message on the front page ("To your 31st birthday from your cat-friend. May your house be always full of mice");

  • Add an Easter egg where Harry Potter in the first book cites Seneca;

  • Go corporate and customise the book using company colours.

I don't know if I ever present an eBook to anyone. But now I know how to make it fun.

Conclusion 2. Reflection

I want to end this article with the words of a great stoic. "Life is long, if you know how to use it". To that, I can only add that fixing EPUBs is definitely a worthy time investment.

P.S.

The whole issue should have been handled by the iBooks app of course. I have no idea why line wrapping works so bad there

The end