A nice PDF of Jayawickrama’s A Critical Analysis of the Sutta Nipata

After previous discussion:

I went ahead and made a nice PDF from the existing online PDFs. It’s nice! You can read it and learn things! It has OCR so you can search the text.

With thanks to whoever made the scans! :pray:

9 Likes

May I ask what software you used Bhante? :pray: I have a few more book scans (as two-up jpegs) I’ve been sitting on and not PDFifying until I have a better process in place…

1 Like

Thank you, Bhante, you made my day!

1 Like

Thanks for the feedback, I didn’t even know if anyone else would really be interested.

Ha ha, if only it were a nice and clean process without false starts and massive messups …

  • extract images:
    • pdfimages foo.pdf 1
  • invert white on black images:
    • mogrify *.pbm -negate *.pbm

What you probably want is the splitting script, i tried a bunch of things and this bash script worked.

  • save as split.sh, make executable and run bash split.sh
#!/bin/bash

inputdir=../
outputdir=$inputdir/out
mkdir -p "$outputdir"

cnt=0
for i in "$inputdir/"*.pbm; do
    if convert "$i" -crop 50%x100% "$outputdir/%d.tmp"; then
        printf -v fname '%03d.pbm' $((++cnt))
        mv "$outputdir/0.tmp" "$outputdir/$fname"
        printf -v fname '%03d.pbm' $((++cnt))
        mv "$outputdir/1.tmp" "$outputdir/$fname"
    else
        echo "failed to convert $i" >&2
        exit 1
    fi
done

I then had to edit and reorder the files, not everything in the source PDFs worked properly. I also fixed the margins and skew by hand in almost all cases.

Then to make the PDF I did something like

convert *.pbm -resize 1800x2700 -compose Copy  -gravity center -extent 1800x2700 -units PixelsPerInch -density 150 foo.pdf

Add some metadata:

exiftool -Title="A Critical Analysis of the Sutta Nipata" -Author="N. A. Jayawickrama" -Subject="Buddhism" foo.pdf

OCR and optimize for size:

ocrmypdf -l eng --deskew foo.pdf foo-ocr.pdf

That’s it!

Tip: if you have JPEGs, you might try converting them to .bpm or similar before making the PDF. I find that the image size is bigger, but the resulting PDF is smaller.

3 Likes

wow… just imagemagick everything by hand :astonished: Gangster. :cowboy_hat_face: :pray: Well, thanks for the scripts! That’ll get me started. :smiling_face:

2 Likes

The page 82-83 is missing though…