After previous discussion:
I went ahead and made a nice PDF from the existing online PDFs. It’s nice! You can read it and learn things! It has OCR so you can search the text.
With thanks to whoever made the scans!
After previous discussion:
I went ahead and made a nice PDF from the existing online PDFs. It’s nice! You can read it and learn things! It has OCR so you can search the text.
With thanks to whoever made the scans!
May I ask what software you used Bhante? I have a few more book scans (as two-up jpegs) I’ve been sitting on and not PDFifying until I have a better process in place…
Thank you, Bhante, you made my day!
Thanks for the feedback, I didn’t even know if anyone else would really be interested.
Ha ha, if only it were a nice and clean process without false starts and massive messups …
pdfimages foo.pdf 1
mogrify *.pbm -negate *.pbm
What you probably want is the splitting script, i tried a bunch of things and this bash script worked.
split.sh
, make executable and run bash split.sh
#!/bin/bash
inputdir=../
outputdir=$inputdir/out
mkdir -p "$outputdir"
cnt=0
for i in "$inputdir/"*.pbm; do
if convert "$i" -crop 50%x100% "$outputdir/%d.tmp"; then
printf -v fname '%03d.pbm' $((++cnt))
mv "$outputdir/0.tmp" "$outputdir/$fname"
printf -v fname '%03d.pbm' $((++cnt))
mv "$outputdir/1.tmp" "$outputdir/$fname"
else
echo "failed to convert $i" >&2
exit 1
fi
done
I then had to edit and reorder the files, not everything in the source PDFs worked properly. I also fixed the margins and skew by hand in almost all cases.
Then to make the PDF I did something like
convert *.pbm -resize 1800x2700 -compose Copy -gravity center -extent 1800x2700 -units PixelsPerInch -density 150 foo.pdf
Add some metadata:
exiftool -Title="A Critical Analysis of the Sutta Nipata" -Author="N. A. Jayawickrama" -Subject="Buddhism" foo.pdf
OCR and optimize for size:
ocrmypdf -l eng --deskew foo.pdf foo-ocr.pdf
That’s it!
Tip: if you have JPEGs, you might try converting them to .bpm
or similar before making the PDF. I find that the image size is bigger, but the resulting PDF is smaller.
wow… just imagemagick everything by hand Gangster. Well, thanks for the scripts! That’ll get me started.
The page 82-83 is missing though…