zngguvnf's Blog

Processing multiple PDFs using the command line

<2017-12-01>

Update [2018-04-17 Tue]

Update [2018-03-30 Fri]:

From time to time I need to process lots of .pdf files.

Here are a few commandline calls that help me a lot:

Split pdf in single pages

Split one .pdf with multiple pages in multiple .pdf files with just one page.

pdftk PdfWithMultiplePages.pdf burst
qpdf --split-pages input.pdf output.pdf

Merge pages to single pdf

Merge multiple .pdf with one or more pages into one single .pdf.

To merge all .pdf in currenct directory to one single file:

pdftk ./*.pdf cat output PdfWithMultiplePages.pdf

Alternatively to can type pdftk, mark all files you want to combine in your file manager, drag and drop them to your terminal and finish the command with cat output PDFWithMultiplePages.pdf

pdfjam ./*pdf -o PdfWithMultiplePages.pdf

Convert from DIN A3 (landscape) to DIN A4 portrait

Sometimes .pdf are in DIN A3 (landscape) and it looks like two DIN A4 pages side by side.

Use the following command to split those documents:

mutool poster -y 2 input.pdf output.pdf

(use -y to preform a vertical split or -x for a horizontal split.) mutool comes as part of mupdf (sudo apt install mupdf-tools)

Convert to DIN A4

pdfjam --outfile filename.pdf --paper a4paper filename.pdf

Batch processing

To convert all .pdf files including those in subfolders to a4

for f in ./**/*.pdf ; do
  pdfjam --outfile "$f" --paper a4paper "$f"
done

Reduce file size of scanned PDF file

There is a question for this on stackexchange and a fantastic answer, which I would like to insert here for reference:

Use the following ghostscript command:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
  • -dPDFSETTINGS=/screen lower quality, smaller size.
  • -dPDFSETTINGS=/ebook for better quality, but slightly larger pdfs.
  • -dPDFSETTINGS=/prepress output similar to Acrobat Distiller "Prepress Optimized" setting
  • -dPDFSETTINGS=/printer selects output similar to the Acrobat Distiller "Print Optimized" setting
  • -dPDFSETTINGS=/default selects output intended to be useful across a wide variety of uses, possibly at the expense of a larger output file

Remove password from pdf

qpdf -password=YourTopSecretPassword -decrypt password-protected-file.pdf file-without-password.pdf

Remove string from pdf

(works to remove text that you can mark in the pdf)

qpdf --stream-data=uncompress YourFile.pdf uncompressed.pdf

Replace 'Some Text' with whitespace

sed 's/Some Text/ /g' < uncompressed.pdf > uncompressed_without_string.pdf

If you want to replace things other than letters (such as brackets), the sed manual will help you. Sometimes it is helpful to remove the desired expression in individual steps (but watch out that you only delete it where you want it to be deleted).

qpdf --stream-data=compress uncompressed_without_string.pdf YourFile_free.pdf

Comments

If you have comments, questions or opinions please drop me a line at blog AT zngguvnf dot org. Please tell me whether it's ok to publish your comment here or not.

archive
Creative Commons License
https://zngguvnf.org by zngguvnf is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.