PDF Bloat Investigation & Fix Cheatsheet

Setup (Mac)

brew install poppler  # pdfinfo, pdfimages, pdffonts
brew install qpdf
brew install gs
brew install mupdf  # mutool
brew install pdfcpu

Inspect PDF internals

General info

pdfinfo document.pdf

Metadata, page count, file size, optimization status

List images

pdfimages -list document.pdf

Check number, resolution, compression

List fonts

pdffonts document.pdf

See embedded fonts

Inspect objects

qpdf --show-object=279 document.pdf | head -40

View dictionary of a specific object

Extract raw stream data

qpdf --show-object=279 --raw-stream-data document.pdf > blob.bin
file blob.bin

Confirm type of a large embedded stream (e.g. MOV, ZIP, ODP)


Shrinking PDFs

Ghostscript (flatten to what you see)

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/screen \
-dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf

Very small, low quality (72 dpi)

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/ebook \
-dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf

Medium quality (~150 dpi)

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/printer \
-dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf

High quality (~300 dpi), good balance

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/prepress \
-dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf

Very high quality (~600 dpi), larger size


qpdf (lossless rebuild from pages)

qpdf --split-pages=1 document.pdf page-%d.pdf
qpdf --empty --pages page-*.pdf -- clean.pdf

Useful to drop catalog baggage, but fails if a page directly references a huge embedded file


pdfcpu (trim to page tree only)

pdfcpu trim -pages 1- -- document.pdf clean.pdf

Keeps only visible pages, lossless, but fails with embedded multimedia references


MuPDF

mutool clean -ggg document.pdf clean-mu.pdf 1-13

(Replace 13 with the number of pages in your PDF)

Aggressive garbage collection & deduplication


LibreOffice export best practices

  • Untick “Create hybrid file (embed ODF file)”
  • Tick “Archive PDF/A-1a”
  • Remove or replace video objects with screenshots before exporting

Prevents hidden ODP/MOV files from inflating PDF size


Rules of thumb

  • Only want visible content: Ghostscript (/ebook or /printer)
  • Lossless but smaller: qpdf split/merge, or pdfcpu trim
  • Hidden video/ODP bloat: Ghostscript or proper LibreOffice export