PDF Bloat Investigation & Fix Cheatsheet
Setup (Mac)
brew install poppler # pdfinfo, pdfimages, pdffonts brew install qpdf brew install gs brew install mupdf # mutool brew install pdfcpu
Inspect PDF internals
General info
pdfinfo document.pdf
Metadata, page count, file size, optimization status
List images
pdfimages -list document.pdf
Check number, resolution, compression
List fonts
pdffonts document.pdf
See embedded fonts
Inspect objects
qpdf --show-object=279 document.pdf | head -40
View dictionary of a specific object
Extract raw stream data
qpdf --show-object=279 --raw-stream-data document.pdf > blob.bin file blob.bin
Confirm type of a large embedded stream (e.g. MOV, ZIP, ODP)
Shrinking PDFs
Ghostscript (flatten to what you see)
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/screen \ -dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf
Very small, low quality (72 dpi)
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/ebook \ -dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf
Medium quality (~150 dpi)
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/printer \ -dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf
High quality (~300 dpi), good balance
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/prepress \ -dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf
Very high quality (~600 dpi), larger size
qpdf (lossless rebuild from pages)
qpdf --split-pages=1 document.pdf page-%d.pdf qpdf --empty --pages page-*.pdf -- clean.pdf
Useful to drop catalog baggage, but fails if a page directly references a huge embedded file
pdfcpu (trim to page tree only)
pdfcpu trim -pages 1- -- document.pdf clean.pdf
Keeps only visible pages, lossless, but fails with embedded multimedia references
MuPDF
mutool clean -ggg document.pdf clean-mu.pdf 1-13
(Replace 13 with the number of pages in your PDF)
Aggressive garbage collection & deduplication
LibreOffice export best practices
- Untick “Create hybrid file (embed ODF file)”
- Tick “Archive PDF/A-1a”
- Remove or replace video objects with screenshots before exporting
Prevents hidden ODP/MOV files from inflating PDF size
Rules of thumb
- Only want visible content: Ghostscript (/ebook or /printer)
- Lossless but smaller: qpdf split/merge, or pdfcpu trim
- Hidden video/ODP bloat: Ghostscript or proper LibreOffice export
