Commit 092c6e
2026-04-13 07:00:32 Anonymous: Initial commit| /dev/null .. pdf bloat investigation & fix cheatsheet.md | |
| @@ 0,0 1,116 @@ | |
| + | # PDF Bloat Investigation & Fix Cheatsheet |
| + | |
| + | ## Setup (Mac) |
| + | |
| + | ```shell |
| + | brew install poppler # pdfinfo, pdfimages, pdffonts |
| + | brew install qpdf |
| + | brew install gs |
| + | brew install mupdf # mutool |
| + | brew install pdfcpu |
| + | ``` |
| + | |
| + | ## Inspect PDF internals |
| + | |
| + | ### General info |
| + | ```shell |
| + | pdfinfo document.pdf |
| + | ``` |
| + | Metadata, page count, file size, optimization status |
| + | |
| + | ### List images |
| + | ```shell |
| + | pdfimages -list document.pdf |
| + | ``` |
| + | Check number, resolution, compression |
| + | |
| + | ### List fonts |
| + | ```shell |
| + | pdffonts document.pdf |
| + | ``` |
| + | See embedded fonts |
| + | |
| + | ### Inspect objects |
| + | ```shell |
| + | qpdf --show-object=279 document.pdf | head -40 |
| + | ``` |
| + | View dictionary of a specific object |
| + | |
| + | ### Extract raw stream data |
| + | ```shell |
| + | qpdf --show-object=279 --raw-stream-data document.pdf > blob.bin |
| + | file blob.bin |
| + | ``` |
| + | Confirm type of a large embedded stream (e.g. MOV, ZIP, ODP) |
| + | |
| + | ------------------------------------------------------------ |
| + | |
| + | ## Shrinking PDFs |
| + | |
| + | ### Ghostscript (flatten to what you see) |
| + | ```shell |
| + | gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/screen \ |
| + | -dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf |
| + | ``` |
| + | Very small, low quality (72 dpi) |
| + | |
| + | ```shell |
| + | gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/ebook \ |
| + | -dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf |
| + | ``` |
| + | Medium quality (~150 dpi) |
| + | |
| + | ```shell |
| + | gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/printer \ |
| + | -dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf |
| + | ``` |
| + | High quality (~300 dpi), good balance |
| + | |
| + | ```shell |
| + | gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/prepress \ |
| + | -dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf |
| + | ``` |
| + | Very high quality (~600 dpi), larger size |
| + | |
| + | ------------------------------------------------------------ |
| + | |
| + | ### qpdf (lossless rebuild from pages) |
| + | ```shell |
| + | qpdf --split-pages=1 document.pdf page-%d.pdf |
| + | qpdf --empty --pages page-*.pdf -- clean.pdf |
| + | ``` |
| + | Useful to drop catalog baggage, but fails if a page directly references a huge embedded file |
| + | |
| + | ------------------------------------------------------------ |
| + | |
| + | ### pdfcpu (trim to page tree only) |
| + | ```shell |
| + | pdfcpu trim -pages 1- -- document.pdf clean.pdf |
| + | ``` |
| + | Keeps only visible pages, lossless, but fails with embedded multimedia references |
| + | |
| + | ------------------------------------------------------------ |
| + | |
| + | ### MuPDF |
| + | ```shell |
| + | mutool clean -ggg document.pdf clean-mu.pdf 1-13 |
| + | ``` |
| + | (Replace 13 with the number of pages in your PDF) |
| + | |
| + | Aggressive garbage collection & deduplication |
| + | |
| + | ------------------------------------------------------------ |
| + | |
| + | ## LibreOffice export best practices |
| + | - Untick “Create hybrid file (embed ODF file)” |
| + | - Tick “Archive PDF/A-1a” |
| + | - Remove or replace video objects with screenshots before exporting |
| + | |
| + | Prevents hidden ODP/MOV files from inflating PDF size |
| + | |
| + | ------------------------------------------------------------ |
| + | |
| + | # Rules of thumb |
| + | - Only want *visible content*: Ghostscript (/ebook or /printer) |
| + | - Lossless but smaller: qpdf split/merge, or pdfcpu trim |
| + | - Hidden video/ODP bloat: Ghostscript or proper LibreOffice export |
