Commit 092c6e

2026-04-13 07:00:32 Anonymous: Initial commit
/dev/null .. pdf bloat investigation & fix cheatsheet.md
@@ 0,0 1,116 @@
+ # PDF Bloat Investigation & Fix Cheatsheet
+
+ ## Setup (Mac)
+
+ ```shell
+ brew install poppler # pdfinfo, pdfimages, pdffonts
+ brew install qpdf
+ brew install gs
+ brew install mupdf # mutool
+ brew install pdfcpu
+ ```
+
+ ## Inspect PDF internals
+
+ ### General info
+ ```shell
+ pdfinfo document.pdf
+ ```
+ Metadata, page count, file size, optimization status
+
+ ### List images
+ ```shell
+ pdfimages -list document.pdf
+ ```
+ Check number, resolution, compression
+
+ ### List fonts
+ ```shell
+ pdffonts document.pdf
+ ```
+ See embedded fonts
+
+ ### Inspect objects
+ ```shell
+ qpdf --show-object=279 document.pdf | head -40
+ ```
+ View dictionary of a specific object
+
+ ### Extract raw stream data
+ ```shell
+ qpdf --show-object=279 --raw-stream-data document.pdf > blob.bin
+ file blob.bin
+ ```
+ Confirm type of a large embedded stream (e.g. MOV, ZIP, ODP)
+
+ ------------------------------------------------------------
+
+ ## Shrinking PDFs
+
+ ### Ghostscript (flatten to what you see)
+ ```shell
+ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/screen \
+ -dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf
+ ```
+ Very small, low quality (72 dpi)
+
+ ```shell
+ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/ebook \
+ -dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf
+ ```
+ Medium quality (~150 dpi)
+
+ ```shell
+ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/printer \
+ -dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf
+ ```
+ High quality (~300 dpi), good balance
+
+ ```shell
+ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/prepress \
+ -dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf
+ ```
+ Very high quality (~600 dpi), larger size
+
+ ------------------------------------------------------------
+
+ ### qpdf (lossless rebuild from pages)
+ ```shell
+ qpdf --split-pages=1 document.pdf page-%d.pdf
+ qpdf --empty --pages page-*.pdf -- clean.pdf
+ ```
+ Useful to drop catalog baggage, but fails if a page directly references a huge embedded file
+
+ ------------------------------------------------------------
+
+ ### pdfcpu (trim to page tree only)
+ ```shell
+ pdfcpu trim -pages 1- -- document.pdf clean.pdf
+ ```
+ Keeps only visible pages, lossless, but fails with embedded multimedia references
+
+ ------------------------------------------------------------
+
+ ### MuPDF
+ ```shell
+ mutool clean -ggg document.pdf clean-mu.pdf 1-13
+ ```
+ (Replace 13 with the number of pages in your PDF)
+
+ Aggressive garbage collection & deduplication
+
+ ------------------------------------------------------------
+
+ ## LibreOffice export best practices
+ - Untick “Create hybrid file (embed ODF file)”
+ - Tick “Archive PDF/A-1a”
+ - Remove or replace video objects with screenshots before exporting
+
+ Prevents hidden ODP/MOV files from inflating PDF size
+
+ ------------------------------------------------------------
+
+ # Rules of thumb
+ - Only want *visible content*: Ghostscript (/ebook or /printer)
+ - Lossless but smaller: qpdf split/merge, or pdfcpu trim
+ - Hidden video/ODP bloat: Ghostscript or proper LibreOffice export
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9