Blame

092c6e Anonymous 2026-04-13 07:00:32 1
# PDF Bloat Investigation & Fix Cheatsheet
2
3
## Setup (Mac)
4
5
```shell
6
brew install poppler # pdfinfo, pdfimages, pdffonts
7
brew install qpdf
8
brew install gs
9
brew install mupdf # mutool
10
brew install pdfcpu
11
```
12
13
## Inspect PDF internals
14
15
### General info
16
```shell
17
pdfinfo document.pdf
18
```
19
Metadata, page count, file size, optimization status
20
21
### List images
22
```shell
23
pdfimages -list document.pdf
24
```
25
Check number, resolution, compression
26
27
### List fonts
28
```shell
29
pdffonts document.pdf
30
```
31
See embedded fonts
32
33
### Inspect objects
34
```shell
35
qpdf --show-object=279 document.pdf | head -40
36
```
37
View dictionary of a specific object
38
39
### Extract raw stream data
40
```shell
41
qpdf --show-object=279 --raw-stream-data document.pdf > blob.bin
42
file blob.bin
43
```
44
Confirm type of a large embedded stream (e.g. MOV, ZIP, ODP)
45
46
------------------------------------------------------------
47
48
## Shrinking PDFs
49
50
### Ghostscript (flatten to what you see)
51
```shell
52
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/screen \
53
-dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf
54
```
55
Very small, low quality (72 dpi)
56
57
```shell
58
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/ebook \
59
-dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf
60
```
61
Medium quality (~150 dpi)
62
63
```shell
64
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/printer \
65
-dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf
66
```
67
High quality (~300 dpi), good balance
68
69
```shell
70
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/prepress \
71
-dNOPAUSE -dBATCH -sOutputFile=small.pdf document.pdf
72
```
73
Very high quality (~600 dpi), larger size
74
75
------------------------------------------------------------
76
77
### qpdf (lossless rebuild from pages)
78
```shell
79
qpdf --split-pages=1 document.pdf page-%d.pdf
80
qpdf --empty --pages page-*.pdf -- clean.pdf
81
```
82
Useful to drop catalog baggage, but fails if a page directly references a huge embedded file
83
84
------------------------------------------------------------
85
86
### pdfcpu (trim to page tree only)
87
```shell
88
pdfcpu trim -pages 1- -- document.pdf clean.pdf
89
```
90
Keeps only visible pages, lossless, but fails with embedded multimedia references
91
92
------------------------------------------------------------
93
94
### MuPDF
95
```shell
96
mutool clean -ggg document.pdf clean-mu.pdf 1-13
97
```
98
(Replace 13 with the number of pages in your PDF)
99
100
Aggressive garbage collection & deduplication
101
102
------------------------------------------------------------
103
104
## LibreOffice export best practices
105
- Untick “Create hybrid file (embed ODF file)”
106
- Tick “Archive PDF/A-1a”
107
- Remove or replace video objects with screenshots before exporting
108
109
Prevents hidden ODP/MOV files from inflating PDF size
110
111
------------------------------------------------------------
112
113
# Rules of thumb
114
- Only want *visible content*: Ghostscript (/ebook or /printer)
115
- Lossless but smaller: qpdf split/merge, or pdfcpu trim
116
- Hidden video/ODP bloat: Ghostscript or proper LibreOffice export