Why Does ZIP Compression Ratio Vary Greatly Depending on the File?
Have you ever experienced 「the file barely became smaller even after ZIP compression」 or 「text files drastically reduced in size after compression」? ZIP compression rates vary greatly depending on file type. This article explains the reasons and characteristics of each file format.
Compression Algorithm Used in ZIP: Deflate
The ZIP format (.zip) primarily uses the <strong>Deflate</strong> algorithm. Deflate is a combination of the following two techniques.
- <strong>LZ77 (Lempel-Ziv 1977)</strong>: Replace repeating patterns in data with references to previous occurrences
- <strong>Huffman coding</strong>: Represent frequently occurring characters with shorter bit sequences
In other words, <strong>data with many repetitive patterns has a high compression ratio</strong>, while random data or already-compressed data can barely be compressed at all.
Compression Rate by File Format
| File Format | Compression ratio (approximate) | Reason |
|---|---|---|
| Text (.txt) | 60–80% Reduction | Many repeated characters and words |
| CSV | 70–85% reduction | Delimiter and same pattern repeats |
| HTML / XML / JSON | 65–85% Reduction | Frequent repetition of tags and key names |
| Log file | 70–90% reduction | Frequent repetition of timestamp format |
| BMP (Uncompressed Image) | 50–80% Reduction | Many consecutive pixels of the same color |
| 5–20% Reduction | In many cases, it is already compressed with zlib internally | |
| PNG | 0–5% reduction | Already compressed with Deflate |
| JPEG | 0–5% reduction | Already compressed with DCT + Huffman |
| MP3 / AAC | 0–3% reduction | Already compressed with lossy compression |
| MP4 / H.264 | 0–3% reduction | Already highly compressed |
| ZIP / GZ / 7z | 0–2% reduction (may increase in some cases) | Re-compression of already compressed data is largely ineffective |
When compressed files become even larger
When compressing already-compressed files like JPEG or MP4 with ZIP, the file size may <strong>increase slightly</strong> due to ZIP headers (file metadata). This is because the ZIP format includes a local file header (30 bytes or more) for each file and a central directory for the entire archive.
JPEGファイル (1.00 MB)
└── ZIP圧縮後: 1.00 MB + ヘッダー(約50B)= わずかに増加
Difference between 「Store」 mode
ZIP has a <strong>Store</strong> mode that stores files without compression. When combining multiple already-compressed files (such as JPEG, MP4, etc.), using Store mode eliminates the CPU load of compression processing while storing them at equivalent sizes.
# zip コマンドで圧縮レベルを指定
zip -0 archive.zip image.jpg video.mp4 # Store(圧縮なし)
zip -9 archive.zip data.csv report.txt # 最大圧縮
# Python で圧縮レベルを指定
import zipfile
with zipfile.ZipFile('archive.zip', 'w', zipfile.ZIP_DEFLATED, compresslevel=9) as zf:
zf.write('data.csv')
Characteristics of test ZIP files
DevLab's test ZIP files contain <strong>random data (pseudo-random byte sequences)</strong> to precisely control file size. Since random data has maximum entropy, Deflate compression is nearly ineffective. Therefore, a "10MB ZIP file" remains approximately "10MB after decompression."
If you need a ZIP file that reaches a specific size after extraction, you can create a test file using a method like the one below.
# 解凍後ちょうど 100MB になるZIPを作成(ゼロバイト埋め、高圧縮)
dd if=/dev/zero bs=1M count=100 | zip -9 zero-100mb.zip -
# 解凍後ちょうど 100MB になるZIPを作成(ランダムデータ、ほぼ無圧縮)
dd if=/dev/urandom bs=1M count=100 | zip -0 random-100mb.zip -
Summary
- ZIP Compression Ratio Is Determined by the Prevalence of <strong>Repeating Data Patterns</strong>
- Text, CSV, and XML can be reduced by <strong>60–85%</strong>
- JPEG, MP4, and Pre-compressed Files <strong>Cannot Be Compressed Much</strong> (May Even Increase Slightly)
- When combining already-compressed files, save CPU with <strong>Store mode (-0)</strong>
- DevLab's test ZIP files use random data, so the size remains nearly identical before and after decompression
→ <a href="/ja/files/zip/">Download test ZIP files here</a>
Test files for this article
- → <a href="/ja/files/zip/" class="text-primary-600 dark:text-primary-400 hover:underline">ZIP test file list</a>
- → <a href="/ja/files/csv/" class="text-primary-600 dark:text-primary-400 hover:underline">CSV Test Files List</a>
Related articles
- → <a href="/ja/blog/file-format-quick-reference/" class="text-primary-600 dark:text-primary-400 hover:underline">File Format Quick Reference for Developers</a>
- → <a href="/ja/blog/file-validation-checklist/" class="text-primary-600 dark:text-primary-400 hover:underline">Web Form File Validation Implementation Checklist</a>