ctwz is a lossless compressor based on the Context Tree Weighting method1. It takes byte-level contexts and makes binary predictions using an ASCII decomposition tree2. It is mainly good for large files.
To build use cmake
$ cmake .
$ make
or simply compile with C++17.
$ ./ctwz -h
ctwz:
Context tree weighting compressor
author: Meijke Balay <[email protected]>
usage:
encode: ctwz [-d depth] file
decode: ctwz -x file
-d
Specifies the depth of the context trees (default=8). Greater depths can improve compression for large files but require more memory and computation.
Some results on the Canterbury Corpus using depth 12 ctwz, with gzip (Lempel-Ziv) for comparison
file | size(bytes) | ctwz | gzip |
---|---|---|---|
E.coli | 4638690 | 1209950 | 1342009 |
bible.txt | 4047392 | 898767 | 1177372 |
world192.txt | 2473400 | 561751 | 724995 |
kennedy.xls | 1029744 | 167460 | 204016 |
ptt5 | 513216 | 55562 | 56482 |
plrabn.txt | 481861 | 151034 | 194357 |
alice29.txt | 152089 | 47817 | 54428 |
asyoulik.txt | 125179 | 42720 | 48922 |
- Parallelize context tree computations
- Lower memory requirements with pruning
1: Willems, F., Shtarkov, Y., & Tjalkens, T. (1995). The context-tree weighting method: basic properties. IEEE Trans. Inf. Theory, 41, 653-664.
2: Volf, P. (2002). Weighting Techniques in Data Compression Theory and Algorithms. Ph.D. thesis, Technische Universiteit Eindhoven.