Arithmetic Coding

I have updated my github project with support for arithmetic coding.  It uses the algorithm provided by Malte Clasen and Eric Bodden.  It is an integer based encoder (32 bit unsigned).

I have made some changes to the original implementation to separate the statistical models more fully from the coder.  This allows substituting models on a per symbol basis.

An example of this behavior is provided in the ArithmeticStream class (paralleling the compression classes in System.IO.Compression).  This class uses two models: a zero order model and a new symbol model.  The former is only initialized with two symbols (stream terminator and new character).  The latter is initialized with all characters.

Continue reading

Dynamic Huffman

Summary

The adaptive Huffman algorithm as described by Vitter had three constraints that were compelling to relax.

The first is the size of the table.  The size of the table was constrained based on initial knowledge of the set being compressed—typically constrained to 256 for a single byte alphabet size.

The NotYetTransmitted sequence never adjusted its weight based on its own frequency so clusters of new characters would not benefit from reduced sizes of this sequence.

Once new characters become rare, it makes sense to reduce the NotYetTransmitted weight and therefore demote it farther down the Huffman tree.

Source

The source for these changes can be found on GitHub. The implementation is C#. C/C++ implementations will be added in the future.
Continue reading