Arithmetic Coding

I have updated my github project with support for arithmetic coding.  It uses the algorithm provided by Malte Clasen and Eric Bodden.  It is an integer based encoder (32 bit unsigned).

I have made some changes to the original implementation to separate the statistical models more fully from the coder.  This allows substituting models on a per symbol basis.

An example of this behavior is provided in the ArithmeticStream class (paralleling the compression classes in System.IO.Compression).  This class uses two models: a zero order model and a new symbol model.  The former is only initialized with two symbols (stream terminator and new character).  The latter is initialized with all characters.

Continue reading

Dynamic Huffman

Summary

The adaptive Huffman algorithm as described by Vitter had three constraints that were compelling to relax.

The first is the size of the table.  The size of the table was constrained based on initial knowledge of the set being compressed—typically constrained to 256 for a single byte alphabet size.

The NotYetTransmitted sequence never adjusted its weight based on its own frequency so clusters of new characters would not benefit from reduced sizes of this sequence.

Once new characters become rare, it makes sense to reduce the NotYetTransmitted weight and therefore demote it farther down the Huffman tree.

Source

The source for these changes can be found on GitHub. The implementation is C#. C/C++ implementations will be added in the future.
Continue reading

Two Responsibilities in the .NET TPL Scheduler Class

I like parallel programming.  Most of my experience comes from developing on Windows in C/C++ and C#.  I haven’t yet had the pleasure of doing so functionally (which I plan to correct).

I prefer task/job based parallelism using threadpools rather than explicit threads performing a suite of tasks (such as FIOS or Kinect).  I could delve into this further, but that is not the purpose of this post.

The Task Parallel Library hinted at an opportunity to build logic I have always wanted with a job based implementation.  I wanted to be able to define my own schedules for task execution.  Unfortunately there is a flaw.

The Scheduler class has two responsibilities.  The Scheduler class not only handles scheduling of tasks, but their execution as well.  Combining these two responsibilities inhibits scheduler composition beyond inheritance and encapsulation.  Furthermore any external libraries that create tasks based on their own schedulers cannot have the scheduling of their tasks altered by any other schedulers.

Continue reading