Wednesday, December 31, 2014

99 Clojure Problems-50: Huffman Code


(deftest huffmann
  (is (= '([a "0"] [c "100"] [b "101"] [f "1100"] [e "1101"] [d "111"])
         (-> '([a 45] [b 13] [c 12] [d 16] [e 9] [f 5])


Huffman coding is a variable length coding based on the frequency of each possible source value. You might have a look at the wikipedia article (or a good book on discrete mathematics and/or algorithms as suggested on the original "99 problems" page) to get a basic understanding of what we are trying to do here.

I used a variation of the two queues method: one queue for the initial set of leaf nodes, the other for the aggregate or branch nodes. The queues have to be ordered by ascending frequency. If you can use a priority queue that does the ordering for you that's fine. You then dequeue the two items with the lowest frequency from each queue (or from just one queue if the other is empty) and combine the two leaves into a branch node. You continue with this procedure until there is only one node left, which is your Huffman tree.

Have a look at the code, but, as usual, first try it yourself. I'm not entirely happy with my solution, which seems to be correct, but some of its semantics are rather subtle. Did you spot that the check for an empty queue is implicit in the default value of Long/MAX_VALUE when accessing the next value polled from the queue? As always, I am happy to get feedback on the solutions presented.

This is the last post for 2014. I set out at the beginning of the year to solve all 99 problems in Clojure that were mentioned in the original 99 Prolog problems compilation. I managed to do the first 50 and I really enjoyed them. Nevertheless, when learning a new language, solving these kinds of nicely prepared educational problems takes you only half the way. I think that the next step should be to build an actual system in Clojure. I still found doing these exercises rewarding and useful, as they allow me to do some Clojure in the limited spare time I have. I am going to continue with the series next year hoping to solve the remaining problems.

No comments: