## Monday, March 31, 2014

### 99 Clojure Problems – 27: Group the Elements of a Set into Disjoint Subsets

1. In how many ways can a group of 9 people work in 3 disjoint subgroups of 2, 3 and 4 persons? Write a function that generates all the possibilities.
2. Generalize the above predicate in a way that we can specify a list of group sizes and the predicate will return a list of groups.

Note that we do not want permutations of the group members; i.e. ((Aldo, Beat), ...) is the same solution as ((Beat, Aldo), ...). However, we make a difference between ((Aldo, Beat), (Carla, David), ...) and ((Carla, David), (Aldo, Beat), ...).

You may find more about this combinatorial problem in a good book on discrete mathematics under the term "multinomial coefficients".

### Example


(deftest p27-multinomial-coeffients
(is (=
'((("Peter" "Paul") ("Mary"))
(("Peter" "Mary") ("Paul"))
(("Paul" "Mary") ("Peter")))
(group [2 1] ["Peter" "Paul" "Mary"]))))



### Solution

The mulinomial coefficients are a generalisation of the binomial coefficients that we saw in the last problem. $${n \choose k_1, k_2, \ldots, k_m} = \frac{n!}{k_1!\, k_2! \cdots k_m!}$$ I used this to validate some aspects of my solution as you can see in the tests. Showing the correctness of your solution in a traditional unit test gets quite unwieldy when you have, as in this example, 1260 combinations to check.

My solution to part a) of the problem is based on the idea to reuse the enumeration of all k-combinations of n elements from problem 26. We then take all possible combinations of four people out of the group of nine and remove these people from the set of all people by using Clojures set functions. We continue to do that for the 3-combinations and the 2-combinations and add them into a result vector.

The repeated set creation/difference code is neither elegant nor performant but a pragmatic choice given that the combinations function was supposed to work with/return lists. Another area for improvement is the calculation of the last k-combinations. It is redundant as its result is just the remainder of the original input sequence.

The solution to part b) is a bit more involved but very closely related in terms of its structure. It's a recursive solution that starts with calculating the combinations for the first group and mapping the result to a recursive self-call. In each recursive call we remove the elements/persons already "used" from the input set. We also remove the k for which we just calculated the combinations from the list of group sizes. The whole operation is wrapped in a mapcat to flatten the result into just one level as you can see in the example.

## Sunday, March 30, 2014

### 99 Clojure Problems – 26: Generate the Combinations of K Distinct Objects Chosen From the N Elements of a List

"In how many ways can a committee of 3 be chosen from a group of 12 people? We all know that there are C(12,3) = 220 possibilities (C(N,K) denotes the well-known binomial coefficients). For pure mathematicians, this result may be great. But we want to really generate all the possibilities."

### Example:

(deftest p26-combinations
(is (= '((0 1) (0 2) (1 2)) (combinations 2 (range 3))))
(is (= 220 (count (combinations 3 (range 12))))))


### Solution:

Here is my thinking, which is inspired by SICPs counting change example: If you want to enumerate all combinations of 2 distinct objects chosen from the list (0 1 2) you could fix the first element to be the first element of the list. That leaves you with a list of two elements and one more object to choose. There are only two combinations for C(2, 1): the two remaining elements of the list. Join them to the first elements we have already chosen and we end up with (0 1) and (0 2). Do the same thing again: Enumerate all combinations of two distinct objects from the remaining list (1 2), there is just one: the list itself is the answer.

We can generalize these observations into a solution: combinations of k distinct objects chosen from a list of n elements are the first element of the list conjoined to the combinations of k-1 distinct objects from the rest of the list, plus all combinations of k distinct objects from the list without its first element (again: the rest or tail of the list).

## Wednesday, March 19, 2014

### Example:

I am a bit at a loss when it comes to providing a meaningful test case that illustrates the problem, as I did in the previous posts in this series.

Based on my current understanding we would have to show that the randomness of the results matches a uniform random distribution. But lets look at the solutions first. I will come back to that question at the end.

### Solution:

There are at least three solutions to this problem that I am aware of.

#### 1. Idiomatic/Java Interop

The first one is to reuse the idiomatic variant of the random-select, which is based on java.util.Collections#shuffle, which in turn implements AFAIK the well known Fisher-Yates shuffle algorithm. While a functional purist might sniff at the mutation involved I think, this is perfectly fine in this case as the mutability is encapsulated in a functional/immutable shell. Your sequence is translated into a mutable collection to use the efficient algorithm based on swapping. The result is then transformed back into a persistent data structure: a pattern often used to get the best of both worlds.

#### 2. Naive Functional

Solution number two is functional but not very efficient. If I am not mistaken the running time of this solution should be O(n^2). We are calling remove-at n times, which calls split-at. Split-at has linear complexity.

#### 3. Perfect Functional Shuffle

The third solution is nothing I came up with myself. I merely followed a hint given by Phil Gold. It pointed to Oleg Kiselyov's post about perfect functional shuffle in Haskell.

Implementing it in Clojure turned out to be quite a lot of work. I have never done any Haskell and I am not sure if I got all of if right. It is probably best you have a look at Oleg's text first.

I did not read up much on Haskell syntax, but nonetheless: it started to make sense after looking at the code for a while even without knowing the language. The biggest difference with effects on the structure of the program: Haskell comes with continuation passing style built right in— something I could not or did not want to recreate in Clojure.

The gist of the solution is to build a binary tree out of the input sequence. The tree allows for efficient extraction of branches and is by its very nature at most ceil(log2 n) deep. The overall complexity of the solution should be therefore O(n*log n). Again, read the original text for more detail.

I decided to use core.match and add with it the first dependency to the project. This allowed me to stick closer to the original solution which makes heavy use of Haskell's pattern matching.

Translating the essential two functions was pretty straightforward and core.match offered all I needed. The syntax is a bit clunkier (notably the guard clauses) than Haskell's succinctness but it's not in a different ballpark alltogether.

The first interesting function is called pair-leaves. It takes a sequence of nodes—initially all leaves—(represented by vectors of the form [:node|:leaf count value]) and joins them together to form a binary tree. We need four pattern matches to do that: matching two leaves, a node and a leaf, a leaf and a node and finally one matching two nodes.

This gives us the binary tree. We now take a sequence of random numbers (each being an independent sample of a uniform random distribution [0...n-1]) and extract the numbers at the positions indicated by those random indices from the tree.

We need six pattern matches to extract from the tree. (Oleg needs five as Haskell apparently has something like a multiple guard with sub-branches which spares him writing the last match.) The patterns are:

1. The first element is always a leaf on the left.
2. Any but the first element when our tree consists of two leaves: the right leaf.
3. Any but the first element where our tree is bigger than two leaves: extract recursively to the right.
4. Any but the first element where n + 1 is equal to (or greater) than the count of the root node and we have a leaf as the right subtree. This is actually the biggest deviation from Oleg's code and possibly a mistake on my part.
But as I understand it, it is due to the lack of CPS in my solution. Oleg's continuations take the flow of execution back up into the shuffle function where he deals with the case of a single leaf. My solution has to be able to deal with the leaf case inside the extract-tree — by weakening the guard here.
5. The last two are closely related: a tree with a non-leaf sub-tree to the left and none of the above: if the number we are looking for is in the left subtree (as indicated by its count on the node) we recursivly extract to the left, otherwise to the right.

### Verifying the Results

What is left to do is to verify the property of uniform random distribution. The assumtions about time complexity as outlined by Oleg seem obvious to me. Let's look at a histogram:

This histogram left me wondering whether I had made a mistake. It does seem to show a quite clearly discernible repetitive sawtooth pattern. I created the data with this snippet of Clojure:

(def input '[a b c d e])
(def perms (vec (combo/permutations input)))
(def samples (repeatedly 10000 (fn [] (.indexOf perms (random-permute-functional input)))))


Compare this with a histogram of 10000 results generated by the first solution based on Fisher-Yates:

The most likely explanation is a mistake in my implementation or my reasoning. I have not found it yet, but I am quite keen on any hints. Could it be that perfect functional shuffle is not quite perfect? Probably not.

### Update:

Of course not. Andy Fingerhut was so kind to point out my mistake to me: You have to choose the random numbers uniformly distributed within [0,n-1] for the first number, [0, n-2] for the second number and so on. Otherwise you will bias your shuffle towards whatever is the rightmost branch of your tree, as every number > n where n is the size of your current tree will extract the rightmost element. You can find the corrected version here. The histogram looks much better now too: