By "discrete distibution" we just mean the roll you get from something like an unfair die, e.g. you want a random number from 0 to 5 but you want 4 and 5 to be twice as likely as 0, 1, 2, or 3. If we think of each possible random value as having a weight, in this case 0, 1, 2, and 3 would have a weight of 1 and 4 and 5 would have a weight of 2.
A simple way to generate these kinds of random values is the following. Given some n such that we want random values ranging from 0 to n-1 where for each 0 <= i < n we have a weight w(i):
- Build a data structure mapping cumulative weight to each value i. By cumulative weight we mean for each i the sum of w(0), ... , w(i-1).
- Generate a random number r from 0 to W-1 inclusive where W is the total weight i.e. the sum of w(i) for all i.
- If r is a cumulative weight in our data structure return the value associated with it; otherwise, find the value v that is the first item in the data structure that has a cumulative weight greater than r and return v-1.
However, in C# you can't just use a SortedDictionary, which is a binary search tree under the hood. You can't use a SortedDictionary because it does not expose the equivalent of C++'s std::lower_bound and std::upper_bound or the equivalient of Java's TreeMap.floorEntry(...) and TreeMap.ceilingEntry(...). In order to perform the "otherwise" part of 3. above you need to efficiently be able to find the spot in the data structure where a key would go if it was in the data structure when it is in fact not in the data structure. There is no efficient way to do this with a SortedDictionary.
However, C#'s List does support a BinarySearch method that will return the bitwise complement of the index of the next element that is larger than the item you searched for so you can use that.
The downside of this whole approach is that there will be no way to efficiently add or remove items to the discrete distribution, but often you don't need this functionality anyway and the code to do the whole algorithm is very concise:
[code=csharp:0]class DiscreteDistributionRnd{ private List m_accumulatedWeights; private int m_totalWeight; private Random m_rnd; public DiscreteDistributionRnd(IEnumerable weights, Random rnd = null) { int accumulator = 0; m_accumulatedWeights = weights.Select( (int prob) => { int output = accumulator; accumulator += prob; return output; } ).ToList(); m_totalWeight = accumulator; m_rnd = (rnd != null) ? rnd : new Random(); } public DiscreteDistributionRnd(Random rnd, params int[] weights) : this(weights, rnd) { } public DiscreteDistributionRnd(params int[] weights) : this(weights, null) { } public int Next() { int index = m_accumulatedWeights.BinarySearch(m_rnd.Next(m_totalWeight)); return (index >= 0) ? index : ~index - 1; }}where usage is like
DiscreteDistributionRnd rnd = new DiscreteDistributionRnd(3,1,2,6);int[] ary = new int[4] {0,0,0,0};for (int i = 0; i < 100000; i++) ary[rnd.Next()]++;System.Diagnostics.Debug.WriteLine( "0 => {0}, 1 => {1}, 2 => {2}, 3 => {3}", (float)(ary[0] / 100000.0), (float)(ary[1] / 100000.0), (float)(ary[2] / 100000.0), (float)(ary[3] / 100000.0));
Source
Nice, this general algorithm is called inverse transform sampling by the way. There is also the alias method for sampling a discrete distribution in constant time given some preprocessing work (useful if you will be drawing from the same distribution many times) but it does have some edge cases.