by Guest » Tue Mar 20, 2007 2:36 am
I am a big fan of using Mersenne Twister with its 2^19997-1 return period. It's also very very fast; as fast as Ran1 from Numerical Recipes. Which is a terrible RNG. I use the U(0,1) outputs to generate vectors as input to a templated BM function. The MT box-mueller implementation "randNorm" is naive, using two U(0,1)'s to generate a single N(0,1), and is not great for a performance critical section of code. Just implementing your own BM type code will speed things up by at least a factor of two (possibly more, if you consider function call overhead).
There are two variants on the Box-Mueller trick. One throws away bad variates (the so called polar rejection method). The other uses trigonometric functions. I have found that it is hard to guage which is faster a priori. If you can convince your compiler to vectorize a calculation on a big block of data, then the trig version is probably faster (vectorizers hate "if" statements). At least that's been my experience doing this on PowerPC and i686 cpus. If you don't operate on big blocks of data, I don't know if it makes a difference.
For distributed RNG the problem is always trying to make sure the different threads don't clash. You obviously have to use a vector of input seeds, one per thread. With a 2^19997-1 cycle period, Mersenne Twister offers acceptably close to zero chance of seed collision, which is longer than I need the program to run.