Alternative Formulations of the Stochastic Sobol Sequence

by Mayur Patel

The Stochastic Sobol sequence accepts, as input, a stream of random points in an arbitrary dimension. For each point, it performs an affine transform on the point based on its location and the results of previous calculations. The output points of the algorithm are a digital net, well-stratified in the base-2 intervals of the unit hypercube. This makes the sequence useful as a generator for Quasi-Monte Carlo sampling.

The original authors also presented a stateless version of the algorithm, which used hashing to replace pseudo-random number generation, so that points could be calculated in any arbitrary order. (Helmer, A., Christensen, P., and Kensler, A. 2021. Stochastic generation of (t, s) sample sequences.)

It is impressive that the algorithm is able to reshape a randomly generated point set, given all the potential flaws that the point set could exhibit. This raises the question which is the subject of this short article: can we spend less effort to generate the input points to the algorithm and still achieve useful results?

We explore this question through testing. We use the same base algorithm presented by Helmer et. al. to reshape several different input point sets. All the point sets tested can be generated deterministically, so they can all be used in the stateless version of the Stochastic Sobol algorithm.

Before we proceed, we need a quick refresher on Kronecker sequences. A coordinate for a Kronecker sequence point can be calculated with:

$p_s[i] = \{ i \times \alpha_s \}$

where i is the index of the point, and s is the dimension of the coordinate being calculated. The operator {x} is the fractional part of x, or x modulus one. The value $\alpha_s$ is an irrational value. A Kronecker sequence for s dimensions is defined by s linearly independent irrational values. In one dimension, the optimal Kronecker sequence is known and it is calculated using:

$\alpha_1 = \phi = \frac{1 + \sqrt{5}}{2}$

The value $\phi$ is the golden ratio.

We focus on Kronecker sequences in our testing because they have been shown to be extremely fast to calculate, especially when integer arithmetic is used (e.g. Patel, M. 2022. Optimizing Kronecker Sequences for Multidimensional Sampling.). Furthermore, they are uniformly distributed in the limit, even if short sub-sequences do not appear to be well-spaced. These properties seem to make Kronecker sequences a worthwhile alternative to pseudo-random number generators.

With that, these are the generators we tested as inputs to the Stochastic Sobol algorithm:

The hashing function recommended by the original authors.
The Kronecker sequence defined with $a_s = e^\frac{-2}{2+s}$. This kind of formulation is consistent with the recommendations of Zinterhof. (Zinterhof, P. 1996. Parallel generation and evaluation of Weyl sequences.).
An alternative arbitrary-dimension Kronecker sequence: $a_s = \phi^\frac{-2}{2+s}$
A projection of the optimal 1D Kronecker sequence into all dimensions: $a_s = \phi$
A constant value, $\frac{1}{\phi}$, used for all coordinates of all points.

Below, we show the first 1024 points in the first two dimensions produced using each of the input signals:

Figure: 1024 points generated from the input signal ks(power e)

Figure: 1024 points generated from the input signal ks(power phi)

Figure: 1024 points generated from the input signal ks(1D)

Figure: 1024 points generated from the constant input signal

Figure: 1024 points generated from the hash input signal

For all of the signals except the constant value, the power spectral densities of the first two dimensions were favorable:

Figure: PSD resulting from the input signal ks(power e)

Figure: PSD resulting from the input signal ks(power phi)

Figure: PSD resulting from the input signal ks(1D)

Figure: PSD resulting from the constant signal

Figure: PSD resulting from the hash input signal

For all the input signals, the resulting unanchored $L_2$ discrepancy and the diaphony in the first two dimensions were extremely similar:

We tested the diaphony in the first eight dimensions to validate the result in higher dimensions:

All the input signals were successfully reshaped into digital nets of very similar quality. The result using the constant input, however, showed a poor PSD and so we cannot recommend it for production use.

The benefit of using a simpler input signal is performance. In our testing, on admittedly old hardware, we were able to generate 102 million 2D points per second using the hashing method endorsed by the original authors. Using the projected 1D Kronecker sequence as input, we were able to generate 187 million 2D points per second.

The product of our simple experiment is an 83% faster version of the Stochastic Sobol sequence, of comparable quality to the original, which supports a stateless implementation. Existing source code can be quickly adapted to the new method with very little effort. Our testing was terse, so we encourage you to evaluate it in the context of your own environment and application. The fact that the algorithm can reshape such a variety of input signals is a testament to the resilience of the approach.

Musings of a Curious Engineer

Search This Blog

Alternative Formulations of the Stochastic Sobol Sequence

Comments

Post a Comment