Functions concerned with random number generation.
There are two flavours of functions here:
index-based (random*
) and sequential (nextRandom*
).
Briefly, the index-based ones are safer to use, but provide poorer
random statistics, while the sequential ones provide decent randomness
but are not suitable for use in some/most contexts.
They are documented separately below.
Index-based functions
The functions named random*
all take an index
parameter which determines the value of the result;
the same index always leads to the same output,
but there is not supposed to be any obvious relationship between index
and output.
An explicit index is required to ensure that a given cell always has
the same value, since cell values are in general calculated on demand.
The quality of the randomness for these functions may not be that good.
In most cases, the table row index, available as the special token
$0
, is a suitable value for the index
parameter.
If several different random values are required in the same table row,
one way is to supply a different row-based index value for each one,
e.g. random(2*$0)
and random(2*$0+1)
.
However, this tends to introduce a correlation between the random
values in the same row, so a better (though in some cases slower) solution
is to use one of the array-generating functions, e.g.
randomArray($0,2)[0]
and randomArray($0,2)[1]
.
The output is deterministic, in the sense that the same invocation will always generate the same "random" number, even across different machines. However, in view of the comments in the implementation note below, the output may be subject to change in the future if some improved algorithm can be found, so this guarantee does not necessarily hold across software versions.
Implementation Note:
The requirement for mapping a given input index deterministically
to a pseudo-random number constrains the way that the random number
generation is done; most well-studied RNGs generate sequences of
random numbers, but that approach cannot be used here, since these
sequences do not admit of random-access.
What we do instead is to scramble the input index somewhat and use that
as the seed for an instance of Java's Random
class,
which is then used to produce one or more random numbers per input index.
Some thought and experimentation has gone into the current implementation
(I bought a copy of Knuth Vol. 2 specially!)
and an eyeball check of the results doesn't look all that bad,
but it's still probably not very good, and is not likely to pass
random number quality tests (though I haven't tried).
A more respectable approach might be to use a cryptographic-grade
hash function on the supplied index, but that's likely to be much slower.
If there is demand, something like that could be added as an alternative
option. In the mean time, beware if you use these random numbers for
scientifically sensitive output.
Sequential functions
The functions named nextRandom*
have no arguments,
and supply the next value in a global sequence when they are evaluated.
These can be used if scanning through a table once (for instance when
writing a table using STILTS), but they are not suitable for contexts
that should supply a fixed value.
For instance if you use them to define the value of a table cell in TOPCAT,
that cell may have a different value every time you look at it,
which may have disconcerting results.
These use the java.util.Random class in a more standard way
than the index-based functions
and should provide random numbers of reasonable quality.
random( index )
Note: The randomness may not be very high quality.
index
(long integer): input value, typically row index "$0
"randomGaussian( index )
Note: The randomness may not be very high quality.
index
(long integer): input value, typically row index "$0
"randomArray( index, n )
Note: The randomness may not be very high quality.
index
(long integer): input value, typically row index "$0
"n
(integer): size of output arrayn
-element array of random numbers between 0 and 1randomGaussianArray( index, n )
Note: The randomness may not be very high quality.
index
(long integer): input value, typically row index "$0
"n
(integer): size of output arrayn
-element array of random numbersnextRandom( )
This function will give a different result every time, hence it is not suitable for use in an expression which should have a fixed value, for instance to define a TOPCAT column.
nextRandomGaussian( )
This function will give a different result every time, hence it is not suitable for use in an expression which should have a fixed value, for instance to define a TOPCAT column.