Currently gen/choose is the generator that everything else is based off of, and it's also a user-facing generator. Its current implementation takes a [0,1) double and scales it to the desired range. This approach degrades w.r.t. distribution as you get higher, and presumably has pretty terrible properties once your range is larger than 2^53. It could also be optimized better for when the range is a power of 2 (no need to go through Double). Seeing as how gen/choose is user facing, we should attempt to redesign it so that it doesn't degrade in this fashion, and can handle bigints.
Requirements
gen/choose should have a uniform distribution regardless of bounds
gen/choose should accept arbitrary integer bounds and handle them correctly
Things to investigate
Note that the SplittableRandom class generates integer ranges without going through floating point. We should check if this is faster, since it's definitely a more uniform distribution.
Is it worth optimizing for the power-of-two case, the same with SplittableRandom does in the link above?
When it's possible for the generator to return a long instead of a bigint is somewhat subtle. E.g., if you call (gen/choose Long/MIN_VALUE Long/MAX_VALUE), then when analyzing the size of the range you'll have to use a bigint (because 2^64 is not representable as a long). gen/choose should generate longs when both args are in long range, probably?
Is it actually possible to satisfy the requirements above without a performance hit? You'd think it was just a generator-creation-time change, but bind can blur the line between generator creation and generation time.
Environment
None
Activity
Show:
gfredericks March 30, 2018 at 4:48 PM
I'm reconsidering whether this is a good idea, compared to just designing a new set of cleaner generators.
gfredericks November 12, 2015 at 1:32 PM
On second thought, I might be more in favor of leaving choose the way it is, discouraging its use by users, and adding a uniform-integer generator that is specifically designed to handle arbitrary ranges. A big question is what it should do in JavaScript, and I have a couple ideas about that.
Currently
gen/choose
is the generator that everything else is based off of, and it's also a user-facing generator. Its current implementation takes a[0,1)
double and scales it to the desired range. This approach degrades w.r.t. distribution as you get higher, and presumably has pretty terrible properties once your range is larger than2^53
. It could also be optimized better for when the range is a power of 2 (no need to go throughDouble
). Seeing as howgen/choose
is user facing, we should attempt to redesign it so that it doesn't degrade in this fashion, and can handle bigints.Requirements
gen/choose
should have a uniform distribution regardless of boundsgen/choose
should accept arbitrary integer bounds and handle them correctlyThings to investigate
Note that the SplittableRandom class generates integer ranges without going through floating point. We should check if this is faster, since it's definitely a more uniform distribution.
Is it worth optimizing for the power-of-two case, the same with
SplittableRandom
does in the link above?When it's possible for the generator to return a long instead of a bigint is somewhat subtle. E.g., if you call
(gen/choose Long/MIN_VALUE Long/MAX_VALUE)
, then when analyzing the size of the range you'll have to use a bigint (because2^64
is not representable as a long).gen/choose
should generate longs when both args are in long range, probably?Is it actually possible to satisfy the requirements above without a performance hit? You'd think it was just a generator-creation-time change, but
bind
can blur the line between generator creation and generation time.