Redesign gen/choose

Description

Currently gen/choose is the generator that everything else is based off of, and it's also a user-facing generator. Its current implementation takes a [0,1) double and scales it to the desired range. This approach degrades w.r.t. distribution as you get higher, and presumably has pretty terrible properties once your range is larger than 2^53. It could also be optimized better for when the range is a power of 2 (no need to go through Double). Seeing as how gen/choose is user facing, we should attempt to redesign it so that it doesn't degrade in this fashion, and can handle bigints.

Requirements

gen/choose should have a uniform distribution regardless of bounds

gen/choose should accept arbitrary integer bounds and handle them correctly

Things to investigate

  • Note that the SplittableRandom class generates integer ranges without going through floating point. We should check if this is faster, since it's definitely a more uniform distribution.

  • Is it worth optimizing for the power-of-two case, the same with SplittableRandom does in the link above?

  • When it's possible for the generator to return a long instead of a bigint is somewhat subtle. E.g., if you call (gen/choose Long/MIN_VALUE Long/MAX_VALUE), then when analyzing the size of the range you'll have to use a bigint (because 2^64 is not representable as a long). gen/choose should generate longs when both args are in long range, probably?

  • Is it actually possible to satisfy the requirements above without a performance hit? You'd think it was just a generator-creation-time change, but bind can blur the line between generator creation and generation time.

Environment

None

Activity

Show:

gfredericks March 30, 2018 at 4:48 PM

I'm reconsidering whether this is a good idea, compared to just designing a new set of cleaner generators.

gfredericks November 12, 2015 at 1:32 PM

On second thought, I might be more in favor of leaving choose the way it is, discouraging its use by users, and adding a uniform-integer generator that is specifically designed to handle arbitrary ranges. A big question is what it should do in JavaScript, and I have a couple ideas about that.

Details

Assignee

Reporter

Priority

Created June 6, 2015 at 8:34 PM
Updated March 30, 2018 at 4:49 PM