Clojure

s/inst-in and s/int-in generators should have uniform distribution not biased towards min value

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: Release 1.9
  • Fix Version/s: Release 1.10
  • Component/s: None
  • Labels:
  • Patch:
    Code
  • Approval:
    Vetted

Description

The s/inst-in and s/int-in generators are based on gen/large-integer* which grows from 0.

(require '[clojure.spec.alpha :as s] '[clojure.spec.gen.alpha :as gen])
(gen/sample (s/gen (s/int-in 0 100)))
;;=> (1 0 1 1 1 0 1 1 72 1)

(gen/sample (s/gen (s/inst-in #inst "2001-01-01" #inst "2001-12-31")))
;;=> (#inst "2001-01-01T00:00:00.000-00:00" #inst "2001-01-01T00:00:00.000-00:00" #inst "2001-01-01T00:00:00.001-00:00" #inst "2001-01-01T00:00:00.001-00:00" ...)

Proposed: Instead, s/inst-in should use a uniform distribution generator:

After on same:

(26 16 65 96 63 37 31 4 94 9)

(#inst "2001-03-03T04:51:43.702-00:00" 
 #inst "2001-07-25T07:13:03.224-00:00" 
 #inst "2001-03-31T18:28:41.625-00:00" 
 #inst "2001-04-17T19:33:14.176-00:00" 
 #inst "2001-01-14T07:03:08.521-00:00" 
 #inst "2001-06-06T09:52:03.421-00:00" ...)

Patch: clj-2179.patch

Activity

Hide
Gary Fredericks added a comment -

What problem is this trying to solve?

Show
Gary Fredericks added a comment - What problem is this trying to solve?
Hide
Alex Miller added a comment -

Typically I find having the values biased towards the min value of the range (particularly in the inst case where values have to grow a lot to seem different) to not be what I expect as a user.

But I think the question is what the intended behavior should be for range specs. Whether or not Rich and Stu agree, I don't know yet.

Show
Alex Miller added a comment - Typically I find having the values biased towards the min value of the range (particularly in the inst case where values have to grow a lot to seem different) to not be what I expect as a user. But I think the question is what the intended behavior should be for range specs. Whether or not Rich and Stu agree, I don't know yet.
Hide
Gary Fredericks added a comment -

For inst, I'd recommend at least generating the components of the timestamp separately (year, month, day, hour, etc.) and combining them with gen/fmap. That makes it shrink more naturally, and makes it easier to specify whatever strategy you like for biasing toward the present.

W.r.t. int ranges, I will only point out that one of test.check's features is to start tests with "small" values, so to whatever extent you impose uniform distributions, you neutralize that feature.

Show
Gary Fredericks added a comment - For inst, I'd recommend at least generating the components of the timestamp separately (year, month, day, hour, etc.) and combining them with gen/fmap. That makes it shrink more naturally, and makes it easier to specify whatever strategy you like for biasing toward the present. W.r.t. int ranges, I will only point out that one of test.check's features is to start tests with "small" values, so to whatever extent you impose uniform distributions, you neutralize that feature.
Hide
Gary Fredericks added a comment - - edited

if you'd like a generator that starts out at both the min and the max and can shrink to either, something like this should work:

(defn bi-biased-int-range
  [min max]
  (let [g (gen/large-integer* {:min min, :max max})
        g' (gen/let [x g] (- max (- x min)))]
    (gen/one-of [g g'])))
Show
Gary Fredericks added a comment - - edited if you'd like a generator that starts out at both the min and the max and can shrink to either, something like this should work:
(defn bi-biased-int-range
  [min max]
  (let [g (gen/large-integer* {:min min, :max max})
        g' (gen/let [x g] (- max (- x min)))]
    (gen/one-of [g g'])))

People

Vote (1)
Watch (1)

Dates

  • Created:
    Updated: