Skip to end of metadata
Go to start of metadata

test.check

Design ideas and planning for test.check.

 

API Overhaul Ideas

Responding to TCHECK-99 got me thinking about trying to pull off a complete redesign of the (mostly generators) API, while maintaining as much backwards compatibility as possible, at least for a few versions.

I'm going to try to list here the sorts of problems that could be fixed with such an overhaul that would be hard to fix otherwise.

  • Generators
    • Convert all generators to be functions; if a default generator is available, it would be obtained by calling the function with 0 args
      • Removes confusion such as with the current pair of large-integer and large-integer*
    • Get rid of nat, pos-int, s-pos-int, and similar. Replace with a generator called small-integer that takes various options
    • rename large-integer to integer, consider whether to allow bigints
    • change gen/vector and gen/list APIs to match gen/set
    • Maybe allow passing options when calling a generator, so that generators like strings and doubles can be customizable in a different way
      • e.g., this way a generator like gen/any could support being customized to never generate NaNs anywhere without having to explicitly code for that
  • clojure-test
    • change defspec to use clojure.test/is
      • can maybe do this w/o breaking changes by putting an alternative to prop/for-all in the clojure-test namespace???
  • Properties

Numerics

The situation here used to be much worse, when all of the integer generators would only generate numbers up to ~200 by default, and no double generator existed at all. Now we have gen/large-integer which does well for large ranges of integers, and gen/double which is also pretty good. The remaining holes are relatively minor:

  • numbers from infinite sets
    • a generator for bigints; unclear if this would require bounds or not
    • a good generator for ratios; again unclear about bounds, both on abs value and on denominator size
    • bigdecimals
  • some lower-level utility generators
    • a generator for doubles between 0 and 1
    • a raw generator of evenly-distributed longs

What is a solid approach to generating infinite-domain types?

In particular bigints, ratios, bigdecimals, and maybe even collections.

One idea I had was a generator that always has a non-zero probability of generating any given value, and the size parameter determines the average size of the thing generated (e.g., in bits). I think this means that the size of the output, at least to start, would have a geometric distribution.

How should NaNs be handled?

Background

gen/double and gen/double* generate NaN, at least by default. This is intentional, since NaN is a useful edge case to test for any code that deals with doubles. However NaN != NaN, which means that any data structure containing a NaN is not equal to itself, which screws up all sorts of things.

This isn't such a problem for gen/double, since users can opt-out of NaNs with gen/double* when appropriate. However, gen/simple-type, gen/simple-type-printable, gen/any, and gen/any-printable are all affected by this in 0.9.0.

Possible Approaches

  1. Modify generators in-place
    1. Just the four composite generators (gen/simple-type{-printable} and gen/any{-printable})
    2. gen/double doesn't generate NaN, and gen/double* allows it as opt-in
  2. Add new generators
    1. Analogues of the four composite, or some subset (gen/simple-type-no-NaN, gen/any-no-NaN, etc.)
  3. Something else?
    1. arguably users can manage this on their own, although excluding NaN from gen/any is not so simple, since a naïve wrapping in gen/such-that won't work

Test Runner Topics

Test Failure Feedback

Currently a test error (i.e., exception caught) gives ample feedback because the user gets the exception object to inspect, but a mere failure (falsy return value) doesn't communicate anything other than whether the result was false or nil. clojure.test goes farther for this, with a probably-overly-macrocentric approach.

Probably the best approach is to evaluate the result of a property expression with a protocol, so the current behavior can be maintained but new implementations can provide more information.

I think reid has already started in this direction on the "feature/result-map" branch

Related: https://github.com/yeller/matcha and this alternate clojure.test integration.

Parallel Tests

Requires the immutable RNG for full determinism.

Do we actually need complex coordination of threads or can we do a naive thing?

Rerunning a single test easily

Running tests for a specific amount of time

Related: test.chuck/times, and a branch that Tom Crayford worked on

Some ideas from Hypothesis

From this (http://www.drmaciver.com/2015/01/using-multi-armed-bandits-to-satisfy-property-based-tests/) blog post primarily. The library itself is here.

It essentially uses a rather more sophisticated approach to getting more interesting distributions.

Idea for trying this out: have each generator keep track of how many size parameters it can accept. E.g., gen/int takes one size parameter. (gen/list gen/int) takes two (one for the size of the list, one for the size given to gen/int). (gen/tuple g1 g2 g3) takes the sum of the size-parameter-counts for its args. This breaks down when using bind and recursive generators, so we might have to fall back on just passing an infinite number of size parameters, or using the RNG to determine them, or something like that.

On the other hand, I spoke to David directly about this feature and he says he regrets including it in hypothesis due to the high complexity and low value. I didn't ask why he thought it was low value.

Orchestrating Concurrent (and distributed?) Programs

The point being to test them deterministically, and to test different possible linearizations to look for concurrency bugs.

Testing Generators, Regression Tests

Several related topics:

  • How do we validate that a generator is likely to generate certain kinds of things?
  • How could we collect stats about the sorts of things generated?
  • When we find a rare failing test, is there an easy way to ensure that such cases are generated more often?

Need to research other QC impls.

Generator Syntax Sugar

We ended up adding gen/let for this, but test.chuck/for is still fancier and especially useful for tuples.

Shrinking Topics

Check other QC impls for all of this.

Can we use more specialized generators to minimize the need for bind?

There was an example somewhere of trying to generate a 2d vector, where doing it without bind seemed impossible but bind caused it to shrink poorly.

Also see "Stateful Generators" elsewhere on this page.

Should shrink-resumption be part of the API?

 

Custom shrinking algorithms?

 

Advanced Generators

"Stateful" Generators

There are common use cases for being able to generate sequences of events, where each event can affect what future events are possible. Perhaps the most natural is modeling a collection of records with create/update/delete events, where you can only have an update or delete for a record that already exists (and no updates on a deleted record).

This can be done pretty easily with a recursive generator that uses gen/bind at each step, but this comes at the huge expense of almost entirely thwarting the shrinking process. It would be great to devise some general way of constructing these sorts of generators that also shrinks well. This seems easier for special cases like the record-set case described above, but difficult in the general case.

Single Set of Independent Records

A specialized generator that takes as input args similar to gen/hash-map (i.e., a description of what keys a record has and generators for each of the keys) and shrinks reasonably well should be possible to write, especially if there are no constraints beyond the ordering of the create/update/delete for individual records.

There would be custom low-level generator code that avoids using gen/bind, paired with custom shrinking code. The shrinking part might only have to ensure that if a shrink removes a "create" event that it deletes all subsequent events for that record. There might also have to be a reified "id" that doesn't shrink or else shrinks all references to an id in unison.

Full Relational Dataset

This would be like the previous example, but would involve multiple datasets ("tables") and relationships between them ("foreign keys"). The tricky part would be intelligent handling of foreign keys. If a record's relative disappears, the shrinking code would manually do cascading deletes when necessary (or perhaps divert foreign keys to other records when that makes sense).

General Problem

The only natural framing of the general problem that I can think of is a generator such as:

(defn gen-events
"Given an init-state and a reduce function for determining the
current state from a sequence of events, together with a function
that takes a state and returns a generator of a new event, returns
a generator of sequences of events."
[reduce-func init-state state->event-gen]
...)

It seems difficult to find a general way to layer smart shrinking on top of this sort of API, since any change to earlier events could potentially require arbitrary changes to future events, and I'm not sure how to build in a way of allowing the user to specify those changes.

One idea is to take an additional argument, a function that receives an event that was previously generated and a new state that is the precursor to that event. The function can choose to leave the event as-is (if it's still valid in the new state), apply return a function suitable for use with fmap (if the event can be easily adjusted to the new state), or signal that the event should be removed altogether. I think this should be sufficient for implementing the two dataset examples above, but I'm not sure whether it works for more complex uses.

Breaking Changes Fantasy

If we decided to make a large breaking-changes release, what would we include?

  • Rename the pos-int/s-pos-int style functions
  • Change the arg order for bind? It's awkward that it's opposite fmap
  • Change the multi-clause behavior of prop/for-all?
    • Currently can't integrate test.chuck/for w/ prop/for-all support without breaking this

 

Old/Obsolete Notes

Things that are done or decided against.

Immutable RNG

I think we need to do some more performance checks. Ideas for things to look into:

  • How does linear generation with java.util.SplittableRandom compare to JUR and IJUSR? Knowing this should give us an idea if the slowdown with IJUSR is more about the algorithm or immutability
    • Seems to be entirely about immutability. E.g., JUSR is actually slightly faster with JUR, even after removing concurrency from JUR.
  • Can we squeeze any extra performance by using a more specialized API anywhere?
    • Like split-n
      • This is looking like the most promising
  • Can we generate batches of numbers faster than the more general splitting method? If so, can we take advantage of that cleanly in the generators?
    • E.g., you could linearly generate 256 longs, put them in an immutable array, and have the RNG reference a range of that array. rand-long returns the first number in the range, splitting returns two objects with the range cut in half. Splitting a singleton range triggers another array creation. I think there are still some subtleties to work out though.
    • It's also worth noting that the splittable API involves twice as many allocation as a linear-immutable API (see split-n idea above)
  • Is the macro usage subverting inlining?
  • Is the long->double->int process taking a while?
  • Try using an unrolled pair (ztellman style) for the return value from split

Some code for messing with this stuff is on this branch.

Labels: