Skip to end of metadata
Go to start of metadata

defrecord and deftype improvements for Clojure 1.3

Motivation

The Java unification of records prevents them from being first class, in either the data or fn sense:

  • record data is not first class
    • can't read/write them
      • crummy choice: maps are good as data, need records for protocol polymorphism
    • user code cannot fix this
      • anything that requires EvalRead is not a fix
  • record creation is not first class
    • no per-record factory fn (or access to any associated fn plumbing, e.g. apply)
    • Clojure level use/require doesn't get you access to records
    • user code can mostly fix this (defrecord+factory macro)
  • Symmetrically, PO Java classes are also not first class
    • A unified reader form would be ideal
  • Reduced import complexities

Solutions

For the sake of discussion, focus will revolve around example defrecords and deftypes defined as

(ns myns)

(defrecord MyRecord [a b])
(deftype MyType [a b])

Semantics of records as first class data

The semantics of record reader forms and record factory functions are defined as follows:

(-> (MyRecord. <initialization value> <initialization value>)
    (into {:a 1, :b 2})
    <validation>)

note: The semantics illustrated above should not be taken as implementation detail. at the moment <validation> is undefined and should be considered a no-op.

The <initialization value> refers to the same default default values for Java primitive types (as defined by type hinting on the record fields) or nil for instances. For record reader forms, the keys and values must remain as constants as their semantics require that the readable form coincide with the evalable form.

Record and Type reader forms

There would be two additional reader forms added to Clojure.

Labelled record reader form
#myns.MyRecord{:a 1, :b 2}
Positional record and type reader forms
#myns.MyRecord[1 2]

and

#myns.MyType[1 2]

This syntax satisfies the need for a general-purpose Java class construction reader form. However, not all Java classes are considered fully constructed after the use of their constructors. Therefore, serialization support is not provided for any Java classes by default. For instances such as these, Clojure will continue to provide facilities via print-dup in the known ways.

Generated factory functions

When defining a new defrecord, two functions will also be defined in the same namespace as the record itself. For new deftypes, only the positional constructor outlined below is generated.

Factory function taking a map (defrecord only)

A factory function named map->MyRecord taking a map is defined by defrecord.

(myns/map->MyRecord {:a 1, :b 2})

;=> #myns.MyRecord{:a 1, :b 2}
Factory function taking positional values (defrecord and deftype)

A factory function named ->MyRecord taking positional values (as defined by the record ctor) is also defined by defrecord.

(myns/->MyRecord 1 2)

;=> #myns.MyRecord{:a 1, :b 2}

and

(myns/->MyType 1 2)

;=> #<MyType myns.MyType@2ed277f2>

Writing records

When writing record data for the purposes of serialization, the positional reader form is used by default:

(binding [*print-dup* true]
  (pr-str (MyRecord. 1 2)))

;=> "#myns.MyRecord[1, 2]"

However, if you wish to use the map reader form instead, then the following would work:

(binding [*print-dup* true
          *verbose-defrecords* true]
  (pr-str (MyRecord. 1 2)))

;=> "#myns.MyRecord{:a 1, :b 2}"

note: printing forms for types are not provided by default

Tool support

Defining Clojure defrecords will also expose static class methods useable at the Java API level. These methods are not documented with the intention of public consumption and are considered implementation details.

Static factory for defrecords

The static factory exposed will mirror the map->MyRecord function:

(MyRecord/create aMap)
Basis access

A static factory allowing access to the basis keys will also be provided:

(MyRecord/getBasis)
;=> [a b]

and

(MyType/getBasis)
;=> [a b]

The getBasis method will return a PersistentVector of Symbols with (potentially) attached metadata for each field.

Old Ideas

Lesser Problems:

  • generic factory fn
    • like factory fn, but generic with name
    • introduces weak-referencing, modularity issues, etc.
    • don't have a good problem statement, so ignoring this for now
    • which comes first: generic or specific?
  • support for common creation patterns
    • named arguments
      • with more than a few slots, record construction is difficult to read
    • default values
      • maybe needs to be a property of factory fn, not record
      • different factory fns can have different defaults
    • validations
    • are the patterns truly common?
    • very solvable in user space, esp. if per-record factory fn available
  • application code needing to know record fields
    • synthesizing data
    • creating factory fns if we don't provide them

Challenges:

  • how evaluative should record read/write be?
    • option 1: records are data++: no EvalReader needed, no non-data semantics
    • option 2: records are more:
      • maybe EvalReader required?
      • maybe special eval loopholes for constructor fns?
    • option 1 wins
  • what happens when readers and writers disagree about a record's fields?
    • positional approach would either fail or silently do the wrong thing
    • k/v approach lets you get back to the data
      • still on you to fix it
  • does this have be a breaking change?
    • data print/read: no
    • constructor fn: yes
      • any good generated name likely to collide with what people are using
  • what if defrecord is not present on the read side?
    • fail?
    • create a plan map instead
      • plus tag in data?
      • plus tag in metadata?
      • reify in a tagging interface
    • attempt to load
      • no – could lead to arbitrary code injection during read

Some Options:

  • create reader/writer positional syntax, no constructor fn
    • pros
      • easy to deliver efficiently
      • non-breaking
      • introduces no logic (user or clojure) into print/read
    • cons
      • what happens if defrecord field count changes?
      • what happens if field names change?
        • no way to know
    • feels like a non-starter
  • create reader/writer kv syntax, no constructor fn
    • pros
      • non-breaking
      • introduces no logic (user or Clojure) into print read
      • can still recover data if defrecord structure has changed
    • cons
      • how to deliver read efficiently?
        • create empty object + merge
          • cache the empty object we merge against?
        • reflect against object and manufacture reader fn
          • who keeps track of this?
          • how would this interact with constructor, if we add that separately?
        • add a map-based constructor to defrecord classes
          • what would its signature be?
        • add a static map based factory fn to defrecord classes
  • reader/writer syntax that depends on a new factory fn
    • pros
      • can be efficient
      • can implement any policy in handling defrecord changes
    • cons
      • likely breaking (what will the fn names be?)
      • read/write now depends on fns
  • positional constructor fn
    • no
    • replicates the weakness of existing constructors
  • kv constructor fn
    • open questions
      • autogenerated for all defrecords?
      • optional?
      • conveniences (defaults, etc.)
        • no

Tentative Proposal 1:

Define a k/v syntax for read and write that does not require a factory fn.

  • adopt the existing print syntax as legal read syntax?
    • "#:user.P{:x 1, :y 2}"
  • get Rich's input on efficient reader approach (4 possibilities listed above)
  • if reader defrecord fields are different, merge and move on
  • Undecided: if record class not loaded:
    • TBD: error or make a plain ol map?
    • hm, could fix on writer side: option to dumb records down to maps?

Tentative Proposal 2:

Autogenerate a k/v factory fn for all defrecords.

  • (new-foo :x 1 :y 2)
  • class constructor is an interop detail
  • factory fn is the Clojure way
  • people can build their own defaults, validation, etc. easily with macros, given this

Some history:

The record multimethod was almost ready to go when Rich raised the GC issue. What happens when somebody creates a ton of record classes over time? GC can collect records that are not longer in use, but doesn't clean up the old multimethod functions.

Additional Reading

Some (non-contributed) code that demonstrates people's need for this:

Labels:
  1. Jan 04, 2011

    I have no idea what is being proposed from this, nor what else was considered, nor what the tradeoffs are.

    1. Jan 05, 2011

      Worse than that, the proposal wouldn't work even if we had made it readable. Sleeping mind thinks this updated proposal would work. There are two questions waiting your input:

      1. Is the double-weak-map approach the simplest thing that can work? It is the simplest thing I have thought of.
      2. Do we need to worry about the code-execution loophole of requiring #= for print/readable records? I would rather have something that kept serialization more separate from arbitrary execution of code.
  2. Jan 19, 2011

    It's hard for me to tell what the state of this doc is but we have used records extensively and added support for a number of features.  Consider this an experience report from the field and take from it what you find interesting / useful.  

    1. Record printing
      1. Records that print to a form that can eval back to the original record.  One small example where this comes up is a test that prints expected results which you expect to copy and paste back in as the actual results. 
      2. Support for records in pprint. 
      3. We have found it useful to omit nil fields when printing records. We tried this as an experiment and have been happy with it but I can understand why that would be undesirable in core.
    2. Constructor/factory functions
      1. We have found that it is difficult to write and maintain code that constructs records with the current positional constructor for records with more than 2-3 fields.  
      2. An automatically created and named constructor function built into defrecord would be very useful.  Our solution to this takes records like FooBar and creates new-foo-bar.
      3. Support for default values would be handy but we have not found it to be essential and have not implemented it.  It seems from examples on the web that this is a popular feature people have implemented.
      4. Because factory functions that take maps of fields introduces the possibility of using almost-right keys, our factory functions only allow you to use the record field names; extra fields are not allowed in that factory function.  You must use assoc to add non-specified fields.  This was an experiment.  After a year of use, we have found that it regularly catches typos and has been inconvenient only a handful of times.  YMMV.
      5. While assoc is the standard way to create a record based on another record instance, we have found it useful to create a constructor that takes a prototype record and then applies the field validation check in 2d.
      6. Having a map factory function has been helpful simplifying ns's as you do not need to both require and import from an external ns.  
    3. Generic record support - we have largely run into these in generic macros and library code that deals frequently with records of many types
      1. Checking whether something is a record with a record? function (maybe using an IRecord marker)
      2. Universal factory function that takes a record class and an initialization map to create any record.
    4. Integration with libraries across the land
      1. clojure.walk - we added this during construction but are not using it much
      2. clojure.zip - we added this and are using it a lot. I'd have to go back and check but I'm pretty sure we used the universal factory function in this impl.
      3. many other external libs that are not of importance here :)
  3. Apr 06, 2011

    I wrote up my thoughts and questions on the proposal here: http://david-mcneil.com/post/4403345585/defrecord-improvements-feedback It echoes much of what Alex said above.

    1. Apr 06, 2011

      Thanks for writing this up. Responses to a few of your items:

      4. We are worrying about a print form that will be readable, so omitting the namespace is not an option. There can be other print formats, of course.

      5. The universal constructor may happen later, but not in the scope of the smallest shippable improvement.

      6. Agreed, we should have a record? predicate.

      8, 9. Automatic multimethod participation is tricky to do generally in a way that is performant but also class-loader and modularity friendly. Do you have working code that covers this?

      1. May 07, 2011

        > We are worrying about a print form that will be readable, so omitting the namespace is not an option. There can be other print formats, of course.

        Hmm... "not an option", but "there can be other print formats"... I wasn't asking for this to be the default, but rather I was asking for an option to exclude the namespace. Seems like maybe this could be another print format? From my experience using records intensively on real code this is quite valuable when debugging and writing test code that uses trees of records.

        > Automatic multimethod participation is tricky to do generally in a way that is performant but also class-loader and modularity friendly. Do you have working code that covers this?

        Yes, https://github.com/david-mcneil/defrecord2

        Thanks for the response (sorry for my delayed response).

        -David

        1. May 12, 2011

          quite valuable when debugging and writing test code that uses trees of records.

          You might find the (->R ...) to be more succinct and likewise more flexible in those cases.

          Yes, https://github.com/david-mcneil/defrecord2

          Sweet! I can't wait to look at your code more deeply.

          Thanks
          :F

          1. May 16, 2011

            > You might find the (->R ...) to be more succinct and likewise more flexible

            I don't know what "->R" is and google was not able to help. Or was that a typo?

            -David

  4. May 07, 2011

    A few questions relating to the current patch on CLJ-374:

    Why does CtorReader eval its arguments? The implementation of CtorReader.resolve parallels the implementation of EvalReader.invoke, except it (understandably) doesn't perform a *read-eval* check. Is there a reason non-literal arguments to a record constructor should not need to use the eval reader macro?

    Records can be safely instantiated in the reader since we control their constructor implementation, but that's not necessarily true of other classes. Currently the CtorReader will, using the #myns.MyRecord[arg] positional format, instantiate any class, e.g., #java.util.Date[0]. Is that openness of the reader intentional, given that it is not guarded by *read-eval*?

    Assuming the above is acceptable, once the data structure from the reader is passed to the compiler, any class it doesn't recognize is emitted as a ConstantExpr. Is that always appropriate?


    My sense is that CtorReader should be restricted to instantiating instances of the IRecord marker interface, and that it should, like all other non-EvalReader readers, treat its arguments as literal clojure data structures, leaving the work of evaluating arguments to the eval reader macro as needed.

    1. May 12, 2011

      Hi Alex,

      Thanks for the questions. They were mostly targeted at a previous version of the patch, but I will try to address them the best that I can.

      Why does CtorReader eval its arguments?

      Currently it does not:

      (defrecord R [a])
      #user.R[(+ 1 2 3)]
      ; IllegalArgumentException Constructor literal can only contain constants or statics.
      
      #user.R[{:foo (str :bar)}]
      ;=> #user.R{:a {:foo (str :bar)}}
      

      Currently the CtorReader will, using the #myns.MyRecordarg positional format, instantiate any class...

      Yes it will, but that's not the end of the story. First, the #foo.bar.Klass... reader form will attempt to call a ctor for the class Klass but it will only get you so far.

      #java.lang.String["foo"]
      ;=> "foo"
      
      (keyword #java.lang.String["foo"])
      ;=> :foo
      
      (import 'java.util.Date)
      #java.util.Date[10101001]
      #<Date Wed Dec 31 21:48:21 EST 1969>
      
      (bean #java.util.Date[10101001])
      ; CompilerException java.lang.RuntimeException: Can't embed object in code, maybe print-dup not defined
      

      For objects that have print-dup definitions we can embed them in other Clojure forms – the compiler is happy. For those that do not we either need to define print-dup for them, or use some other method. For arbitrary Java classes we can not assume that a call to its constructor results in a fully constructed object. For records and types we, as you say, have control over their construction and can make different assumptions.

      any class it doesn't recognize is emitted as a ConstantExpr. Is that always appropriate?

      I'm not sure what you mean. Do you mind rephrasing?

      Thanks again.
      :F

      1. May 14, 2011

        Note that some of this might be better directed at Stu, assuming he was the one driving these changes.

        Currently it does not [eval its arguments]

        Most of the eval'ing was removed, but it does try to eval symbol args as classes, and only as classes:

        user=> (defrecord R [x])
        user.R
        user=> (print-dup-string (R. 'a))
        "#user.R[a]"
        user=> (read-string *1)
        IllegalArgumentException Constructor literal can only contain constants or statics. a does not name a known class.  clojure.lang.LispReader$CtorReader.resolve (LispReader.java:1206)
        

        This is interesting behaviour considering classes are print-dup'd with the eval reader macro:

        user=> (print-dup-string (R. String))
        "#user.R[#=java.lang.String]"
        

        Which raises the question of what would be sending something like #user.R[java.lang.String] to the reader. Is this (now committed to master) functionality intended to enable reading of print-dup'd records or allow humans an alternate way to type record instances?

        For objects that have print-dup definitions we can embed them in other Clojure forms – the compiler is happy.

        If the literal notation is intended to be emitted by print-dup for non-records, then that's not true, as the (bean #java.util.Date[10101001]) example shows. The preceding assumes someone wrote a print-dup method for Date, emitting that notation. If that is not correct, then what is the purpose of the constructor literal for non-records?


        Finally, it's not clear to me whether the "EvalRead is not a fix" requirement is meant to apply just to the record or to its arguments as well. The latter seems unlikely as the set of readable-without-eval types is outside the scope of this change (records excepted). If the former, then this all seems to be sugar for calling the record constructor while avoiding #=. If that's the case, then why not allow the reader to read the literal notation and emit a form that the compiler can then process? E.g., reading a string "#myns.ARec[#myns.BRec[5]]" and return a clojure data structure of (new myns.ARec (new myns.BRec 5)) which will then be passed to the compiler.

        Though perhaps there is a desire for record values to be emitted as constants rather than as a call to a constructor. If so, then the process would need to parallel that for maps, namely that the reader creates some data structure (not an instance of the specific record class) which is passed to the compiler, which in turn checks if the arguments are all LiteralExpr before emitting as a ConstantExpr, otherwise a runtime call is made.

        1. May 16, 2011

          IllegalArgumentException Constructor literal...

          Hi Alex,

          This was a result of my attempting to be too clever with the Reader and will be fixed in the next release.

          I'll read your other questions more closely and respond post-haste.
          :F

  5. May 16, 2011

    The Writing records section seems wrong. We should never be printing something that can't be read. Printing factory fn calls would require evaluation to restore. I.e. these should always print #something...

    Also, deftype behavior needs to be spelled out in all cases.

    1. May 16, 2011

      Indeed you're right and #something is what the implementation does currently.  I will bring the text up to date regarding the Writing and the deftype behavior.