Skip to end of metadata
Go to start of metadata

Java collection types currently print as

#<ClassName {toString rep}>

As a result, their contents are not subject to control via Clojure's flags:

*print-length*, *print-level*

This is a big pain when working with big Java data at the REPL. I propose that, when their print forms are not overridden in derived classes, objects that implement the core Java collection interfaces print

  • with the same rep as Clojure's persistent collections
  • subject to the various print binding flags

Not in scope

  • Per Rich's comment in the discussion, Java collections should not expose their type in print rep, so an array list should print as e.g.  [1 2] like any other sequential collection. 
  • print-dup 
  • Print/read roundtripping to a specific concrete type
Labels:
  1. Aug 27, 2012

    I don't think the display matters nearly as much as it's ability to follow the rules. Obeying the rules would be a great change. If there were changes made to how the structure was printed back, I would want to know that it was a Java object though. I wouldn't want to accidentally assume the guarantees of a vector when in fact I was holding an array. I think that it would be painfully obvious when the program executed, but keeping things visually distinct doesn't  seem like a bad idea.

  2. Aug 27, 2012

    I don't think it's a good idea to hide the Java-type. It obscures important information in the REPL and during debugging. It also creates asymmetric printing/reading, where the existing <#...> format would throw an error to protect you from such a mistake.

    What about utilizing tagged literals? "#java.util.ArrayList [1 2]"

  3. Aug 27, 2012

    Tagged literals can't have dots in their tag names, because then the reader assumes it's a record/type literal. You could make up some other tag like clojure.java/array-list.

    But I agree with Aaron that it's more important to obey the printing rules.

    1. Aug 28, 2012

      Is there an inherent ambiguity here? Or is this just an artifact of the current implementation?

      Here's the printed (and readable!) form for a Record:

      I'm not sure why we couldn't also support:

      The Tagged Literals page mentions ambiguity with record constructors, but doesn't demonstrate it. Could you elaborate?

      1. Aug 28, 2012

        Having the reader know to how to make mutable, platform-specific data structures (hereafter "host literals") is separate from the objective of this proposal, but is certainly worth considering while we are here.

        That said, I don't want host literals (at least not as part of this more limited proposal). Host literals don't solve the problem at hand, as we have no guarantee that we can make them. Your example works because we know about ArrayList, but what about List implementations not in the JDK? The list interface does not promise a one-arg constructor form for all its concrete types.

        If we are going to do host literals, I would argue for doing them as a separate proposal, and doing the print side only when print-dup is on. We would need to consider the syntax you suggest, but also perhaps a categoric syntax, e.g.

        #native.MutableList[]

        which could be portable across Clojure dialects. 

        1. Aug 28, 2012

          > platform-specific data structures (hereafter "host literals") is separate from the objective of this proposal, but is certainly worth considering while we are here.

          After thinking about this a bit more, I fully agree, but allow me to elaborate on why, so that we can make a fully informed decision about print control and, in particular, the question of including type information in the printed output.

          > The list interface does not promise a one-arg constructor form for all its concrete types.

          It's true: I often forget that constructors are just less-flexible functions that don't place nice with interfaces/protocols. I try to avoid the Java-ism of factory interfaces whenever I can. However, even if an interface could promise a constructor, you wouldn't want to go willy-nilly calling constructors by interface. That would be a security hole. I could define a class which implements that factory interface, but runs arbitrary code during construction. The reader should be considered safe to use in pretty much all situations, unless you explicitly turn this off. Hence, I why I think about tagged literals like an eval-reader whitelist.

          For "host literals", I'm proposing a small extension of the tagged literal reader. Right now, we have symbol -> factory function where the symbol is restricted to foo.bar/baz and not foo.bar.Baz but we could relax that constraint (assuming this can co-exist with record literals). Each host could install factory functions for known types (and extend-protocol for printing). As you suggested, we could also have some agreed upon native.MutableList sort of thing for portability if appropriate. Record literals could just be tagged literals for whom their symbol and factory function pair get installed automatically on defrecord.

      2. Aug 29, 2012

        It's an implementation detail. The 1.4 reader looks for dots in a tag name as an indication that it's a record or class constructor. 

        1. Aug 29, 2012

          That's what I thought. Good news!

          Maybe for 1.5, we just change that implementation detail, such that tagged literals and record forms are unified. defrecord can install a tagged reader and we'll have the option of installing tagged readers for host types as well. There there is no more concept of a "host literal", it's just a normal tagged literal. Installing read/print support for a host type would just be a matter of a macro which simultaneously extends printable and installs a tagged literal.

          That makes host literals a solved problem in my mind. The print implementations for host literals can obey the print control vars. That moves the discussion back to what to print for things that implement java.util.List, for example, but don't have a tagged literal: 1) discard reader 2) unreadable form 3) metadata 4) something else?

          My intuition is to default to #2, but always print the type and obey the print control parameters. There should be a var to switch to #3. Alternatively, #3 could be the default with the toggle back to #2.

  4. Aug 28, 2012

    There are two reasons to want the Java type:

    1. The (programmatic) reader could know about the type, in order to rehydrate that type. This case is already handled by binding print-dup.
    2. The (human) reader wants to know about the type. One way to do that would be to emit the type inside a discard reader macro, e.g.  
      #_java.util.ArrayList [1 2 3]
    I like the discard reader approach. It will produce data that can be read back, which I think is a good thing. Brandon, why do you want to break the reader in such cases?
    1. Aug 28, 2012

      print-dup depends on the #= eval reader form, which is a giant gaping security hole for many use cases. I view tagged literals as a sort of whitelist of functions that can safely be used to rehydrate types. print-dup also requires you to activate it explicitly, which defeats the goal of improving REPL sessions.

      Assuming there isn't an insurmountable ambiguity and tagged literals, it seems preferable to have the printed data be useful to both humans and the programmatic reader. Whether or not you want printing to be precisely reversible depends on your use case. Sometimes you want to send data across the wire, so you don't care if the recipient uses the same type as you. In such a case, maybe you could output tag metadata ^java.util.ArrayList [1 2 3] ? That way both the human reader and the programmatic reader have access, but the programmatic reader ignores it by default. In other cases, you do want precise reversibility, which is what I guess print-dup is for. It seems like as-reversible-as-possible is the right default, throwing an error if that's not achievable (hence, the unreadable dispatch macro's behavior). When you get such an error, you're prompted to make a decision about what behavior your want. Printing the type as discarded seems like you're setting yourself up for a subtle bug later when you realize you depended on the exact type of your data. Maybe print-dup being a boolean is too simple and we need a little extra configuration. At the very least, there should be a secure way to rehydrate platform types.

      1. Aug 28, 2012

        Can you elaborate on how the discarded type info can cause a subtle bug? It seems the opposite to me – since it is discarded, there is no way you can depend on it. And since the following data structure is ordinary, there is no way you can accidentally depend on host-isms.

        1. Aug 28, 2012

          I'd need to think more deeply about the common use cases of print and read, but my expectation is that, being inverses of each other, print and read should hold the properties of functional inverses. Specifically, I expect (= x (read-str (print-string x))); that is (comp read-string pr-str) is equivalent to identity. Unfortunately, that's not the case in practice because print and read are implemented separately and there exists printable forms that are not currently readable. In mathematical terms, read-str is a partial inverse of pr-str for the domain of readable forms. Also unfortunately, printing is not currently an injective function if you consider type; there are multiple values in the domain that produce the same value in the co-domain (eg. hash-map and array-map both print the same). This is where print-dup comes in: it makes printing injective. print and read are bijective for the domain of Clojure objects, if you ignore type, and bijective with respect to type if print-dup enabled. It seems to me to be a bug for read to produce a value that violates the properties of a functional inverse. However, again, I don't have a strong dependency on this behavior in any of my code, it's just my expectations from the math universe. If real use cases would be better served by not being true inverses, then we should explore concrete examples. Otherwise, it seems like a non bugged implementation would assume that read is a partial inverse of print and you get an invalid argument error of sorts when you try to read something outside of its domain. My proposal to extend the tagged literal machinery comes from the motivation of expanding the domain of the reader to be a true inverse over a greater set of values.

          Sorry about the run-on sentences and mathematical spew. That was just sort of a brain dump.

          Despite the rambling, I don't really care one way or another, as long as the data is available to read somehow. Hence I prefer printing with metadata over the discard reader. This way it's ignored by default, but available (ie not discarded) if necessary.

  5. Sep 06, 2012

    I don't think any of the arguments for including type information hold up. If you want to see types, use print-dup. If you don't like that print-dup uses #= for Java collections, propose a patch for that, e.g. via tags #java.util.ArrayList [1 2 3]. If you want to see the class of something, call class. Using metadata overloads that space (is it real metadata or type metadata? will have to be checked everywhere)

    Simple print and read are about facilitating communicating information with a minimum of implementation specificity - that's why sorted/hashed Clojure collections are undistinguished.

    If we are going to start widening the scope of the use of Clojure data for communicating information between langs (which is good for Clojure), we'd better minimize specificity. We are going to be advocating that other langs just send us their collections plainly - let's start with Java.