<< Back to previous view

[CLJ-1100] Reader literals cannot contain periods Created: 02/Nov/12  Updated: 13/May/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.7

Type: Defect Priority: Minor
Reporter: Kevin Lynagh Assignee: Steve Miner
Resolution: Unresolved Votes: 1
Labels: reader

Attachments: Text File CLJ-1100-reader-tags-with-periods.patch     Text File clj-1100-v2.patch    
Patch: Code and Test
Approval: Screened


The reader tries to read a record instead of a literal if the tag contains periods.

user> (binding [*data-readers* {'foo/bar #'identity}] (read-string "#foo/bar 1"))
user> (binding [*data-readers* {'foo/bar.x #'identity}] (read-string "#foo/bar.x 1"))
ClassNotFoundException foo/bar.x  java.lang.Class.forName0 (Class.java:-2)

Summary of reader forms:

Kind Example Constraint Status
Record #user.Foo[1] record class name OK
Class #java.lang.String["abc"] class name OK
Clojure reader tag #uuid "c48d7d6e-f3bb-425a-abc5-44bd014a511d" not a class name, no "/" OK
Library reader tag #my/card "5H" not a class name, has "/" OK
  #my.ns/card "5H" not a class name, has "/" OK
  #my/playing.card "5H" not a class name, has "/" BROKEN - read as record

Note: reader tags should not be allowed to override the record reader.

Cause: In LispReader, CtorReader.invoke() decides between record and tagged literal based on whether the tag has a ".".

Proposed: Change the discriminator in CtorReader by doing more string inspection:

  • If name has a "/" -> readTagged (not a legal class name)
  • If name has no "/" or "." -> readTagged (records must have qualified names)
  • Else -> readRecord (also covers Java classes)

Tradeoffs: Clojure-defined data reader tags must not contain periods. Not possible to read a Java class with no package. Avoids unnecessary class loading/construction for all tags.

Patch: CLJ-1100-v2.patch

Screened by: Alex Miller

Alternatives considered:

Using class checks:

  • Attempt readRecord (also covers Java classes)
  • If failed, attempt readTagged

Tradeoffs: Clojure tags could not override Java/record constructors - not sure that's something we'd ever want to do, but this would cut that off. This alternative may attempt classloading when it would not have before.

Comment by Steve Miner [ 06/Nov/12 9:41 AM ]

The suggested patch (clj-1100-reader-literal-periods.patch) will break reading records when *default-data-reader-fn* is set. Try adding a test like this:

(deftest tags-containing-periods-with-default
      ;; we need a predefined record for this test so we (mis)use clojure.reflect.Field for convenience
      (let [v "#clojure.reflect.Field{:name \"fake\" :type :fake :declaring-class \"Fake\" :flags nil}"]
        (binding [*default-data-reader-fn* nil]
          (is (= (read-string v) #clojure.reflect.Field{:name "fake" :type :fake :declaring-class "Fake" :flags nil})))
        (binding [*default-data-reader-fn* (fn [tag val] (assoc val :meaning 42))]
          (is (= (read-string v) #clojure.reflect.Field{:name "fake" :type :fake :declaring-class "Fake" :flags nil})))))
Comment by Rich Hickey [ 29/Nov/12 9:36 AM ]

The problem assessment is ok, but the resolution approach may not be. What happens should be based not upon what is in data-readers but whether or not the name names a class.

Is the intent here to allow readers to circumvent records? I'm not in favor of that.

Comment by Steve Miner [ 29/Nov/12 4:01 PM ]

New patch following Rich's comments. The decision to read a record is now based on the symbol containing periods and not having a namespace. Otherwise, it is considered a data reader tag. User
defined tags are required to be qualified but they may now have periods in the name. Tests added to show that
data readers cannot override record classes. Note: Clojure-defined data reader tags may be unqualified, but they should not contain periods in order to avoid confusion with record classes.

Comment by Steve Miner [ 29/Nov/12 4:17 PM ]

I deleted my old patch and some comments referring to it to avoid confusion.

In Clojure 1.5 beta 1, # followed by a qualified symbol with a period in the name is considered a record and causes an exception for the missing record class. With the patch, only non-qualified symbols containing periods are considered records. That allows user-defined qualified symbols with periods in their names to be used as data reader tags.

Comment by Andy Fingerhut [ 07/Feb/13 9:05 AM ]

clj-1100-periods-in-data-reader-tags-patch-v2.txt dated Feb 7 2013 is identical to CLJ-1100-periods-in-data-reader-tags.patch dated Nov 29 2012, except it applies cleanly to latest master. The only change appears to be in some white space in the context lines.

Comment by Andy Fingerhut [ 07/Feb/13 12:53 PM ]

I've removed clj-1100-periods-in-data-reader-tags-patch-v2.txt mentioned in the previous comment, because I learned that CLJ-1100-periods-in-data-reader-tags.patch dated Nov 29 2012 applies cleanly to latest master and passes all tests if you use this command to apply it.

% git am --keep-cr -s --ignore-whitespace < CLJ-1100-periods-in-data-reader-tags.patch

I've already updated the JIRA workflow and screening patches wiki pages to mention this --ignore-whitespace option.

Comment by Andy Fingerhut [ 13/Feb/13 11:31 AM ]

Both of the current patches, CLJ-1100-periods-in-data-reader-tags.patch dated Nov 29 2012, and clj-1100-reader-literal-periods.patch dated Nov 6 2012, fail to apply cleanly to latest master (1.5.0-RC15) as of today, although they did last week. Given all of the changes around read / read-string and edn recently, they should probably be evaluated by their authors to see how they should be updated.

Comment by Steve Miner [ 14/Feb/13 12:23 PM ]

I deleted my patch: CLJ-1100-periods-in-data-reader-tags.patch. clj-1100-reader-literal-periods.patch is clearly wrong, but the original author or an administrator has to delete that.

Comment by Kevin Lynagh [ 14/Feb/13 1:28 PM ]

I cannot figure out how to remove my attachment (clj-1100-reader-literal-periods.patch) in JIRA.

Comment by Steve Miner [ 14/Feb/13 1:43 PM ]

Downarrow (popup) menu to the right of the "Attachments" section. Choose "manager attachments".

Comment by Kevin Lynagh [ 14/Feb/13 2:02 PM ]

Great, thanks Steve. Are you going to take another pass at this issue, or should I give it a go?

Comment by Steve Miner [ 14/Feb/13 3:04 PM ]

Kevin, I'm not planning to work on this right now as 1.5 is pretty much done. It might be worthwhile discussing the issue a bit on the dev mailing list before working on a patch, but that's up to you. I think my approach was correct, although now changes would have to be applied to both LispReader and EdnReader.

Comment by Alex Miller [ 09/Apr/14 10:29 AM ]

Updated description based on my understanding.

Comment by Steve Miner [ 22/Apr/14 3:30 PM ]

I will resurrect my old patch and update it for the changes since 1.5.

Comment by Steve Miner [ 28/Apr/14 8:21 AM ]

Added patch to allow reader tags to have periods, but only with a namespace. Added tests to confirm that it works, but does not allow overriding a record name with a data-reader.

Comment by Steve Miner [ 28/Apr/14 8:51 AM ]

The patch implements Alex's alternative 1. It's purely lexical. A tag symbol without a namespace and containing periods is handled as a record (Java class). Otherwise, it's a data-reader tag. Of course, unqualified symbols without periods are still data-reader tags.

IMHO, a Java class without a package is a pathological case which Clojure doesn't need to worry about. This patch follows the convention that Java classes are named by unqualified symbols containing dots.

I did try alternative 2, testing for an actual class, but the implementation was more complicated. Also, it would open the possibility of breaking working code by adding a record or Java class that accidentally collided with an unqualified dotted tag that had previously worked fine. It's better to follow a simple rule that unqualified dotted symbols always refer to classes. Maybe the class doesn't actually exist, but that doesn't mean the symbol might be a data-literal tag.

Comment by Alex Miller [ 13/May/14 4:49 PM ]

Added clj-1100-v2.patch - identical, just removes whitespace to simplify change.

Generated at Mon Jul 28 03:40:05 CDT 2014 using JIRA 4.4#649-r158309.