<< Back to previous view

[CLJ-1146] Symbol name starting with digits to defn throws "Unmatched delimiter )" Created: 13/Jan/13  Updated: 13/Jan/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Trivial
Reporter: Linus Ericsson Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: reader
Environment:

$java -jar clojure-1.5.0-RC2.jar

$java -version
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06-434-10M3909)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01-434, mixed mode)
Mac OS X:
System Version: Mac OS X 10.6.8 (10K549)
Kernel Version: Darwin 10.8.0



 Description   

When trying to use an invalid symbol name when defining a function, the error message thrown is a confusing and wrong one. The error message is "RuntimeException Unmatched delimiter: ) clojure.lang.Util.runtimeException (Util.java:219)", which unfortunately is the only message seen in nrepled emacs.

$ java -jar clojure-1.5.0-RC2.jar
Clojure 1.5.0-RC2
user=> (defn 45fn [] nil)
NumberFormatException Invalid number: 45fn clojure.lang.LispReader.readNumber (LispReader.java:255)
[]
nil
RuntimeException Unmatched delimiter: ) clojure.lang.Util.runtimeException (Util.java:219)

Expected:
When trying to (defn or (def a thing with a non valid symbol name, the last thrown error message should be one stating that the given symbol name is not a valid one.






[CLJ-1138] data-reader returning nil causes exception Created: 22/Dec/12  Updated: 14/Feb/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4, Release 1.5
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Steve Miner Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: reader
Environment:

clojure 1.5 beta2, Mac OS X 10.8.2, java version "1.6.0_37"



 Description   

If a data-reader returns nil, the reader throws java.lang.RuntimeException: No dispatch macro... The error message implies that there is no dispatch macro for whatever the first character of the tag happens to be.

Here's a simple example:

(binding [*data-readers* {'f/ignore (constantly nil)}] (read-string "#f/ignore 42 10"))

RuntimeException No dispatch macro for: f clojure.lang.Util.runtimeException (Util.java:219)



 Comments   
Comment by Steve Miner [ 22/Dec/12 9:43 AM ]

clj-1138-allow-data-reader-to-return-nil-instead-of-throwing.patch allows a data-reader to return nil instead of throwing. Does sanity check that possible tag or record isJavaIdentifierStart(). Gives better error message for special characters that might actually be dispatch macros (rather than assuming it's a tagged literal).

Comment by Steve Miner [ 22/Dec/12 10:06 AM ]

clj-1138-data-reader-return-nil-for-no-op.patch allows a data-reader returning nil to be treated as a no-op by the reader (like #_). nil is not normally a useful value (actually it causes an exception in Clojure 1.4 through 1.5 beta2) for a data-reader to return. With this patch, one could get something like a conditional feature reader using data-readers.

Comment by Steve Miner [ 22/Dec/12 10:26 AM ]

clj-1138-allow-data-reader-to-return-nil-instead-of-throwing.patch is the first patch to consider. It merely allows nil as a value from a data-reader and returns nil as the final value. I think it does what was originally intended for dispatch macros, and gives a better error message in many cases (mostly typos).

The second patch, clj-1138-data-reader-return-nil-for-no-op.patch, depends on the other being applied first. It takes an extra step to treat a nil value returned from a data-reader as a no-op for the reader (like #_).

Comment by Steve Miner [ 23/Dec/12 11:52 AM ]

It turns out that you can work around the original problem by having your data-reader return '(quote nil) instead of plain nil. That expression conveniently evaluates to nil so you can get a nil if necessary. This also works after applying the patches so there's still a way to return nil if you really want it.

(binding [*data-readers* {'x/nil (constantly '(quote nil))}] (read-string "#x/nil 42"))
;=> (quote nil)

Comment by Andy Fingerhut [ 07/Feb/13 9:20 AM ]

Patch clj-1138-allow-data-reader-to-return-nil-instead-of-throwing.patch dated Dec 22 2012 still applies cleanly to latest master if you use the following command:

% git am --keep-cr -s --ignore-whitespace < clj-1138-allow-data-reader-to-return-nil-instead-of-throwing.patch

Without the --ignore-whitespace option, the patch fails only because some whitespace was changed in Clojure master recently.

Comment by Andy Fingerhut [ 13/Feb/13 11:24 AM ]

OK, now with latest master (1.5.0-RC15 at this time), patch clj-1138-allow-data-reader-to-return-nil-instead-of-throwing.patch no longer applies cleanly, not even using --ignore-whitespace in the 'git am' command given above. Steve, if you could see what needs to be updated, that would be great. Using the patch command as suggested in the "Updating stale patches" section of http://dev.clojure.org/display/design/JIRA+workflow wasn't enough, so it should probably be carefully examined by hand to see what needs updating.

Comment by Steve Miner [ 14/Feb/13 12:21 PM ]

I removed my patches. Things have changes recently with the LispReader and new EdnReader.





[CLJ-1100] Reader literals cannot contain periods Created: 02/Nov/12  Updated: 14/Feb/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Kevin Lynagh Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: reader

Approval: Vetted

 Description   

The LispReader tries to read a record instead of a literal if the tag contains periods:

https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LispReader.java#L1171

Which effectively means that reader tags cannot contain periods.
The EDN spec is unclear on this:

edn supports extensibility through a simple mechanism. # followed immediately by a symbol starting with an alphabetic character indicates that that symbol is a tag.

(issue opened: https://github.com/edn-format/edn/issues/39)

If periods are allowed, then the LispReader should first check to see if the tag is in *data-readers* and only then if not try to initialize a Java class.

I'm happy to write the patch if this behavior is what is desired.



 Comments   
Comment by Steve Miner [ 06/Nov/12 9:41 AM ]

The suggested patch (clj-1100-reader-literal-periods.patch) will break reading records when *default-data-reader-fn* is set. Try adding a test like this:

(deftest tags-containing-periods-with-default
      ;; we need a predefined record for this test so we (mis)use clojure.reflect.Field for convenience
      (let [v "#clojure.reflect.Field{:name \"fake\" :type :fake :declaring-class \"Fake\" :flags nil}"]
        (binding [*default-data-reader-fn* nil]
          (is (= (read-string v) #clojure.reflect.Field{:name "fake" :type :fake :declaring-class "Fake" :flags nil})))
        (binding [*default-data-reader-fn* (fn [tag val] (assoc val :meaning 42))]
          (is (= (read-string v) #clojure.reflect.Field{:name "fake" :type :fake :declaring-class "Fake" :flags nil})))))
Comment by Rich Hickey [ 29/Nov/12 9:36 AM ]

The problem assessment is ok, but the resolution approach may not be. What happens should be based not upon what is in data-readers but whether or not the name names a class.

Is the intent here to allow readers to circumvent records? I'm not in favor of that.

Comment by Steve Miner [ 29/Nov/12 4:01 PM ]

New patch following Rich's comments. The decision to read a record is now based on the symbol containing periods and not having a namespace. Otherwise, it is considered a data reader tag. User
defined tags are required to be qualified but they may now have periods in the name. Tests added to show that
data readers cannot override record classes. Note: Clojure-defined data reader tags may be unqualified, but they should not contain periods in order to avoid confusion with record classes.

Comment by Steve Miner [ 29/Nov/12 4:17 PM ]

I deleted my old patch and some comments referring to it to avoid confusion.

In Clojure 1.5 beta 1, # followed by a qualified symbol with a period in the name is considered a record and causes an exception for the missing record class. With the patch, only non-qualified symbols containing periods are considered records. That allows user-defined qualified symbols with periods in their names to be used as data reader tags.

Comment by Andy Fingerhut [ 07/Feb/13 9:05 AM ]

clj-1100-periods-in-data-reader-tags-patch-v2.txt dated Feb 7 2013 is identical to CLJ-1100-periods-in-data-reader-tags.patch dated Nov 29 2012, except it applies cleanly to latest master. The only change appears to be in some white space in the context lines.

Comment by Andy Fingerhut [ 07/Feb/13 12:53 PM ]

I've removed clj-1100-periods-in-data-reader-tags-patch-v2.txt mentioned in the previous comment, because I learned that CLJ-1100-periods-in-data-reader-tags.patch dated Nov 29 2012 applies cleanly to latest master and passes all tests if you use this command to apply it.

% git am --keep-cr -s --ignore-whitespace < CLJ-1100-periods-in-data-reader-tags.patch

I've already updated the JIRA workflow and screening patches wiki pages to mention this --ignore-whitespace option.

Comment by Andy Fingerhut [ 13/Feb/13 11:31 AM ]

Both of the current patches, CLJ-1100-periods-in-data-reader-tags.patch dated Nov 29 2012, and clj-1100-reader-literal-periods.patch dated Nov 6 2012, fail to apply cleanly to latest master (1.5.0-RC15) as of today, although they did last week. Given all of the changes around read / read-string and edn recently, they should probably be evaluated by their authors to see how they should be updated.

Comment by Steve Miner [ 14/Feb/13 12:23 PM ]

I deleted my patch: CLJ-1100-periods-in-data-reader-tags.patch. clj-1100-reader-literal-periods.patch is clearly wrong, but the original author or an administrator has to delete that.

Comment by Kevin Lynagh [ 14/Feb/13 1:28 PM ]

I cannot figure out how to remove my attachment (clj-1100-reader-literal-periods.patch) in JIRA.

Comment by Steve Miner [ 14/Feb/13 1:43 PM ]

Downarrow (popup) menu to the right of the "Attachments" section. Choose "manager attachments".

Comment by Kevin Lynagh [ 14/Feb/13 2:02 PM ]

Great, thanks Steve. Are you going to take another pass at this issue, or should I give it a go?

Comment by Steve Miner [ 14/Feb/13 3:04 PM ]

Kevin, I'm not planning to work on this right now as 1.5 is pretty much done. It might be worthwhile discussing the issue a bit on the dev mailing list before working on a patch, but that's up to you. I think my approach was correct, although now changes would have to be applied to both LispReader and EdnReader.





[CLJ-1033] pr-str and read-string don't handle @ symbols inside keywords properly Created: 26/Jul/12  Updated: 13/Jan/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Steven Ruppert Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: reader
Environment:

Ubuntu 12.04 LTS; Java 1.7.0_05 Java HotSpot(TM) Client VM


Approval: Vetted

 Description   
user=> (read-string (pr-str {(keyword "key@other") :stuff}))
RuntimeException Map literal must contain an even number of forms  clojure.lang.Util.runtimeException (Util.java:170)

pr-str emits "{:key@other :stuff}", which read-string fails to interpret correctly. Either pr-str needs to escape the @ symbol, or read-string needs to handle the symbol inside a keyword.

Background: I'm passing a map with email addresses as keys through Storm bolts, which require a thrift-serializable form. Using the pr-str/read-string combo fails on these keys, so I've fallen back to JSON serialization.



 Comments   
Comment by Stuart Halloway [ 10/Aug/12 12:51 PM ]

The '@' character is not a legal character for keywords or symbols (see http://clojure.org/reader). Recategorized as enhancement request.

Comment by Steven Ruppert [ 10/Aug/12 1:04 PM ]

Then why doesn't (keyword "keywith@") throw an exception? It seems inconsistent with your statement.

Comment by Andy Fingerhut [ 13/Sep/12 2:23 PM ]

It is a long standing property of Clojure that it does not throw exceptions for everything that is illegal.

Comment by Steven Ruppert [ 17/Sep/12 2:16 PM ]

Yeah, but read-string clearly does. Is there a good reason that the "keyword" function can't throw an exception? With the other special rules on namespaces within symbol names, the "keyword" function really should be doing validation.

Another solution would be to allow a ruby-like :"symbol with disallowed characters" literal, but that would also be confusing with how the namespace is handled.

https://groups.google.com/forum/?fromgroups=#!topic/clojure/Ct5v9w0yNAE has some older discussion on this topic.

Comment by Andy Fingerhut [ 17/Sep/12 7:43 PM ]

Disclaimer: I'm not a Clojure/core member, just an interested contributor who doesn't know all the design decisions that were made here.

Steven, I think perhaps a couple of concerns are: (1) doing such checks would be slower than not doing them, and (2) implementing such checks means having to update them if/when the rules for legal symbols, keywords, namespace names, etc. change.

Would you be interested in writing strict versions of functions like symbol and keyword for addition to a contrib library? And test suites that try to hit a significant number of corner cases in the rules for what is legal and what is not? I mean those as serious questions, not rhetorical ones. This would permit people that want to use the strict versions of these functions to do so, and at the same time make it easy to measure performance differences between the strict and loose versions.

Comment by Steven Ruppert [ 13/Jan/13 10:58 PM ]

Looking back at this, the root cause of the problem is that the {pr} function, although it by default "print[s] in a way that objects can be read by the reader" [0], doesn't always do so. Thus, the easiest "fix" is to change its docstring to warn that not all keywords can be read back.

The deeper problem is that symbol don't have a reader form that can represent all actually possible keywords (in this case, those with "@" in them). Restricting the actually-possible keywords to match the reader form, i.e. writing a strict "keyword" function actually seems like a worse solution overall to me. The better solution would be to somehow extend the keyword reader form to allow it to express all possible keywords, possibly ruby's :"keyword" syntax. Plus, that solution would avoid having to keep hypothetical strict keyword/symbol functions in sync with reader operation, and write test cases for that, and so on.

Thus, the resolution of this bug comes down to how far we're willing to go. Changing the docstring would be the easiest, but extending the keyword form would be the "best" resolution, IMO.

[0]: http://clojuredocs.org/clojure_core/clojure.core/pr





[CLJ-976] Implement reader literal and print support for PersistentQueue data structure Created: 27/Apr/12  Updated: 06/Apr/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Fogus Assignee: Fogus
Resolution: Unresolved Votes: 0
Labels: data-structures, queue, reader, tagged-literals

Attachments: File CLJ-976-queue-literal-eval-and-synquote.diff     Text File clj-976-queue-literal-eval-and-synquote-patch-v3.txt     File CLJ-976-queue-literal-eval.diff    
Patch: Code and Test
Approval: Test

 Description   

Clojure's PersistentQueue structure has been in the language for quite some time now and has found its way into a fair share of codebases. However, the creation of queues is a two step operation often of the form:

(conj clojure.lang.PersistentQueue/EMPTY :a :b :c)

;=> #<PersistentQueue clojure.lang.PersistentQueue@78d5f6bc>

A better experience might be the following:

#queue [:a :b :c]

;=> #queue [:a :b :c]

(pop #queue [:a :b :c])

;=> #queue [:b :c]

This syntax is proposed and discussed in the Clojure-dev group at https://groups.google.com/forum/?fromgroups#!topic/clojure-dev/GQqus5Wycno

Open question: Should the queue literal's arguments eval? The implications of this are illustrated below:

;; non-eval case
#queue [1 2 (+ 1 2)]

;=> #queue [1 2 (+ 1 2)]


;; eval case
#queue [1 2 (+ 1 2)]

;=> #queue [1 2 3]

The answer to this open question will determine the implementation.



 Comments   
Comment by Steve Miner [ 27/Apr/12 10:18 AM ]

I think the non-eval behavior would be consistent with the other reader literals in Clojure 1.4. It's definitely better for interop where some other language implementation could be expected to handle a few literal representations, but not the evaluation of Clojure expressions. Use a regular function if the args need evaluation.

Comment by Chas Emerick [ 27/Apr/12 10:19 AM ]

The precedent of records seems relevant:

=> (defrecord A [b])
user.A
=> #user.A[(+ 4 5)]
#user.A{:b (+ 4 5)}
=> #user.A{:b (+ 4 5)}
#user.A{:b (+ 4 5)}

This continues to make sense, as otherwise queues would need to print with an extra (quote …) form around lists — which records neatly avoid:

=> (A. '(+ 4 5))
#user.A{:b (+ 4 5)}

Does this mean that a queue fn (analogous to vector, maybe) will also make an appearance? It'd be handy for HOF usage.

Comment by Fogus [ 27/Apr/12 11:00 AM ]

Added a patch for the tagged literal support ONLY. This is only one part of the total solution. This provides the read-string and printing capability. I'd like more discussion around the eval side before I get dive into the compiler.

Comment by Paul Michael Bauer [ 27/Apr/12 6:45 PM ]

In addition to Chas' observations on consistency with record literals, would eval in queue literals open up the same security hole as #=, needing to respect *read-eval*?
When needing eval inside a queue literal, embedding a #= seems more apropos.

Comment by Fogus [ 04/May/12 1:14 PM ]

Evalable queue literal support.

Comment by Andy Fingerhut [ 10/May/12 5:54 PM ]

Neither of the patches CLJ-976-queue-literal-tagged-parse-support-only.diff dated Apr 27, 2012 nor CLJ-976-queue-literal-eval.diff dated May 4, 2012 apply cleanly to latest master as of May 10, 2012.

Comment by Fogus [ 11/May/12 10:15 AM ]

Updated patch file to merge with latest master.

Comment by Fogus [ 20/Jul/12 1:14 PM ]

New patch with support fixed for syntax-quote.

Comment by Stuart Sierra [ 17/Aug/12 12:41 PM ]

Patch does not apply as of commit f5f4faf95051f794c9bfa0315e4457b600c84cef

Comment by Fogus [ 17/Aug/12 3:06 PM ]

Weird. I was able to download the CLJ-976-queue-literal-eval-and-synquote.diff patch and apply it to HEAD as of just now (f5f4faf95051f794c9bfa0315e4457b600c84cef). There were whitespace warnings, but the patch applied, compiles and passes all tests.

Comment by Andy Fingerhut [ 17/Aug/12 7:29 PM ]

With latest head I was able to successfully apply patch CLJ-976-queue-literal-eval-and-synquote.diff with this command:

git am --keep-cr -s < CLJ-976-queue-literal-eval-and-synquote.diff

with some warnings, but successfully applied. If I try it without the --keep-cr option, the patch fails to apply. I believe this is often a sign that either one of the files being patched, or the patch itself, contains CR/LF line endings, which some of the Clojure source files definitely do.

The command above (with --keep-cr) is currently the one recommended for applying patches on page http://dev.clojure.org/display/design/JIRA+workflow in section "Screening Tickets". I added the suggested --keep-cr option after running across another patch that applied with the option, but not without it.

Comment by Andy Fingerhut [ 28/Aug/12 5:45 PM ]

Presumptuously changing Approval from Incomplete back to Test, since the latest patch does apply cleanly if --keep-cr option is used.

Comment by Rich Hickey [ 08/Sep/12 6:48 AM ]

this needs more time

Comment by Fogus [ 18/Sep/12 8:15 AM ]

Rich,

Do you mind providing a little more detail? I would be happy to make any changes if needed. However, if it's just a matter of its relationship to EDN and/or waiting until the next release then I am happy to wait. In either case, I'd like to complete this or push it to the back of my mind. Thanks.

Comment by Andy Fingerhut [ 05/Oct/12 7:49 AM ]

clj-976-queue-literal-eval-and-synquote-patch-v2.txt dated Oct 5 2012 is identical to Fogus's patch CLJ-976-queue-literal-eval-and-synquote.diff dated Jul 20 2012. It simply removes one line addition to clojure.iml that Rich has since added in a different commit, so that this patch now applies cleanly to latest master.

Comment by Andy Fingerhut [ 16/Oct/12 12:20 PM ]

clj-976-queue-literal-eval-and-synquote-patch-v3.txt dated oct 16 2012 is identical to Fogus's patch CLJ-976-queue-literal-eval-and-synquote.diff dated Jul 20 2012. It simply removes one line addition to clojure.iml that Rich has since added in a different commit, so that this patch now applies cleanly to latest master.

Comment by Andy Fingerhut [ 20/Oct/12 12:26 PM ]

Fogus, with the recent commit of a patch for CLJ-1070, my touched-up patch clj-976-queue-literal-eval-and-synquote-patch-v3.txt dated Oct 16 2012 doesn't apply cleanly. In this case it isn't simply a few lines of context that have changed, it is the interfaces that PersistentQueue implements have been changed. It might be best if you take a look at the latest code and the patch and consider how it should be updated.

Comment by Steve Miner [ 06/Apr/13 8:07 AM ]

Related to CLJ-1078.





[CLJ-950] Function literals behavior differ from that of fns Created: 08/Mar/12  Updated: 01/Mar/13  Resolved: 08/Mar/12

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Víctor M. Valenzuela Assignee: Unassigned
Resolution: Declined Votes: 0
Labels: reader


 Description   

((fn [] true)) ; true
(#(true)) ; classcast exception

((fn [])) ; nil
(#()) ; ()

(some (fn [_]_) [nil false 0 1]) ; 0
(some #(%) [nil false 0 1]) ; NPE



 Comments   
Comment by Tassilo Horn [ 08/Mar/12 12:27 PM ]

This is no defect. Function literals must have a function (or macro or special form) as first symbol.
So your examples should be written like so:

user> (#(do true))
true
user> (#(do))
nil
user> (some #(do %) [nil false 0 1])
0
Comment by Víctor M. Valenzuela [ 08/Mar/12 2:10 PM ]

It makes sense. However (and correct me if I'm wrong) there should be little problem in making them fully equivalent to fns, resulting in a more concise and consistent API.

Please consider re-opening the issue as a feature request.

Regards - Víctor.

Comment by Tassilo Horn [ 09/Mar/12 1:26 AM ]

The reader docs at http://www.clojure.org/reader say that #() is not a replacement for (fn [] ...). You can't make it more equivalent to fn without making it much harder to understand. Let me explain that with an example.

(defn get-time []
  (System/currentTimeMillis))

(#(get-time)) ;; What's the result?

What's the result of the funcall above? Clearly, right now, it is the current system time.

So if we decided to allow to write #(true) as an alternative to (constantly true) [which is a varargs fn] or #(do true) [which is a fn of zero args], then valid values of #(get-time) where both the current system time but also the function object for get-time. Functions are values, too.

Ok, one could say that in the case of a function, #(function) is always a call, but it would make it harder to reason about what the code does for not much benefit.

Comment by Víctor M. Valenzuela [ 09/Mar/12 7:56 AM ]

You're right - satisfying my request would require to change the average use of this feature to #((some_fn %1 %2)) rather than just #(some_fn %1 %2), if we wanted #(true) to be valid. Which indeed would be barely handy.

Thank you.





[CLJ-928] instant literal for Date and Timestamp should print in UTC Created: 07/Feb/12  Updated: 01/Mar/13  Resolved: 24/Feb/12

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Steve Miner Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: reader
Environment:

1.4-beta1, Mac OS X 10.7.3, java version "1.6.0_30"


Attachments: Text File CLJ-928-instant-literals-for-Date-and-Timestamp-fogus.patch     Text File CLJ-928-instant-literals-for-Date-and-Timestamp.patch    
Patch: Code and Test
Approval: Ok

 Description   

The default #inst returns a java.util.Date. Date is always in UTC, and doesn't know about time zones, but the implementation of the print method always renders it in the default time zone. For example,

user=> #inst "2012Z"
#inst "2011-12-31T19:00:00.000-05:00"

RFC3339 says:

4.3. Unknown Local Offset Convention

If the time in UTC is known, but the offset to local time is unknown,
this can be represented with an offset of "-00:00". This differs
semantically from an offset of "Z" or "+00:00", which imply that UTC
is the preferred reference point for the specified time.

java.sql.Timestamp should also print in UTC since that class doesn't keep timezone information. The print-method for Timestamp seems broken.

user=> (binding [*data-readers* {'inst #'clojure.instant/read-instant-timestamp}] (read-string "#inst \"2012Z\""))
#inst "2011-12-31T19:000000000000000-05:00"

user=> (binding [data-readers {'inst #'clojure.instant/read-instant-timestamp}]
(read-string "#inst \"2012-01-01T01:23:45.678+00:00\""))
#inst "2011-12-31T20:267800000000000-05:00"

user=> (java.sql.Timestamp. 0)
#inst "1969-12-31T19:000000000000000-05:00"

The implementations of the print-methods for Data, GregorianCalendar and Timestamp do too much string manipulation. I suggest doing some refactoring. (Patch coming soon.)

Also, the documentation needs some updating. clojure.instant/read-instant-date, etc. mention instant-reader but I think that mechanism was generalized as data-readers.



 Comments   
Comment by Steve Miner [ 07/Feb/12 1:28 PM ]

#inst literals for Date and Timestamp should print in UTC. Fixed round-tripping problem with Timestamp nanos (and re-enabled test). Refactored print-methods to avoid some string manipulations, particularly regarding Calendar timezone offsets. Added tests including those from CLJ-926, which this patch also fixes. Fixed doc on read-instant-date, etc.

Comment by Fogus [ 07/Feb/12 3:10 PM ]

Thank you. I will run this patch through the paces ASAP.

Comment by Cosmin Stejerean [ 07/Feb/12 7:09 PM ]

"Date is always in UTC, and doesn't know about time zones"

Given that pretty much all of the methods on Date return results in the local timezone, I wouldn't quite say that Date is always in UTC. Granted, most of those methods are deprecated; toString() however isn't and it also displays the date in the local timezone.

We can certainly force printing of dates in UTC, by overriding the timezone setting on SimpleDateFormat, and there might be other valid reasons for always forcing UTC representation. Dates not being timezone aware however doesn't seem like one to me.

Comment by Steve Miner [ 08/Feb/12 8:54 AM ]

It's only a slight exaggeration to say that a java.util.Date is intrinsically a long (milliseconds since "the epoch"). For Clojure's purposes, an "instant" is a point in time in a universal sense (not situated in a timezone) so UTC makes sense as the canonical form. RFC 3339 section 4.3 says to use "-00:00" for the offset when timezone information is unknown. java.sql.Timestamp is like a Date with an extra field for nanos so it also should use UTC for the canonical form.

My argument for UTC as the canonical form is that it best matches the desired semantics of a Clojure instant. I think it's the most conservative way to go and offers the best approach to interoperability by following RFC 3339.

java.util.Calendar is by design cognizant of timezones so it arguably makes sense to preserve the timezone information. Note that comparisons of Calendars take into account the timezone so two Calendar objects might be the same instant in a sense but not be equal because of the timezones. That might be a bit tricky for Clojure users who want a clean abstraction of "instant" with multiple implementations. Presumably they know what they are doing if they decide to use read-instant-calendar. It might be a good idea to be more explicit about this in the documentation.

Comment by Fogus [ 20/Feb/12 3:22 PM ]

Modified priting to conform to the following:

  • Instant instances are stored in UTC format without offset information.
  • Instant literals with offset information will be parsed and stored with offset applied.
  • Instant literals without offset information will be parsed as UTC format without offset information.
  • Instants will print as time having local offset applied and without offset printed.
  • Instants without offset info will be print-dup'd as UTC format with RFC-3339 default offset information (-00:00)

This provides a canonical storage and read format of UTC and a print format relative to the local offset.

Comment by Stuart Sierra [ 24/Feb/12 2:11 PM ]

With Fogus' patch on Feb. 20, instant literals do not round-trip unless *print-dup* is bound true:

user=>  (let [i (read-string "#inst \"2012-02-24T14:41-02:00\"")
              j (read-string (pr-str i))]
          (prn i)
          (prn j)
          (prn (= i j)))
#inst "2012-02-24T11:41:00.000"
#inst "2012-02-24T06:41:00.000"
false
nil
user=>(binding [*print-dup* true]
        (let [i (read-string "#inst \"2012-02-24T14:41-02:00\"")
              j (read-string (pr-str i))]
          (prn i)
          (prn j)
          (prn (= i j))))
#inst "2012-02-24T16:41:00.000-00:00"
#inst "2012-02-24T16:41:00.000-00:00"
true
nil

The docstring of *print-dup* says nothing about print/read values being equal (by Clojure's definition of =) only that print/read values will be of the same type. "Type" is not a particularly meaningful concept when applied to data literals.

We (Stuart & Luke) believe that print/read values should always print in such a way that they can be read back as the same instant. Printing an instant should either:

  • print in local time, with local offset
  • always print in UTC, with zero offset

Instants should never print without any time zone offset, as this admits too much confusion.

Comment by Stuart Sierra [ 24/Feb/12 2:46 PM ]

Just spoke with Rich Hickey, who made the following assertions:

  • Instants should always represent an exact point in time, unambiguously. This representation may include time zone information.
  • Our specification for instant literals allows any part on the right to be omitted. Omitting the time zone offset means UTC is assumed. It is not important whether or not the default printer includes the UTC "00:00" offset or omits it.
  • If we are printing an instant literal based on a type that does not contain time zone information (e.g., java.util.Date) then we should not add time zone information but simply print in UTC.
  • If we are printing an instant literal based on a type that can contain time zone information (e.g., java.util.Calendar) then we may print with the time zone offset included. This is a nice-to-have feature, not a requirement. It is always permissible to print instants in UTC.
  • It is possible that round-trip print/read of an instant may lose time zone information, depending on the types used, but it this should not change the meaning of the instant in UTC.
Comment by Steve Miner [ 24/Feb/12 3:13 PM ]

I believe my original patch satisfies the requirements stated above. I'm happy to work on this if you want something changed.

Comment by Stuart Sierra [ 24/Feb/12 3:22 PM ]

Rejected Fogus' patch of Feb. 20.

Screened Steve Miner's patch of Feb. 7. Ready for Rich.

Comment by Stuart Sierra [ 24/Feb/12 3:29 PM ]

Patch applied.





[CLJ-927] default tagged literal reader Created: 06/Feb/12  Updated: 01/Mar/13  Resolved: 09/Nov/12

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.5

Type: Enhancement Priority: Major
Reporter: Fogus Assignee: Unassigned
Resolution: Completed Votes: 1
Labels: clojure, reader

Attachments: Text File CLJ-927-default-data-reader-fn-for-unknown-tags.patch    
Patch: Code and Test
Approval: Ok

 Description   

With data reader support, it's impossible to write a program to read an arbitrary stream of Clojure forms. For example, the following code will fail with the current 1.4.0-beta1 tagged literal support:

#point [0 2]

It might be enough to require that the read side define a reader for point, but what if we do not wish to require that both sides of the conversation know about the #point tag beforehand? Using the identity function as a default allows the reader to handle the literal form as is even when it doesn't recognize a tag (or even care to translate it).

The change from the current behavior is that missing tagged literal parsers are no longer error conditions.



 Comments   
Comment by Steve Miner [ 12/Feb/12 10:30 AM ]

I'd like to preserve the unknown literal tag as well as the value. That would allow a program to recreate the printed representation. A map would work as the value. You'd get something like: {:unknown-literal point :value [0 2]}. If you needed to pass that on to some other process, you could easily write it in the original literal form. Perhaps the key for literal tag should be namespace qualified to avoid possible confusion with user data. Another benefit of returning a map for an unknown literal tag is that equality tests still seem reasonable: (not= #foo "abc" #bar "abc").

Comment by Steve Miner [ 18/Sep/12 11:01 AM ]

It would be convenient if I could handle unknown tags using some sort of catch-all key in *data-readers* (say 'default). The associated function should take two arguments: the tag and the literal value. If there is no 'default key in *data-readers*, then an error would be thrown (same as Clojure 1.4).

I think it's a simple way to allow the programmer to take control without having to add new API or data types. It's just one distinguished key ('default, :default something like that) and one additional line of doc.

I can provide a patch.

Comment by Rich Hickey [ 21/Sep/12 9:17 AM ]

This needs to be addressed. We should follow the advice given in the edn docs:

If a reader encounters a tag for which no handler is registered, the implementation can either report an error, call a designated 'unknown element' handler, or create a well-known generic representation that contains both the tag and the tagged element, as it sees fit. Note that the non-error strategies allow for readers which are capable of reading any and all edn, in spite of being unaware of the details of any extensions present.

We can get both of the latter by having a preinstalled default unknown element handler that returns the generic representation. "identity" is out since it loses the tag. Using a map with namespaced keys is somewhat subtle. An TaggedElement record type is also possible.

Issues are - what happens when more than one library tries to supply a default? If the system supplies one, perhaps it's best to only allow dynamic rebinding and not static installation. Or error on conflicting default handlers, or first/last one wins (but first/'last' might be difficult to predict).

Comment by Steve Miner [ 17/Oct/12 12:49 PM ]

Everyone agrees that identity is not an appropriate default so I changed the Summary field.

Comment by Steve Miner [ 18/Oct/12 8:54 PM ]

Minimal patch that adds support for a default data reader in *data-readers*. If an unknown tag is read, we look up the 'default reader in *data-readers and call it with two arguments: the tag and the value. By default, there is no default reader and you get an exception as in Clojure 1.4.

Comment by Steve Miner [ 18/Oct/12 8:57 PM ]

An alternative patch that establishes a default data reader to handle the case of an unknown tag. The default reader returns a map with metadata to define the :unknown-tag. This preserves the information for the unknown data literal, but keeps a simple and convenient format.

Comment by Stuart Halloway [ 19/Oct/12 5:27 AM ]

Steve,

I am guessing that you consider these two patches alternatives, not cumulative. I am marking as screened the "default-in-data-readers" patch, which allows users to specify a 'default handler.

The other patch "unknown-data-reader", which specifies a built-in Clojure handler for unknown tags, is not screened. It specifies a default handler that returns a metadata-annotated map. If there is to be a default handler, I think it would need to return something disjoint from data, e.g. a tagged interface of some kind (or at least a namespaced name in the map.)

It would be a great help to have a little discussion along with the patches.

Comment by Steve Miner [ 19/Oct/12 8:01 AM ]

Yes, the two patches are separate alternatives. The "default-in-data-readers" patch just adds the special treatment for the 'default key in *data-readers* without providing a system default. This allows the application programmer to provide a catch-all handler for unknown data literal tags. Libraries should be discouraged from setting a 'default handler. Conflicts with the 'default key in data_readers.clj are handled just like other keys so it would be a bad thing for multiple libraries to try to take over the 'default data reader. Libraries can instead provide implementations and let the application programmer do the actual setting of the 'default key.

The second patch ("unknown-data-reader") implements 'default similarly, but also provides for a 'default reader in default-data-readers. My unknown-data-reader returns a map. I found it safer and simpler to deal with keywords instead of symbols for unknown tag – Clojure doesn't like symbols for unknown packages and you never know what you might get in data literal tags. After experimenting with with my original idea of having a map with two keys (essentially :tag and :value), I decided that I preferred the single entry map with the key derived from the tag and the value preserved as read. Adding the metadata to define the :unknown-tag provides enough information for the application programmer to deal with the maps unambiguously. I think the single entry maps are easier to read.

The alternative that I did not pursue would be to use a new record type as the default data type. My second patch could be used as a basis for that approach. We just need to replace my unknown-data-reader function with one that creates a record (TaggedElement).

Comment by Rich Hickey [ 19/Oct/12 5:43 PM ]

I don't like the 'default entry, especially if it is usually wrong for a library to set it.

Having no bindable var default makes editing data_readers.clj an outstanding chore for everyone.

It is unlikely a single special override in data_readers.clj is suitable for every reading situation, even in the same app.

If there is a known TaggedElement type, then people need only be able to opt out of that.
So if there were a default-tagged-reader function of (tag read-element) that built a TaggedElement (or, alternatively, threw, if people prefer), people could just rebind that in a particular reading context.

There's no perfect default, but in most situations getting unknown data is an error. I personally would default to that, and provide a read-tagged-element fn people could bind in when trying to implement read-anything.

Comment by Steve Miner [ 20/Oct/12 10:03 AM ]

old patches deleted. This revised patch introduces a var *default-data-reader-fn* which can be used to handle unknown tags. By default it is nil and an exception is thrown for the unknown tag as in Clojure 1.4. If it's non-nil, the function is called with the tag and value. I chose the name so that it contained 'data-reader', which makes it search friendly. I wanted to commit this separately from any attempt to provide a built-in catch-all reader and new record type as that might be more contentious.

Comment by Steve Miner [ 02/Nov/12 9:42 AM ]

The patch for *default-data-reader-fn* was committed for Clojure 1.5 beta1. The programmer can now specify a catch-all reader, which solves the main issue. However, there is no built-in default reader so unknown tags still cause exceptions to be thrown as in Clojure 1.4.

I think this bug can be closed. Default reader implementations could be provided by a contrib library. Or someone can open a new bug if you want the language to provide a built-in default reader.

Comment by Nicola Mometto [ 02/Nov/12 1:24 PM ]

what about adding default-tagged-reader-fn to clojure.main/with-bindings to make it set!able?

Comment by Steve Miner [ 03/Nov/12 9:29 AM ]

Good point about making it set!-able. I filed that issue as a separate bug: CLJ-1101.

Comment by Stuart Sierra [ 09/Nov/12 8:26 AM ]

Resolved.





[CLJ-926] Instant literals do not round trip correctly Created: 05/Feb/12  Updated: 17/Feb/12  Resolved: 17/Feb/12

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Cosmin Stejerean Assignee: Unassigned
Resolution: Declined Votes: 0
Labels: instant, reader, tagged-literals

Attachments: Text File CLJ-926-round-trip-date-instants-with-tests.patch    
Patch: Code and Test
Approval: Screened
Waiting On: Stuart Sierra

 Description   

When using java.util.Date for instant literals (which is the default) instants do not round-trip properly during daylight savings. Here is an example:

(read-string "#inst \"2010-11-12T13:14:15.666-06:00\"")
#inst "2010-11-13T06:14:15.666+10:00"

I'm currently in Melbourne, which is normally GMT+10. However, on November 12th daylight savings is in effect, so the proper GMT offset is +11. The date above is actually using the correct local time (6:14:15) but with the wrong offset. The problem is more obvious when you attempt to round-trip the instant that was read.

user> #inst "2010-11-13T06:14:15.666+10:00"
#inst "2010-11-13T07:14:15.666+10:00"

Notice that we read 6:14am but the output was 7:14 with the same offset. Upon digging deeper the real culprit seems to be the use of String.format (through clojure.core/format) when outputting java.util.Date. Notice the following inside caldate->rfc3339

(format "#inst \"%1$tFT%1$tT.%1$tL%1$tz\"" d))

Let's compare the timezone offset in the underlying date with what is printed by %1$tz

user> (def d #inst "2010-11-13T06:14:15.666+10:00")
#'clojure.instant/d                                                                                                                                                                                         
user> (.getHours d)
7                                                                                                                                                                                                           
user> (.getTimezoneOffset d)
-660

For reference, the definition of getTimezoneOffset is

-(Calendar.get(Calendar.ZONE_OFFSET) + Calendar.get(Calendar.DST_OFFSET)) / (60 * 1000)

So far it looks good. 6am in GMT+10 becomes 7am in GMT+11. Let's see how String.format handles it though.

                                                                                               
clojure.instant> (format "%1$tz" d)
"+1000"                                                                                                                                                                                                     
clojure.instant> (format "%1$tT" d)
"07:14:15"

String.format prints the correct hour, but with the wrong offset. The recommended way for formatting dates is to use a DateFormatter.

The String.format approach appears to work properly for Calendar, but not for Date. Therefore the attached patch keeps the current
implementation for java.util.Calendar but uses SimpleDateFormat to handle java.util.Date correctly. This fixes the roundtrip problem.



 Comments   
Comment by Cosmin Stejerean [ 05/Feb/12 9:29 PM ]

Not quite sure how to create a repeatable test for this since the issue depends on the local timezone.

Comment by Steve Miner [ 06/Feb/12 8:33 AM ]

java.util.TimeZone/setDefault could be used for testing in various timezones.

Comment by Steve Miner [ 06/Feb/12 8:37 AM ]

Regarding the patch: SimpleDateFormat is a relatively heavy-weight object, so it seems bad to allocate one every time you print a Date. Unfortunately, SimpleDateFormat is not thread-safe, so you can't just share one. The Java world seems to agree that you should use JodaTime instead, but if you're stuck with the JDK, you need to have a ThreadLocal SimpleDateFormat. I'm working on my own patch along those lines.

Comment by Fogus [ 06/Feb/12 7:38 PM ]

Excellent analysis. I'll keep track of this case and vet any patches that come along. Please do attach a regression test if possible.

Comment by Cosmin Stejerean [ 06/Feb/12 8:39 PM ]

I attached a new patch using a SimpleDateFormat per thread using a thread local. I'll try to add some tests next.

Comment by Cosmin Stejerean [ 06/Feb/12 10:42 PM ]

A combined patch that uses a thread local SimpleDateFormat, tests round-tripping at a known daylight savings point (by overriding the default timezone) and checks round tripping at multiple points throughout the year (every hour of the first 4 weeks of every month of the year).

Steve, thanks for the suggestions on using a thread local and TimeZone/setDefault.

Comment by Steve Miner [ 07/Feb/12 1:32 PM ]

I filed CLJ-928 with my patch for fixing the printing of Date and Timestamp using UTC. I copied the tests from the CLJ-926 patch to make sure we're compatible.

Comment by Fogus [ 07/Feb/12 3:11 PM ]

Thanks for the regression test also. I'll vett this patch ASAP.

Comment by Fogus [ 17/Feb/12 1:24 PM ]

Seems reasonable to me.

Comment by Stuart Sierra [ 17/Feb/12 1:46 PM ]

Vetted. Will apply later.

Comment by Stuart Sierra [ 17/Feb/12 1:56 PM ]

Not Vetted. Test. That thing.

Comment by Stuart Sierra [ 17/Feb/12 2:10 PM ]

No, it's "Screened," not "Test." Somebody save me.

Comment by Stuart Sierra [ 17/Feb/12 3:55 PM ]

Superseded by CLJ-928.





[CLJ-889] Specifically allow '.' inside keywords Created: 01/Dec/11  Updated: 15/Apr/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Reviewed Backlog

Type: Enhancement Priority: Minor
Reporter: Howard Lewis Ship Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: keywords, reader


 Description   

The documentation for keywords (on page http://clojure.org/reader) specifically states that '.' is not allowed as part of a keyword name; however '.' is specifically useful. For example, several web frameworks for Clojure use keywords to represent HTML elements, using CSS selector syntax (i.e., :div.important is equivalent to <div class='important'>).

In any case, the use of '.' is not checked by the reader and it is generally useful.

I would like to see '.' officially allowed (in the documentation). Further, I'd like to see additional details about which punctuation characters are allowed (my own web framework uses '&', '?' and '>' inside keywords for various purposes ... again, current reader implementation does not forbid this, but if a future reader will reject it, I'd like to know now).



 Comments   
Comment by Howard Lewis Ship [ 08/Dec/11 3:37 PM ]

To clarify, Hiccup and Cascade both use keywords containing '#' and '.' Cascade goes further, using '&' (to represent HTML entities), '>', and (possibly in the future) '?'.

Comment by Devin Walters [ 20/Oct/12 6:46 PM ]

I think the EDN spec mitigates some of the concern, but as of yet the official clojure.org reader documentation does not reflect the language used in the description of EDN. Where does EDN stand right now? Can the description being used on the github page be pulled over to clojure.org?

References:

Comment by Howard Lewis Ship [ 15/Apr/13 5:56 AM ]

Unfortunately, the EDN specification does not mention '>'.





Generated at Sun May 19 22:45:02 CDT 2013 using JIRA 4.4#649-r158309.