Clojure

instant literal for Date and Timestamp should print in UTC

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: Release 1.4
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Environment:
    1.4-beta1, Mac OS X 10.7.3, java version "1.6.0_30"
  • Patch:
    Code and Test
  • Approval:
    Ok

Description

The default #inst returns a java.util.Date. Date is always in UTC, and doesn't know about time zones, but the implementation of the print method always renders it in the default time zone. For example,

user=> #inst "2012Z"
#inst "2011-12-31T19:00:00.000-05:00"

RFC3339 says:

4.3. Unknown Local Offset Convention

If the time in UTC is known, but the offset to local time is unknown,
this can be represented with an offset of "-00:00". This differs
semantically from an offset of "Z" or "+00:00", which imply that UTC
is the preferred reference point for the specified time.

java.sql.Timestamp should also print in UTC since that class doesn't keep timezone information. The print-method for Timestamp seems broken.

user=> (binding [*data-readers* {'inst #'clojure.instant/read-instant-timestamp}] (read-string "#inst \"2012Z\""))
#inst "2011-12-31T19:000000000000000-05:00"

user=> (binding [data-readers {'inst #'clojure.instant/read-instant-timestamp}]
(read-string "#inst \"2012-01-01T01:23:45.678+00:00\""))
#inst "2011-12-31T20:267800000000000-05:00"

user=> (java.sql.Timestamp. 0)
#inst "1969-12-31T19:000000000000000-05:00"

The implementations of the print-methods for Data, GregorianCalendar and Timestamp do too much string manipulation. I suggest doing some refactoring. (Patch coming soon.)

Also, the documentation needs some updating. clojure.instant/read-instant-date, etc. mention instant-reader but I think that mechanism was generalized as data-readers.

Activity

Hide
Steve Miner added a comment -

#inst literals for Date and Timestamp should print in UTC. Fixed round-tripping problem with Timestamp nanos (and re-enabled test). Refactored print-methods to avoid some string manipulations, particularly regarding Calendar timezone offsets. Added tests including those from CLJ-926, which this patch also fixes. Fixed doc on read-instant-date, etc.

Show
Steve Miner added a comment - #inst literals for Date and Timestamp should print in UTC. Fixed round-tripping problem with Timestamp nanos (and re-enabled test). Refactored print-methods to avoid some string manipulations, particularly regarding Calendar timezone offsets. Added tests including those from CLJ-926, which this patch also fixes. Fixed doc on read-instant-date, etc.
Steve Miner made changes -
Field Original Value New Value
Attachment CLJ-928-instant-literals-for-Date-and-Timestamp.patch [ 10896 ]
Hide
Fogus added a comment -

Thank you. I will run this patch through the paces ASAP.

Show
Fogus added a comment - Thank you. I will run this patch through the paces ASAP.
Hide
Cosmin Stejerean added a comment -

"Date is always in UTC, and doesn't know about time zones"

Given that pretty much all of the methods on Date return results in the local timezone, I wouldn't quite say that Date is always in UTC. Granted, most of those methods are deprecated; toString() however isn't and it also displays the date in the local timezone.

We can certainly force printing of dates in UTC, by overriding the timezone setting on SimpleDateFormat, and there might be other valid reasons for always forcing UTC representation. Dates not being timezone aware however doesn't seem like one to me.

Show
Cosmin Stejerean added a comment - "Date is always in UTC, and doesn't know about time zones" Given that pretty much all of the methods on Date return results in the local timezone, I wouldn't quite say that Date is always in UTC. Granted, most of those methods are deprecated; toString() however isn't and it also displays the date in the local timezone. We can certainly force printing of dates in UTC, by overriding the timezone setting on SimpleDateFormat, and there might be other valid reasons for always forcing UTC representation. Dates not being timezone aware however doesn't seem like one to me.
Hide
Steve Miner added a comment - - edited

It's only a slight exaggeration to say that a java.util.Date is intrinsically a long (milliseconds since "the epoch"). For Clojure's purposes, an "instant" is a point in time in a universal sense (not situated in a timezone) so UTC makes sense as the canonical form. RFC 3339 section 4.3 says to use "-00:00" for the offset when timezone information is unknown. java.sql.Timestamp is like a Date with an extra field for nanos so it also should use UTC for the canonical form.

My argument for UTC as the canonical form is that it best matches the desired semantics of a Clojure instant. I think it's the most conservative way to go and offers the best approach to interoperability by following RFC 3339.

java.util.Calendar is by design cognizant of timezones so it arguably makes sense to preserve the timezone information. Note that comparisons of Calendars take into account the timezone so two Calendar objects might be the same instant in a sense but not be equal because of the timezones. That might be a bit tricky for Clojure users who want a clean abstraction of "instant" with multiple implementations. Presumably they know what they are doing if they decide to use read-instant-calendar. It might be a good idea to be more explicit about this in the documentation.

Show
Steve Miner added a comment - - edited It's only a slight exaggeration to say that a java.util.Date is intrinsically a long (milliseconds since "the epoch"). For Clojure's purposes, an "instant" is a point in time in a universal sense (not situated in a timezone) so UTC makes sense as the canonical form. RFC 3339 section 4.3 says to use "-00:00" for the offset when timezone information is unknown. java.sql.Timestamp is like a Date with an extra field for nanos so it also should use UTC for the canonical form. My argument for UTC as the canonical form is that it best matches the desired semantics of a Clojure instant. I think it's the most conservative way to go and offers the best approach to interoperability by following RFC 3339. java.util.Calendar is by design cognizant of timezones so it arguably makes sense to preserve the timezone information. Note that comparisons of Calendars take into account the timezone so two Calendar objects might be the same instant in a sense but not be equal because of the timezones. That might be a bit tricky for Clojure users who want a clean abstraction of "instant" with multiple implementations. Presumably they know what they are doing if they decide to use read-instant-calendar. It might be a good idea to be more explicit about this in the documentation.
Steve Miner made changes -
Patch Code and Test [ 10002 ]
Hide
Fogus added a comment -

Modified priting to conform to the following:

  • Instant instances are stored in UTC format without offset information.
  • Instant literals with offset information will be parsed and stored with offset applied.
  • Instant literals without offset information will be parsed as UTC format without offset information.
  • Instants will print as time having local offset applied and without offset printed.
  • Instants without offset info will be print-dup'd as UTC format with RFC-3339 default offset information (-00:00)

This provides a canonical storage and read format of UTC and a print format relative to the local offset.

Show
Fogus added a comment - Modified priting to conform to the following:
  • Instant instances are stored in UTC format without offset information.
  • Instant literals with offset information will be parsed and stored with offset applied.
  • Instant literals without offset information will be parsed as UTC format without offset information.
  • Instants will print as time having local offset applied and without offset printed.
  • Instants without offset info will be print-dup'd as UTC format with RFC-3339 default offset information (-00:00)
This provides a canonical storage and read format of UTC and a print format relative to the local offset.
Fogus made changes -
Hide
Stuart Sierra added a comment - - edited

With Fogus' patch on Feb. 20, instant literals do not round-trip unless *print-dup* is bound true:

user=>  (let [i (read-string "#inst \"2012-02-24T14:41-02:00\"")
              j (read-string (pr-str i))]
          (prn i)
          (prn j)
          (prn (= i j)))
#inst "2012-02-24T11:41:00.000"
#inst "2012-02-24T06:41:00.000"
false
nil
user=>(binding [*print-dup* true]
        (let [i (read-string "#inst \"2012-02-24T14:41-02:00\"")
              j (read-string (pr-str i))]
          (prn i)
          (prn j)
          (prn (= i j))))
#inst "2012-02-24T16:41:00.000-00:00"
#inst "2012-02-24T16:41:00.000-00:00"
true
nil

The docstring of *print-dup* says nothing about print/read values being equal (by Clojure's definition of =) only that print/read values will be of the same type. "Type" is not a particularly meaningful concept when applied to data literals.

We (Stuart & Luke) believe that print/read values should always print in such a way that they can be read back as the same instant. Printing an instant should either:

  • print in local time, with local offset
  • always print in UTC, with zero offset

Instants should never print without any time zone offset, as this admits too much confusion.

Show
Stuart Sierra added a comment - - edited With Fogus' patch on Feb. 20, instant literals do not round-trip unless *print-dup* is bound true:
user=>  (let [i (read-string "#inst \"2012-02-24T14:41-02:00\"")
              j (read-string (pr-str i))]
          (prn i)
          (prn j)
          (prn (= i j)))
#inst "2012-02-24T11:41:00.000"
#inst "2012-02-24T06:41:00.000"
false
nil
user=>(binding [*print-dup* true]
        (let [i (read-string "#inst \"2012-02-24T14:41-02:00\"")
              j (read-string (pr-str i))]
          (prn i)
          (prn j)
          (prn (= i j))))
#inst "2012-02-24T16:41:00.000-00:00"
#inst "2012-02-24T16:41:00.000-00:00"
true
nil
The docstring of *print-dup* says nothing about print/read values being equal (by Clojure's definition of =) only that print/read values will be of the same type. "Type" is not a particularly meaningful concept when applied to data literals. We (Stuart & Luke) believe that print/read values should always print in such a way that they can be read back as the same instant. Printing an instant should either:
  • print in local time, with local offset
  • always print in UTC, with zero offset
Instants should never print without any time zone offset, as this admits too much confusion.
Hide
Stuart Sierra added a comment -

Just spoke with Rich Hickey, who made the following assertions:

  • Instants should always represent an exact point in time, unambiguously. This representation may include time zone information.
  • Our specification for instant literals allows any part on the right to be omitted. Omitting the time zone offset means UTC is assumed. It is not important whether or not the default printer includes the UTC "00:00" offset or omits it.
  • If we are printing an instant literal based on a type that does not contain time zone information (e.g., java.util.Date) then we should not add time zone information but simply print in UTC.
  • If we are printing an instant literal based on a type that can contain time zone information (e.g., java.util.Calendar) then we may print with the time zone offset included. This is a nice-to-have feature, not a requirement. It is always permissible to print instants in UTC.
  • It is possible that round-trip print/read of an instant may lose time zone information, depending on the types used, but it this should not change the meaning of the instant in UTC.
Show
Stuart Sierra added a comment - Just spoke with Rich Hickey, who made the following assertions:
  • Instants should always represent an exact point in time, unambiguously. This representation may include time zone information.
  • Our specification for instant literals allows any part on the right to be omitted. Omitting the time zone offset means UTC is assumed. It is not important whether or not the default printer includes the UTC "00:00" offset or omits it.
  • If we are printing an instant literal based on a type that does not contain time zone information (e.g., java.util.Date) then we should not add time zone information but simply print in UTC.
  • If we are printing an instant literal based on a type that can contain time zone information (e.g., java.util.Calendar) then we may print with the time zone offset included. This is a nice-to-have feature, not a requirement. It is always permissible to print instants in UTC.
  • It is possible that round-trip print/read of an instant may lose time zone information, depending on the types used, but it this should not change the meaning of the instant in UTC.
Hide
Steve Miner added a comment -

I believe my original patch satisfies the requirements stated above. I'm happy to work on this if you want something changed.

Show
Steve Miner added a comment - I believe my original patch satisfies the requirements stated above. I'm happy to work on this if you want something changed.
Hide
Stuart Sierra added a comment -

Rejected Fogus' patch of Feb. 20.

Screened Steve Miner's patch of Feb. 7. Ready for Rich.

Show
Stuart Sierra added a comment - Rejected Fogus' patch of Feb. 20. Screened Steve Miner's patch of Feb. 7. Ready for Rich.
Stuart Sierra made changes -
Waiting On richhickey
Approval Screened [ 10004 ]
Rich Hickey made changes -
Approval Screened [ 10004 ] Ok [ 10007 ]
Hide
Stuart Sierra added a comment -

Patch applied.

Show
Stuart Sierra added a comment - Patch applied.
Stuart Sierra made changes -
Resolution Completed [ 1 ]
Waiting On richhickey
Status Open [ 1 ] Resolved [ 5 ]
Stuart Halloway made changes -
Status Resolved [ 5 ] Closed [ 6 ]

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: