tools.reader

The exceptions throw when parsing fails could be much more specific and helpful.

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Environment:
    Both cljs and clojure.
     OS independent.
  • Patch:
    Code and Test

Description

The exceptions thrown when edn/clojure code parsing fails are less helpful than they might be:

  • The same message is used in slightly different circumstance. For example "EOF while reading" is used in three slightly different situations in clojure/tools/reader.clj, while "EOF while reading character" is used four times in the same file.
  • The exception messages frequently do not include much of the available context. Which token is bad? Which map contains an odd number of items?
  • The line number where the error occurred is not always returned.

Activity

Hide
Nicola Mometto added a comment -

Hi Russ,
thanks for the patch, this is a significant change so it might take me a while to review it and I can't say whether or not I'll accept it yet.

In the meantime, can you give me a quick overview of the impl approach and a description of which error messages you changed?

Show
Nicola Mometto added a comment - Hi Russ, thanks for the patch, this is a significant change so it might take me a while to review it and I can't say whether or not I'll accept it yet. In the meantime, can you give me a quick overview of the impl approach and a description of which error messages you changed?
Hide
Russ Olsen added a comment -

Nicola,

Sure, thanks for writing.

Let me start by saying that I'm not married to the approach that I tool with this patch. It seemed to me a good way to make things better, but certainly not the only way.
The tact that I took was:

  • Factor out generating the common exceptions and put that code in the errors namespace. I did this because – as I'm sure you are aware – there are four separate implementations of the basic reading code, one each for tools.reader and tools.reader.edn and times two for clojure and clojurescript. There is more or less one
    function in errors for each exception.
  • In a couple of places I pushed down the specific type of value we are reading into the generic function. So both read-delimited and read-token have an additional 'kind' parameter, so that when something goes wrong they can say "Opps, I was reading a vector (or symbol or keyword) and stuff happened."
  • Also added an inspect multimethod whose idea is to provide a safe way to turns values into strings to provide context. I say 'safe' in that inspect will happily turn a keyword or a symbol into a string, but will give you <seq> for lazy (and possibly never ending) sequences.
  • I also changed read-string to use an indexing reader.

As for the specifics of the messages, well there are a lot of them. Can I suggest you take a looks at errors.clj - I think the changes are apparent there. If you still need more context then I'll be happy to write up something more formal.

Show
Russ Olsen added a comment - Nicola, Sure, thanks for writing. Let me start by saying that I'm not married to the approach that I tool with this patch. It seemed to me a good way to make things better, but certainly not the only way. The tact that I took was:
  • Factor out generating the common exceptions and put that code in the errors namespace. I did this because – as I'm sure you are aware – there are four separate implementations of the basic reading code, one each for tools.reader and tools.reader.edn and times two for clojure and clojurescript. There is more or less one function in errors for each exception.
  • In a couple of places I pushed down the specific type of value we are reading into the generic function. So both read-delimited and read-token have an additional 'kind' parameter, so that when something goes wrong they can say "Opps, I was reading a vector (or symbol or keyword) and stuff happened."
  • Also added an inspect multimethod whose idea is to provide a safe way to turns values into strings to provide context. I say 'safe' in that inspect will happily turn a keyword or a symbol into a string, but will give you <seq> for lazy (and possibly never ending) sequences.
  • I also changed read-string to use an indexing reader.
As for the specifics of the messages, well there are a lot of them. Can I suggest you take a looks at errors.clj - I think the changes are apparent there. If you still need more context then I'll be happy to write up something more formal.
Hide
Russ Olsen added a comment -

Nicola, just wondering if you have any additional feedback on this?

Show
Russ Olsen added a comment - Nicola, just wondering if you have any additional feedback on this?
Hide
Nicola Mometto added a comment -

Haven't had time to take a deep look at this yet, will do soon now that 1.0.0 has been released

Show
Nicola Mometto added a comment - Haven't had time to take a deep look at this yet, will do soon now that 1.0.0 has been released
Hide
Nicola Mometto added a comment -

Hi Russ Olsen, first of all, many thanks for the work you've put into this.
I finally had a chance to look at this, generally, I like the improvements and I think I'm likely to include this in some future release, relatively soon.

A few issues I'd like to see addressed before that happens:

  • as a stylistic implementaion detail, I'd rather see keyword passed around rather than strings in e.g. `read-delimited` or `read-token`, and have a dispatch table in the errors namespace.
  • some error messages seem to have lost starting column details, like in the case of `throw-eof-delimited`. Please reinstate that, we want improvements in error messages to be strictly additive
  • move both .errors and .inspect under the .impl namespace segment, we do not want to expose those as public API
  • we need to incorporate the fix for https://github.com/clojure/tools.reader/commit/becf3abdc39bd6420582615ee3b9c428077f30d8 in this patch set
  • exclude from this patch whiteline/newline changes, commented code, variable renames (e.g. reader to rdr), formatting changes (e.g. in the ns declaration for edn.cljs) and non-related improvements (e.g. switching from string-push-back-reader to indexing-push-back-reader for read-string)to keep the changeset as contained as possible. If you fee like any of those changes are valid and wish to see them included, please open separate tickets for them.
  • I'd like to see the cljs and clj implementations match as much as possible. Is there a reason for having different implementations of e.g. reader-error between clj and cljs? If there is, fine, otherwise please use the same implementation
  • please remove the println at the top of edn.clj

Please do let me know if you don't have time to handle the points I've reaised and I'll do them myself. In the meantime, thanks again for your work on this.

Show
Nicola Mometto added a comment - Hi Russ Olsen, first of all, many thanks for the work you've put into this. I finally had a chance to look at this, generally, I like the improvements and I think I'm likely to include this in some future release, relatively soon. A few issues I'd like to see addressed before that happens:
  • as a stylistic implementaion detail, I'd rather see keyword passed around rather than strings in e.g. `read-delimited` or `read-token`, and have a dispatch table in the errors namespace.
  • some error messages seem to have lost starting column details, like in the case of `throw-eof-delimited`. Please reinstate that, we want improvements in error messages to be strictly additive
  • move both .errors and .inspect under the .impl namespace segment, we do not want to expose those as public API
  • we need to incorporate the fix for https://github.com/clojure/tools.reader/commit/becf3abdc39bd6420582615ee3b9c428077f30d8 in this patch set
  • exclude from this patch whiteline/newline changes, commented code, variable renames (e.g. reader to rdr), formatting changes (e.g. in the ns declaration for edn.cljs) and non-related improvements (e.g. switching from string-push-back-reader to indexing-push-back-reader for read-string)to keep the changeset as contained as possible. If you fee like any of those changes are valid and wish to see them included, please open separate tickets for them.
  • I'd like to see the cljs and clj implementations match as much as possible. Is there a reason for having different implementations of e.g. reader-error between clj and cljs? If there is, fine, otherwise please use the same implementation
  • please remove the println at the top of edn.clj
Please do let me know if you don't have time to handle the points I've reaised and I'll do them myself. In the meantime, thanks again for your work on this.
Hide
Russ Olsen added a comment -

Nicola,

Really glad you are finding the patch helpful. Philosophically I have no problems with the changes
you are suggesting - this whole thing has been one of those "pull on a thread and before you know
it the whole sweater is gone" kind of exercises.

Unfortunately I find myself really busy at the moment, so it's probably better if you run with it
yourself.

Show
Russ Olsen added a comment - Nicola, Really glad you are finding the patch helpful. Philosophically I have no problems with the changes you are suggesting - this whole thing has been one of those "pull on a thread and before you know it the whole sweater is gone" kind of exercises. Unfortunately I find myself really busy at the moment, so it's probably better if you run with it yourself.
Hide
Nicola Mometto added a comment -

Hi Russ, no worries, that's completely understandable.
Thanks again for all the effort you've put into this, I'll try to get this into tools.reader by the end of the week.

Show
Nicola Mometto added a comment - Hi Russ, no worries, that's completely understandable. Thanks again for all the effort you've put into this, I'll try to get this into tools.reader by the end of the week.
Hide
Nicola Mometto added a comment -

Merged into master

Show
Nicola Mometto added a comment - Merged into master

People

Vote (3)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: