tools.reader

Add more source metadata to read objects

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code and Test

Description

I'm working on a static analyzer for Clojure, where I'd like to have a reader which provides precise form coordinates and source representations.

Specifically this enables analyses where it's straightforward to bind an object like:

(fn* [p1__3615#] (inc (conj p1__3615#)))

to the source form which it was read from:

"#(inc (conj %))"

Would this kind of functionality be considered appropriate to tools.reader? Specifically I'd like to make a change such that

  • Read data is unchanged
  • When an IMeta is read, additional information is embedded in the metadata, but no existing metadata is changed
  • The additional metadata would include the source character string, the ending column, and the ending line from the source stream

As an example, reading a stream that contained the following on the first line:

#(inc (conj %))

Would return a list as follows:

(fn* [p1__3615#] (inc (conj p1__3615#)))

And would change the metadata from:

{:line 1, :column 1}

to augment it with more metadata, as follows:

{:line 1, :column 1, :end-line 1, :end-column 15, :source "#(inc (conj %))"}

I'd happily contribute a patch to include this behavior if you'd be inclined to accept it. I'd like to first see if it would be considered appropriate, and also discuss any other concerns you might have (such as having this behavior be optionally enabled, etc)

Activity

Hide
Alexander Redington added a comment -

Squashed commit with working source logging

Show
Alexander Redington added a comment - Squashed commit with working source logging
Hide
Nicola Mometto added a comment -

I opened a pull request https://github.cam/aredington/tools.reader/pull/1 that should fix all the issues you were having, as well as fixing another minor bug.

If you find my edits reasonable please merge it and then squash the commits and I'll merge the patch upstream

Show
Nicola Mometto added a comment - I opened a pull request https://github.cam/aredington/tools.reader/pull/1 that should fix all the issues you were having, as well as fixing another minor bug. If you find my edits reasonable please merge it and then squash the commits and I'll merge the patch upstream
Hide
Nicola Mometto added a comment -

I'll take a look at your current implementation and see if I can help you spot what's going wrong, thanks again for the effort you're putting into this.

Show
Nicola Mometto added a comment - I'll take a look at your current implementation and see if I can help you spot what's going wrong, thanks again for the effort you're putting into this.
Hide
Alexander Redington added a comment -

On https://github.com/aredington/tools.reader I have it nearly working save that nested collections (e.g. [1 2 [3 4 5]]) fail to read correctly.

The top collection will read correctly with the following as the source string: "[1 2 [3 4 5]]", the nested vector reads as "3 4 5]".

I tried taking the macro dispatching char and prepending it to the source log at the appropriate source logging frame, and that resolves nested collections, but causes metadata to read erroneously, e.g. {:foo :baz} 'zot will read with source string "^{:foo :baz} 'zot".

Gonna keep digging to try and resolve this because source logging is very close, but just wanted to update on current status as I'd been quiet for a while.

Show
Alexander Redington added a comment - On https://github.com/aredington/tools.reader I have it nearly working save that nested collections (e.g. [1 2 [3 4 5]]) fail to read correctly. The top collection will read correctly with the following as the source string: "[1 2 [3 4 5]]", the nested vector reads as "3 4 5]". I tried taking the macro dispatching char and prepending it to the source log at the appropriate source logging frame, and that resolves nested collections, but causes metadata to read erroneously, e.g. {:foo :baz} 'zot will read with source string "^{:foo :baz} 'zot". Gonna keep digging to try and resolve this because source logging is very close, but just wanted to update on current status as I'd been quiet for a while.
Hide
Nicola Mometto added a comment -

Looks like CLJS will also beenfit from this, see http://dev.clojure.org/jira/browse/CLJS-658

+1 for SourceLoggingPushbackReader

Show
Nicola Mometto added a comment - Looks like CLJS will also beenfit from this, see http://dev.clojure.org/jira/browse/CLJS-658 +1 for SourceLoggingPushbackReader
Hide
Alexander Redington added a comment -

Created TRDR-10 for end-line and end-column info.

Show
Alexander Redington added a comment - Created TRDR-10 for end-line and end-column info.
Hide
Alexander Redington added a comment -

Absolutely. I'll make two patches and we will have two paths for metadata:

  • Default case; if line numbering information is available, end-line and end-column will be captured. The performance hit from this is marginal (2-3% slower), but it does provide sufficient information for, e.g. editors, IDEs to precisely indicate what portions of a source file originate what forms. This patch will be submitted in a new ticket.
  • Enhanced case; using a new InfoPushbackReader (SourceInfoPushbackReader? SourceLoggingPushbackReader?) we'll log precise source information on each form read. The performance hit from this will be substantial (100% slower), but it isolates the performance impacts to those users who deliberately want more information and are willing to pay the performance cost. This patch will be attached to this issue (TRDR-9)
Show
Alexander Redington added a comment - Absolutely. I'll make two patches and we will have two paths for metadata:
  • Default case; if line numbering information is available, end-line and end-column will be captured. The performance hit from this is marginal (2-3% slower), but it does provide sufficient information for, e.g. editors, IDEs to precisely indicate what portions of a source file originate what forms. This patch will be submitted in a new ticket.
  • Enhanced case; using a new InfoPushbackReader (SourceInfoPushbackReader? SourceLoggingPushbackReader?) we'll log precise source information on each form read. The performance hit from this will be substantial (100% slower), but it isolates the performance impacts to those users who deliberately want more information and are willing to pay the performance cost. This patch will be attached to this issue (TRDR-9)
Hide
Nicola Mometto added a comment -

After some more thoughts, I think it should be better to have a "InfoPushBackReader" (not sure if it's the best name) wrapping an IndexingPushBackReader that provides source data, this way we only get the performance hit when we effectively need the :source info instead of making every Reader collect source data even when we don't want/need it.

Would this approach work for your use case?

Show
Nicola Mometto added a comment - After some more thoughts, I think it should be better to have a "InfoPushBackReader" (not sure if it's the best name) wrapping an IndexingPushBackReader that provides source data, this way we only get the performance hit when we effectively need the :source info instead of making every Reader collect source data even when we don't want/need it. Would this approach work for your use case?
Hide
Nicola Mometto added a comment -

Awesome, I haven't had a chance to read through the implementation but I'll do so as soon as possible.
If I'm reading the last commit message correctly, it looks like since your comment here you were able to reduce performance loss to 2x, that's cool.

Can you please make the end-line/end-column modifications into another ticket? I can merge that as soon as you upload the patch.

Regarding the failing tests – looks like those are inevitable, it's definitely not a bug and those tests should be fixed to match the new behaviour (fixing those should be just a matter of manually attatching the :source metadata to the forms to be read by the clojure reader)

Show
Nicola Mometto added a comment - Awesome, I haven't had a chance to read through the implementation but I'll do so as soon as possible. If I'm reading the last commit message correctly, it looks like since your comment here you were able to reduce performance loss to 2x, that's cool. Can you please make the end-line/end-column modifications into another ticket? I can merge that as soon as you upload the patch. Regarding the failing tests – looks like those are inevitable, it's definitely not a bug and those tests should be fixed to match the new behaviour (fixing those should be just a matter of manually attatching the :source metadata to the forms to be read by the clojure reader)
Hide
Alexander Redington added a comment -

Just like to touch base as I've been quiet for a little while, but haven't forgotten about this. It's been my primary goal on Friday time.

I currently have a naive implementation that works, but blows our performance goal by a 5x factor. I think this is solvable with a little more finesse in source logging.

It appears that end-line and end-column information can be obtained for almost no additional computational cost. (less than 2% additional processor time).

I've committed my WIP to: https://github.com/aredington/tools.reader/tree/metadata-logging

Also worth discussing at some point in time: some tests around syntax quoting and metadata reading start failing. For syntax quote, the forms returned are significantly more verbose, including metadata assertions with the source attribution. For metadata, the metadata values read now have an additional :source key included that specifies the source string the forms were read from. In both cases, the expectations of the tests are violated, but I'm not inclined to say that the end result is incorrect without discussion.

Show
Alexander Redington added a comment - Just like to touch base as I've been quiet for a little while, but haven't forgotten about this. It's been my primary goal on Friday time. I currently have a naive implementation that works, but blows our performance goal by a 5x factor. I think this is solvable with a little more finesse in source logging. It appears that end-line and end-column information can be obtained for almost no additional computational cost. (less than 2% additional processor time). I've committed my WIP to: https://github.com/aredington/tools.reader/tree/metadata-logging Also worth discussing at some point in time: some tests around syntax quoting and metadata reading start failing. For syntax quote, the forms returned are significantly more verbose, including metadata assertions with the source attribution. For metadata, the metadata values read now have an additional :source key included that specifies the source string the forms were read from. In both cases, the expectations of the tests are violated, but I'm not inclined to say that the end result is incorrect without discussion.
Hide
Nicola Mometto added a comment -

I would definitely take a patch to add such a behaviour, with the condition that this should not degrade the performances too much (a 2x slowdown would be acceptable though).

I don't feel like this needs to be optional behaviour, we already provide column/line info by default, if it's not needed the user might just ignore it.

Show
Nicola Mometto added a comment - I would definitely take a patch to add such a behaviour, with the condition that this should not degrade the performances too much (a 2x slowdown would be acceptable though). I don't feel like this needs to be optional behaviour, we already provide column/line info by default, if it's not needed the user might just ignore it.

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: