<< Back to previous view

[TRDR-9] Add more source metadata to read objects Created: 27/Sep/13  Updated: 20/Nov/13  Resolved: 20/Nov/13

Status: Closed
Project: tools.reader
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Alexander Redington Assignee: Alexander Redington
Resolution: Completed Votes: 0
Labels: None

Attachments: File TRDR-9-source-logging.diff    
Patch: Code and Test

 Description   

I'm working on a static analyzer for Clojure, where I'd like to have a reader which provides precise form coordinates and source representations.

Specifically this enables analyses where it's straightforward to bind an object like:

(fn* [p1__3615#] (inc (conj p1__3615#)))

to the source form which it was read from:

"#(inc (conj %))"

Would this kind of functionality be considered appropriate to tools.reader? Specifically I'd like to make a change such that

  • Read data is unchanged
  • When an IMeta is read, additional information is embedded in the metadata, but no existing metadata is changed
  • The additional metadata would include the source character string, the ending column, and the ending line from the source stream

As an example, reading a stream that contained the following on the first line:

#(inc (conj %))

Would return a list as follows:

(fn* [p1__3615#] (inc (conj p1__3615#)))

And would change the metadata from:

{:line 1, :column 1}

to augment it with more metadata, as follows:

{:line 1, :column 1, :end-line 1, :end-column 15, :source "#(inc (conj %))"}

I'd happily contribute a patch to include this behavior if you'd be inclined to accept it. I'd like to first see if it would be considered appropriate, and also discuss any other concerns you might have (such as having this behavior be optionally enabled, etc)



 Comments   
Comment by Nicola Mometto [ 28/Sep/13 1:48 PM ]

I would definitely take a patch to add such a behaviour, with the condition that this should not degrade the performances too much (a 2x slowdown would be acceptable though).

I don't feel like this needs to be optional behaviour, we already provide column/line info by default, if it's not needed the user might just ignore it.

Comment by Alexander Redington [ 18/Oct/13 12:49 PM ]

Just like to touch base as I've been quiet for a little while, but haven't forgotten about this. It's been my primary goal on Friday time.

I currently have a naive implementation that works, but blows our performance goal by a 5x factor. I think this is solvable with a little more finesse in source logging.

It appears that end-line and end-column information can be obtained for almost no additional computational cost. (less than 2% additional processor time).

I've committed my WIP to: https://github.com/aredington/tools.reader/tree/metadata-logging

Also worth discussing at some point in time: some tests around syntax quoting and metadata reading start failing. For syntax quote, the forms returned are significantly more verbose, including metadata assertions with the source attribution. For metadata, the metadata values read now have an additional :source key included that specifies the source string the forms were read from. In both cases, the expectations of the tests are violated, but I'm not inclined to say that the end result is incorrect without discussion.

Comment by Nicola Mometto [ 22/Oct/13 11:57 AM ]

Awesome, I haven't had a chance to read through the implementation but I'll do so as soon as possible.
If I'm reading the last commit message correctly, it looks like since your comment here you were able to reduce performance loss to 2x, that's cool.

Can you please make the end-line/end-column modifications into another ticket? I can merge that as soon as you upload the patch.

Regarding the failing tests – looks like those are inevitable, it's definitely not a bug and those tests should be fixed to match the new behaviour (fixing those should be just a matter of manually attatching the :source metadata to the forms to be read by the clojure reader)

Comment by Nicola Mometto [ 24/Oct/13 11:39 AM ]

After some more thoughts, I think it should be better to have a "InfoPushBackReader" (not sure if it's the best name) wrapping an IndexingPushBackReader that provides source data, this way we only get the performance hit when we effectively need the :source info instead of making every Reader collect source data even when we don't want/need it.

Would this approach work for your use case?

Comment by Alexander Redington [ 25/Oct/13 8:04 AM ]

Absolutely. I'll make two patches and we will have two paths for metadata:

  • Default case; if line numbering information is available, end-line and end-column will be captured. The performance hit from this is marginal (2-3% slower), but it does provide sufficient information for, e.g. editors, IDEs to precisely indicate what portions of a source file originate what forms. This patch will be submitted in a new ticket.
  • Enhanced case; using a new InfoPushbackReader (SourceInfoPushbackReader? SourceLoggingPushbackReader?) we'll log precise source information on each form read. The performance hit from this will be substantial (100% slower), but it isolates the performance impacts to those users who deliberately want more information and are willing to pay the performance cost. This patch will be attached to this issue (TRDR-9)
Comment by Alexander Redington [ 25/Oct/13 8:27 AM ]

Created TRDR-10 for end-line and end-column info.

Comment by Nicola Mometto [ 06/Nov/13 4:04 PM ]

Looks like CLJS will also beenfit from this, see http://dev.clojure.org/jira/browse/CLJS-658

+1 for SourceLoggingPushbackReader

Comment by Alexander Redington [ 15/Nov/13 1:35 PM ]

On https://github.com/aredington/tools.reader I have it nearly working save that nested collections (e.g. [1 2 [3 4 5]]) fail to read correctly.

The top collection will read correctly with the following as the source string: "[1 2 [3 4 5]]", the nested vector reads as "3 4 5]".

I tried taking the macro dispatching char and prepending it to the source log at the appropriate source logging frame, and that resolves nested collections, but causes metadata to read erroneously, e.g. {:foo :baz} 'zot will read with source string "^{:foo :baz} 'zot".

Gonna keep digging to try and resolve this because source logging is very close, but just wanted to update on current status as I'd been quiet for a while.

Comment by Nicola Mometto [ 15/Nov/13 8:02 PM ]

I'll take a look at your current implementation and see if I can help you spot what's going wrong, thanks again for the effort you're putting into this.

Comment by Nicola Mometto [ 16/Nov/13 8:49 AM ]

I opened a pull request https://github.cam/aredington/tools.reader/pull/1 that should fix all the issues you were having, as well as fixing another minor bug.

If you find my edits reasonable please merge it and then squash the commits and I'll merge the patch upstream

Comment by Alexander Redington [ 20/Nov/13 1:23 PM ]

Squashed commit with working source logging

Comment by Nicola Mometto [ 20/Nov/13 3:17 PM ]

Fixed: https://github.com/clojure/tools.reader/commit/f9e55071d82a443db3fac1c2feda059d0215bb90

Thanks

Generated at Thu Aug 21 05:10:30 CDT 2014 using JIRA 4.4#649-r158309.