tools.nrepl

Ensure efficient bytestring (byte[]) support in bencode transport

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: 0.2.0-beta9
  • Component/s: None
  • Labels:
    None

Description

In particular, no boxing of individual bytes or multiple en/decoding steps should be involved.

This implies that:

1. all string values should remain byte[] coming out of the bencode protocol support (which makes it more compliant with bencode anyway)
2. the string decoding should occur in the bencode transport impl
3. all string values should be decoded, except for those named in a reserved slot (say, "-unencoded")

All map keys should always be decoded to Strings.

While nREPL should continue to be fully compatible with Clojure 1.2.x, let's not kill ourselves trying to optimize for 1.2.x as well as Clojure >= 1.3.0; if necessary, just make sure the latter is fast. If 1.2.x benefits as well, then good.

Activity

Hide
Meikel Brandmeyer added a comment -

How about “nrepl/unencoded” re point 3?

Show
Meikel Brandmeyer added a comment - How about “nrepl/unencoded” re point 3?
Hide
Chas Emerick added a comment -

Is that intended to become a namespaced keyword at some point (further up in the stack than the bencode parser or transport, of course)?

If so, does that imply that other keys (perhaps "nonstandard" ones defined by middlewares) should be namespaced as well?

Also, this is actually impacting the protocol; nrepl/* makes me think the slot is related to an "application-level" handler provided by nREPL.

FWIW, my suggestion of -unencoded was a nod to Python's _foo and Clojure's -foo to denote private / internal members. *shrug*

Show
Chas Emerick added a comment - Is that intended to become a namespaced keyword at some point (further up in the stack than the bencode parser or transport, of course)? If so, does that imply that other keys (perhaps "nonstandard" ones defined by middlewares) should be namespaced as well? Also, this is actually impacting the protocol; nrepl/* makes me think the slot is related to an "application-level" handler provided by nREPL. FWIW, my suggestion of -unencoded was a nod to Python's _foo and Clojure's -foo to denote private / internal members. *shrug*
Hide
Meikel Brandmeyer added a comment -

Namespaced keywords are indeed the inspiration. So that it should probably be more like clojure.tools.nrepl.transport/unencoded rather than just nrepl/unencoded. However it is not a requirement to be a real keyword. A qualified string works just as well. There could be a middleware which turns the map keys into keywords (considering also a possible namespace). ring has a similar middleware, IIRC. I think that qualified operation and result field values (in some form or the other) are a best practise in this respect.

However your argument of unencoded being part of the protocol is valid. But then why should it be private? Document it in the official protocol spec and let middlewares also add fields to it. The question is then: what happens with nested structures (eg. :status, which has to be traversed now painfully)? If middlewares weren't allowed to add to it, then only :value would be an interesting target for unencoded. So a simple flag would suffice.

In general, I still think a tagged protocol would have been the better solution since it also allows nesting and each slot knows whether it's encoded or not.

Maybe we should clarify what middlewares are allowed to do. Is there already some documentation on this? (I'm must confess I'm not up-to-date with all the design changes.)

Show
Meikel Brandmeyer added a comment - Namespaced keywords are indeed the inspiration. So that it should probably be more like clojure.tools.nrepl.transport/unencoded rather than just nrepl/unencoded. However it is not a requirement to be a real keyword. A qualified string works just as well. There could be a middleware which turns the map keys into keywords (considering also a possible namespace). ring has a similar middleware, IIRC. I think that qualified operation and result field values (in some form or the other) are a best practise in this respect. However your argument of unencoded being part of the protocol is valid. But then why should it be private? Document it in the official protocol spec and let middlewares also add fields to it. The question is then: what happens with nested structures (eg. :status, which has to be traversed now painfully)? If middlewares weren't allowed to add to it, then only :value would be an interesting target for unencoded. So a simple flag would suffice. In general, I still think a tagged protocol would have been the better solution since it also allows nesting and each slot knows whether it's encoded or not. Maybe we should clarify what middlewares are allowed to do. Is there already some documentation on this? (I'm must confess I'm not up-to-date with all the design changes.)
Hide
Chas Emerick added a comment -

Re: encoding, I've started to think perhaps nothing should be decoded by default, and middlewares can independently handle such things. The standard middlewares will stake out one approach, but that at least leaves open the possibility of a different convention emerging over time that can be adopted later.

Opinions?

Show
Chas Emerick added a comment - Re: encoding, I've started to think perhaps nothing should be decoded by default, and middlewares can independently handle such things. The standard middlewares will stake out one approach, but that at least leaves open the possibility of a different convention emerging over time that can be adopted later. Opinions?
Hide
Chas Emerick added a comment -

Sorry, where is the code for this? No .patch here, and nothing in bitbucket looks right.

Leaving aside the code for now, defaulting to unencoded strings continues to seem reasonable; a default Transport impl/substrate can perform the en/decoding, as well as establish a default convention for indicating that some fields values should be left unencoded and should not be decoded.

Show
Chas Emerick added a comment - Sorry, where is the code for this? No .patch here, and nothing in bitbucket looks right. Leaving aside the code for now, defaulting to unencoded strings continues to seem reasonable; a default Transport impl/substrate can perform the en/decoding, as well as establish a default convention for indicating that some fields values should be left unencoded and should not be decoded.
Hide
Chas Emerick added a comment -

Looks good, patch applied.

I'm less and less sure re: -unencoded. Thinking back to your "tagged protocol" comment, adopting a convention of keys like "value/binary" (or something similar) would be quite a bit more flexible and somewhat simpler to handle.

(Thinking of rich content, mime types could then be sent in e.g. "value/mime" — not that the transport would care about that, but it would at least be easy to pair up data with its "tag".)

Thoughts?

Show
Chas Emerick added a comment - Looks good, patch applied. I'm less and less sure re: -unencoded. Thinking back to your "tagged protocol" comment, adopting a convention of keys like "value/binary" (or something similar) would be quite a bit more flexible and somewhat simpler to handle. (Thinking of rich content, mime types could then be sent in e.g. "value/mime" — not that the transport would care about that, but it would at least be easy to pair up data with its "tag".) Thoughts?
Hide
Chas Emerick added a comment -

Releasing -beta9 now; further discussion of unencoded value semantics can continue elsewhere…

Show
Chas Emerick added a comment - Releasing -beta9 now; further discussion of unencoded value semantics can continue elsewhere…

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: