<< Back to previous view

[NREPL-11] Ensure efficient bytestring (byte[]) support in bencode transport Created: 21/Feb/12  Updated: 21/Aug/12  Resolved: 21/Aug/12

Status: Closed
Project: tools.nrepl
Component/s: None
Affects Version/s: None
Fix Version/s: 0.2.0-beta9

Type: Enhancement Priority: Major
Reporter: Chas Emerick Assignee: Chas Emerick
Resolution: Completed Votes: 0
Labels: None


In particular, no boxing of individual bytes or multiple en/decoding steps should be involved.

This implies that:

1. all string values should remain byte[] coming out of the bencode protocol support (which makes it more compliant with bencode anyway)
2. the string decoding should occur in the bencode transport impl
3. all string values should be decoded, except for those named in a reserved slot (say, "-unencoded")

All map keys should always be decoded to Strings.

While nREPL should continue to be fully compatible with Clojure 1.2.x, let's not kill ourselves trying to optimize for 1.2.x as well as Clojure >= 1.3.0; if necessary, just make sure the latter is fast. If 1.2.x benefits as well, then good.

Comment by Meikel Brandmeyer [ 22/Feb/12 2:05 PM ]

How about “nrepl/unencoded” re point 3?

Comment by Chas Emerick [ 22/Feb/12 5:57 PM ]

Is that intended to become a namespaced keyword at some point (further up in the stack than the bencode parser or transport, of course)?

If so, does that imply that other keys (perhaps "nonstandard" ones defined by middlewares) should be namespaced as well?

Also, this is actually impacting the protocol; nrepl/* makes me think the slot is related to an "application-level" handler provided by nREPL.

FWIW, my suggestion of -unencoded was a nod to Python's _foo and Clojure's -foo to denote private / internal members. *shrug*

Comment by Meikel Brandmeyer [ 23/Feb/12 5:03 AM ]

Namespaced keywords are indeed the inspiration. So that it should probably be more like clojure.tools.nrepl.transport/unencoded rather than just nrepl/unencoded. However it is not a requirement to be a real keyword. A qualified string works just as well. There could be a middleware which turns the map keys into keywords (considering also a possible namespace). ring has a similar middleware, IIRC. I think that qualified operation and result field values (in some form or the other) are a best practise in this respect.

However your argument of unencoded being part of the protocol is valid. But then why should it be private? Document it in the official protocol spec and let middlewares also add fields to it. The question is then: what happens with nested structures (eg. :status, which has to be traversed now painfully)? If middlewares weren't allowed to add to it, then only :value would be an interesting target for unencoded. So a simple flag would suffice.

In general, I still think a tagged protocol would have been the better solution since it also allows nesting and each slot knows whether it's encoded or not.

Maybe we should clarify what middlewares are allowed to do. Is there already some documentation on this? (I'm must confess I'm not up-to-date with all the design changes.)

Comment by Chas Emerick [ 01/Mar/12 3:32 PM ]

Re: encoding, I've started to think perhaps nothing should be decoded by default, and middlewares can independently handle such things. The standard middlewares will stake out one approach, but that at least leaves open the possibility of a different convention emerging over time that can be adopted later.


Comment by Chas Emerick [ 20/Aug/12 4:14 PM ]

Sorry, where is the code for this? No .patch here, and nothing in bitbucket looks right.

Leaving aside the code for now, defaulting to unencoded strings continues to seem reasonable; a default Transport impl/substrate can perform the en/decoding, as well as establish a default convention for indicating that some fields values should be left unencoded and should not be decoded.

Comment by Meikel Brandmeyer [ 20/Aug/12 4:21 PM ]

The code is here: https://github.com/kotarak/tools.nrepl/tree/bencode-improvements2

Comment by Chas Emerick [ 20/Aug/12 10:14 PM ]

Looks good, patch applied.

I'm less and less sure re: -unencoded. Thinking back to your "tagged protocol" comment, adopting a convention of keys like "value/binary" (or something similar) would be quite a bit more flexible and somewhat simpler to handle.

(Thinking of rich content, mime types could then be sent in e.g. "value/mime" — not that the transport would care about that, but it would at least be easy to pair up data with its "tag".)


Comment by Chas Emerick [ 21/Aug/12 8:26 PM ]

Releasing -beta9 now; further discussion of unencoded value semantics can continue elsewhere…

Generated at Sat Apr 19 14:49:41 CDT 2014 using JIRA 4.4#649-r158309.