Background: Symbol is composed of name and namespace strings. Symbol construction interns both of these strings - this reduces memory usage and allows for string == checks inside Symbol. Keywords wrap a Symbol and have an additional cache to reuse Keyword instances.
Problem: Certain applications make heavy use of keywords (in particular the case of parsing or transforming JSON, XML, or other data into Clojure maps with keyword keys). Constructing the same keyword from a string over and over again will cause the string to be interned, a symbol constructed, and the lookup to occur in the keyword cache. In the case where the keyword already exists, this is more work than is necessary, making this path slower than it can be.
Reproduce: The following test simulates rounds of creating many keywords - the unique? flag indicates whether to use new or the same set of keywords each rep. unique?=false should be more similar to parsing a similar JSON record format over and over.
On 1.6, we see about 5.5 ms for repeated and 134 ms for unique after warmup.
With the patch, we see about 2.2 ms for repeated and 120 ms for unique after warmup.
Cause: Keyword construction based on a string involves:
- Interning string(s) in new kw
- Constructing Symbol with interned strings
- Clearing Keywords from the Keyword cache if GC has reclaimed them
- Constructing a new Keyword
- Wrapping the Keyword in a WeakReference
- CHM putIfAbsent on the cache
- If new, return. If exists, get the old one and return.
- In the event the Keyword is reclaimed by GC between the last 2 steps, retry.
This process involves a fair amount of speculative interning and object creation if the keyword already exist.
Proposal: Streamline the keyword construction process by reworking the cache implementation and the Keyword.intern() process. The patch changes the cache to key by string name instead of symbol, deferring interning and symbol creation on lookup to when we know the keyword construction is needed. The various Keyword.intern() methods are also reworked to take advantage if called with an existing Symbol to avoid re-creating it.