Reduce keyword cache lookup cost

Description

Background: Symbol is composed of name and namespace strings. Symbol construction interns both of these strings - this reduces memory usage and allows for string == checks inside Symbol. Keywords wrap a Symbol and have an additional cache to reuse Keyword instances.

Problem: Certain applications make heavy use of keywords (in particular the case of parsing or transforming JSON, XML, or other data into Clojure maps with keyword keys). Constructing the same keyword from a string over and over again will cause the string to be interned, a symbol constructed, and the lookup to occur in the keyword cache. In the case where the keyword already exists, this is more work than is necessary, making this path slower than it can be.

Reproduce: The following test simulates rounds of creating many keywords - the unique? flag indicates whether to use new or the same set of keywords each rep. unique?=false should be more similar to parsing a similar JSON record format over and over.

On 1.6, we see about 5.5 ms for repeated and 134 ms for unique after warmup.
With the patch, we see about 2.2 ms for repeated and 120 ms for unique after warmup.

Cause: Keyword construction based on a string involves:

  • Interning string(s) in new kw

  • Constructing Symbol with interned strings

  • Clearing Keywords from the Keyword cache if GC has reclaimed them

  • Constructing a new Keyword

  • Wrapping the Keyword in a WeakReference

  • CHM putIfAbsent on the cache

  • If new, return. If exists, get the old one and return.

  • In the event the Keyword is reclaimed by GC between the last 2 steps, retry.

This process involves a fair amount of speculative interning and object creation if the keyword already exist.

Proposal: Streamline the keyword construction process by reworking the cache implementation and the Keyword.intern() process. The patch changes the cache to key by string name instead of symbol, deferring interning and symbol creation on lookup to when we know the keyword construction is needed. The various Keyword.intern() methods are also reworked to take advantage if called with an existing Symbol to avoid re-creating it.

Patch: 0001-Improve-Keyword.intern-performance.patch

Related:

Environment

None

Attachments

1

Activity

Show:

Alex Miller October 7, 2014 at 10:42 PM

related patch was applied

Alex Miller August 1, 2014 at 5:48 PM

Alternate changes were committed today to improve both symbol and keyword creation times.

https://github.com/clojure/clojure/commit/c9e70649d2652baf13b498c4c3ebb070118c4573

Completed

Details

Assignee

Reporter

Approval

Ok

Patch

Code

Priority

Affects versions

Fix versions

Created June 5, 2014 at 8:33 PM
Updated October 7, 2014 at 10:42 PM
Resolved October 7, 2014 at 10:42 PM