Clojure

adding functions `map-vals` and `map-keys`

Details

  • Type: Feature Feature
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Approval:
    Triaged

Description

Many people have been writing a function to map values in HashMap:

Proposal: Add `map-keys` and `map-values` which: maps keys in HashMap, and map values in HashMap. They return HashMap as a result.

Workaround: Using function `reduce-kv` or ordinary `map` and `into` is a common solution, but they are confusing and types change, which makes it tricky and tedious.

Discussions: https://groups.google.com/forum/#!topic/clojure-dev/kkPYIl5qj0o

  1. map-mapper.patch
    14/Jun/16 11:22 AM
    4 kB
    Hiroyuki Fudaba
  2. map-mapper-v2.patch
    15/Jun/16 11:01 AM
    4 kB
    Hiroyuki Fudaba
  3. map-mapper-v3.patch
    11/Jul/16 3:27 AM
    3 kB
    Hiroyuki Fudaba

Activity

Hide
Hiroyuki Fudaba added a comment -

code and test for map-keys and map-vals

Show
Hiroyuki Fudaba added a comment - code and test for map-keys and map-vals
Hide
Nicola Mometto added a comment -

I propose those functions being called `update-vals` and `update-keys` rather than `map-vals` and `map-keys`

Show
Nicola Mometto added a comment - I propose those functions being called `update-vals` and `update-keys` rather than `map-vals` and `map-keys`
Hide
Alex Miller added a comment -

It's not worth bike-shedding names on this - Rich will have his own opinion regardless.

On the patch:

  • remove the :static metadata, that's not used anymore
  • needs docstrings, which should be written in the style of other Clojure docstrings. map is probably a good place to draw from.
  • rather than declare into, defer the definition of these till whatever it needs has been defined. There is no reason to add more declares for this.

There are other potential implementations - these should be implemented and compared for performance across a range of input sizes. In addition to the current approach, I would investigate:

  • reduce-kv with construction into a transient map. This allows the map to reduce itself (no seq caching needed) and avoid creating entries only to tear them apart again.
  • transducers with (into {} (map ...) m)

Also should consider

  • whether to build a k/v vector and convert to a map, or build a map directly (the former may be faster, not sure)
  • if building the map, how to construct the map entries (vector vs creating a mapentry object directly)
  • in map-keys, is there any open question when map generates new overlapping keys?
  • are there places in existing core code where map-keys/map-vals could be used (I am pretty certain there are)
Show
Alex Miller added a comment - It's not worth bike-shedding names on this - Rich will have his own opinion regardless. On the patch:
  • remove the :static metadata, that's not used anymore
  • needs docstrings, which should be written in the style of other Clojure docstrings. map is probably a good place to draw from.
  • rather than declare into, defer the definition of these till whatever it needs has been defined. There is no reason to add more declares for this.
There are other potential implementations - these should be implemented and compared for performance across a range of input sizes. In addition to the current approach, I would investigate:
  • reduce-kv with construction into a transient map. This allows the map to reduce itself (no seq caching needed) and avoid creating entries only to tear them apart again.
  • transducers with (into {} (map ...) m)
Also should consider
  • whether to build a k/v vector and convert to a map, or build a map directly (the former may be faster, not sure)
  • if building the map, how to construct the map entries (vector vs creating a mapentry object directly)
  • in map-keys, is there any open question when map generates new overlapping keys?
  • are there places in existing core code where map-keys/map-vals could be used (I am pretty certain there are)
Hide
Hiroyuki Fudaba added a comment -

Thanks for comments

> I propose those functions being called `update-vals` and `update-keys` rather than `map-vals` and `map-keys`
Maybe. But I name it `map-*` just for now, we can choose it later

about potential implementations:
I have tried several implementations, and seems to be the current implementation is the fastest.
You can see it here: https://github.com/delihiros/performance

about considerings:
> whether to build a k/v vector and convert to a map, or build a map directly (the former may be faster, not sure)
> are there places in existing core code where map-keys/map-vals could be used (I am pretty certain there are)
> if building the map, how to construct the map entries (vector vs creating a mapentry object directly)
I'll check which them as soon as possible. I haven't done it yet.

> in map-keys, is there any open question when map generates new overlapping keys?
I believe it should be overwritten by latter applied key and value.

Show
Hiroyuki Fudaba added a comment - Thanks for comments > I propose those functions being called `update-vals` and `update-keys` rather than `map-vals` and `map-keys` Maybe. But I name it `map-*` just for now, we can choose it later about potential implementations: I have tried several implementations, and seems to be the current implementation is the fastest. You can see it here: https://github.com/delihiros/performance about considerings: > whether to build a k/v vector and convert to a map, or build a map directly (the former may be faster, not sure) > are there places in existing core code where map-keys/map-vals could be used (I am pretty certain there are) > if building the map, how to construct the map entries (vector vs creating a mapentry object directly) I'll check which them as soon as possible. I haven't done it yet. > in map-keys, is there any open question when map generates new overlapping keys? I believe it should be overwritten by latter applied key and value.
Hide
Nathan Marz added a comment -

I've done quite a bit of investigation into this through building Specter. Here are some benchmarks of numerous ways of doing map-vals, including using Specter.

Code: https://github.com/nathanmarz/specter/blob/4778500e0370fb211f47ebf4d69ca64366117b6c/scripts/benchmarks.clj#L87
Results: https://gist.github.com/nathanmarz/bf571c9ed86bfad09816e17b9b6e59e3

A few comments:

  • Implementations that build and tear apart MapEntry's perform much worse.
  • Transients should be used for large maps but not for small ones.
  • This benchmark shows that the property of maintaining the type of the map in the output can be achieved without sacrificing performance (the test cases using Specter or "empty" have this property).
Show
Nathan Marz added a comment - I've done quite a bit of investigation into this through building Specter. Here are some benchmarks of numerous ways of doing map-vals, including using Specter. Code: https://github.com/nathanmarz/specter/blob/4778500e0370fb211f47ebf4d69ca64366117b6c/scripts/benchmarks.clj#L87 Results: https://gist.github.com/nathanmarz/bf571c9ed86bfad09816e17b9b6e59e3 A few comments:
  • Implementations that build and tear apart MapEntry's perform much worse.
  • Transients should be used for large maps but not for small ones.
  • This benchmark shows that the property of maintaining the type of the map in the output can be achieved without sacrificing performance (the test cases using Specter or "empty" have this property).
Hide
Hiroyuki Fudaba added a comment -

I've modified the implementation. It should be faster than before.

Show
Hiroyuki Fudaba added a comment - I've modified the implementation. It should be faster than before.
Hide
Steve Miner added a comment -

Implementations that call reduce-kv are not lazy so the documentation should be clarified in the proposed patch (map-mapper-v3.patch). Also, it's probably better to say "map" (as the noun) rather than to specify a particular concrete type "hash map".

Show
Steve Miner added a comment - Implementations that call reduce-kv are not lazy so the documentation should be clarified in the proposed patch (map-mapper-v3.patch). Also, it's probably better to say "map" (as the noun) rather than to specify a particular concrete type "hash map".
Hide
Nicola Mometto added a comment - - edited

map->map operations can't be lazy either way. Even if one implementation used lazy operations to iterate over the original map, the `into {}` would realize it later.

Show
Nicola Mometto added a comment - - edited map->map operations can't be lazy either way. Even if one implementation used lazy operations to iterate over the original map, the `into {}` would realize it later.
Hide
Nathan Marz added a comment -

-1 to this. Clojure aims to be a small core, pushing additional functionality into libraries. The problem space of compound transformations, of which this functionality is a small piece, is already thoroughly solved by Specter. Specter's `MAP-VALS` and `MAP-KEYS` navigators additionally support removal of key/value pairs during transformation by transforming to special `NONE` value. This expands the utility greatly.

Also worth noting is a fast implementation requires a totally different approach depending on the type of the map. `reduce-kv` with transients is optimal for hash maps, but for array maps using lower level facilities provide ~60% speed boost. See Specter's implementation here https://github.com/nathanmarz/specter/blob/1.0.4/src/clj/com/rpl/specter/navs.cljc#L243

Show
Nathan Marz added a comment - -1 to this. Clojure aims to be a small core, pushing additional functionality into libraries. The problem space of compound transformations, of which this functionality is a small piece, is already thoroughly solved by Specter. Specter's `MAP-VALS` and `MAP-KEYS` navigators additionally support removal of key/value pairs during transformation by transforming to special `NONE` value. This expands the utility greatly. Also worth noting is a fast implementation requires a totally different approach depending on the type of the map. `reduce-kv` with transients is optimal for hash maps, but for array maps using lower level facilities provide ~60% speed boost. See Specter's implementation here https://github.com/nathanmarz/specter/blob/1.0.4/src/clj/com/rpl/specter/navs.cljc#L243
Hide
Ghadi Shayban added a comment -

Nathan you're making a strawman re: compound transformations. This isn't a request for a function with filtering knobs or conditional behavior (which Clojure has historically opposed). There are other valid approaches than Specter's.

Re fast implementation: Not every function has to strive for the most performant implementation, esp at the cost of branching and general complexity. A cost model for performance has to take into account complexity.

This ticket is a request for a convenience that is repeated in many codebases.

Do we want to preserve metadata? Many map operations do.
Do we want to assume IEditableCollection?

Show
Ghadi Shayban added a comment - Nathan you're making a strawman re: compound transformations. This isn't a request for a function with filtering knobs or conditional behavior (which Clojure has historically opposed). There are other valid approaches than Specter's. Re fast implementation: Not every function has to strive for the most performant implementation, esp at the cost of branching and general complexity. A cost model for performance has to take into account complexity. This ticket is a request for a convenience that is repeated in many codebases. Do we want to preserve metadata? Many map operations do. Do we want to assume IEditableCollection?
Hide
Nathan Marz added a comment -

Performance is incredibly important for general data structure manipulation functions like this. Manipulating data structures is one of the most basic things you do in a language, and it's done all the time in performance sensitive code. Having to abandon the "nice" functions for ugly specialized code in performance sensitive code is a failure of the language.

`map-vals`/`map-keys` are part of a rich problem space of which myself and the users of Specter have learned a lot the past few years. Clojure barely touches this problem space, especially when it comes to nested or recursive data structures. Adding functions to Clojure that are significantly inferior in both semantics and performance to what already exists in an external library seems pretty silly to me. Just my two cents.

Show
Nathan Marz added a comment - Performance is incredibly important for general data structure manipulation functions like this. Manipulating data structures is one of the most basic things you do in a language, and it's done all the time in performance sensitive code. Having to abandon the "nice" functions for ugly specialized code in performance sensitive code is a failure of the language. `map-vals`/`map-keys` are part of a rich problem space of which myself and the users of Specter have learned a lot the past few years. Clojure barely touches this problem space, especially when it comes to nested or recursive data structures. Adding functions to Clojure that are significantly inferior in both semantics and performance to what already exists in an external library seems pretty silly to me. Just my two cents.
Hide
Hiroyuki Fudaba added a comment -

I agree with Nathan Martz that the performance is very important, but I still have a strong opinion that this function should be somehow imported to the core part of the language.
People use this transformation pretty often and if there is a fast implementation in the core it will be a great benefit to all of us.

Show
Hiroyuki Fudaba added a comment - I agree with Nathan Martz that the performance is very important, but I still have a strong opinion that this function should be somehow imported to the core part of the language. People use this transformation pretty often and if there is a fast implementation in the core it will be a great benefit to all of us.

People

Vote (40)
Watch (12)

Dates

  • Created:
    Updated: