Clojure

ArraySeq dead code cleanup and ArraySeq_short gap filling

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Completed
  • Affects Version/s: Release 1.5
  • Fix Version/s: Release 1.6
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code
  • Approval:
    Ok

Description

The ArraySeq constructor and all methods retain support for primitive ArraySeq but that code has all been shunted into type-specific ArraySeq primitive array variants (ArraySeq_long, etc). The ArraySeq_short variant is currently missing.

Approach: Remove the vestigial primitive array code paths in ArraySeq and fields (ct and oa). This may provide a performance benefit but it has been hard to find a measurable impact.

The patch also fills in the missing ArraySeq_short variant. Before this patch, reduce on an array of primitive short would throw ClassCastException.

Patch: no-getComponentType--v002.patch
Screened by: Stuart Sierra

Activity

Hide
Zach Tellman added a comment -

As a data point, I was recently profiling a Hadoop job where the code made heavy use of 'partial', which apparently unlike 'comp' and 'juxt', always uses apply [1]. As a result, .getComponentType accounted for 41% of the overall compute time. This might be as much a comment on the implementation of partial as anything else, but it certainly shows that this can have a significant effect on performance.

[1] https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj#L2388

Show
Zach Tellman added a comment - As a data point, I was recently profiling a Hadoop job where the code made heavy use of 'partial', which apparently unlike 'comp' and 'juxt', always uses apply [1]. As a result, .getComponentType accounted for 41% of the overall compute time. This might be as much a comment on the implementation of partial as anything else, but it certainly shows that this can have a significant effect on performance. [1] https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj#L2388
Hide
Kevin Downey added a comment -

prepRet usage appears to be all about enforcing canonical Boolean values, so I think using Object.class is not the best. Maybe splitting ArraySeq in to ArraySeq and ArraySeqBoolean would be better. ArraySeq would no longer do a prepRet and ArraySeqBoolean would

Show
Kevin Downey added a comment - prepRet usage appears to be all about enforcing canonical Boolean values, so I think using Object.class is not the best. Maybe splitting ArraySeq in to ArraySeq and ArraySeqBoolean would be better. ArraySeq would no longer do a prepRet and ArraySeqBoolean would
Hide
Kevin Downey added a comment -

ArraySeq actually already has specialized implementations, and interestingly the specialized boolean implementation doesn't call prepRet

Show
Kevin Downey added a comment - ArraySeq actually already has specialized implementations, and interestingly the specialized boolean implementation doesn't call prepRet
Hide
Kevin Downey added a comment -

The ArraySeq split I proposed above will fail if you have an array of Objects that happen to be Booleans, it seems like the best bet would be something like preRet that did a instanceof Boolean check without the requirement of passing in a class

Show
Kevin Downey added a comment - The ArraySeq split I proposed above will fail if you have an array of Objects that happen to be Booleans, it seems like the best bet would be something like preRet that did a instanceof Boolean check without the requirement of passing in a class
Hide
Brandon Bloom added a comment -

I traced the surrounding code & iirc, the result was always Object[], so the return value was always Object.class. I'm pretty sure that code wasn't actually doing anything useful here.

Show
Brandon Bloom added a comment - I traced the surrounding code & iirc, the result was always Object[], so the return value was always Object.class. I'm pretty sure that code wasn't actually doing anything useful here.
Hide
Brandon Bloom added a comment -

What does "triaged" mean in this context? Doesn't triage require some sort of decision or classification?

Show
Brandon Bloom added a comment - What does "triaged" mean in this context? Doesn't triage require some sort of decision or classification?
Hide
Andy Fingerhut added a comment -

See http://dev.clojure.org/display/community/JIRA+workflow for way more than the answer to your question (especially the flowchart in the "Workflow" section), but basically it means that a screener believes the ticket description represents an actual problem to be addressed, and they ask that Rich examine the ticket to see if he agrees.

Show
Andy Fingerhut added a comment - See http://dev.clojure.org/display/community/JIRA+workflow for way more than the answer to your question (especially the flowchart in the "Workflow" section), but basically it means that a screener believes the ticket description represents an actual problem to be addressed, and they ask that Rich examine the ticket to see if he agrees.
Hide
Brandon Bloom added a comment -

Gotcha. Thanks.

Show
Brandon Bloom added a comment - Gotcha. Thanks.
Hide
Alex Miller added a comment - - edited

The existing ArraySeq class has support for different two modes - as an array of Objects and as an array of primitives. In the Object case, this.oa will be set to the array and this.ct will be set to the component type of the array. From running a bunch of code, this.ct is not always Object - it is easy to find cases of Class and many other cases as well.

However, any kind of non-primitive array will cause this.oa to be set and this prevents the calls to prepRet from being called with ct in every method where this is done. All primitive arrays are now being handled via the switch in ArraySeq.createFromObject.

The only thing prepRet is effectively doing is returning canonical Boolean objects. It is possible to create an ArraySeq on a Boolean[]. However, even in this case, Boolean[] is an instanceof Object[] and the object path is triggered.

Thus, I agree that prepRet is never being called from ArraySeq and this path can be removed, which also allows us to remove this.ct and importantly the array.getClass().getComponentType() check. This also allows us to go further though and remove the oa field entirely and all of the prepRet calls.

I have attached an updated patch that makes this change. I also found (based on some test failures) that the ArraySeq_short was inexplicably missing so I added that in as well.

ArraySeq could be made even cleaner with the addition of another variant that specifically handled the case of a null array (ArraySeq_null). I have no idea if that would be worth doing and I have not done that here.

Show
Alex Miller added a comment - - edited The existing ArraySeq class has support for different two modes - as an array of Objects and as an array of primitives. In the Object case, this.oa will be set to the array and this.ct will be set to the component type of the array. From running a bunch of code, this.ct is not always Object - it is easy to find cases of Class and many other cases as well. However, any kind of non-primitive array will cause this.oa to be set and this prevents the calls to prepRet from being called with ct in every method where this is done. All primitive arrays are now being handled via the switch in ArraySeq.createFromObject. The only thing prepRet is effectively doing is returning canonical Boolean objects. It is possible to create an ArraySeq on a Boolean[]. However, even in this case, Boolean[] is an instanceof Object[] and the object path is triggered. Thus, I agree that prepRet is never being called from ArraySeq and this path can be removed, which also allows us to remove this.ct and importantly the array.getClass().getComponentType() check. This also allows us to go further though and remove the oa field entirely and all of the prepRet calls. I have attached an updated patch that makes this change. I also found (based on some test failures) that the ArraySeq_short was inexplicably missing so I added that in as well. ArraySeq could be made even cleaner with the addition of another variant that specifically handled the case of a null array (ArraySeq_null). I have no idea if that would be worth doing and I have not done that here.
Hide
Brandon Bloom added a comment -

Hey Alex: Thanks for following through on the vestiges removal.

Show
Brandon Bloom added a comment - Hey Alex: Thanks for following through on the vestiges removal.
Hide
Rich Hickey added a comment -

please add perf tests that demonstrate benefit

Show
Rich Hickey added a comment - please add perf tests that demonstrate benefit
Hide
Alex Miller added a comment -

Removed performance angle and focus on clean-up and gap-filling (ArraySeq_short) aspects instead.

Show
Alex Miller added a comment - Removed performance angle and focus on clean-up and gap-filling (ArraySeq_short) aspects instead.
Hide
Alex Miller added a comment -

Point to considered patch, this got lost at some point in the description.

Show
Alex Miller added a comment - Point to considered patch, this got lost at some point in the description.

People

Vote (6)
Watch (5)

Dates

  • Created:
    Updated:
    Resolved: