Clojure

Improve slurp performance by using native Java StringWriter and jio/copy

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: Release 1.3
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Patch:
    Code
  • Approval:
    Triaged

Description

Instead of copying each character from InputReader to StringBuffer.

Performance improvement:

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
...
"Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...

Approach: Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

Patch: slurp-perf-patch.diff
Screened by: Alex Miller

Activity

Alex Miller made changes -
Field Original Value New Value
Labels io performance
Alex Miller made changes -
Approval Triaged [ 10120 ]
Priority Minor [ 4 ] Major [ 3 ]
Alex Miller made changes -
Labels io performance io newbie performance
Timothy Baldridge made changes -
Assignee Timothy Baldridge [ halgari ]
Timothy Baldridge made changes -
Description Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}
From:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 4269.980863 msecs"
104857600
To:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 1140.854023 msecs"
104857600
{noformat}

Patch: http://github.com/juergenhoetzel/clojure/commit/af1a5713cd485ba5f1db25c37906ebaf7d474b47
Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}
From:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 4269.980863 msecs"
104857600
To:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 1140.854023 msecs"
104857600
{noformat}

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
Attachment slurp-perf-patch.diff [ 13328 ]
Alex Miller made changes -
Labels io newbie performance io performance
Alex Miller made changes -
Labels io performance ft io performance
Timothy Baldridge made changes -
Description Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}
From:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 4269.980863 msecs"
104857600
To:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 1140.854023 msecs"
104857600
{noformat}

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
..."Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
Timothy Baldridge made changes -
Description Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
..."Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
...
"Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...
{noformat}

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
Andy Fingerhut made changes -
Patch Code [ 10001 ]
Alex Miller made changes -
Description Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
...
"Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...
{noformat}

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
...
"Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...
{noformat}

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
*Screened by:* Alex Miller

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated: