Clojure

Improve slurp performance by using native Java StringWriter and jio/copy

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Critical Critical
  • Resolution: Completed
  • Affects Version/s: Release 1.3
  • Fix Version/s: Release 1.8
  • Component/s: None
  • Labels:
  • Patch:
    Code
  • Approval:
    Ok

Description

Instead of copying each character from InputReader to StringBuffer.

Performance improvement:

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
...
"Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...

Approach: Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

Patch: clj-668-2.patch
Screened by: Alex Miller

  1. clj-668-2.patch
    09/Oct/15 8:23 AM
    2 kB
    Alex Miller
  2. slurp-perf-patch.diff
    10/Sep/14 10:27 PM
    1 kB
    Timothy Baldridge

Activity

Alex Miller made changes -
Field Original Value New Value
Labels io performance
Hide
Alex Miller added a comment -

This is double-better with the changes in Clojure 1.6 to improve jio/copy performance by using the NIO impl. Rough timing difference on a 25M file: old= 2316.021 msecs, new= 93.319 msecs.

Filer did not supply a patch and is not a contributor. If someone wants to make a patch (and better timing info demonstrating performance improvements), that would be great.

Show
Alex Miller added a comment - This is double-better with the changes in Clojure 1.6 to improve jio/copy performance by using the NIO impl. Rough timing difference on a 25M file: old= 2316.021 msecs, new= 93.319 msecs. Filer did not supply a patch and is not a contributor. If someone wants to make a patch (and better timing info demonstrating performance improvements), that would be great.
Alex Miller made changes -
Priority Minor [ 4 ] Major [ 3 ]
Approval Triaged [ 10120 ]
Alex Miller made changes -
Labels io performance io newbie performance
Timothy Baldridge made changes -
Assignee Timothy Baldridge [ halgari ]
Timothy Baldridge made changes -
Attachment slurp-perf-patch.diff [ 13328 ]
Description Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}
From:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 4269.980863 msecs"
104857600
To:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 1140.854023 msecs"
104857600
{noformat}

Patch: http://github.com/juergenhoetzel/clojure/commit/af1a5713cd485ba5f1db25c37906ebaf7d474b47
Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}
From:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 4269.980863 msecs"
104857600
To:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 1140.854023 msecs"
104857600
{noformat}

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
Hide
Timothy Baldridge added a comment -

Fixed the ticket formatting a bit, and added a patch I coded up tonight. Should be pretty close to the old patch, as we both use StringWriter, but I didn't really look at the old patch beyond noticing that it was using StringWriter.

Show
Timothy Baldridge added a comment - Fixed the ticket formatting a bit, and added a patch I coded up tonight. Should be pretty close to the old patch, as we both use StringWriter, but I didn't really look at the old patch beyond noticing that it was using StringWriter.
Hide
Alex Miller added a comment -

Can you update the perf comparison on latest code and do both a small and big file?

Show
Alex Miller added a comment - Can you update the perf comparison on latest code and do both a small and big file?
Alex Miller made changes -
Labels io newbie performance io performance
Alex Miller made changes -
Labels io performance ft io performance
Timothy Baldridge made changes -
Description Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}
From:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 4269.980863 msecs"
104857600
To:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 1140.854023 msecs"
104857600
{noformat}

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
..."Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
Timothy Baldridge made changes -
Description Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
..."Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
...
"Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...
{noformat}

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
Andy Fingerhut made changes -
Patch Code [ 10001 ]
Alex Miller made changes -
Description Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
...
"Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...
{noformat}

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
...
"Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...
{noformat}

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
*Screened by:* Alex Miller
Alex Miller made changes -
Priority Major [ 3 ] Critical [ 2 ]
Hide
Michael Blume added a comment -

Is this blocked on a new perf comparison? Should I post one?

Show
Michael Blume added a comment - Is this blocked on a new perf comparison? Should I post one?
Hide
Michael Blume added a comment -
$ lein bench 20000 /tmp/hello
writing test file
filename: /tmp/hello
file size: 20000 bytes
benching old slurp
WARNING: Final GC required 1.210381595203919 % of runtime
Evaluation count : 298920 in 60 samples of 4982 calls.
             Execution time mean : 202.329756 µs
    Execution time std-deviation : 6.314504 µs
   Execution time lower quantile : 191.929679 µs ( 2.5%)
   Execution time upper quantile : 218.094592 µs (97.5%)
                   Overhead used : 1.982912 ns

Found 4 outliers in 60 samples (6.6667 %)
	low-severe	 1 (1.6667 %)
	low-mild	 2 (3.3333 %)
	high-mild	 1 (1.6667 %)
 Variance from outliers : 17.4347 % Variance is moderately inflated by outliers
benching new slurp
Evaluation count : 906180 in 60 samples of 15103 calls.
             Execution time mean : 67.856649 µs
    Execution time std-deviation : 1.729056 µs
   Execution time lower quantile : 64.529704 µs ( 2.5%)
   Execution time upper quantile : 70.907626 µs (97.5%)
                   Overhead used : 1.982912 ns

Found 6 outliers in 60 samples (10.0000 %)
	low-severe	 4 (6.6667 %)
	low-mild	 1 (1.6667 %)
	high-mild	 1 (1.6667 %)
 Variance from outliers : 12.6203 % Variance is moderately inflated by outliers
$ lein bench 20000000 /tmp/hello
writing test file
filename: /tmp/hello
file size: 20000000 bytes
benching old slurp
WARNING: Final GC required 3.245241304626936 % of runtime
Evaluation count : 360 in 60 samples of 6 calls.
             Execution time mean : 191.645108 ms
    Execution time std-deviation : 7.985328 ms
   Execution time lower quantile : 177.640022 ms ( 2.5%)
   Execution time upper quantile : 199.548233 ms (97.5%)
                   Overhead used : 1.733362 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 1 (1.6667 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 28.6500 % Variance is moderately inflated by outliers
benching new slurp
Evaluation count : 1380 in 60 samples of 23 calls.
             Execution time mean : 45.151513 ms
    Execution time std-deviation : 1.298921 ms
   Execution time lower quantile : 41.675855 ms ( 2.5%)
   Execution time upper quantile : 47.746682 ms (97.5%)
                   Overhead used : 1.733362 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 3 (5.0000 %)
	low-mild	 2 (3.3333 %)
 Variance from outliers : 15.7926 % Variance is moderately inflated by outliers
$ lein bench 20 /tmp/hello
writing test file
filename: /tmp/hello
file size: 20 bytes
benching old slurp
WARNING: Final GC required 1.413345976323001 % of runtime
Evaluation count : 2707140 in 60 samples of 45119 calls.
             Execution time mean : 22.617091 µs
    Execution time std-deviation : 409.896754 ns
   Execution time lower quantile : 21.777914 µs ( 2.5%)
   Execution time upper quantile : 23.316498 µs (97.5%)
                   Overhead used : 1.917000 ns
benching new slurp
Evaluation count : 2430420 in 60 samples of 40507 calls.
             Execution time mean : 25.023136 µs
    Execution time std-deviation : 476.636242 ns
   Execution time lower quantile : 23.910619 µs ( 2.5%)
   Execution time upper quantile : 25.666224 µs (97.5%)
                   Overhead used : 1.917000 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 1 (1.6667 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 7.8349 % Variance is slightly inflated by outliers
Show
Michael Blume added a comment -
$ lein bench 20000 /tmp/hello
writing test file
filename: /tmp/hello
file size: 20000 bytes
benching old slurp
WARNING: Final GC required 1.210381595203919 % of runtime
Evaluation count : 298920 in 60 samples of 4982 calls.
             Execution time mean : 202.329756 µs
    Execution time std-deviation : 6.314504 µs
   Execution time lower quantile : 191.929679 µs ( 2.5%)
   Execution time upper quantile : 218.094592 µs (97.5%)
                   Overhead used : 1.982912 ns

Found 4 outliers in 60 samples (6.6667 %)
	low-severe	 1 (1.6667 %)
	low-mild	 2 (3.3333 %)
	high-mild	 1 (1.6667 %)
 Variance from outliers : 17.4347 % Variance is moderately inflated by outliers
benching new slurp
Evaluation count : 906180 in 60 samples of 15103 calls.
             Execution time mean : 67.856649 µs
    Execution time std-deviation : 1.729056 µs
   Execution time lower quantile : 64.529704 µs ( 2.5%)
   Execution time upper quantile : 70.907626 µs (97.5%)
                   Overhead used : 1.982912 ns

Found 6 outliers in 60 samples (10.0000 %)
	low-severe	 4 (6.6667 %)
	low-mild	 1 (1.6667 %)
	high-mild	 1 (1.6667 %)
 Variance from outliers : 12.6203 % Variance is moderately inflated by outliers
$ lein bench 20000000 /tmp/hello
writing test file
filename: /tmp/hello
file size: 20000000 bytes
benching old slurp
WARNING: Final GC required 3.245241304626936 % of runtime
Evaluation count : 360 in 60 samples of 6 calls.
             Execution time mean : 191.645108 ms
    Execution time std-deviation : 7.985328 ms
   Execution time lower quantile : 177.640022 ms ( 2.5%)
   Execution time upper quantile : 199.548233 ms (97.5%)
                   Overhead used : 1.733362 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 1 (1.6667 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 28.6500 % Variance is moderately inflated by outliers
benching new slurp
Evaluation count : 1380 in 60 samples of 23 calls.
             Execution time mean : 45.151513 ms
    Execution time std-deviation : 1.298921 ms
   Execution time lower quantile : 41.675855 ms ( 2.5%)
   Execution time upper quantile : 47.746682 ms (97.5%)
                   Overhead used : 1.733362 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 3 (5.0000 %)
	low-mild	 2 (3.3333 %)
 Variance from outliers : 15.7926 % Variance is moderately inflated by outliers
$ lein bench 20 /tmp/hello
writing test file
filename: /tmp/hello
file size: 20 bytes
benching old slurp
WARNING: Final GC required 1.413345976323001 % of runtime
Evaluation count : 2707140 in 60 samples of 45119 calls.
             Execution time mean : 22.617091 µs
    Execution time std-deviation : 409.896754 ns
   Execution time lower quantile : 21.777914 µs ( 2.5%)
   Execution time upper quantile : 23.316498 µs (97.5%)
                   Overhead used : 1.917000 ns
benching new slurp
Evaluation count : 2430420 in 60 samples of 40507 calls.
             Execution time mean : 25.023136 µs
    Execution time std-deviation : 476.636242 ns
   Execution time lower quantile : 23.910619 µs ( 2.5%)
   Execution time upper quantile : 25.666224 µs (97.5%)
                   Overhead used : 1.917000 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 1 (1.6667 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 7.8349 % Variance is slightly inflated by outliers
Hide
Michael Blume added a comment -

This is with an SSD, no idea what the numbers would look like with a spinny disk.

Show
Michael Blume added a comment - This is with an SSD, no idea what the numbers would look like with a spinny disk.
Rich Hickey made changes -
Fix Version/s Release 1.8 [ 10254 ]
Approval Triaged [ 10120 ] Vetted [ 10003 ]
Hide
Alex Miller added a comment -

Updated patch to contain ticket id in commit description and provide a few more lines of diff context. No actual changes in the new patch, attribution retained.

Show
Alex Miller added a comment - Updated patch to contain ticket id in commit description and provide a few more lines of diff context. No actual changes in the new patch, attribution retained.
Alex Miller made changes -
Attachment clj-668-2.patch [ 15199 ]
Description Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
...
"Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...
{noformat}

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* slurp-perf-patch.diff
*Screened by:* Alex Miller
Instead of copying each character from InputReader to StringBuffer.

Performance improvement:
{noformat}

Generate a 10meg file:
user> (spit "foo.txt" (apply str (repeat (* 1024 1024 10) "X")))

Test code:
user> (dotimes [x 100] (time (do (slurp "foo.txt") 0)))

From:
...
Elapsed time: 136.387 msecs"
"Elapsed time: 143.782 msecs"
"Elapsed time: 153.174 msecs"
"Elapsed time: 211.51 msecs"
"Elapsed time: 155.429 msecs"
"Elapsed time: 145.619 msecs"
"Elapsed time: 142.641 msecs"
...


To:
...
"Elapsed time: 23.408 msecs"
"Elapsed time: 25.876 msecs"
"Elapsed time: 41.449 msecs"
"Elapsed time: 28.292 msecs"
"Elapsed time: 25.765 msecs"
"Elapsed time: 24.339 msecs"
"Elapsed time: 32.047 msecs"
"Elapsed time: 23.372 msecs"
"Elapsed time: 24.365 msecs"
"Elapsed time: 26.265 msecs"
...
{noformat}

*Approach:* Use StringWriter and jio/copy vs character by character copy. Results from the current patch see a 4-5x perf boost after the jit warms up, with purely in-memory streams (ByteArrayInputStream over a 6MB string).

*Patch:* clj-668-2.patch
*Screened by:* Alex Miller
Approval Vetted [ 10003 ] Screened [ 10004 ]
Rich Hickey made changes -
Approval Screened [ 10004 ] Ok [ 10007 ]
Stuart Halloway made changes -
Status Open [ 1 ] Closed [ 6 ]
Resolution Completed [ 1 ]

People

Vote (0)
Watch (3)

Dates

  • Created:
    Updated:
    Resolved: