Improve writeClassFile performance

Description

When writing class files to disk, Clojure currently calls flush on the FileOutputStream used, and sync on the underlying FileDescriptor for each class file it writes. This ticket proposes to remove the sync call as it provides little value, but has a detrimental effect on performance on some operating systems.

Background

The call to sync in the current implementation of Compiler.writeClassFile was added in SVN-1252 to address an issue where compile allegedly returned but changes to class files weren't visible[1]. The original report lacks detail and it's hard to discern what the actual issue was.

Calling FileDescriptor.sync tells the operating system to flush any buffered data to disk (to its best capability). This means that in the event of a system failure, the kernel buffers will not hold any data that hasn't been persisted to disk, and thus no data would be lost.

However, sync does not provide any special guarantees around visbility. Any data written using to a file descriptor that is subsequently closed is already guaranteed to be visible to all other processes, as soon as the close call returns. When not using sync, the data returned may reside only in kernel buffers, but the reading process will see no difference.

Syncing is an expensive operation and doing so after each class file is written can make compilation significantly slower.

Laurent Petit (who made the original request for this) reports that it would not make any difference for him now in CCW (https://groups.google.com/d/msg/clojure-dev/Vz9h8hqAvFk/cQvip5j_zoQJ).

Proposed solution

Remove the call to FileDescriptor.sync when writing class files to disk.

Latest patch: http://dev.clojure.org/jira/secure/attachment/14214/clj-703-remove-sync-in-write-class-file.patch

Performance implications

The performance gains of removing the sync call will vary depending on OS and file system. Some operating systems provide strong guarantees that your data will actually end up on the disk, others less so. See for example differences between fsync in Linux[2] and OSX[3].

Preliminary tests on Linux have shown compile times going from:

  • aleph.http: 13s -> 4s

  • incanter.core: 14s -> 4s

I've created a test for this and have asked the community for help testing and gathering data: http://goo.gl/forms/0b8SVt2pyN

Will update this ticket once more data is available.

Tradeoffs

In short, this change trades some degree of on-disk consistency for compilation speed. If a fault (process, OS, hardware) occurs whilst compiling, this patch may result in class files not being properly written to disk. This however, is still the case today, depending on the type of fault, the OS you're using and the fsync guarantees provided. If a compilation fails with a fatal error, it is unreasonable to expect that the output is in a consistent state.

[1]: https://groups.google.com/forum/#!topic/clojure/C-DEWzpPPUY
[2]: "Note that while fsync() will flush all data from the host to the drive (i.e. the "permanent storage device"), the drive itself may not physically write the data to the platters for quite some time and it may be written in an out-of-order sequence.": https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fsync.2.html
[3]: "[...] includes writing through or flushing a disk cache if present." http://man7.org/linux/man-pages/man2/fsync.2.html[2]:

Environment

None

Attachments

6

Activity

Show:

Ragnar Dahlén May 31, 2015 at 9:40 AM

Alex - Done

Alex Miller May 31, 2015 at 2:49 AM

Ragnar - much better - thank you! Could you also add a pointer to the patch for consideration?

Ragnar Dahlén May 31, 2015 at 12:25 AM

Alex - Updated description, please let me know if I can improve it further.

Ragnar Dahlén May 31, 2015 at 12:24 AM

Updated for RC1

Alex Miller May 17, 2015 at 5:02 PM

Andy - I will probably do so soon but I thought previous commenters would see this now.

Ragnar - absolutely!

Completed

Assignee

Reporter

Approval

Patch

Priority

Affects versions

Fix versions

Created January 4, 2011 at 6:58 PM
Updated July 18, 2015 at 1:20 AM
Resolved July 18, 2015 at 1:20 AM