<< Back to previous view

[CLJ-1026] Mixed end-of-line endings in the source code Created: 17/Jul/12  Updated: 31/Aug/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: John Szakmeister Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: None

Attachments: Text File 0001-Introduce-end-of-line-normalization.patch     File clj-1026-bash-script.sh    
Patch: Code
Approval: Triaged

 Description   

Some source files in Clojure have the DOS/Windows carriage return/line feed line endings, whereas the majority use the Linux/Unix line feed only line endings.

The main ongoing issue with this is that if there is a correctly formatted git patch for such a file, then using the recommended 'git am' command to apply such a patch always gives warnings. Some screeners see such warnings, and immediately disregard the patch, asking for the warnings to be fixed before they will consider it. As far as I can tell, this is impossible. Thus patches modifying files with CR/LF line endings are nearly in a deadlock state, requiring unusual effort from non-screeners to remind screeners that such patches are as good as they can be, and would they please ignore those warnings because there is no way around them.

If someone can find a way to apply git patches to such files without such warnings, please update this description or add a comment.

Stu Halloway asks: Wouldn't eliminating all CR/LF endings in Clojure source files invalidate many unapplied patches? My answer: Please do it anyway. Others will step in to fix those patches, if they become broken.

The attached file clj-1026-bash-script.sh does several things, which should be easy to verify by reading through the script itself, which is quite simple:

1. find all files with CR/LF endings, and remove the CR characters.
2. Does 'git status .' to show the files that have been modified
3. Does 'git diff .' and 'git diff -w .', showing the number of lines that have been modified, and the number of lines that have been modified in a way that changes something besides whitespace, which should be exactly 0 lines, because only whitespace has been changed.

Original description when this ticket was created:

While examining some of the Clojure source code, I discovered that some files had mixed line endings, or CRLF line endings on a non-Windows box. Using .gitattributes, we can correct that so that files have the right endings for the platform that it's on.



 Comments   
Comment by John Szakmeister [ 17/Jul/12 2:26 PM ]

Patch to fix line endings and introduce .gitattributes.

Comment by Stuart Halloway [ 20/Jul/12 4:47 PM ]

This looks like a change to every line in the world, which makes it hard to vet. Also: will it render incompatible all other outstanding patches at the time it is applied?

Comment by John Szakmeister [ 21/Jul/12 6:52 AM ]

You can use git diff -w $(git merge-base HEAD master) to see the actual diff minus the line ending change. Here it is inline:

:: git diff -w $(git merge-base HEAD master)
diff --git a/.gitattributes b/.gitattributes
new file mode 100644
index 0000000..7b89cfa
--- /dev/null
+++ b/.gitattributes
@@ -0,0 +1,6 @@
+*.txt           text
+*.clj           text
+*.html          text
+*.js            text
+*.css           text
+*.java          text diff=java

Also, no, it won't render all the outstanding patches incompatible. For one, it seems like the files that have the eols mixed or in CRLF aren't touched much. I also went through the majority of outstanding patches and couldn't fix one that conflicts. Secondly, format-patch emits the patch inline and is intended to be sent via email. SMTP says that all lines must end with CRLF, so line endings are effectively lost. So git will convert it to the right line ending on application.

It can conflict with any outstanding branches that you may have. Supplying the renormalization option on merge can help:

git merge -X renormalize <branch-name>

Or, you can enable this by default for your repository:

git config --local merge.renormalize true

If you think it's too much trouble, let's just drop it though.

Comment by Stuart Halloway [ 10/Aug/12 1:15 PM ]

Patch does not apply on my working copy of Clojure. I am using

git am -s ...
/Users/stu/repos/clojure/.git/rebase-apply/patch:344: trailing whitespace.
  p {  	
/Users/stu/repos/clojure/.git/rebase-apply/patch:350: space before tab in indent.
  	margin-left: 0.5in;
error: patch failed: epl-v10.html:1
error: epl-v10.html: patch does not apply
error: patch failed: src/jvm/clojure/asm/AnnotationVisitor.java:1
error: src/jvm/clojure/asm/AnnotationVisitor.java: patch does not apply
error: patch failed: src/jvm/clojure/asm/AnnotationWriter.java:1
error: src/jvm/clojure/asm/AnnotationWriter.java: patch does not apply

I am willing to do this, just inept.

Comment by Andy Fingerhut [ 10/Aug/12 1:21 PM ]

Stuart, I updated this page http://dev.clojure.org/display/design/JIRA+workflow a while back when I had trouble applying some patches involving files with carriage return line endings. I did some Googling on git docs and found the option '--keep-cr' that seems to help in such cases. Try that out.

Comment by Alex Miller [ 04/Sep/13 12:36 PM ]

I think in this case, we should not provide a patch but a process. And I would like to see this cleaned up.

Comment by Andy Fingerhut [ 04/Sep/13 1:14 PM ]

I am not sure what kind of process you mean, Alex, but here are some thoughts.

There are some patches attached to tickets right now that modify Java source files that have CR/LF line endings throughout. Typically those patches remove some lines, and add new ones, and those new lines typically have CR/LF line endings, too, because whatever editor the person used while making the patch saw the file have CR/LF line endings, and preserved that on new lines of text they added to the file.

So if you want to get rid of CRs throughout the Clojure source code, and keep them out, one way would be to:

(1) do a big commit to get rid of all of the current CR characters in source files.

(2) update the few pending patches that introduce lines with CRs (I checked. There are only 15 tickets with such patches right now.)

(3) I can automate a process of detecting patches that have CRs and flag them for CR removal.

In fact (2) can be done as a result of (3).

Comment by Alex Miller [ 04/Sep/13 2:29 PM ]

I just mean that any patch for this ticket is doomed to be continuously broken until it is applied, so it would be better to instead have a set of steps that someone can follow to fix all line endings, than a patch. That is #1 on your list should be a set of steps. #2 and 3 would be nice as well.

Comment by Andy Fingerhut [ 04/Sep/13 7:48 PM ]

clj-1026-bash-script.sh is not a patch, but a bash script that can be run in an unmodified Clojure source tree, preferably freshly cloned, that will delete carriage returns characters from all files except those in the .git directory, and directories.

It also prints out the results of a few commands to help you gain confidence that those are the only changes it made.

Tested on Mac OS X 10.8.4 and Ubuntu Linux 12.04.3.

Comment by Andy Fingerhut [ 31/Aug/14 3:32 PM ]

Shell script clj-1026-bash-script.sh has updated 'comments' printed with the latest stats of number of files and lines containing carriage return characters as of Aug 31 2014.

Generated at Fri Dec 19 06:26:01 CST 2014 using JIRA 4.4#649-r158309.