Clojure

try to catch using - instead of _ in filenames so the compiler can give a better error message for people who don't know that you need to use _ in file names

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: Release 1.7
  • Component/s: None
  • Patch:
    Code and Test
  • Approval:
    Incomplete

Description

Screener's Note: This works as advertised, but I have reservations about the approach. We could accept the patch as-is, or a much simpler patch that handles the only important (IMO) case: a-b-c to a_b_c – without generating and testing for unlikely errors like a-b_c. Please advise.

Problem: Clojure requires the files that back a namespace that has dashes in it to have the dashes replaced with underscores on the filesystem (ie a.b_c.clj for namespace a.b-c). If you require a file that has been mistakenly saved as b-c.clj instead, you will get an error message:

Exception in thread "main" java.io.FileNotFoundException: Could not locate a/b_c__init.class or a/b_c.clj on classpath:
	at clojure.lang.RT.load(RT.java:443)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5018.invoke(core.clj:5530)
	at clojure.core$load.doInvoke(core.clj:5529)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5336)
	at clojure.core$load_lib$fn__4967.invoke(core.clj:5375)
	at clojure.core$load_lib.doInvoke(core.clj:5374)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:619)
	at clojure.core$load_libs.doInvoke(core.clj:5413)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:619)
	at clojure.core$require.doInvoke(core.clj:5496)

Proposed:

  • When loading the resource-root of lib throws a FileNotFoundException, the lib is analyzed...
  • ... if the lib was a name that would be munged, it examines the combinatorial explosion of munge candidates and .clj or .class files in the classpath ...
  • ... if any of these candidates exist, it informs the user of the file's existence, and that a change to that filename would lead to that resource being loaded.
  • ... if none of these candidates exist, it throws the original exception.

It also modifies clojure.lang.RT to expose the behavior around finding clj or class files from a resource root.

Patch: better-error-messages-for-require.diff

Activity

Hide
Joshua Ballanco added a comment - - edited

A perhaps even better solution would be to simply allow the use of dashes in *.clj[s] filenames. I can't imagine the extra disk access per-namespace would be a huge performance burden, and (since dashes aren't allowed currently) I don't think there would be any issues with backwards compatibility.

Show
Joshua Ballanco added a comment - - edited A perhaps even better solution would be to simply allow the use of dashes in *.clj[s] filenames. I can't imagine the extra disk access per-namespace would be a huge performance burden, and (since dashes aren't allowed currently) I don't think there would be any issues with backwards compatibility.
Hide
Gary Fredericks added a comment -

It's worth mentioning the combinatorial explosion for namespaces with multiple dashes – if I (require 'foo-bar.baz-bang), should clojure search for all four possible filenames? Does the jvm have a way to search for files by regex or similar to avoid nasty degenerate cases (like (require 'foo-------------))?

Show
Gary Fredericks added a comment - It's worth mentioning the combinatorial explosion for namespaces with multiple dashes – if I (require 'foo-bar.baz-bang), should clojure search for all four possible filenames? Does the jvm have a way to search for files by regex or similar to avoid nasty degenerate cases (like (require 'foo-------------))?
Hide
Joshua Ballanco added a comment - - edited

According to the docs, the FileSystem class's "getPathMatcher" method accepts path globs, so you'd merely have to replace each instance of "-" or "_" with "{-,_}". Actual runtime characteristics would likely depend on the underlying filesystem's implementation.

Show
Joshua Ballanco added a comment - - edited According to the docs, the FileSystem class's "getPathMatcher" method accepts path globs, so you'd merely have to replace each instance of "-" or "_" with "{-,_}". Actual runtime characteristics would likely depend on the underlying filesystem's implementation.
Hide
Alex Miller added a comment -

I don't think the FileSystem stuff applies when looking up classes on the classpath. Note that Java class names cannot contain "-".

Show
Alex Miller added a comment - I don't think the FileSystem stuff applies when looking up classes on the classpath. Note that Java class names cannot contain "-".
Hide
Phil Hagelberg added a comment -

According to the spec, Java class names can't contain dashes (though IIRC OpenJDK and Oracle's JDK accept them anyway) but the requirement that Clojure source files have names which align with their AOT'd class file eqivalents is something we've imposed upon ourselves. Introducing the disconnect between .clj files and .class files makes way more sense than disconnecting namespaces and .clj files, but arguably it's too late to fix that mistake.

In any case a check for dashed files (resulting only in a more informative compiler error, not a more permissive compiler) which only triggers when a .clj file cannot be found imposes zero overhead in the case where things are already working.

Show
Phil Hagelberg added a comment - According to the spec, Java class names can't contain dashes (though IIRC OpenJDK and Oracle's JDK accept them anyway) but the requirement that Clojure source files have names which align with their AOT'd class file eqivalents is something we've imposed upon ourselves. Introducing the disconnect between .clj files and .class files makes way more sense than disconnecting namespaces and .clj files, but arguably it's too late to fix that mistake. In any case a check for dashed files (resulting only in a more informative compiler error, not a more permissive compiler) which only triggers when a .clj file cannot be found imposes zero overhead in the case where things are already working.
Hide
scott tudd added a comment -

As Clojure seems to be idiomatic to have sometimes-dashed-namespace-and-function-names as opposed to the ubiquitous camelCaseFunctionNames in java ... I agree to have the compiler automagically handle 'knowing' to look in dir_struct AND dir-struct for requisite files.

or at the least print out a nice message explaining the quirk when files "can't" be found ... WHEN there are dashes and underscores involved... anything to aid in helping things "just work" as one would think they're supposed to.

Show
scott tudd added a comment - As Clojure seems to be idiomatic to have sometimes-dashed-namespace-and-function-names as opposed to the ubiquitous camelCaseFunctionNames in java ... I agree to have the compiler automagically handle 'knowing' to look in dir_struct AND dir-struct for requisite files. or at the least print out a nice message explaining the quirk when files "can't" be found ... WHEN there are dashes and underscores involved... anything to aid in helping things "just work" as one would think they're supposed to.
Hide
Obadz added a comment -

I would have saved a few hours as well.

Show
Obadz added a comment - I would have saved a few hours as well.
Hide
Alexander Redington added a comment -

This patch changes clojure.core/load such that:

  • When loading the resource-root of lib throws a FileNotFoundException, the lib is analyzed...
  • ... if the lib was a name that would be munged, it examines the combinatorial explosion of munge candidates and .clj or .class files in the classpath ...
  • ... if any of these candidates exist, it informs the user of the file's existance, and that a change to that filename would lead to that resource being loaded.
  • ... if none of these candidates exist, it throws the original exception.

It also modifies clojure.lang.RT to expose the behavior around finding clj or class files from a resource root.

Show
Alexander Redington added a comment - This patch changes clojure.core/load such that:
  • When loading the resource-root of lib throws a FileNotFoundException, the lib is analyzed...
  • ... if the lib was a name that would be munged, it examines the combinatorial explosion of munge candidates and .clj or .class files in the classpath ...
  • ... if any of these candidates exist, it informs the user of the file's existance, and that a change to that filename would lead to that resource being loaded.
  • ... if none of these candidates exist, it throws the original exception.
It also modifies clojure.lang.RT to expose the behavior around finding clj or class files from a resource root.
Hide
Andy Fingerhut added a comment -

I do not know whether it handles all of the cases proposed in this discussion, but I encourage folks to check out the filename/namespace consistency checking in the latest Eastwood release (version 0.1.1) to see if it catches the cases they would hope to catch. It does a static check based on the files in a Leiningen project, nothing at run time. https://github.com/jonase/eastwood

Of course changes to Clojure itself to give warnings about such things can still be very useful, since not everyone will be using a 3rd party tool to check for such things.

Show
Andy Fingerhut added a comment - I do not know whether it handles all of the cases proposed in this discussion, but I encourage folks to check out the filename/namespace consistency checking in the latest Eastwood release (version 0.1.1) to see if it catches the cases they would hope to catch. It does a static check based on the files in a Leiningen project, nothing at run time. https://github.com/jonase/eastwood Of course changes to Clojure itself to give warnings about such things can still be very useful, since not everyone will be using a 3rd party tool to check for such things.
Hide
Alex Miller added a comment -

Re the screener's note at the top, my preference would be for the simpler approach.

Show
Alex Miller added a comment - Re the screener's note at the top, my preference would be for the simpler approach.
Hide
Rich Hickey added a comment -

I see no reason to fish around in the file system at all. Why can't the message simply remind people that underscores are required and to check that they aren't using dashes?

Show
Rich Hickey added a comment - I see no reason to fish around in the file system at all. Why can't the message simply remind people that underscores are required and to check that they aren't using dashes?
Hide
Gary Fredericks added a comment -

The tradeoff is that fishing around in the file system means the error message is only shown if the user likely made the relevant mistake, whereas simply showing the error message would (if I understand correctly) mean this reminder gets shown for every require error, which I would estimate happens a whole lot more often than the mistake in question.

Show
Gary Fredericks added a comment - The tradeoff is that fishing around in the file system means the error message is only shown if the user likely made the relevant mistake, whereas simply showing the error message would (if I understand correctly) mean this reminder gets shown for every require error, which I would estimate happens a whole lot more often than the mistake in question.
Hide
Andy Fingerhut added a comment -

Gary, there is an exception thrown in any case if the load fails. One approach that I am hacking up now is to add to the existing exception's message that maybe they need to replace - chars in file name with _, but only if the name attempted to be loaded has at least one '-' character in it. So no new output, except in an exception already being thrown, and then only if there is at least a possibility that this is the problem.

Show
Andy Fingerhut added a comment - Gary, there is an exception thrown in any case if the load fails. One approach that I am hacking up now is to add to the existing exception's message that maybe they need to replace - chars in file name with _, but only if the name attempted to be loaded has at least one '-' character in it. So no new output, except in an exception already being thrown, and then only if there is at least a possibility that this is the problem.
Hide
Andy Fingerhut added a comment -

clj-1297-v2.patch is similar to the previously proposed patch, but does not touch the file system in any way that wasn't already being done before this patch.

It adds an extra hint to the message of the exception already being thrown if the resource is not found, but only if there is a '-' character in the name that failed to load.

Show
Andy Fingerhut added a comment - clj-1297-v2.patch is similar to the previously proposed patch, but does not touch the file system in any way that wasn't already being done before this patch. It adds an extra hint to the message of the exception already being thrown if the resource is not found, but only if there is a '-' character in the name that failed to load.

People

Vote (10)
Watch (6)

Dates

  • Created:
    Updated: