ClojureScript

Look for goog-style JavaScript dependencies on the classpath automatically

Details

  • Type: Enhancement Enhancement
  • Status: In Progress In Progress
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code

Description

Right now, only actual goog.* JavaScript dependencies (as shipped w/ GClosure or in its "third-party" library and specified in its deps.js file) are available via (:require ...), etc.; other dependencies that contain goog.provide "declarations" can only ever be loaded via :libs and the classpath scanning (or direct filesystem reference) it provides.

This should change so that any referenced Closure-safe JavaScript dependencies may be resolved from their expected location on the classpath. e.g. (:require foo.bar) would resolve to a foo/bar.js resource on the classpath, which presumably would be required to contain a goog.provide("foo.bar") declaration.

With this change, :libs would be downgraded to support only direct references to filesystem files. (Q: should the option be renamed to reflect this, and to provide an easy configuration validation check for people that are using :libs as it exists now?)

This would be a breaking change to downstream tooling.

(The above is a summary of #clojure discussion with David Nolen.)

Activity

Hide
Chas Emerick added a comment -

This would eliminate the need for CLJS-526

Show
Chas Emerick added a comment - This would eliminate the need for CLJS-526
Hide
Michael Alyn Miller added a comment -

I would really like to see this change. I am building a ClojureScript wrapper around a JavaScript library (which I GClosure-ified) and the only real way to do this at the moment is with your {{:libs [""]}} trick. That works, but it also pollutes the project.clj of every consumer of the library.

Show
Michael Alyn Miller added a comment - I would really like to see this change. I am building a ClojureScript wrapper around a JavaScript library (which I GClosure-ified) and the only real way to do this at the moment is with your {{:libs [""]}} trick. That works, but it also pollutes the project.clj of every consumer of the library.
Hide
David Nolen added a comment - - edited

Related information, it appears I was wrong about GClosure following Java classpath convention for layout. In order to resolve the location of Google Closure Library files, closure.clj parses deps.js in the Google Closure Library JAR.

Still I think we should move forward with the approach outlined here.

Show
David Nolen added a comment - - edited Related information, it appears I was wrong about GClosure following Java classpath convention for layout. In order to resolve the location of Google Closure Library files, closure.clj parses deps.js in the Google Closure Library JAR. Still I think we should move forward with the approach outlined here.
Hide
Chas Emerick added a comment -

My current understanding is that deps.js is an optional delivery mechanism, and perhaps is relatively unused outside of GClosure itself. I still need to dig into the GClosure packaging guidelines, probably won't happen pre-Conj; until then, I can't add much except that I also much prefer the current proposal than attempt to find, consume, and produce these deps.js files. It's not as if people are banging down the door for GClosure-packaged library compat, etc.

Show
Chas Emerick added a comment - My current understanding is that deps.js is an optional delivery mechanism, and perhaps is relatively unused outside of GClosure itself. I still need to dig into the GClosure packaging guidelines, probably won't happen pre-Conj; until then, I can't add much except that I also much prefer the current proposal than attempt to find, consume, and produce these deps.js files. It's not as if people are banging down the door for GClosure-packaged library compat, etc.
Hide
David Nolen added a comment -

Pretty much summarizes my thoughts as well.

Show
David Nolen added a comment - Pretty much summarizes my thoughts as well.
Hide
Brenton Ashworth added a comment -

In my opinion this is a good change. What follows is some context that I have on the purpose of the :libs option. I hope this information is both accurate and helpful.

At the birth of ClojureScript it was not clear how much we would depend on the Java classpath. I think it has become clear that the compiler is typically used as a library which has access to the project's classpath. We should embrace this and make it more friendly to use in this environment.

In the first version of ClojureScript, the compiler could compile a file without resolving dependencies. Problems with deps where either found at run time or when files are handed to the GClosure compiler for compilation. This may no longer be true. I am ashamed to say that I haven't looked at the source in a while. It is important to remember that GClosure deals only with JavaScript sources. Once all ClojureScript sources have been compiled to JavaScript, a set of files is handed to the GClosure compiler for optimization. The purpose of the `:libs` option is to allow any other JavaScript sources which are GClosure compatible, and upon which the sources depend, to be found and handed to the GClosure compiler.

In the beginning, we added GClosure support (dealing with JavaScript files) on top of the CLJS compiler. This change would allow the CLJS compiler to contribute useful information to the GClosure compilation process and remove the need to specify known information elsewhere.

Show
Brenton Ashworth added a comment - In my opinion this is a good change. What follows is some context that I have on the purpose of the :libs option. I hope this information is both accurate and helpful. At the birth of ClojureScript it was not clear how much we would depend on the Java classpath. I think it has become clear that the compiler is typically used as a library which has access to the project's classpath. We should embrace this and make it more friendly to use in this environment. In the first version of ClojureScript, the compiler could compile a file without resolving dependencies. Problems with deps where either found at run time or when files are handed to the GClosure compiler for compilation. This may no longer be true. I am ashamed to say that I haven't looked at the source in a while. It is important to remember that GClosure deals only with JavaScript sources. Once all ClojureScript sources have been compiled to JavaScript, a set of files is handed to the GClosure compiler for optimization. The purpose of the `:libs` option is to allow any other JavaScript sources which are GClosure compatible, and upon which the sources depend, to be found and handed to the GClosure compiler. In the beginning, we added GClosure support (dealing with JavaScript files) on top of the CLJS compiler. This change would allow the CLJS compiler to contribute useful information to the GClosure compilation process and remove the need to specify known information elsewhere.
Hide
Chas Emerick added a comment - - edited

Thanks for the background, Brenton. Assuming you'll be at the Conj, I may have a couple other more minor questions for you.

I think the results from the (still ongoing) community survey will further support the use of the classpath as a reasonable option easily available from the tools people are actually using.

I've dug around the GClosure library docs w.r.t. their build tools, and they seem uniformly directed at people building applications; outside of supporting application use cases, I don't see any discussion of distributing the artifacts of e.g. closurebuilder.py & co. as libraries. A (hardly comprehensive) search of libraries distributed with deps.js files yielded nothing; in fact, the only mentions of deps.js that I can find on github are in .gitignore s and HTML pages of app repos, and in files related to build processes that are reading deps.js shipped with GClosure and its third-party-library.

So, I think we can safely not think about deps.js again, except for the special case of the upstream GClosure libs, which is already taken care of.

One piece of "prior art" of which I was not fully cognizant until yesterday (!!) is this:

https://github.com/emezeske/lein-cljsbuild/issues/143

Essentially, cljsbuild implicitly sets up e.g. /closure-js/libs as a classpath :libs root, same for /closure-js/externs. I've not tested this myself yet, but others apparently have had success. Seems like a very reasonable approach, absent something built-in to the compiler. Externs are certainly out of scope here, but some discussion of what can be done to similarly ease their use is warranted; perhaps adopting a variant of cljsbuild's approach…

Show
Chas Emerick added a comment - - edited Thanks for the background, Brenton. Assuming you'll be at the Conj, I may have a couple other more minor questions for you. I think the results from the (still ongoing) community survey will further support the use of the classpath as a reasonable option easily available from the tools people are actually using. I've dug around the GClosure library docs w.r.t. their build tools, and they seem uniformly directed at people building applications; outside of supporting application use cases, I don't see any discussion of distributing the artifacts of e.g. closurebuilder.py & co. as libraries. A (hardly comprehensive) search of libraries distributed with deps.js files yielded nothing; in fact, the only mentions of deps.js that I can find on github are in .gitignore s and HTML pages of app repos, and in files related to build processes that are reading deps.js shipped with GClosure and its third-party-library. So, I think we can safely not think about deps.js again, except for the special case of the upstream GClosure libs, which is already taken care of. One piece of "prior art" of which I was not fully cognizant until yesterday (!!) is this: https://github.com/emezeske/lein-cljsbuild/issues/143 Essentially, cljsbuild implicitly sets up e.g. /closure-js/libs as a classpath :libs root, same for /closure-js/externs. I've not tested this myself yet, but others apparently have had success. Seems like a very reasonable approach, absent something built-in to the compiler. Externs are certainly out of scope here, but some discussion of what can be done to similarly ease their use is warranted; perhaps adopting a variant of cljsbuild's approach…
Chas Emerick made changes -
Field Original Value New Value
Assignee Chas Emerick [ cemerick ]
Hide
Chas Emerick added a comment -

I've started working on this in earnest. It's turning out to be more complicated and delicate than I expected. I see no reason why it won't be resolved successfully as discussed previously, but I thought I'd show my face here so people know that progress is being made.

Show
Chas Emerick added a comment - I've started working on this in earnest. It's turning out to be more complicated and delicate than I expected. I see no reason why it won't be resolved successfully as discussed previously, but I thought I'd show my face here so people know that progress is being made.
Chas Emerick made changes -
Status Open [ 1 ] In Progress [ 3 ]
Hide
Thomas Heller added a comment -

I implemented a different approach to this problem in https://github.com/thheller/shadow-build. It has been working well for me so let me explain what I do.

1) The user specifies all source paths
https://github.com/thheller/shadow-build/blob/master/dev/build.clj#L53-55

Each source path is scanned for .cljs and .js files. JS files are just scanned for goog.provide/require statements via regexp, CLJS files assume the NS form is first and is scanned as such to extract the require/provides. No further analysis is done.

2) Out of all this information a dependency graph is built (plus a lookup index, eg. where can I find goog.string).

3) The user specifies his "main" namespaces. A main is simply a namespace that either exports functions for outside use or runs code on its own behalf. Those main namespaces are sorted in topological order so we have the entire graph needed for the compilation. If a namespace is not found required but not provided I do not attempt to build anything but throw an exception.

4) All files are "compiled" in reverse order. Since only CLJS requires compilation, JS files are skipped. CLJS compilation is also a lot "safer" than cljs.closure since dependencies are "guaranteed" to exist (eg. can't compiled a file we don't have all requires for, which may or may not be ideal).

5) Optionally everything may be grouped into modules and either flushed unoptimized and fed to the closure optimizer and then output to disk.

Although I must admit I never used the ```:libs``` directive I'm pretty sure its not needed by using this approach, just put it into a "source path" and let the regexp scan it. As long as it has goog.provide/require its good to go. You can even make it a "main" by just specifying its goog.provide, not required to make a CLJS main.

I've been pretty busy lately and shadow-build is missing some cleanup I want to do but its working very well for my project (> 10k CLJS LOC).

Maybe some of the work I did could help here.

Show
Thomas Heller added a comment - I implemented a different approach to this problem in https://github.com/thheller/shadow-build. It has been working well for me so let me explain what I do. 1) The user specifies all source paths https://github.com/thheller/shadow-build/blob/master/dev/build.clj#L53-55 Each source path is scanned for .cljs and .js files. JS files are just scanned for goog.provide/require statements via regexp, CLJS files assume the NS form is first and is scanned as such to extract the require/provides. No further analysis is done. 2) Out of all this information a dependency graph is built (plus a lookup index, eg. where can I find goog.string). 3) The user specifies his "main" namespaces. A main is simply a namespace that either exports functions for outside use or runs code on its own behalf. Those main namespaces are sorted in topological order so we have the entire graph needed for the compilation. If a namespace is not found required but not provided I do not attempt to build anything but throw an exception. 4) All files are "compiled" in reverse order. Since only CLJS requires compilation, JS files are skipped. CLJS compilation is also a lot "safer" than cljs.closure since dependencies are "guaranteed" to exist (eg. can't compiled a file we don't have all requires for, which may or may not be ideal). 5) Optionally everything may be grouped into modules and either flushed unoptimized and fed to the closure optimizer and then output to disk. Although I must admit I never used the ```:libs``` directive I'm pretty sure its not needed by using this approach, just put it into a "source path" and let the regexp scan it. As long as it has goog.provide/require its good to go. You can even make it a "main" by just specifying its goog.provide, not required to make a CLJS main. I've been pretty busy lately and shadow-build is missing some cleanup I want to do but its working very well for my project (> 10k CLJS LOC). Maybe some of the work I did could help here.
Hide
Chas Emerick added a comment -

Thanks, Thomas. I haven't looked at shadow-build much before, but I've stuck a pin in it for further review. The biggest complication of this change is not the primary objective, but my wanting to accomplish it with minimal impact on cljs.closure. There are many things that I would like to change there, but absent an effective set of tests, any simplifying refactor is fairly risky. The first step to accomplishing that would probably be to invert the existing dataflow to produce a (testable) build plan instead of actually building anything. But again, work for another day.

Anyway, I think I have things pretty well settled out locally, though there's some policy decisions to be made. (Putting those in a separate comment.) My biggest hangup was stepping on some helper functions that were being used by the externs stuff in ways I wasn't aware.

Show
Chas Emerick added a comment - Thanks, Thomas. I haven't looked at shadow-build much before, but I've stuck a pin in it for further review. The biggest complication of this change is not the primary objective, but my wanting to accomplish it with minimal impact on cljs.closure. There are many things that I would like to change there, but absent an effective set of tests, any simplifying refactor is fairly risky. The first step to accomplishing that would probably be to invert the existing dataflow to produce a (testable) build plan instead of actually building anything. But again, work for another day. Anyway, I think I have things pretty well settled out locally, though there's some policy decisions to be made. (Putting those in a separate comment.) My biggest hangup was stepping on some helper functions that were being used by the externs stuff in ways I wasn't aware.
Hide
Chas Emerick added a comment -

Progress so far on this is available @ https://github.com/cemerick/clojurescript/tree/CLJS-656

Changes on that branch:

  • goog-style JavaScript libraries are no longer resolved via classpath scanning (though it is still used for loading externs and such)
  • I've replaced the system-property-based classpath discovery with one that looks at the actual classloaders in use; this should avoid all sorts of confusing/incorrect behaviour when the compiler is being used in environments where the effective classpath is dynamic.
  • JavaScript dependencies are resolved in one of two ways:
  • resolution through the goog deps.js, the consumption of which remains untouched (though it looks like the classpath-scanning mechanism yields effectively equivalent results as reading deps.js, so we could probably eliminate goog-dependencies entirely in the future…)
  • direct lookup on the classpath based on the "namespace" being :require}}d; e.g. {{(:require foo.bar) will look for a foo/bar.js resource on the classpath, which must contain a corresponding goog.provide("foo.bar") declaration

This all works nicely with the couple of libraries I have GClosure-ified (pprng is the only public instance of that at the moment).

As noted in a comment in the WIP change, a problem with the direct-lookup policy is that goog-style JS dependencies that aren't part of the "official" GClosure library can provide one and only one name, the one corresponding to their position on the classpath. Not a problem from my perspective, but it does cut off the practice of libraries providing type ctors or pools of constants directly, e.g. goog.string.Unicode. This is really just a policy question:

Q1: Should ClojureScript support requiring JavaScript namespaces that don't correspond with the naming/location of their source files?

IMO, the answer to that should be 'yes'; saying 'no' means that we either need to prevent such requires w.r.t. official GClosure library, or that that collection of libraries is effectively a special case from the user's perspective. 'Yes' dumps us back to doing classpath scanning to discover goog-style JS dependencies; not a problem IMO (it would make the solution to this issue effectively equivalent to CLJS-526), but something that David said in the IRC conversation preceding this issue's creation should be eliminated. I think that secondary objective preceded the understanding that GClosure lib doesn't use Java-style file layout itself.

Show
Chas Emerick added a comment - Progress so far on this is available @ https://github.com/cemerick/clojurescript/tree/CLJS-656 Changes on that branch:
  • goog-style JavaScript libraries are no longer resolved via classpath scanning (though it is still used for loading externs and such)
  • I've replaced the system-property-based classpath discovery with one that looks at the actual classloaders in use; this should avoid all sorts of confusing/incorrect behaviour when the compiler is being used in environments where the effective classpath is dynamic.
  • JavaScript dependencies are resolved in one of two ways:
  • resolution through the goog deps.js, the consumption of which remains untouched (though it looks like the classpath-scanning mechanism yields effectively equivalent results as reading deps.js, so we could probably eliminate goog-dependencies entirely in the future…)
  • direct lookup on the classpath based on the "namespace" being :require}}d; e.g. {{(:require foo.bar) will look for a foo/bar.js resource on the classpath, which must contain a corresponding goog.provide("foo.bar") declaration
This all works nicely with the couple of libraries I have GClosure-ified (pprng is the only public instance of that at the moment). As noted in a comment in the WIP change, a problem with the direct-lookup policy is that goog-style JS dependencies that aren't part of the "official" GClosure library can provide one and only one name, the one corresponding to their position on the classpath. Not a problem from my perspective, but it does cut off the practice of libraries providing type ctors or pools of constants directly, e.g. goog.string.Unicode. This is really just a policy question: Q1: Should ClojureScript support requiring JavaScript namespaces that don't correspond with the naming/location of their source files? IMO, the answer to that should be 'yes'; saying 'no' means that we either need to prevent such requires w.r.t. official GClosure library, or that that collection of libraries is effectively a special case from the user's perspective. 'Yes' dumps us back to doing classpath scanning to discover goog-style JS dependencies; not a problem IMO (it would make the solution to this issue effectively equivalent to CLJS-526), but something that David said in the IRC conversation preceding this issue's creation should be eliminated. I think that secondary objective preceded the understanding that GClosure lib doesn't use Java-style file layout itself.
Hide
David Nolen added a comment -

Just a heads up that this will likely conflict with CLJS-615, which I would like to land first.

Show
David Nolen added a comment - Just a heads up that this will likely conflict with CLJS-615, which I would like to land first.
Hide
Chas Emerick added a comment -

That's fine, this patch will be small either way and not difficult to rebase.

Show
Chas Emerick added a comment - That's fine, this patch will be small either way and not difficult to rebase.
Hide
Thomas Heller added a comment -

IMHO the direct lookup is not very useful. Discovering/scanning all JS files it not that expensive and its quite common in Closure to have many provides per file. Not a big fan of "special cases" either, so I think every JS file should be treated equal regardless of Closure-Library or not.

The real problem is that cljs.closure/build is not a very flexible interface and calling it repeatedly (lein cljsbuild auto) causes problems because all of the sudden cljs.closure has to take care of caching, reloading, etc. IMHO something that should probably be left to tooling and not pollute cljs.analyzer/compiler.

Show
Thomas Heller added a comment - IMHO the direct lookup is not very useful. Discovering/scanning all JS files it not that expensive and its quite common in Closure to have many provides per file. Not a big fan of "special cases" either, so I think every JS file should be treated equal regardless of Closure-Library or not. The real problem is that cljs.closure/build is not a very flexible interface and calling it repeatedly (lein cljsbuild auto) causes problems because all of the sudden cljs.closure has to take care of caching, reloading, etc. IMHO something that should probably be left to tooling and not pollute cljs.analyzer/compiler.
Hide
Chas Emerick added a comment - - edited

I agree; I don't see any way (or really, good reason) to avoid the classpath scanning without diverging in important ways from user-dev expectations re: how the goog provide/require system works. Now that CLJS-615 has landed, I'll tweak up my changes to reflect that, and apply cleanly to master in a proper patch.

Show
Chas Emerick added a comment - - edited I agree; I don't see any way (or really, good reason) to avoid the classpath scanning without diverging in important ways from user-dev expectations re: how the goog provide/require system works. Now that CLJS-615 has landed, I'll tweak up my changes to reflect that, and apply cleanly to master in a proper patch.
Hide
Chas Emerick added a comment -

Patch attached. It does all that I described above, but does not constrain the location of goog-style JavaScript dependencies within the classpath; the goog.provide declarations in such libraries is the sole constraint on their visibility from user-dev code.

FWIW, the classpath scanning takes ~600ms on my machine the first time, and ~7ms after that (resource lookup from the classpath is pretty heavily cached in the JVM), so I don't think we need to worry about it being a problem in compiler performance.

Show
Chas Emerick added a comment - Patch attached. It does all that I described above, but does not constrain the location of goog-style JavaScript dependencies within the classpath; the goog.provide declarations in such libraries is the sole constraint on their visibility from user-dev code. FWIW, the classpath scanning takes ~600ms on my machine the first time, and ~7ms after that (resource lookup from the classpath is pretty heavily cached in the JVM), so I don't think we need to worry about it being a problem in compiler performance.
Chas Emerick made changes -
Attachment CLJS-656.diff [ 12823 ]
Chas Emerick made changes -
Patch Code [ 10001 ]
Hide
Thomas Heller added a comment -

FWIW I use reducers to bring that 600ms to around 200ms on my machine, should be possible to make that even faster since right now I (and you I presume) open JAR files ONCE to extract all names and then ONCE per URL. Would probably be faster to open/scan once in total.

But like you said, since its done on init it probably is no problem.

Show
Thomas Heller added a comment - FWIW I use reducers to bring that 600ms to around 200ms on my machine, should be possible to make that even faster since right now I (and you I presume) open JAR files ONCE to extract all names and then ONCE per URL. Would probably be faster to open/scan once in total. But like you said, since its done on init it probably is no problem.
Hide
Chas Emerick added a comment -

No, we touch the jars each time. You're right that it could be sped up for the first compile very obviously (I just try to keep my patches as slim as possible, so I generally don't touch what I don't need to). I'll see about it in addressing CLJS-770.

Show
Chas Emerick added a comment - No, we touch the jars each time. You're right that it could be sped up for the first compile very obviously (I just try to keep my patches as slim as possible, so I generally don't touch what I don't need to). I'll see about it in addressing CLJS-770.
Chas Emerick made changes -
Attachment CLJS-656.diff [ 12823 ]
Hide
Chas Emerick added a comment -

New patch attached that applies cleanly, now that CLJS-770 / cljs.js-deps is merged.

Show
Chas Emerick added a comment - New patch attached that applies cleanly, now that CLJS-770 / cljs.js-deps is merged.
Chas Emerick made changes -
Attachment CLJS-656.diff [ 12845 ]
Chas Emerick made changes -
Attachment CLJS-656.diff [ 12845 ]
Chas Emerick made changes -
Attachment CLJS-656.diff [ 12846 ]
Hide
David Nolen added a comment -

After some thought I still think we should special case Google Closure Library. Everything else must use Java classpath conventions. There's just not enough incentive to support Google Closure idiosyncracies when we have simpler and more sensible conventions in ClojureScript itself. The universe of libraries that follow Google Closure idiosyncrasies is not exactly large, and the ClojureScript ecosystem is evolving at a quick clip. Integration of large non-Closure compatible libraries like React is already straightforward due to :preamble. Let's wrap this one up and move on.

Show
David Nolen added a comment - After some thought I still think we should special case Google Closure Library. Everything else must use Java classpath conventions. There's just not enough incentive to support Google Closure idiosyncracies when we have simpler and more sensible conventions in ClojureScript itself. The universe of libraries that follow Google Closure idiosyncrasies is not exactly large, and the ClojureScript ecosystem is evolving at a quick clip. Integration of large non-Closure compatible libraries like React is already straightforward due to :preamble. Let's wrap this one up and move on.
Hide
Chas Emerick added a comment -

Update forthcoming shortly.

Show
Chas Emerick added a comment - Update forthcoming shortly.
Chas Emerick made changes -
Attachment CLJS-656.diff [ 12846 ]
Hide
Chas Emerick added a comment -

New patch attached. This one requires that any non-extern libraries loaded via the classpath contain a goog.provide declaration that corresponds to its relative location on the classpath and to the name used for that library in ClojureScript (e.g. in ns forms).

:libs continues to exist, though is now limited to pulling in libraries from the filesystem, and enforces no constraints on goog.provide declarations or relative position on the filesystem.

Show
Chas Emerick added a comment - New patch attached. This one requires that any non-extern libraries loaded via the classpath contain a goog.provide declaration that corresponds to its relative location on the classpath and to the name used for that library in ClojureScript (e.g. in ns forms). :libs continues to exist, though is now limited to pulling in libraries from the filesystem, and enforces no constraints on goog.provide declarations or relative position on the filesystem.
Chas Emerick made changes -
Attachment CLJS-656.diff [ 12882 ]

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated: