<< Back to previous view

[CLJS-656] Look for goog-style JavaScript dependencies on the classpath automatically Created: 01/Nov/13  Updated: 17/Mar/14

Status: In Progress
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Chas Emerick Assignee: Chas Emerick
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File CLJS-656.diff    
Patch: Code


Right now, only actual goog.* JavaScript dependencies (as shipped w/ GClosure or in its "third-party" library and specified in its deps.js file) are available via (:require ...), etc.; other dependencies that contain goog.provide "declarations" can only ever be loaded via :libs and the classpath scanning (or direct filesystem reference) it provides.

This should change so that any referenced Closure-safe JavaScript dependencies may be resolved from their expected location on the classpath. e.g. (:require foo.bar) would resolve to a foo/bar.js resource on the classpath, which presumably would be required to contain a goog.provide("foo.bar") declaration.

With this change, :libs would be downgraded to support only direct references to filesystem files. (Q: should the option be renamed to reflect this, and to provide an easy configuration validation check for people that are using :libs as it exists now?)

This would be a breaking change to downstream tooling.

(The above is a summary of #clojure discussion with David Nolen.)

Comment by Chas Emerick [ 01/Nov/13 3:06 PM ]

This would eliminate the need for CLJS-526

Comment by Michael Alyn Miller [ 01/Nov/13 11:27 PM ]

I would really like to see this change. I am building a ClojureScript wrapper around a JavaScript library (which I GClosure-ified) and the only real way to do this at the moment is with your {{:libs [""]}} trick. That works, but it also pollutes the project.clj of every consumer of the library.

Comment by David Nolen [ 08/Nov/13 12:26 PM ]

Related information, it appears I was wrong about GClosure following Java classpath convention for layout. In order to resolve the location of Google Closure Library files, closure.clj parses deps.js in the Google Closure Library JAR.

Still I think we should move forward with the approach outlined here.

Comment by Chas Emerick [ 08/Nov/13 2:25 PM ]

My current understanding is that deps.js is an optional delivery mechanism, and perhaps is relatively unused outside of GClosure itself. I still need to dig into the GClosure packaging guidelines, probably won't happen pre-Conj; until then, I can't add much except that I also much prefer the current proposal than attempt to find, consume, and produce these deps.js files. It's not as if people are banging down the door for GClosure-packaged library compat, etc.

Comment by David Nolen [ 08/Nov/13 2:37 PM ]

Pretty much summarizes my thoughts as well.

Comment by Brenton Ashworth [ 11/Nov/13 9:35 AM ]

In my opinion this is a good change. What follows is some context that I have on the purpose of the :libs option. I hope this information is both accurate and helpful.

At the birth of ClojureScript it was not clear how much we would depend on the Java classpath. I think it has become clear that the compiler is typically used as a library which has access to the project's classpath. We should embrace this and make it more friendly to use in this environment.

In the first version of ClojureScript, the compiler could compile a file without resolving dependencies. Problems with deps where either found at run time or when files are handed to the GClosure compiler for compilation. This may no longer be true. I am ashamed to say that I haven't looked at the source in a while. It is important to remember that GClosure deals only with JavaScript sources. Once all ClojureScript sources have been compiled to JavaScript, a set of files is handed to the GClosure compiler for optimization. The purpose of the `:libs` option is to allow any other JavaScript sources which are GClosure compatible, and upon which the sources depend, to be found and handed to the GClosure compiler.

In the beginning, we added GClosure support (dealing with JavaScript files) on top of the CLJS compiler. This change would allow the CLJS compiler to contribute useful information to the GClosure compilation process and remove the need to specify known information elsewhere.

Comment by Chas Emerick [ 12/Nov/13 5:11 AM ]

Thanks for the background, Brenton. Assuming you'll be at the Conj, I may have a couple other more minor questions for you.

I think the results from the (still ongoing) community survey will further support the use of the classpath as a reasonable option easily available from the tools people are actually using.

I've dug around the GClosure library docs w.r.t. their build tools, and they seem uniformly directed at people building applications; outside of supporting application use cases, I don't see any discussion of distributing the artifacts of e.g. closurebuilder.py & co. as libraries. A (hardly comprehensive) search of libraries distributed with deps.js files yielded nothing; in fact, the only mentions of deps.js that I can find on github are in .gitignore s and HTML pages of app repos, and in files related to build processes that are reading deps.js shipped with GClosure and its third-party-library.

So, I think we can safely not think about deps.js again, except for the special case of the upstream GClosure libs, which is already taken care of.

One piece of "prior art" of which I was not fully cognizant until yesterday (!!) is this:


Essentially, cljsbuild implicitly sets up e.g. /closure-js/libs as a classpath :libs root, same for /closure-js/externs. I've not tested this myself yet, but others apparently have had success. Seems like a very reasonable approach, absent something built-in to the compiler. Externs are certainly out of scope here, but some discussion of what can be done to similarly ease their use is warranted; perhaps adopting a variant of cljsbuild's approach…

Comment by Chas Emerick [ 15/Feb/14 8:31 AM ]

I've started working on this in earnest. It's turning out to be more complicated and delicate than I expected. I see no reason why it won't be resolved successfully as discussed previously, but I thought I'd show my face here so people know that progress is being made.

Comment by Thomas Heller [ 16/Feb/14 3:14 AM ]

I implemented a different approach to this problem in https://github.com/thheller/shadow-build. It has been working well for me so let me explain what I do.

1) The user specifies all source paths

Each source path is scanned for .cljs and .js files. JS files are just scanned for goog.provide/require statements via regexp, CLJS files assume the NS form is first and is scanned as such to extract the require/provides. No further analysis is done.

2) Out of all this information a dependency graph is built (plus a lookup index, eg. where can I find goog.string).

3) The user specifies his "main" namespaces. A main is simply a namespace that either exports functions for outside use or runs code on its own behalf. Those main namespaces are sorted in topological order so we have the entire graph needed for the compilation. If a namespace is not found required but not provided I do not attempt to build anything but throw an exception.

4) All files are "compiled" in reverse order. Since only CLJS requires compilation, JS files are skipped. CLJS compilation is also a lot "safer" than cljs.closure since dependencies are "guaranteed" to exist (eg. can't compiled a file we don't have all requires for, which may or may not be ideal).

5) Optionally everything may be grouped into modules and either flushed unoptimized and fed to the closure optimizer and then output to disk.

Although I must admit I never used the ```:libs``` directive I'm pretty sure its not needed by using this approach, just put it into a "source path" and let the regexp scan it. As long as it has goog.provide/require its good to go. You can even make it a "main" by just specifying its goog.provide, not required to make a CLJS main.

I've been pretty busy lately and shadow-build is missing some cleanup I want to do but its working very well for my project (> 10k CLJS LOC).

Maybe some of the work I did could help here.

Comment by Chas Emerick [ 17/Feb/14 9:51 AM ]

Thanks, Thomas. I haven't looked at shadow-build much before, but I've stuck a pin in it for further review. The biggest complication of this change is not the primary objective, but my wanting to accomplish it with minimal impact on cljs.closure. There are many things that I would like to change there, but absent an effective set of tests, any simplifying refactor is fairly risky. The first step to accomplishing that would probably be to invert the existing dataflow to produce a (testable) build plan instead of actually building anything. But again, work for another day.

Anyway, I think I have things pretty well settled out locally, though there's some policy decisions to be made. (Putting those in a separate comment.) My biggest hangup was stepping on some helper functions that were being used by the externs stuff in ways I wasn't aware.

Comment by Chas Emerick [ 17/Feb/14 11:34 AM ]

Progress so far on this is available @ https://github.com/cemerick/clojurescript/tree/CLJS-656

Changes on that branch:

  • goog-style JavaScript libraries are no longer resolved via classpath scanning (though it is still used for loading externs and such)
  • I've replaced the system-property-based classpath discovery with one that looks at the actual classloaders in use; this should avoid all sorts of confusing/incorrect behaviour when the compiler is being used in environments where the effective classpath is dynamic.
  • JavaScript dependencies are resolved in one of two ways:
  • resolution through the goog deps.js, the consumption of which remains untouched (though it looks like the classpath-scanning mechanism yields effectively equivalent results as reading deps.js, so we could probably eliminate goog-dependencies entirely in the future…)
  • direct lookup on the classpath based on the "namespace" being :require}}d; e.g. {{(:require foo.bar) will look for a foo/bar.js resource on the classpath, which must contain a corresponding goog.provide("foo.bar") declaration

This all works nicely with the couple of libraries I have GClosure-ified (pprng is the only public instance of that at the moment).

As noted in a comment in the WIP change, a problem with the direct-lookup policy is that goog-style JS dependencies that aren't part of the "official" GClosure library can provide one and only one name, the one corresponding to their position on the classpath. Not a problem from my perspective, but it does cut off the practice of libraries providing type ctors or pools of constants directly, e.g. goog.string.Unicode. This is really just a policy question:

Q1: Should ClojureScript support requiring JavaScript namespaces that don't correspond with the naming/location of their source files?

IMO, the answer to that should be 'yes'; saying 'no' means that we either need to prevent such requires w.r.t. official GClosure library, or that that collection of libraries is effectively a special case from the user's perspective. 'Yes' dumps us back to doing classpath scanning to discover goog-style JS dependencies; not a problem IMO (it would make the solution to this issue effectively equivalent to CLJS-526), but something that David said in the IRC conversation preceding this issue's creation should be eliminated. I think that secondary objective preceded the understanding that GClosure lib doesn't use Java-style file layout itself.

Comment by David Nolen [ 17/Feb/14 11:48 AM ]

Just a heads up that this will likely conflict with CLJS-615, which I would like to land first.

Comment by Chas Emerick [ 17/Feb/14 11:52 AM ]

That's fine, this patch will be small either way and not difficult to rebase.

Comment by Thomas Heller [ 18/Feb/14 3:58 AM ]

IMHO the direct lookup is not very useful. Discovering/scanning all JS files it not that expensive and its quite common in Closure to have many provides per file. Not a big fan of "special cases" either, so I think every JS file should be treated equal regardless of Closure-Library or not.

The real problem is that cljs.closure/build is not a very flexible interface and calling it repeatedly (lein cljsbuild auto) causes problems because all of the sudden cljs.closure has to take care of caching, reloading, etc. IMHO something that should probably be left to tooling and not pollute cljs.analyzer/compiler.

Comment by Chas Emerick [ 18/Feb/14 6:42 AM ]

I agree; I don't see any way (or really, good reason) to avoid the classpath scanning without diverging in important ways from user-dev expectations re: how the goog provide/require system works. Now that CLJS-615 has landed, I'll tweak up my changes to reflect that, and apply cleanly to master in a proper patch.

Comment by Chas Emerick [ 18/Feb/14 9:49 AM ]

Patch attached. It does all that I described above, but does not constrain the location of goog-style JavaScript dependencies within the classpath; the goog.provide declarations in such libraries is the sole constraint on their visibility from user-dev code.

FWIW, the classpath scanning takes ~600ms on my machine the first time, and ~7ms after that (resource lookup from the classpath is pretty heavily cached in the JVM), so I don't think we need to worry about it being a problem in compiler performance.

Comment by Thomas Heller [ 18/Feb/14 10:30 AM ]

FWIW I use reducers to bring that 600ms to around 200ms on my machine, should be possible to make that even faster since right now I (and you I presume) open JAR files ONCE to extract all names and then ONCE per URL. Would probably be faster to open/scan once in total.

But like you said, since its done on init it probably is no problem.

Comment by Chas Emerick [ 19/Feb/14 3:48 AM ]

No, we touch the jars each time. You're right that it could be sped up for the first compile very obviously (I just try to keep my patches as slim as possible, so I generally don't touch what I don't need to). I'll see about it in addressing CLJS-770.

Comment by Chas Emerick [ 24/Feb/14 4:30 AM ]

New patch attached that applies cleanly, now that CLJS-770 / cljs.js-deps is merged.

Comment by David Nolen [ 07/Mar/14 2:39 PM ]

After some thought I still think we should special case Google Closure Library. Everything else must use Java classpath conventions. There's just not enough incentive to support Google Closure idiosyncracies when we have simpler and more sensible conventions in ClojureScript itself. The universe of libraries that follow Google Closure idiosyncrasies is not exactly large, and the ClojureScript ecosystem is evolving at a quick clip. Integration of large non-Closure compatible libraries like React is already straightforward due to :preamble. Let's wrap this one up and move on.

Comment by Chas Emerick [ 12/Mar/14 6:25 AM ]

Update forthcoming shortly.

Comment by Chas Emerick [ 17/Mar/14 11:32 AM ]

New patch attached. This one requires that any non-extern libraries loaded via the classpath contain a goog.provide declaration that corresponds to its relative location on the classpath and to the name used for that library in ClojureScript (e.g. in ns forms).

:libs continues to exist, though is now limited to pulling in libraries from the filesystem, and enforces no constraints on goog.provide declarations or relative position on the filesystem.

Generated at Fri Apr 18 21:18:22 CDT 2014 using JIRA 4.4#649-r158309.