Skip to end of metadata
Go to start of metadata

Problem

In a Browser environment it is sometimes desirable to split the output of a Google Closure optimized build into multiple files (modules) instead of one. This is an optimization and has no other uses beside that. On traditional Web Pages it is very common that not all code is required on all pages, therefore it makes sense to split code into logical groups and only load what is required on each page. Doing this manually would be hard (impossible) due to the aggressive renaming of the Closure optimizer, therefore it needs to be setup before Closure Compiler is invoked.

The result of a compilation with only one defined module is identical to compiling without modules, it can therefore be the default to compile with modules.

Example

Need a good one, they are almost always app specific.

Definitions

  • Input - Any piece and JavaScript in the form of a String usually, either from a File or String. Each has a list of "provides" and a list of "requires", each of those refer to a namespace within the project. Either goog.provide/require from Javascript or those generated by ClojureScript via the (ns ...) form.
  • Module - Given the set of all available Inputs, a module represents a subset of those. A Module can depend on other Modules, but any Input may only appear once in the entire compilation of the Graph.
  • Main (?) - A namespace that represents an entry point to the application. These either contain public (exported) API functions or code that is executed when loaded.

Strategy

Let the user define each Module and their Main namespaces. It would be desireable to get the resulting dependency graph from this information, otherwise the user needs to supply the dependency info as well.

Implementation

A sample implementation can be found at: https://github.com/thheller/shadow-build

shadow-build Implementation:

Since the cljs.closure namespace assumes to create only one output file, the shadow-build library is basically a complete reimplementation of it with some features added. When writing it I (Thomas Heller) opted to go with a "no shared state" approach meaning that I carry the state of the compilation around with me as a immutable map. This map is created and modified in multiple steps during a compilation. Steps usually follow this flow:

Configuration -> Discovery -> Module Definition -> Compilation -> Output

During Discovery all defined source paths are scanned for viable compilation resources. These are either ClojureScript files or Javascript files that contain goog.provide/require statements. Javascript files without these will be discarded. ClojureScript files will read the first form (and expect it to be the ns definition, fail otherwise) to extract the provide/require information.

Then the modules are created using the information provided by the user. Each Module is then compiled using a topo sorted list of the required namespaces (via the Main definitions). Any ClojureScript file will be compiled, ensuring that all its dependencies were compiled before. A missing dependency will fail the build. A ClojureScript file that was never required anywhere is not compiled. Javascript files do not need compilation and are skipped, but dropped also if not required. Then the Closure JSModule instances are created and the Compiler.compileModules function is invoked. On success the generated Javascript (and source maps) is saved in the compiler state.

After the compilation the output generally needs to be written to files, I opted to go with a "strict" output layout. The User defines an output directory and a path where this file can be reached once in a Server/Client Situation. The output directory then contains all generated modules and a manifest.json file. Additionally when compiled with source maps the output directory will contain a src directory with all ClojureScript/JavaScript source files. Intermediate Output (JS generated by CLJS files is not written to disk unless requested via shadow.cljs.build/flush-to-disk).

The generated manifest.json can be used by additional tooling but is not needed at runtime. In production I timestamp each module with a custom compilation step (https://gist.github.com/thheller/6c3ad4c880b035921ce6) and the manifest.json contains the mapping of module name -> filename. A server side component then watches the manifest and reloads it, the information of the manifest is then used to emit the correct <script src="..."> tags in the HTML. The code just refers to a module by name, the actual filename is resolved automatically.

During development it is desireable to have very fast builds via {:optimizations :none} but still have the exact same "API" to include files via HTML. When using an unoptimized build (flush-unoptimized) the library creates "fake" modules. The common base module will contain the goog/base.js and goog/deps.js, each module javascript file then just contains every goog.require statement of the namespaces provided by this module. This way the user just includes <script src="module-a.js"> in the HTML and the correct thing is done in production and development. No manual goog.require statements in HTML required.

Labels: