Skip to end of metadata
Go to start of metadata

Problem

Macros are compiler extensions (really, mini compilers themselves) that translate from a Clojure-like syntax (s-expressions) into Clojure syntax (s-expressions). Most macros contains aspects of these stages:

  1. Parsing and validating the input syntax structure
  2. Transforming the structure
  3. Outputting new structure (often by quoting or syntax quoting new forms)

The first step is effectively a hand-written parser. This leads to the following problems with macro definitions:

  • Validation is usually incomplete, checking only for the most common syntactic errors
  • Error messages occur inside the macro rather than inside the user's original syntax
  • Syntax documentation is hand-written and can thus be at odds with the parser

Due to the above problems, macros are often hard to write and hard to maintain, especially when more attention is paid to validation and error messages.

Proposal

Instead, define a standard grammar to describe the input syntax for a grammar and a generic parser facility that can validate the input according to the grammar. In the case of an error, the parser should automatically produce an error message describing what was provided as input, what the grammar expected, and what was found instead.

The facility provided for macro writers should be similar in spirit to the existing destructure macro - that is, given an input form and a grammar: produce a binding set. In the case of a failure, this function should provide either an error message and/or data describing where the parse failed.

What do grammars need to describe?

Grammars are conceptually similar to a regex. Like regexes, the grammar should describe both the macro structure to parse ("is this valid?") and also capture values as the parsing occurs for later use.

Additionally, we wish to produce (as automatically as possible) informative and useful error messages. Some extra information may need to be provided to assist.

  • Terminals - keywords, symbols, strings (literal or regex?), characters, numbers, booleans, vectors, sets, maps, lists, sequentials
  • Composition
    • concatenation
    • alternation
    • optional (0 or 1)
    • one or more
    • zero or more
    • repeat - fixed number in any order
    • negative lookahead (NOT this branch)
  • Capture
    • mark beginnning
    • mark ending
    • create new binding
    • update existing binding
  • Error message production
    • user-friendly rule names (for use in errors)
    • custom validation and errors (for when that's needed)

Possible parsers

Questions to consider:

  • What are the dependency constraints?
  • What are the parsing capabilities?
  • What are the capture capabilities?
  • What are the error messages like when errors occur? Are there ways to customize?
  • Are grammars composable? Nameable?
  • Performance? (not a key driver, but worst case may be important)

Parsers:

 error-testseqexInstaparse
Depsno external
c.string/replace
c.set/union
no external
c.string/join
c.set/union, difference, intersection, subset? 
no external
c.string/replace, join
Parsingall aboveall abovemost of above, but in terms of string matching at the bottom, not Clojure forms
CaptureCreate or update binding set during parsecap/recap + arbitrary functions to build custom captureProduces AST, with some control for post-processing
Error messagesGenerates error based on expected vs found allows for custom names.Generates error based on expected vs found, allows for custom names.?
ComposableYes, can build up rules.Yes, can build up models.?
Perf???
Docs???
Sizesmallmediummedium (not all needed though)
Where is it used?Cursive  

Some others:

Error message considerations

  • How do we tie macro errors back to the point where user provided incorrect input?
  • Does the reader need to provide more information to produce good error location information?

Integration

New function: destructure-parse

  • How early during core.clj bootstrap can we integrate destructure-parse?


Produce grammars for existing macros/macro-parts in Clojure core:

  • destructuring
  • parameters (with and without rest)
  • declare
  • def* body
  • arities
  • defn
  • fn
  • letfn
  • local-binding
  • for
  • defmethod
  • gen-class
  • ns
  • in-ns
  • defprotocol
  • deftype
  • defclass
  • reify
  • extend-type, extend-protocol
  • proxy
  • definterface
  • condp
  • case

Grammar examples

seqex let example:

error-test example:

 

Example bugs that could be resolved by macro grammars

These are some open bugs that require additional validation in a core macro that could be caught automatically if the macros were defined with a grammar:

Example defn problems:

Example ns problems:

Example let/destructuring problems:

 

 

References

Other possible uses

These are not primary goals of this effort, but perhaps lie nearby:

  • Tool for printing the syntax of a macro
  • Using the same syntax for documenting the grammar of *functions*
  • Tool for "grammar expanding" a macro, seeing what gets matched, backtracking, etc
Labels: