This design document covers a superset of the clojure.xml schema.

All metadata are optional, parsers SHOULD provide them and serializers MAY leverage it: metadata is not required to be kept in sync with their data so it's up to the serializers heuristics to use or ignore them.


A namespace aware XML parser produces a datastructure following these representations, this representation is designed as a superset of the clojure.xml representation:

^{::xml/ns {"" "http://..."
"x" "http://..."}
::xml/prefix "x"}
{:tag :foo
:uri "http://..."
:attrs {...}
:content [...]}

A map with keys: :tag (local name keywordized), :attrs (attribute map), :content (nil, sequence, sequential collection of nodes), :uri (namespace URI as string).

Metadata: under the ::xml/ns key, a map of prefixes (strings) to URIs (strings), under the ::xml/prefix a string of the original prefix.

(where xml is an alias for data.xml)

^{::xml/ns {"" "http://..."
"x" "http://..."}
::xml/prefix "x"}
[:href "http://..."]

A map entry whose key is either a keyword (no prefix) or a [:kw "uri"] pair; the value is a string.

In the case of [:kw "uri"] keys, ::xml/prefix SHOULD be present

the uri MUST NOT be nil or the empty string

Text nodes
 {:comment "text"}
... TBD

Default serialization strategy suggestion

The default serialization strategy is conservative.

When serializing an element: add missing namespaces declarations.

For each name (elements or attributes name):

  1. If the ::xml/prefix matches the :uri then use it,
  2. if the uri is mapped to another alias, use it, (Should we rather map the url to the new alias?)
  3. if the uri is not mapped then map it under the specified alias (if present) or a gensymed alias.

For elements:

emit xmlns and xmlns:* attributes for new mappings (generated by attributes serialization and by new entries under the ::xml/ns key – new when compared to the state maintained by the serializer).


Custom serialization strategies

To please broken consumers, XML serialization has to be tweaked. It may be interesting to have emit or *xml-emitter* or somethieng to be a dynamic var.


Less than half-baked idea:

Defining a full serializer is tiresome so it may be interesting to provide a factory.

  init ; internal state of the serializer, initial value 
  (fn [state xmlns] state') ; fn to update internal state given a new xmlns map
  (fn [state local-name uri prefix] prefix')) ; fn which decides which prefix to use for a given name

All serialization quirks can't be solved by emitters produced by such a factory but if it covers a good chunk it may be worthy.

There must be a better abstraction.