Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For escaping schemes, there are a couple of well known choices, most of them based on \ as an escape character, which rules them out for use in edn keywords.
Arguably the best known escaping scheme, though, is rfc3987 percent-encoding, also known as url-encoding. This happens to be an almost perfect fit for transcribing human-readable strings into clojure keyword namespaces:
1.) It retains a moderate amount of readability which gets better through reusable knowledge. You probably don't even have to check, when I tell you that %2F means /
2.) Reserved characters in an uri segment, seem to be a happy superset of clojure's reserved characters. In particular, it reserves : and /, but leaves . (which leads to a curious, but not entirely unappealing mapping to java package trees.)
     Hence, using urlencoding, will even save us from shipping codecs (we need to make sure though, that java's URLEncoder fully agrees with javascript's encodeURIComponent)
3.) % got allowed in clojure 1.5.0. The jury on http://dev.clojure.org/jira/browse/CLJ-1527 is still out, but Rich Hickey's talk about not breaking APIs could be interpreted to mean that it will stay allowed.
     Bumping the required clojure version from 1.4.0 to 1.5.0 is a slight drawback, but that will amortize as people keep upgrading their software. It should also be possible to work with a useful a subset of data.xml on 1.4.0, even some namespacing support as % won't be readable, but still constructable.

[r3] 20171226 :xmlns "..." attributes transform non-namespaced content

Manually setting an :xmlns attribute for the emitter (the parser will never generate such), now exactly behaves as in xml: It transforms non-namespaced tags within the current element into a default xmlns.

Effectively, this specifies a second representation for elemens, that's not canonical and useful mainly for emitting. For QNames, there is already a precedent: Accepting QName instances, keywords, strings in the emitter.

https://dev.clojure.org/jira/browse/DXML-52

This motivates a normalization function, to make equal fragments clojure.core/=

Runtime data structures

Code Block
;; <rdf:nil xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>
;; would be represented in clojure as 

;; [r1] no more 
;; (declare-ns :xml.rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#") ; globally associates the clojure namespace xml.rdf with the rdf xmlns
;; {:tag :xml.rdf/nil} ; now denotes an rdf element with the qualified name {http://www.w3.org/1999/02/22-rdf-syntax-ns#}nil
;; [r1] instead of declare-ns and alias-ns, we now have
(alias-uri :rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#") 
{:tag ::rdf/nil}
;; which the reader expands to {:tag :xmlns.http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23/nil}

;; a reader macro can be used to refer to xmlns in clojures alias facilities [alias, refer :as, e.g.
(require '[#xml/ns "http://www.w3.org/1999/02/22-rdf-syntax-ns#" :as rdf]))
{:tag ::rdf/nil} ; ] can be used to introduce shorthands
;; in clojurescript, this is the only way, as there is no alias-uri there
;; what makes this awkward, is that in this case, http%3A%2F%2Fwww/w3/org%2F1999%2F02%2F22-rdf-syntax-ns%23.clj, + one with all the % replaced by _PERCENT_ for .cljs, need to exist on the classpath.
;; there is hope for this, though: http://dev.clojure.org/jira/browse/CLJ-2030

...

Unfortunately, percent-encoding uri-namespaces don't quite fit the bill on user-friendliness, but outside of clojure's kw-aliasing facilities, this can still be fixed by using reader tags.

Since the emitter accepts a larger set of representations, there is a normalization function for xml fragments, called canonicalize. Additionally, there is clojure.data.xml/= as a possibly more efficient version of #(clojure.core/= (canonicalize %1) (canonicalize %2))

xml elements

Elements are represented as maps with keys #{:tag :attrs :content}. The canonical representation is a clojure.data.xml.node/Element defrecord, exposed through the constructors element and element*.

clojure.data.xml.node/Element implements a custom equality, compatible with maps. It does not, however, use clojure.data.xml/=, in order to preserve commutativity.

element* takes tag name, attributes, a content list, and optional metadata. It can be used to construct non-canonical representations.

element takes content varargs and canonicalizes its tag, attributes and content maps. It wont canonicalize content elements.

xml names

̶I̶n̶ ̶t̶h̶e̶ ̶g̶e̶n̶e̶r̶a̶l̶ ̶c̶a̶s̶e̶,̶ ̶x̶m̶l̶ ̶n̶a̶m̶e̶s̶ ̶a̶r̶e̶ ̶r̶e̶p̶r̶e̶s̶e̶n̶t̶e̶d̶ ̶a̶s̶ ̶(̶Q̶N̶a̶m̶e̶s̶)̶[̶h̶t̶t̶p̶:̶/̶/̶d̶o̶c̶s̶.̶o̶r̶a̶c̶l̶e̶.̶c̶o̶m̶/̶j̶a̶v̶a̶e̶e̶/̶1̶.̶4̶/̶a̶p̶i̶/̶j̶a̶v̶a̶x̶/̶x̶m̶l̶/̶n̶a̶m̶e̶s̶p̶a̶c̶e̶/̶Q̶N̶a̶m̶e̶.̶h̶t̶m̶l̶]̶ ̶o̶r̶,̶ ̶i̶f̶ ̶t̶h̶e̶y̶ ̶h̶a̶v̶e̶ ̶n̶o̶ ̶n̶a̶m̶e̶s̶p̶a̶c̶e̶ ̶u̶r̶i̶,̶ ̶a̶s̶ ̶k̶e̶y̶w̶o̶r̶d̶.̶
̶d̶a̶t̶a̶.̶x̶m̶l̶ ̶h̶a̶s̶ ̶a̶ ̶f̶a̶c̶i̶l̶i̶t̶y̶ ̶t̶o̶ ̶a̶s̶s̶o̶c̶i̶a̶t̶e̶ ̶c̶l̶o̶j̶u̶r̶e̶ ̶n̶a̶m̶e̶s̶p̶a̶c̶e̶s̶ ̶w̶i̶t̶h̶ ̶x̶m̶l̶ ̶n̶a̶m̶e̶s̶p̶a̶c̶e̶ ̶u̶r̶i̶s̶.̶ ̶W̶h̶i̶c̶h̶ ̶a̶l̶l̶o̶w̶s̶ ̶c̶l̶o̶j̶u̶r̶e̶'̶s̶ ̶s̶h̶o̶r̶t̶h̶a̶n̶d̶-̶s̶y̶n̶t̶a̶x̶ ̶f̶o̶r̶ ̶n̶a̶m̶e̶s̶p̶a̶c̶e̶d̶ ̶k̶e̶y̶w̶o̶r̶d̶s̶ ̶t̶o̶ ̶b̶e̶ ̶u̶s̶e̶d̶:̶

...

<n:foo xmlns:n="NO:NO/NO" /> => {:tag :xmlns.NO%3ANO%2FNO/foo}

<foo xmlns="NO:NO/NO" /> => {:tag :xmlns.NO%3ANO%2FNO/foo}

Similar to xml serialization, the kw-ns :xmlns/... and :xml/... are given special treatment: Even though you can still emit them, by giving their full namespace uri, their canonical representation is the short form.

Additionally accepted qname Additional, non-canonical qnames types in the emitter:

xml attributes

Are stored in hash-maps. The parser removes xmlns attributes from the attr hash and stores them in metadata (accessible via clojure.data.xml/element-nss).

The namespace environment can be augmented by associating :xmlns and :xmlns/<prefix> attributes.

Associating attributes :xmlns or :xmlns/<prefix> denotes a non-canonical representation for namespaced xml, where you can scope tag names, similar to xml, this is akin to the 0.0.8 API, but only for the emitter.