<< Back to previous view

[DXML-53] Java 9 changes indented xml output (adds newlines) Created: 09/Jan/18  Updated: 09/Jan/18

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Alex Miller Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Java 9 (vs prior)



 Description   

Using the indent-xml, which goes through a print read roundtrip with Transformer to do the indenting now adds newlines to every node with Java 9 (so you get different output. This was first seen at https://dev.clojure.org/jira/browse/TDEPS-29.

javax.xml.transform.Transformer has had changes in Java 9, presumably due to the update to Xerces-J 2.11.0 (https://xerces.apache.org/xerces2-j/releases.html). Here's a blog outlining some of the effects: http://java9.wtf/xml-transformer/. Possibly also relevant: https://bugs.java.com/view_bug.do?bug_id=JDK-8087303

Seems like the last and some other places I've found hint that https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/ls/LSSerializer.html is one possible answer.






[DXML-43] Emitting XML element with :xmlns attribute doesn't work as expected Created: 25/Mar/17  Updated: 26/Dec/17

Status: Reopened
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Yegor Timoshenko Assignee: Herwig Hochleitner
Resolution: Unresolved Votes: 0
Labels: None
Environment:

java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)



 Description   

Affects version 0.2.0-alpha2, 0.0.8 doesn't have this problem.

user=> (require '[clojure.data.xml :as xml])
nil
user=> (xml/emit-str {:tag :RDF :attrs {:xmlns "http://www.w3.org/1999/02/22-rdf-syntax-ns"}})
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><RDF></RDF>"



 Comments   
Comment by Herwig Hochleitner [ 27/Mar/17 2:40 AM ]

Works as designed: Since you're emitting a plain, non-namespaced element, called RDF, it's impossible to set a default xmlns.

Comment by Yegor Timoshenko [ 27/Mar/17 6:31 AM ]

It's a regression. It breaks your code if you've used data.xml to generate the following XML using version 0.0.8:

<?xml version="1.0" encoding="utf-8"?>
<RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:em="http://www.mozilla.org/2004/em-rdf#">
    <Description about="urn:mozilla:install-manifest">
          <em:id>jid1-3MbuIoOYk03BMg@jetpack</em:id>
    </Description>
</RDF>
Comment by Yegor Timoshenko [ 27/Mar/17 6:32 AM ]

...and then updated to 0.2.0.

Comment by Herwig Hochleitner [ 27/Mar/17 9:56 AM ]

0.0.8 didn't have xmlns support at all, hence this is not considered a regression (we couldn't even roundtrip namespaced XML). To be able to emit namespaced XML like this was incidental.

0.2.0 established semantics for XML namespaces, part of which say that {:tag :RDF} always means <RDF> (in the empty namespace), in order to facilitate composition. Please read up on the current semantics in the README and in the Design Page http://dev.clojure.org/display/DXML/Namespaced+XML

We are still in -alpha, so if you have ideas for changing the semantics, feel free to state your case with reference to the current design.

Comment by Herwig Hochleitner [ 27/Mar/17 10:01 AM ]

btw, with current xmlns support, you would write your fragment like:

(alias-uri 'R "http://www.w3.org/1999/02/22-rdf-syntax-ns")
(emit-str {:tag ::R/RDF})
Comment by Bruno ReniƩ [ 14/Dec/17 7:26 AM ]

With -alpha5, this generates prefixed tags/attrs:

<?xml version="1.0" encoding="UTF-8"?><a:RDF xmlns:a="http://www.w3.org/1999/02/22-rdf-syntax-ns"/>

How would one emit <RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns"/> with 0.2?

Comment by Herwig Hochleitner [ 17/Dec/17 12:29 PM ]

Since any non-broken xml consumer shouldn't care about that difference, data.xml emits prefixes at will.
It does, however, try to keep emitted prefixes to a minimum and you can exploit that behavior to get the serialisation, you want:

(alias-uri 'R "http://www.w3.org/1999/02/22-rdf-syntax-ns")
(emit-str {:tag ::R/RDF :attrs {:xmlns "http://www.w3.org/1999/02/22-rdf-syntax-ns"}})

emits

<?xml version="1.0" encoding="UTF-8"?><RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns"/>
Comment by Bruno ReniƩ [ 22/Dec/17 1:56 AM ]

Thank you. We have to cope with a host of XML consumers in many languages, broken in various ways so it's very important to have complete control over the output.

For now we've reverted to 0.0.8. We'll see if we switch to the ::R/* notation but it's less pleasant to read.

Comment by Herwig Hochleitner [ 26/Dec/17 2:47 PM ]

I was thinking about a replacement mechanism for your use case. Turns out, I was wrong about declining this. See DXML-52 for a rationale.





[DXML-52] Transforming fragments of non-namespaced xml into default xmlns Created: 26/Dec/17  Updated: 26/Dec/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Herwig Hochleitner Assignee: Herwig Hochleitner
Resolution: Unresolved Votes: 0
Labels: None


 Description   

In data.xml, there is currently no feature, equivalent to `xmlns="..."` on an xml tag. A tag in the empty namespace is kept as such.

(emit-str {:tag :xmlns.goo/foo
           :attrs {:xmlns "goo"}
           :content [
             {:tag :foo}]})

evaluates to

"<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo xmlns=\"goo\"><foo xmlns=\"\" xmlns:a=\"goo\"/></foo>"

for a reason: to have perfect round-trippability and value equality for parsed xml

However, mapping the empty namespace to a default namespace is valid for many use cases, including:

  • notational brevity
  • processing output from an HTML parser

Even though, notational brevity looks like a use case, that could benefit from the lexical scoping, a macro can provide, there is no clear use case and the reduced possibility of using arbitrary code in an xml fragment would probably make this inconvenient to use.

Processing, on the other hand, needs a dynamically scoped transformation, and lots of use cases for notational brevity can also be implemented with this.

So the use cases for implementing a dynamically scoped transformation are good.
As for API, setting an :xmlns attribute seems desirable for a couple of reasons:

  • no interference with round-tripping:
    the above, striken-through reason doesn't apply, because the parser will never yield fragments with :xmlns attributes included
  • limited effect on compositionality:
    setting an :xmlns attribute would be indistinguishable from applying a transformation, dynamically scoped down to another :xmlns.
  • negligible API breakage (for -alpha)
    The current use case for setting an :xmlns attribute is a shortcut for setting it into metadata. It:
  • is a subset of the new behavior,
  • only breaks when trying to embed non-namespaced xml into a fragment with an overridden :xmlns (unlikely)
  • can be replicated with a function, setting metadata
  • knowledge transfer from xml syntax

Therefore, the dynamically scoped default-xmlns feature should be exposed via setting :xmlns attributes for the emitter.

https://dev.clojure.org/display/DXML/Namespaced+XML should be updated, to reflect this rationale.

DXML-43 already suggested this, so that will be re-opened as well.






[DXML-51] aggregate-xmlns overwrites metadata Created: 26/Dec/17  Updated: 26/Dec/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Trivial
Reporter: Herwig Hochleitner Assignee: Herwig Hochleitner
Resolution: Unresolved Votes: 0
Labels: None


 Description   

aggregate-xmlns uses with-meta instead of vary-meta assoc to overwrite xmlns metadata, thus deleting other metadata like line,col information.






[DXML-45] Support UTF-8 XML beginning with BOM Created: 27/Apr/17  Updated: 21/Nov/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Jeff Wong Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None


 Description   

It would be great to be able to parse UTF-8 encoded files beginning with a BOM byte order mark character, as it would give better native support for XML in the wild.

Currently, I'm having a few of these xml files throw a "content not allowed in prolog" exception:
http://stackoverflow.com/questions/4569123/content-is-not-allowed-in-prolog-saxparserexception



 Comments   
Comment by Herwig Hochleitner [ 02/May/17 9:46 AM ]

Following your stackoverflow link, this seems to be related to a couple of java bugs, that are marked as `wontfix` due to expectations of existing tools and the recommendation in the tickets is for applications to deal with the BOM themselves.

Since data.xml promises to process xml from raw bytes (because it accepts InputStreams), there is a choice: Either discontinue the InputStream interface and require users to pass Readers that correctly handle their input (e.g. https://commons.apache.org/proper/commons-io/javadocs/api-2.2/org/apache/commons/io/input/XmlStreamReader.html) or use a Reader implementation that can do so, when creating an input source from a stream.

For ease of maintenance, it's tempting to go with removing the byte-based interface, but I'm open to arguments to why data.xml should deal with this.

Comment by Jeff Wong [ 02/May/17 12:28 PM ]

This was more of a suggestion - After reading up about input and input streams, I can understand why this may be out of scope.

I was naive in thinking that handling input via a clojure.java.io/reader would be able to parse an xml file properly, as I was unaware of the BOM issues until I hit the exception. Even though the related JVM fix for BOMs would break backwards compatability and thus rejected, it would still be helpful if another underlying parsing library handled the input and BOMs.

At least consider adding a recommended list of readers for those unfamiliar with XML parsing in java. It is difficult to anticipate these kinds of gotchas for developers unfamiliar with BOMs, readers, and XML (such as myself), especially when the same files pass validation in other languages.

Comment by Herwig Hochleitner [ 21/Nov/17 8:11 AM ]

I'm just leaving this here, it might be a good reference to mention, when documenting / changing this: https://github.com/jimpil/clj-bom





[DXML-50] Indenting writer Created: 08/Nov/17  Updated: 14/Nov/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File indent.clj    
Patch: Code

 Description   

I know the current approach to emitting indented output is kind of a hack. I embarked on a quest to make an indenting XMLStreamWriter which is attached. I'm not sure the best way it should be integrated into data.xml so I've not actually done that part, but I have included a demo, which is kind of a hacked version of clojure.data.xml/emit and clojure.data.xml.jvm.emit/write-document. I don't think there is a clean place to insert the wrapping IndentingWriter at the moment so that's something that needs to be resolved. But feel free to use this contribution as you will in implementing this feature.



 Comments   
Comment by Herwig Hochleitner [ 14/Nov/17 1:28 PM ]

Hi Alex,

thanks for bringing this up! The current, horrible implementation of indent has plagued my mind as well.

Thanks, also, for your implementation of an indenting java.xml.stream.XMLStreamWriter. Before going into detail on how your code could be integrated, or what options would suffice to introduce it in configuration, let me frame some solvable problems, that this ticket touches on:

  • Being able to efficiently indent xml, ideally platform-independently
  • Being able to swap out the XMLStreamWriter, and possibly other nitty-gritties of parse/emit, by configuration

I'll focus on the "efficiently indenting xml" part, for the purposes of this comment (and ticket, hopefully )

For a jvm-only solution, we should have a report on the possibility of (e.g.) hooking a StAXSource [1] into the indenting-transformer [2], before rolling our own. On the other hand, if cljs could profit as well, rolling our own streaming transformer makes sense, even if this could be achieved by other means in the jvm backend.

If you squint at your c.d.xml.jvm.indent namespace a little, you might already see the transducer, it contains, popping out at you. The XMLStreamWriter interface seems like incidental complexity in a simple text-node transformer, for the want of a streamable data model. Luckily data.xml, from the very beginning, was built on a streaming event model for xml [3]. I have been planning to support tree-transformations, in the form of transducers over the event stream, and indentation would be an awesome first use-case for this.

What do you think? If you're interested in taking this further, here is a commit, that defines an :event-xform config-option for emit* [4].

(emit-str (parse-str "<foo>bar lala <br/> gag</foo>")
          :event-xform (fn [xf]
                         (fn
                           ([s] (xf s))
                           ([s {:as e :keys [str]}]
                            (-> s
                                (cond-> str (xf (clojure.data.xml.event/->CharsEvent "^.^")))
                                (xf e))))))
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo>^.^bar lala <br/>^.^ gag</foo>"

[1] https://docs.oracle.com/javase/7/docs/api/javax/xml/transform/stax/StAXSource.html
[2] https://github.com/clojure/data.xml/blob/master/src/main/clojure/clojure/data/xml/jvm/pprint.clj#L15
[3] https://github.com/clojure/data.xml/blob/master/src/main/clojure/clojure/data/xml/event.clj
[4] https://github.com/bendlas/data.xml/commit/0c2baa690154bfa731fa1f98a539542a0205e6b1





[DXML-47] Failed to emit CDATA in ClojureScript Created: 26/Jul/17  Updated: 26/Jul/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Heehong Moon Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Emitting CDATA in ClojureScript causes an error.

dev:cljs.user=> (xml/emit-str (xml/element :a {} "test"))
"<a>test</a>"
dev:cljs.user=> (xml/emit-str (xml/element :a {} (xml/cdata "<b></b>")))
#object[Error Error: No protocol method AsQName.qname-uri defined for type null: ]
   cljs.core/missing-protocol (jar:file:/Users/bbirec/.m2/repository/org/clojure/clojurescript/1.9.229/clojurescript-1.9.229.jar!/cljs/core.cljs:270:4)
   clojure.data.xml.protocols/qname-uri (jar:file:/Users/bbirec/.m2/repository/org/clojure/data.xml/0.2.0-alpha2/data.xml-0.2.0-alpha2.jar!/clojure/data/xml/protocols.cljc:13:1)
   clojure.data.xml.name/qname-uri (jar:file:/Users/bbirec/.m2/repository/org/clojure/data.xml/0.2.0-alpha2/data.xml-0.2.0-alpha2.jar!/clojure/data/xml/name.cljc:42:4)
   Function.clojure.data.xml.js.dom.element_STAR_.cljs$core$IFn$_invoke$arity$3 (jar:file:/Users/bbirec/.m2/repository/org/clojure/data.xml/0.2.0-alpha2/data.xml-0.2.0-alpha2.jar!/clojure/data/xml/js/dom.cljs:32:36)
   clojure.data.xml.js.dom/element* (jar:file:/Users/bbirec/.m2/repository/org/clojure/data.xml/0.2.0-alpha2/data.xml-0.2.0-alpha2.jar!/clojure/data/xml/js/dom.cljs:15:1)
   clojure$data$xml$js$dom$element_node (jar:file:/Users/bbirec/.m2/repository/org/clojure/data.xml/0.2.0-alpha2/data.xml-0.2.0-alpha2.jar!/clojure/data/xml/js/dom.cljs:97:30)
   cljs.core.map.cljs$core$IFn$_invoke$arity$2 (jar:file:/Users/bbirec/.m2/repository/org/clojure/clojurescript/1.9.229/clojurescript-1.9.229.jar!/cljs/core.cljs:4466:30)
   cljs.core.LazySeq.sval (jar:file:/Users/bbirec/.m2/repository/org/clojure/clojurescript/1.9.229/clojurescript-1.9.229.jar!/cljs/core.cljs:3223:18)
   cljs.core.LazySeq.cljs$core$ISeqable$_seq$arity$1 (jar:file:/Users/bbirec/.m2/repository/org/clojure/clojurescript/1.9.229/clojurescript-1.9.229.jar!/cljs/core.cljs:3277:12)
nil
dev:cljs.user=>





[DXML-41] README - applicability of xml-seq, xml-zip? Created: 12/Feb/17  Updated: 14/Feb/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Phill Wolf Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Clojure's core library includes clojure.xml and two other very useful functions evidently designed to work with data from clojure.xml: xml-seq and clojure.zip/xml-zip.

Is it intended that xml-seq and xml-zip work with data from clojure.data.xml and, in particular, its release-0.2 XML-namespace-related improvements?

Let's enhance the clojure.data.xml README to clarify whether, or to what degree, it should be OK to use clojure.data.xml with xml-seq and xml-zip.



 Comments   
Comment by Herwig Hochleitner [ 14/Feb/17 5:17 AM ]

I have used clojure.data.xml with xml-zip (as well as with clojure.data.zip.xml) and it worked as expected. I'd expect the same from xml-seq.

We should verify this behavior in the test suite and announce it in the readme.
Patches welcome.





[DXML-40] README FileWriter example fails if platform default encoding is not UTF-8 Created: 12/Feb/17  Updated: 12/Feb/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Phill Wolf Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None


 Description   

README.md illustrates writing an XML file with java.io.FileWriter. Therefore the example works only if the Java platform's default encoding is UTF-8. Suggestion: The README would present a more widely usable technique by using clojure.java.io/writer, whose default encoding is UTF-8 everywhere.

Sample program using the README example:

Unable to find source-code formatter for language: clojure. Available languages are: javascript, sql, xhtml, actionscript, none, html, xml, java
(ns garble
  (:require [clojure.data.xml :refer [element emit]]))
(defn -main
  "Tries to write an XML file"
  []
  (let [tags (element :foo {:foo-attr "foo value"}
             (element :bar {:bar-attr "bar value"}
               (element :baz {} "The baz value")))]
  (with-open [out-file (java.io.FileWriter. "/tmp/foo.xml")]
    (emit tags out-file))))

Invocation 1 (overriding Java's default encoding because Java is inclined to use UTF-8 on my computer):

java -cp ... -Dfile.encoding=US-ASCII clojure.main -m garble

Result:

java.lang.Exception: Output encoding of stream (UTF-8) doesn't match declaration (ASCII)

Invocation 2:

java -cp ... -Dfile.encoding=UTF-8 clojure.main -m garble

Result: successfully writes /tmp/foo.xml






[DXML-22] Adding hiccup generation function for elements Created: 24/Feb/14  Updated: 07/Dec/16

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Chris Zheng Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

N/a



 Description   

This is for completeness really. See pull request https://github.com/clojure/data.xml/pull/10

I would like to:

  • generate an element using hiccup (already exists)
  • generate hiccup using an element (proposed)


 Comments   
Comment by Chris Zheng [ 28/Mar/14 7:22 AM ]

I'm hoping someone can at least give some feedback to this ticket.

Comment by Ryan Senior [ 28/Mar/14 7:53 AM ]

Hi Chris,

Thanks for the reminder on this. I'll have more time to dig in this weekend, but off the top of my head I think more will need to be done on this, both on implementation and on testing. I think what you have now won't work with comments or cdata. One way to flesh some of that out is to create round trip types of tests in src/test/clojure/clojure/data/xml/test_sexp.clj.





Generated at Wed Jan 17 13:58:13 CST 2018 using JIRA 4.4#649-r158309.