<< Back to previous view

[DXML-47] Failed to emit CDATA in ClojureScript Created: 26/Jul/17  Updated: 26/Jul/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Heehong Moon Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Emitting CDATA in ClojureScript causes an error.

dev:cljs.user=> (xml/emit-str (xml/element :a {} "test"))
"<a>test</a>"
dev:cljs.user=> (xml/emit-str (xml/element :a {} (xml/cdata "<b></b>")))
#object[Error Error: No protocol method AsQName.qname-uri defined for type null: ]
   cljs.core/missing-protocol (jar:file:/Users/bbirec/.m2/repository/org/clojure/clojurescript/1.9.229/clojurescript-1.9.229.jar!/cljs/core.cljs:270:4)
   clojure.data.xml.protocols/qname-uri (jar:file:/Users/bbirec/.m2/repository/org/clojure/data.xml/0.2.0-alpha2/data.xml-0.2.0-alpha2.jar!/clojure/data/xml/protocols.cljc:13:1)
   clojure.data.xml.name/qname-uri (jar:file:/Users/bbirec/.m2/repository/org/clojure/data.xml/0.2.0-alpha2/data.xml-0.2.0-alpha2.jar!/clojure/data/xml/name.cljc:42:4)
   Function.clojure.data.xml.js.dom.element_STAR_.cljs$core$IFn$_invoke$arity$3 (jar:file:/Users/bbirec/.m2/repository/org/clojure/data.xml/0.2.0-alpha2/data.xml-0.2.0-alpha2.jar!/clojure/data/xml/js/dom.cljs:32:36)
   clojure.data.xml.js.dom/element* (jar:file:/Users/bbirec/.m2/repository/org/clojure/data.xml/0.2.0-alpha2/data.xml-0.2.0-alpha2.jar!/clojure/data/xml/js/dom.cljs:15:1)
   clojure$data$xml$js$dom$element_node (jar:file:/Users/bbirec/.m2/repository/org/clojure/data.xml/0.2.0-alpha2/data.xml-0.2.0-alpha2.jar!/clojure/data/xml/js/dom.cljs:97:30)
   cljs.core.map.cljs$core$IFn$_invoke$arity$2 (jar:file:/Users/bbirec/.m2/repository/org/clojure/clojurescript/1.9.229/clojurescript-1.9.229.jar!/cljs/core.cljs:4466:30)
   cljs.core.LazySeq.sval (jar:file:/Users/bbirec/.m2/repository/org/clojure/clojurescript/1.9.229/clojurescript-1.9.229.jar!/cljs/core.cljs:3223:18)
   cljs.core.LazySeq.cljs$core$ISeqable$_seq$arity$1 (jar:file:/Users/bbirec/.m2/repository/org/clojure/clojurescript/1.9.229/clojurescript-1.9.229.jar!/cljs/core.cljs:3277:12)
nil
dev:cljs.user=>





[DXML-50] Indenting writer Created: 08/Nov/17  Updated: 14/Nov/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File indent.clj    
Patch: Code

 Description   

I know the current approach to emitting indented output is kind of a hack. I embarked on a quest to make an indenting XMLStreamWriter which is attached. I'm not sure the best way it should be integrated into data.xml so I've not actually done that part, but I have included a demo, which is kind of a hacked version of clojure.data.xml/emit and clojure.data.xml.jvm.emit/write-document. I don't think there is a clean place to insert the wrapping IndentingWriter at the moment so that's something that needs to be resolved. But feel free to use this contribution as you will in implementing this feature.



 Comments   
Comment by Herwig Hochleitner [ 14/Nov/17 1:28 PM ]

Hi Alex,

thanks for bringing this up! The current, horrible implementation of indent has plagued my mind as well.

Thanks, also, for your implementation of an indenting java.xml.stream.XMLStreamWriter. Before going into detail on how your code could be integrated, or what options would suffice to introduce it in configuration, let me frame some solvable problems, that this ticket touches on:

  • Being able to efficiently indent xml, ideally platform-independently
  • Being able to swap out the XMLStreamWriter, and possibly other nitty-gritties of parse/emit, by configuration

I'll focus on the "efficiently indenting xml" part, for the purposes of this comment (and ticket, hopefully )

For a jvm-only solution, we should have a report on the possibility of (e.g.) hooking a StAXSource [1] into the indenting-transformer [2], before rolling our own. On the other hand, if cljs could profit as well, rolling our own streaming transformer makes sense, even if this could be achieved by other means in the jvm backend.

If you squint at your c.d.xml.jvm.indent namespace a little, you might already see the transducer, it contains, popping out at you. The XMLStreamWriter interface seems like incidental complexity in a simple text-node transformer, for the want of a streamable data model. Luckily data.xml, from the very beginning, was built on a streaming event model for xml [3]. I have been planning to support tree-transformations, in the form of transducers over the event stream, and indentation would be an awesome first use-case for this.

What do you think? If you're interested in taking this further, here is a commit, that defines an :event-xform config-option for emit* [4].

(emit-str (parse-str "<foo>bar lala <br/> gag</foo>")
          :event-xform (fn [xf]
                         (fn
                           ([s] (xf s))
                           ([s {:as e :keys [str]}]
                            (-> s
                                (cond-> str (xf (clojure.data.xml.event/->CharsEvent "^.^")))
                                (xf e))))))
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo>^.^bar lala <br/>^.^ gag</foo>"

[1] https://docs.oracle.com/javase/7/docs/api/javax/xml/transform/stax/StAXSource.html
[2] https://github.com/clojure/data.xml/blob/master/src/main/clojure/clojure/data/xml/jvm/pprint.clj#L15
[3] https://github.com/clojure/data.xml/blob/master/src/main/clojure/clojure/data/xml/event.clj
[4] https://github.com/bendlas/data.xml/commit/0c2baa690154bfa731fa1f98a539542a0205e6b1





[DXML-53] Java 9 changes indented xml output (adds newlines) Created: 09/Jan/18  Updated: 09/Jan/18

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Alex Miller Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Java 9 (vs prior)



 Description   

Using the indent-xml, which goes through a print read roundtrip with Transformer to do the indenting now adds newlines to every node with Java 9 (so you get different output. This was first seen at https://dev.clojure.org/jira/browse/TDEPS-29.

javax.xml.transform.Transformer has had changes in Java 9, presumably due to the update to Xerces-J 2.11.0 (https://xerces.apache.org/xerces2-j/releases.html). Here's a blog outlining some of the effects: http://java9.wtf/xml-transformer/. Possibly also relevant: https://bugs.java.com/view_bug.do?bug_id=JDK-8087303

Seems like the last and some other places I've found hint that https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/ls/LSSerializer.html is one possible answer.






[DXML-40] README FileWriter example fails if platform default encoding is not UTF-8 Created: 12/Feb/17  Updated: 12/Feb/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Phill Wolf Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None


 Description   

README.md illustrates writing an XML file with java.io.FileWriter. Therefore the example works only if the Java platform's default encoding is UTF-8. Suggestion: The README would present a more widely usable technique by using clojure.java.io/writer, whose default encoding is UTF-8 everywhere.

Sample program using the README example:

Unable to find source-code formatter for language: clojure. Available languages are: javascript, sql, xhtml, actionscript, none, html, xml, java
(ns garble
  (:require [clojure.data.xml :refer [element emit]]))
(defn -main
  "Tries to write an XML file"
  []
  (let [tags (element :foo {:foo-attr "foo value"}
             (element :bar {:bar-attr "bar value"}
               (element :baz {} "The baz value")))]
  (with-open [out-file (java.io.FileWriter. "/tmp/foo.xml")]
    (emit tags out-file))))

Invocation 1 (overriding Java's default encoding because Java is inclined to use UTF-8 on my computer):

java -cp ... -Dfile.encoding=US-ASCII clojure.main -m garble

Result:

java.lang.Exception: Output encoding of stream (UTF-8) doesn't match declaration (ASCII)

Invocation 2:

java -cp ... -Dfile.encoding=UTF-8 clojure.main -m garble

Result: successfully writes /tmp/foo.xml






[DXML-41] README - applicability of xml-seq, xml-zip? Created: 12/Feb/17  Updated: 14/Feb/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Phill Wolf Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Clojure's core library includes clojure.xml and two other very useful functions evidently designed to work with data from clojure.xml: xml-seq and clojure.zip/xml-zip.

Is it intended that xml-seq and xml-zip work with data from clojure.data.xml and, in particular, its release-0.2 XML-namespace-related improvements?

Let's enhance the clojure.data.xml README to clarify whether, or to what degree, it should be OK to use clojure.data.xml with xml-seq and xml-zip.



 Comments   
Comment by Herwig Hochleitner [ 14/Feb/17 5:17 AM ]

I have used clojure.data.xml with xml-zip (as well as with clojure.data.zip.xml) and it worked as expected. I'd expect the same from xml-seq.

We should verify this behavior in the test suite and announce it in the readme.
Patches welcome.





[DXML-45] Support UTF-8 XML beginning with BOM Created: 27/Apr/17  Updated: 21/Nov/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Jeff Wong Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None


 Description   

It would be great to be able to parse UTF-8 encoded files beginning with a BOM byte order mark character, as it would give better native support for XML in the wild.

Currently, I'm having a few of these xml files throw a "content not allowed in prolog" exception:
http://stackoverflow.com/questions/4569123/content-is-not-allowed-in-prolog-saxparserexception



 Comments   
Comment by Herwig Hochleitner [ 02/May/17 9:46 AM ]

Following your stackoverflow link, this seems to be related to a couple of java bugs, that are marked as `wontfix` due to expectations of existing tools and the recommendation in the tickets is for applications to deal with the BOM themselves.

Since data.xml promises to process xml from raw bytes (because it accepts InputStreams), there is a choice: Either discontinue the InputStream interface and require users to pass Readers that correctly handle their input (e.g. https://commons.apache.org/proper/commons-io/javadocs/api-2.2/org/apache/commons/io/input/XmlStreamReader.html) or use a Reader implementation that can do so, when creating an input source from a stream.

For ease of maintenance, it's tempting to go with removing the byte-based interface, but I'm open to arguments to why data.xml should deal with this.

Comment by Jeff Wong [ 02/May/17 12:28 PM ]

This was more of a suggestion - After reading up about input and input streams, I can understand why this may be out of scope.

I was naive in thinking that handling input via a clojure.java.io/reader would be able to parse an xml file properly, as I was unaware of the BOM issues until I hit the exception. Even though the related JVM fix for BOMs would break backwards compatability and thus rejected, it would still be helpful if another underlying parsing library handled the input and BOMs.

At least consider adding a recommended list of readers for those unfamiliar with XML parsing in java. It is difficult to anticipate these kinds of gotchas for developers unfamiliar with BOMs, readers, and XML (such as myself), especially when the same files pass validation in other languages.

Comment by Herwig Hochleitner [ 21/Nov/17 8:11 AM ]

I'm just leaving this here, it might be a good reference to mention, when documenting / changing this: https://github.com/jimpil/clj-bom





Generated at Wed Jan 24 02:06:37 CST 2018 using JIRA 4.4#649-r158309.