<< Back to previous view

[DXML-45] Support UTF-8 XML beginning with BOM Created: 27/Apr/17  Updated: 02/May/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Jeff Wong Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None


 Description   

It would be great to be able to parse UTF-8 encoded files beginning with a BOM byte order mark character, as it would give better native support for XML in the wild.

Currently, I'm having a few of these xml files throw a "content not allowed in prolog" exception:
http://stackoverflow.com/questions/4569123/content-is-not-allowed-in-prolog-saxparserexception



 Comments   
Comment by Herwig Hochleitner [ 02/May/17 9:46 AM ]

Following your stackoverflow link, this seems to be related to a couple of java bugs, that are marked as `wontfix` due to expectations of existing tools and the recommendation in the tickets is for applications to deal with the BOM themselves.

Since data.xml promises to process xml from raw bytes (because it accepts InputStreams), there is a choice: Either discontinue the InputStream interface and require users to pass Readers that correctly handle their input (e.g. https://commons.apache.org/proper/commons-io/javadocs/api-2.2/org/apache/commons/io/input/XmlStreamReader.html) or use a Reader implementation that can do so, when creating an input source from a stream.

For ease of maintenance, it's tempting to go with removing the byte-based interface, but I'm open to arguments to why data.xml should deal with this.

Comment by Jeff Wong [ 02/May/17 12:28 PM ]

This was more of a suggestion - After reading up about input and input streams, I can understand why this may be out of scope.

I was naive in thinking that handling input via a clojure.java.io/reader would be able to parse an xml file properly, as I was unaware of the BOM issues until I hit the exception. Even though the related JVM fix for BOMs would break backwards compatability and thus rejected, it would still be helpful if another underlying parsing library handled the input and BOMs.

At least consider adding a recommended list of readers for those unfamiliar with XML parsing in java. It is difficult to anticipate these kinds of gotchas for developers unfamiliar with BOMs, readers, and XML (such as myself), especially when the same files pass validation in other languages.





[DXML-42] Some reflection warnings `[org.clojure/data.xml "0.2.0-alpha2"]` Created: 20/Mar/17  Updated: 20/Mar/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Harold Hausman Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None


 Description   

When I include `[org.clojure/data.xml "0.2.0-alpha2"]`,

I encounter these reflection warnings on compilation:
```
Reflection warning, clojure/data/xml/jvm/parse.clj:54:24 - call to method getNamespaceURI on javax.xml.stream.XMLStreamReader can't be resolved (argument types: unknown).
Reflection warning, clojure/data/xml/jvm/parse.clj:128:5 - call to method createXMLStreamReader can't be resolved (target class is unknown).
Reflection warning, clojure/data/xml/jvm/emit.clj:97:33 - call to method isLoggable can't be resolved (target class is unknown).
Reflection warning, clojure/data/xml/jvm/emit.clj:98:28 - call to method log can't be resolved (target class is unknown).
Reflection warning, clojure/data/xml/jvm/emit.clj:123:27 - reference to field writeEndElement can't be resolved.
Reflection warning, clojure/data/xml/jvm/emit.clj:125:38 - call to method writeCharacters can't be resolved (target class is unknown).
Reflection warning, clojure/data/xml/jvm/emit.clj:129:38 - call to method writeComment can't be resolved (target class is unknown).
```

Hope that helps. Thanks for this library, it is pleasant to work with and has saved me many hours.






[DXML-41] README - applicability of xml-seq, xml-zip? Created: 12/Feb/17  Updated: 14/Feb/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Phill Wolf Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Clojure's core library includes clojure.xml and two other very useful functions evidently designed to work with data from clojure.xml: xml-seq and clojure.zip/xml-zip.

Is it intended that xml-seq and xml-zip work with data from clojure.data.xml and, in particular, its release-0.2 XML-namespace-related improvements?

Let's enhance the clojure.data.xml README to clarify whether, or to what degree, it should be OK to use clojure.data.xml with xml-seq and xml-zip.



 Comments   
Comment by Herwig Hochleitner [ 14/Feb/17 5:17 AM ]

I have used clojure.data.xml with xml-zip (as well as with clojure.data.zip.xml) and it worked as expected. I'd expect the same from xml-seq.

We should verify this behavior in the test suite and announce it in the readme.
Patches welcome.





[DXML-40] README FileWriter example fails if platform default encoding is not UTF-8 Created: 12/Feb/17  Updated: 12/Feb/17

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Phill Wolf Assignee: Ryan Senior
Resolution: Unresolved Votes: 0
Labels: None


 Description   

README.md illustrates writing an XML file with java.io.FileWriter. Therefore the example works only if the Java platform's default encoding is UTF-8. Suggestion: The README would present a more widely usable technique by using clojure.java.io/writer, whose default encoding is UTF-8 everywhere.

Sample program using the README example:

Unable to find source-code formatter for language: clojure. Available languages are: javascript, sql, xhtml, actionscript, none, html, xml, java
(ns garble
  (:require [clojure.data.xml :refer [element emit]]))
(defn -main
  "Tries to write an XML file"
  []
  (let [tags (element :foo {:foo-attr "foo value"}
             (element :bar {:bar-attr "bar value"}
               (element :baz {} "The baz value")))]
  (with-open [out-file (java.io.FileWriter. "/tmp/foo.xml")]
    (emit tags out-file))))

Invocation 1 (overriding Java's default encoding because Java is inclined to use UTF-8 on my computer):

java -cp ... -Dfile.encoding=US-ASCII clojure.main -m garble

Result:

java.lang.Exception: Output encoding of stream (UTF-8) doesn't match declaration (ASCII)

Invocation 2:

java -cp ... -Dfile.encoding=UTF-8 clojure.main -m garble

Result: successfully writes /tmp/foo.xml






[DXML-22] Adding hiccup generation function for elements Created: 24/Feb/14  Updated: 07/Dec/16

Status: Open
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Chris Zheng Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

N/a



 Description   

This is for completeness really. See pull request https://github.com/clojure/data.xml/pull/10

I would like to:

  • generate an element using hiccup (already exists)
  • generate hiccup using an element (proposed)


 Comments   
Comment by Chris Zheng [ 28/Mar/14 7:22 AM ]

I'm hoping someone can at least give some feedback to this ticket.

Comment by Ryan Senior [ 28/Mar/14 7:53 AM ]

Hi Chris,

Thanks for the reminder on this. I'll have more time to dig in this weekend, but off the top of my head I think more will need to be done on this, both on implementation and on testing. I think what you have now won't work with comments or cdata. One way to flesh some of that out is to create round trip types of tests in src/test/clojure/clojure/data/xml/test_sexp.clj.





Generated at Thu May 25 00:09:21 CDT 2017 using JIRA 4.4#649-r158309.