<< Back to previous view

[DXML-26] Disable external entities resolution in the default XML parser to prevent XXE attacks Created: 27/Aug/14  Updated: 28/Sep/14  Resolved: 28/Sep/14

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Critical
Reporter: Carlo Sciolla Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: security

Attachments: Text File 0001-Prevent-XXE-attacks-by-disabling-external-entities-r.patch    
Patch: Code and Test

 Description   

The default behavior of Java XML parsers is to happily resolve external XML entities, which exposes any application that processes unsecured XMLs to XXE vulnerabilities.

By default data.xml should initialize the XML parses with disabled XXE processing.



 Comments   
Comment by Ryan Senior [ 28/Sep/14 7:23 AM ]

Patch looks good, I've applied it. Thanks Carlo

Comment by Carlo Sciolla [ 28/Sep/14 11:51 AM ]

Great, thanks!





[DXML-1] Stack overflow when parsing huge XML file Created: 10/Feb/12  Updated: 20/Mar/12  Resolved: 20/Mar/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Justin Kramer Assignee: Ryan Senior
Resolution: Completed Votes: 1
Labels: patch,
Environment:

OS X


Attachments: Text File data-xml-kwopts.patch    
Patch: Code and Test

 Description   

This is using Ryan Senior's new 0.0.3-SNAPSHOT.

While trying to parse a huge XML file (7.5 GB compressed, a dump of Wikipedia), got a stack overflow error. Some digging turned up this bug:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6440214

Modifying clojure.data.xml/source-seq to disable the IS_COALESCING property got rid of the error.

The old lazy-xml contrib code worked (although used up tons more memory).

Attached is a patch that adds keyword options to source-seq, parse, and parse-str, allowing the consumer to disable coalescing and sidestep the upstream bug.



 Comments   
Comment by Ryan Senior [ 20/Mar/12 8:05 AM ]

Thanks Justin!





[DXML-8] Cannot pass strings when keywords are expected Created: 27/Sep/12  Updated: 26/Jul/13  Resolved: 14/Nov/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Brian Siebert Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None
Environment:

Windows 7


Patch: Code

 Description   

This error does not present till you attempt to emit xml that has a string where the element function is expecting a keyword. This is double hard to figure out at first because the error message is vague. I am requesting that the element function is allowed to use strings instead of keywords or the error message is cleaned up so that the "user" error is clear.



 Comments   
Comment by Ryan Senior [ 14/Nov/12 7:28 AM ]

I have added this. Supporting keywords and strings seems to be common in some of the other contrib libraries. Now you can use the keyword :foo or the string "foo" for tags and attributes.





[DXML-5] OutOfMemory errors when emitting large XML documents Created: 27/Apr/12  Updated: 26/Jun/12  Resolved: 26/Jun/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Ryan Senior Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None


 Description   

Emitting large XML documents, fed from lazy-seqs in data.xml does not work. Currently, the lazy-seq is held in a defrecord, which holds onto the head of the lazy-seq and will force it to all be in memory (eventually consuming all available memory). Example code to reproduce the issue below:

Unable to find source-code formatter for language: clojure. Available languages are: javascript, sql, xhtml, actionscript, none, html, xml, java
(with-open [fw (java.io.FileWriter. "/tmp/lots-of-foo.xml")]
    (xml/emit
       (Element. :some-tags
           {}
           (map #(Element. :foo {} [(str "foo" %)])
                (range 0 10000000)))
       fw))


 Comments   
Comment by Ryan Senior [ 22/May/12 10:57 AM ]

Fixed

Comment by Ryan Senior [ 26/Jun/12 12:18 PM ]

Found this to be fixed only in the simplest case. If you have a large lazy-seq nested below 2+ tags it will hold onto the head of the lazy-seq and consume memory.

Comment by Ryan Senior [ 26/Jun/12 1:37 PM ]

Added an intermediate step to emitting elements to the stream writer. Now elements get flattened to a stream of events that get written to the stream writer.

Comment by Ryan Senior [ 26/Jun/12 1:37 PM ]

Not sure how to set a "Fix Version" in Jira, but this was fixed in 0.0.5





[DXML-11] Support cdata with sexp-as-element Created: 21/Nov/12  Updated: 08/Jan/13  Resolved: 08/Jan/13

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Jeff Weiss Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None


 Description   

prxml allowed something like this:

(prxml [:foo [:cdata! "all my cdata"]])

It doesn't look like that is currently allowed in data.xml. It looked like maybe I could extend the AsElements protocol to get this behavior, but I couldn't quite figure it out, seems like I'd have to have access to the XmlStreamWriter to get the string representation of the cdata.



 Comments   
Comment by Ryan Senior [ 08/Jan/13 10:07 PM ]

Added, released in 0.0.7





[DXML-12] Do the right thing if cdata content contains the cdata end-tag "]]>" Created: 21/Nov/12  Updated: 08/Jan/13  Resolved: 08/Jan/13

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Jeff Weiss Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None


 Description   

(xml/emit-str (xml/cdata "fooo]]>bar"))

"<?xml version=\"1.0\" encoding=\"UTF-8\"?><![CDATA[fooo]]>bar]]>"

This is invalid xml. The contract for cdata states that it cannot contain the end tag "]]>", so if the cdata function gets passed content that contains it, it should do the right thing, which is probably this:

http://stackoverflow.com/questions/223652/is-there-a-way-to-escape-a-cdata-end-token-in-xml

(split the content so it is emitted as multiple cdata blocks, none of which contain the entire end-tag "]]>").

This is not a purely academic bug report - I actually hit this problem in prxml and fixed it on my fork.



 Comments   
Comment by Ryan Senior [ 08/Jan/13 10:07 PM ]

Fixed, released in 0.0.7





[DXML-3] Build release on JDK 1.6 Created: 17/Feb/12  Updated: 24/Feb/12  Resolved: 24/Feb/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Stuart Sierra Assignee: Alan Malloy
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File jdk_16_jobs.patch    
Approval: Vetted

 Description   

See https://groups.google.com/d/topic/clojure-dev/Z-wrRTcUs6U/discussion



 Comments   
Comment by Ryan Senior [ 20/Feb/12 9:58 PM ]

Patch for adding JDK version to a Hudson job config

Comment by Stuart Sierra [ 24/Feb/12 3:13 PM ]

Patch applied to build.ci. Rebuilding Hudson configs now.





[DXML-6] data.xml tests fail on clojure 1.2.0 and 1.2.1 Created: 22/May/12  Updated: 22/May/12  Resolved: 22/May/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Ryan Senior Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None


 Description   

See the test matrix here: http://build.clojure.org/job/data.xml-test-matrix/. Looks like the mixed-quotes test is to blame, just a reordering of attributes when they are emitted to a string.



 Comments   
Comment by Ryan Senior [ 22/May/12 12:54 PM ]

Tests now run successfully on 1.2.0 and 1.2.1





[DXML-2] lein deps fails Created: 17/Feb/12  Updated: 22/May/12  Resolved: 22/May/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Ralph Möritz Assignee: Ryan Senior
Resolution: Declined Votes: 0
Labels: None
Environment:

Leiningen


Attachments: File project.clj    

 Description   

C:\Users\ralphm\workspace\dbxml-env>lein version
Leiningen 1.6.2 on Java 1.7.0_02 Java HotSpot(TM) Client VM
C:\Users\ralphm\workspace\dbxml-env>lein deps
Downloading: org/clojure/data.xml/0.0.2-SNAPSHOT/data.xml-0.0.2-SNAPSHOT.pom from repository clojars at http://clojars.org/repo/
Unable to locate resource in repository
[INFO] Unable to find resource 'org.clojure:data.xml:pom:0.0.2-SNAPSHOT' in repository clojars (http://clojars.org/repo/)
Downloading: org/clojure/data.xml/0.0.2-SNAPSHOT/data.xml-0.0.2-SNAPSHOT.jar from repository clojars at http://clojars.org/repo/
Unable to locate resource in repository
[INFO] Unable to find resource 'org.clojure:data.xml:jar:0.0.2-SNAPSHOT' in repository clojars (http://clojars.org/repo/)
An error has occurred while processing the Maven artifact tasks.
Diagnosis:

Unable to resolve artifact: Missing:
----------
1) org.clojure:data.xml:jar:0.0.2-SNAPSHOT

Try downloading the file manually from the project website.

Then, install it using the command:
mvn install:install-file -DgroupId=org.clojure -DartifactId=data.xml -Dversion=0.0.2-SNAPSHOT
-Dpackaging=jar -Dfile=/path/to/file

Alternatively, if you host your own repository you can deploy the file there:
mvn deploy:deploy-file -DgroupId=org.clojure -DartifactId=data.xml -Dversion=0.0.2-SNAPSHOT -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

Path to dependency:
1) org.apache.maven:super-pom:pom:2.0
2) org.clojure:data.xml:jar:0.0.2-SNAPSHOT

----------
1 required artifact is missing.

for artifact:
org.apache.maven:super-pom:pom:2.0

from the specified remote repositories:
central (http://repo1.maven.org/maven2),
clojars (http://clojars.org/repo/)



 Comments   
Comment by Ryan Senior [ 20/Feb/12 10:03 PM ]

As far as I know, there haven't been any releases of data.xml (SNAPSHOT or regular) to the maven repositories. I'm working on this and will hopefully have something out soon.

Comment by Ryan Senior [ 22/May/12 10:56 AM ]

data.xml doesn't get deployed to clojars. Look for it in maven central: http://search.maven.org/#search|ga|1|data.xml . 0.0.3 is the most recent version released, but 0.0.4 will be released soon.





[DXML-19] data.xml should ship a copy of the EPL license in LICENSE Created: 29/Jul/13  Updated: 14/Aug/13  Resolved: 14/Aug/13

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Wolodja Wentland Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None


 Description   

One requirement for licensing code under the EPL is that "a copy of this Agreement [the EPL]
must be included with each copy of the Program." [0]. Unfortunately data.xml does not comply
with this requirement even though its README.md claims that it is licensed under the EPL.

Please fix this issue and release a new version of data.xml as it is not legally distributable
in its current form.

[0] http://www.eclipse.org/legal/epl-v10.html → 3. REQUIREMENTS



 Comments   
Comment by Ryan Senior [ 14/Aug/13 11:51 PM ]

Good catch. I've added the EPL file, it will be in the next release.





[DXML-17] Embedded CDATA end tags are not properly handled Created: 19/Jun/13  Updated: 14/Aug/13  Resolved: 14/Aug/13

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Jeff Weiss Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File dxml17.patch    

 Description   

user> (xml/indent-str (xml/sexp-as-element [:pre [:-cdata "foo]]>bar"]]))
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><pre><![CDATA[foo]]><![CDATA[bar]]></pre>\n"

What's being emitted here is cdata for foobar not foo]]>bar.

What needs to be done is break up the embedded ]]> so that the first two characters are in one cdata block, and the last character is in the next block.

The tests are wrong, as far as I can tell. I think I have fixed the code and the tests, I just need to figure out how to run the tests and submit a patch.



 Comments   
Comment by Jeff Weiss [ 20/Jun/13 8:16 AM ]

Patch that fixes issue and tests

Comment by Jeff Weiss [ 20/Jun/13 8:18 AM ]

And just to clear up what the issue is, currently if the cdata contains the cdata end tag "]]>" it is just dropped and when the xml is read in those characters are gone.

That is not correct behavior, the cdata should be able to contain any arbitrary characters without any loss of data, and the attached patch will allow this.

Comment by Ryan Senior [ 14/Aug/13 11:52 PM ]

Thanks for the patch! It's been pushed up and will be in the next release.





[DXML-14] IllegalArgumentException when trying to emit a boolean value Created: 07/Mar/13  Updated: 10/Nov/13  Resolved: 10/Nov/13

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Ed O'Loughlin Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None
Environment:

JRE 1.7, OS X 10.7.5, Clojure 1.4 & 1.5, data.xml 0.0.7



 Description   

I can create an element with a boolean value but I can't emit it...

user=> (emit-str (element :something {} false))
IllegalArgumentException No implementation of method: :gen-event of protocol: #'clojure.data.xml/EventGeneration found for class: java.lang.Boolean clojure.core/-cache-protocol-fn (core_deftype.clj:541)



 Comments   
Comment by Ryan Senior [ 10/Nov/13 10:41 PM ]

Thanks for the bug report. The fix is in ffd6957baa0cf752fa0678be7f2a3393eab16739 and should be released with 0.0.8.





[DXML-24] parse can be extremely slow for certain input data Created: 07/Jun/14  Updated: 28/Sep/14  Resolved: 28/Sep/14

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Sean Corfield Assignee: Ryan Senior
Resolution: Declined Votes: 0
Labels: None


 Description   

I'm still doing some experiments but parse seems to take a very long time to deal with this URL http://www.cybletechnologies.com/?feed=rss2 and I wonder if it's due to huge CDATA piece containing JS code?

I'll do some more experimentation to narrow it down but wanted to get at least a placeholder bug in play in case this was a known issue.



 Comments   
Comment by Ryan Senior [ 28/Sep/14 10:09 AM ]

I profiled this. The problem looks to be with the DTD calls. I've not done a lot of stuff with DTD, but it looks like the StAX parser is making a bunch of HTTP calls for things referenced by the DTD. First it pulls in http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd, my guess is it's resolving a bunch of stuff referenced from that DTD. The parse did eventually finish but took around 10 minutes on my laptop.

If I pass in :support-dtd false to the parse call, it returns very quickly for me, around 4 milliseconds.

(parse (java.io.FileInputStream. "path/to/file.html") :support-dtd false)

I'm going to close this as the behavior seems to be correct from a StAX perspective, and it :support-dtd false seems to be a pretty reasonable work around.





[DXML-34] Issue parsing and then emitting xml:lang attribute. Created: 01/Sep/16  Updated: 03/Oct/16  Resolved: 03/Oct/16

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Ferris Ellis Assignee: Ryan Senior
Resolution: Declined Votes: 0
Labels: bug
Environment:

Using org.clojure/data.xml "0.0.8", org.clojure/clojure "1.9.0-alpha10", and Leiningen 2.6.1 on Java 1.8.0_76 OpenJDK 64-Bit Server VM.



 Description   

There is an issue occurring somewhere in the chain of:

```
(require '[clojure.data.xml :as xml])
(-> "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo xml:lang=\"en\"></foo>" xml/parse-str pprint)
```

Which results in the following error:

```
XMLStreamException Prefix cannot be null com.sun.xml.internal.stream.writers.XMLStreamWriterImpl.writeAttribute (XMLStreamWriterImpl.java:575)
```

It appears, after some brief exploration, that this comes from emitting the string which is parsed into a keyword with a namespace. The following results show this:

```
user=> (-> "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo xml:lang=\"en\"></foo>" xml/parse-str pprint)
{:tag :foo, :attrs {:xml/lang "en"}, :content ()}
```

and

```
user=> (xml/emit-str (xml/element :foo {:xml:lang "en"}))
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo xml:lang=\"en\"></foo>"
```



 Comments   
Comment by Ferris Ellis [ 01/Sep/16 1:13 PM ]

There is an error in my report, the first code example, related to what causes the error, should read:

(-> "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo xml:lang=\"en\"></foo>" xml/parse-str xml/emit-str)

Comment by Herwig Hochleitner [ 03/Oct/16 5:47 PM ]

data.xml-0.0.8 doesn't have namespace support. Your first example works, starting with -beta1
Your second example works, because you're directly emitting xml:lang and the xml: prefix happens to be implicitly defined in xml.

Comment by Herwig Hochleitner [ 03/Oct/16 5:49 PM ]

Won't fix for data.xml-0.0.x





[DXML-15] data.xml can't parse own output if there's a colon in an attribute name Created: 03/Apr/13  Updated: 03/Oct/16  Resolved: 03/Oct/16

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: ben wolfson Assignee: Ryan Senior
Resolution: Completed Votes: 1
Labels: None
Environment:

data.xml 0.0.7



 Description   

Observe:

> (x/emit-str (x/element :NC {"xmlns" "http://example.com" "xmlns:xsi" "http://www.w3.org/2001/XMLSchema-instance" "xsi:schemaLocation" "http://www.example.com/schema.xsd"} (x/element :Foo {} "bar")))
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><NC xsi:schemaLocation=\"http://www.example.com/schema.xsd\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns=\"http://example.com\"><Foo>bar</Foo></NC>"
> (x/parse-str *1)
#clojure.data.xml.Element{:tag :NC, :attrs {:xsi/schemaLocation "http://www.example.com/schema.xsd"}, :content (#clojure.data.xml.Element{:tag :Foo, :attrs {}, :content ("bar")})}
a> (x/emit-str *1)
XMLStreamException Prefix cannot be null com.sun.xml.internal.stream.writers.XMLStreamWriterImpl.writeAttribute (XMLStreamWriterImpl.java:574)
app.services.external.experian.internal.test-data>

(a) the xmlns and xmlns:xsi attributes have disappeared. Not the point of this issue but worth pointing out.
(b) "xsi:schemaLocation" has become :xsi/schemaLocation
(c) emitting a string blows up.



 Comments   
Comment by Herwig Hochleitner [ 03/Oct/16 6:11 PM ]

This should be resolved with the namespacing support in the 0.1.0 betas.





[DXML-7] cannot change encoding when using the indent function Created: 27/Sep/12  Updated: 09/Oct/12  Resolved: 09/Oct/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Brian Siebert Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None
Environment:

Window 7



 Description   

When using the Indent function, and trying to change the encoding, an exception is thrown.

java.lang.IllegalArgumentException: No value supplied for key: [:encoding "UTF-8"]

This seems to be that the options are not being passed from indent to emit correctly.



 Comments   
Comment by Ryan Senior [ 09/Oct/12 10:46 PM ]

Thanks for finding the bug. It's fixed in the repo and will be included in the next release.





[DXML-9] Remove some use of reflection in data.xml Created: 28/Oct/12  Updated: 26/Jul/13  Resolved: 14/Nov/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Andy Fingerhut Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File dxml-9-remove-reflection-v1.txt    
Patch: Code and Test

 Description   

There are a couple of occurrences of reflection in the data.xml library



 Comments   
Comment by Andy Fingerhut [ 28/Oct/12 6:10 PM ]

dxml-9-remove-reflection-v1.txt dated Oct 28 2012 removes one use of reflection in data.xml. There is still one remaining, to which I have added a comment explaining why it cannot be removed with a single type hint.

Comment by Ryan Senior [ 14/Nov/12 7:26 AM ]

Thanks Andy. Will be in the next release.





[DXML-16] Eliminate reflection in emit-cdata Created: 25/Apr/13  Updated: 14/Aug/13  Resolved: 14/Aug/13

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Andy Fingerhut Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File dxml-16-eliminate-relfection-in-emit-cdata-patch-v1.txt    

 Description   

Solvable with a type hint on emit-cdata arg 'writer'



 Comments   
Comment by Andy Fingerhut [ 25/Apr/13 1:30 PM ]

Patch dxml-16-eliminate-relfection-in-emit-cdata-patch-v1.txt dated Apr 25 2013 eliminates a couple of uses of reflection in function emit-cdata.

Comment by Ryan Senior [ 14/Aug/13 11:50 PM ]

Thanks Andy, just pushed up your patch.

Comment by Ryan Senior [ 14/Aug/13 11:53 PM ]

Accidentally marked as closed





[DXML-21] Some unit tests are never run because their names are the same Created: 23/Dec/13  Updated: 16/Apr/14  Resolved: 16/Apr/14

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Andy Fingerhut Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None

Attachments: File dxml-21-v1.diff    

 Description   

Whenever two deftest statements have the same name, the first is ignored and its tests are never run.

In namespace clojure.data.xml.test-emit, there are two deftests with name defaults, and two with the name test-indent.

Found using pre-release version of Eastwood Clojure lint tool.



 Comments   
Comment by Andy Fingerhut [ 23/Dec/13 5:50 PM ]

Patch dxml-21-v1.diff makes all deftest names unique. I have not attempted to correct any new failing tests, if any.

Comment by Ryan Senior [ 16/Apr/14 4:18 PM ]

Patch applied





[DXML-36] Make location info like line number, column number accessible from elements Created: 16/Nov/16  Updated: 27/Nov/16  Resolved: 27/Nov/16

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Tamas Jung Assignee: Herwig Hochleitner
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File 0001-add-location-info.patch     Text File 0002-get-location-at-the-beginning-of-an-element.patch    
Patch: Code and Test

 Description   

In the domain of manual validation, error reporting accessing the line number is crucial. The patch makes this possible.

(deftest test-location-meta
(let [input "<a/>"]
(is (= 1 (-> input (parse-str :with-location-meta true) meta :line-number)))))

https://github.com/clojure/data.xml/compare/master...lambdawerk:add-location-meta?expand=1



 Comments   
Comment by Herwig Hochleitner [ 19/Nov/16 3:09 PM ]

I'd welcome that change, but I'm having a hard time reviewing your patch. It looks like it includes a re-indent of pull-seq. Could you get rid of that, or split it into a second commit, to make the patch easier to consume?

Also, assuming you already signed a CA, could you attach your changes to this ticket via `git format-patch`?

thanks

Comment by Tamas Jung [ 20/Nov/16 8:55 AM ]

Just signed the the CA, reduced the identation change, add w=1 to the url:
https://github.com/clojure/data.xml/compare/master...lambdawerk:add-location-meta?expand=1&w=1

If you like to keep lines under 80 as I do then do not apply the 0002 patch.

Comment by Herwig Hochleitner [ 22/Nov/16 9:22 PM ]

I didn't know about github's `w=1`. Nifty.

How do you feel about changing the api to:

  • (parse-str _ :location-info false)
    ;; defaults to true
  • ;; if true
    -> ^{:clojure.data.xml/location-info {:character-offset _ :column-number _ :line-number _}} _

?

Concerning the patch:

  • Events don't need equality, so you can add an event slot for location info
  • `pull-seq` is internal, so you can just change its arguments. This should also get rid of the massive indentation change.
    I'd merge `include-node?` and the argument for location info into an `opts` argument

Additional bikeshedding, feel free to ignore:

  • For the opts argument of `event-tree`, consider
    (defn event-tree [_ {:keys [event-element event-exit? event-node]
                         :or {event-element tree/event-element .... }}] ...)

    This lets the caller know, what options there are, and what the default values are.

[EDIT ADD]

  • That `with-meta` + `merge` pattern in `event-element-with-meta` is covered by `vary-meta` + `into`, but adding location metadata should probably just be done by `event-element` (if present on event), in order to avoid the additional allocation, induced by `with-meta`.
Comment by Tamas Jung [ 23/Nov/16 9:35 AM ]

A squashed patch, ignore the previous ones.

Comment by Tamas Jung [ 23/Nov/16 9:38 AM ]

Thank you for the comments. Attached 0001-add-location-info.patch, a squashed patch.

Comment by Herwig Hochleitner [ 23/Nov/16 1:19 PM ]

Thanks for the new patch. It seems that we are mostly on the same page now, as far as functionality goes, though I'm still not convinced of the necessity of a separate `event-element-with-meta`, when `event-element` could already find location-info (or their lack) on events and could then construct nodes with the appropriate metadata from the start. Remember, `with-meta` or `vary-meta` will copy the object, which seems wasteful for a function that gets called for every START_ELEMENT event, especially now that adding location info is the default.

Your editor still seems to disagree about the amount of whitespace to use for indentation. That's not a problem per se, but including all those changed lines, that only differ in whitespace, into your patch, concerns me because it makes life harder not only for reviewers here, but also for anybody trying to look at the revision history, be it via `git log`, `git blame`, ...

Please try to submit minimal and focused patches. That is, patches where every single change is justified by the goal stated in the commit message.

I truly regret git's diffs being line-based and not based on syntax trees, but we have to work within the constraints of our preset process.
I hope that my nit-picking doesn't seem petty to you, my hope is that I can help you to avoid similar friction in your hopefully many future contributions.

If you want any further guidance before submitting another patch, feel free to contact me @bendlas on irc, github, ...

Comment by Tamas Jung [ 24/Nov/16 4:44 AM ]

Refreshed the patch. See this new branch: https://github.com/clojure/data.xml/compare/master...lambdawerk:add-location-info?expand=1 (without w=1 !!). Agree, the diff shows a much cleaner picture now, blame is not cheated, the whole business with higher order options is removed.

Comment by Herwig Hochleitner [ 24/Nov/16 10:39 PM ]

Thanks! Tests still run and formatting is ok now.
The issue that you mention in the test files, about character offset and column number being "odd":
After some investigation, it seems to me that .getLocation returns the current stream position of the reader. So if we wanted to locate the beginning of each START_ELEMENT, it might make sense to record the latest location for each event and base any encountered START_ELEMENT's location on that, what do you think?

Comment by Tamas Jung [ 25/Nov/16 3:22 AM ]

Yeah, absolutely makes sense, pls find the additional patch. Unfortunately ignoring whitespaces in diff comes in handy again
https://github.com/clojure/data.xml/compare/master...lambdawerk:add-location-info?expand=1&w=1

Comment by Herwig Hochleitner [ 27/Nov/16 10:43 AM ]

Squashed the patches and added a ticket number to the commit message: https://github.com/clojure/data.xml/commit/144f799507a5041a00ba683681ef41300eb7c9ca

Thanks!





[DXML-29] ClojureScript support Created: 13/Nov/15  Updated: 07/Dec/16  Resolved: 07/Dec/16

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Artem Yarulin Assignee: Herwig Hochleitner
Resolution: Completed Votes: 0
Labels: None


 Description   

Parsing XML(and HTML) is very often task and currently it requires a lot of effort and use of different libraries and wrappers around JS libs.

We can dramatically simplify it if we migrate data.xml to ClojureScript.

I'm maintaining koh library which has proof-of-concept XML support for Node and Browsers https://github.com/artemyarulin/koh/blob/master/src/koh/xml.cljs

The general idea is that we can utilise DOMParser object in order to parse string to XML tree. It has a decent browser support https://developer.mozilla.org/en-US/docs/Web/API/DOMParser and has a NodeJS implementation which follows the same API https://github.com/jindw/xmldom

What do you think about this feature?
I can pretty much follow the same guidelines from http://dev.clojure.org/jira/browse/DZIP-5 and implement it in the same way

Thanks



 Comments   
Comment by Herwig Hochleitner [ 04/Aug/16 2:40 PM ]

I'm working on clojurescript support in a separate branch: https://github.com/clojure/data.xml/tree/cljs
It can extend browser-native DOM objects to work with clojure data protocols.
Let me know what you think.

Comment by Artem Yarulin [ 05/Aug/16 4:18 AM ]

Oh, that's super cool! If you need any testing - feel free to ping me, I'll be happy to help

Comment by Herwig Hochleitner [ 08/Aug/16 1:37 PM ]

Thanks for the offer! In fact, the clojurescript support is in a testable state right now and I'm utilizing it for a cljs webdav library: https://github.com/webnf/davstore/blob/master/src/server/webnf/davstore/dav/xml.cljc

The main thing preventing it from being ready is, that I've utilized .cljc and reader tags. As of my last information, our CI doesn't support .cljc yet, for the test matrix and also it's unclear how to target clojure versions < 1.7

I'd be happy to the clojurescript support tested, though. We can still copy+paste into separate clj and cljs files, when we decide to release, if cljc support for contrib projects isn't ready by then.

Comment by Herwig Hochleitner [ 07/Dec/16 9:48 PM ]

will be released as part of 0.2.0-alpha1





Generated at Fri Dec 09 02:11:47 CST 2016 using JIRA 4.4#649-r158309.