<< Back to previous view

[DXML-1] Stack overflow when parsing huge XML file Created: 10/Feb/12  Updated: 20/Mar/12  Resolved: 20/Mar/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Justin Kramer Assignee: Ryan Senior
Resolution: Completed Votes: 1
Labels: patch,
Environment:

OS X


Attachments: Text File data-xml-kwopts.patch    
Patch: Code and Test

 Description   

This is using Ryan Senior's new 0.0.3-SNAPSHOT.

While trying to parse a huge XML file (7.5 GB compressed, a dump of Wikipedia), got a stack overflow error. Some digging turned up this bug:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6440214

Modifying clojure.data.xml/source-seq to disable the IS_COALESCING property got rid of the error.

The old lazy-xml contrib code worked (although used up tons more memory).

Attached is a patch that adds keyword options to source-seq, parse, and parse-str, allowing the consumer to disable coalescing and sidestep the upstream bug.



 Comments   
Comment by Ryan Senior [ 20/Mar/12 8:05 AM ]

Thanks Justin!





[DXML-8] Cannot pass strings when keywords are expected Created: 27/Sep/12  Updated: 26/Jul/13  Resolved: 14/Nov/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Brian Siebert Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None
Environment:

Windows 7


Patch: Code

 Description   

This error does not present till you attempt to emit xml that has a string where the element function is expecting a keyword. This is double hard to figure out at first because the error message is vague. I am requesting that the element function is allowed to use strings instead of keywords or the error message is cleaned up so that the "user" error is clear.



 Comments   
Comment by Ryan Senior [ 14/Nov/12 7:28 AM ]

I have added this. Supporting keywords and strings seems to be common in some of the other contrib libraries. Now you can use the keyword :foo or the string "foo" for tags and attributes.





[DXML-5] OutOfMemory errors when emitting large XML documents Created: 27/Apr/12  Updated: 26/Jun/12  Resolved: 26/Jun/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Ryan Senior Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None


 Description   

Emitting large XML documents, fed from lazy-seqs in data.xml does not work. Currently, the lazy-seq is held in a defrecord, which holds onto the head of the lazy-seq and will force it to all be in memory (eventually consuming all available memory). Example code to reproduce the issue below:

Unable to find source-code formatter for language: clojure. Available languages are: javascript, sql, xhtml, actionscript, none, html, xml, java
(with-open [fw (java.io.FileWriter. "/tmp/lots-of-foo.xml")]
    (xml/emit
       (Element. :some-tags
           {}
           (map #(Element. :foo {} [(str "foo" %)])
                (range 0 10000000)))
       fw))


 Comments   
Comment by Ryan Senior [ 22/May/12 10:57 AM ]

Fixed

Comment by Ryan Senior [ 26/Jun/12 12:18 PM ]

Found this to be fixed only in the simplest case. If you have a large lazy-seq nested below 2+ tags it will hold onto the head of the lazy-seq and consume memory.

Comment by Ryan Senior [ 26/Jun/12 1:37 PM ]

Added an intermediate step to emitting elements to the stream writer. Now elements get flattened to a stream of events that get written to the stream writer.

Comment by Ryan Senior [ 26/Jun/12 1:37 PM ]

Not sure how to set a "Fix Version" in Jira, but this was fixed in 0.0.5





[DXML-11] Support cdata with sexp-as-element Created: 21/Nov/12  Updated: 08/Jan/13  Resolved: 08/Jan/13

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Jeff Weiss Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None


 Description   

prxml allowed something like this:

(prxml [:foo [:cdata! "all my cdata"]])

It doesn't look like that is currently allowed in data.xml. It looked like maybe I could extend the AsElements protocol to get this behavior, but I couldn't quite figure it out, seems like I'd have to have access to the XmlStreamWriter to get the string representation of the cdata.



 Comments   
Comment by Ryan Senior [ 08/Jan/13 10:07 PM ]

Added, released in 0.0.7





[DXML-12] Do the right thing if cdata content contains the cdata end-tag "]]>" Created: 21/Nov/12  Updated: 08/Jan/13  Resolved: 08/Jan/13

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Jeff Weiss Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None


 Description   

(xml/emit-str (xml/cdata "fooo]]>bar"))

"<?xml version=\"1.0\" encoding=\"UTF-8\"?><![CDATA[fooo]]>bar]]>"

This is invalid xml. The contract for cdata states that it cannot contain the end tag "]]>", so if the cdata function gets passed content that contains it, it should do the right thing, which is probably this:

http://stackoverflow.com/questions/223652/is-there-a-way-to-escape-a-cdata-end-token-in-xml

(split the content so it is emitted as multiple cdata blocks, none of which contain the entire end-tag "]]>").

This is not a purely academic bug report - I actually hit this problem in prxml and fixed it on my fork.



 Comments   
Comment by Ryan Senior [ 08/Jan/13 10:07 PM ]

Fixed, released in 0.0.7





[DXML-3] Build release on JDK 1.6 Created: 17/Feb/12  Updated: 24/Feb/12  Resolved: 24/Feb/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Stuart Sierra Assignee: Alan Malloy
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File jdk_16_jobs.patch    
Approval: Vetted

 Description   

See https://groups.google.com/d/topic/clojure-dev/Z-wrRTcUs6U/discussion



 Comments   
Comment by Ryan Senior [ 20/Feb/12 9:58 PM ]

Patch for adding JDK version to a Hudson job config

Comment by Stuart Sierra [ 24/Feb/12 3:13 PM ]

Patch applied to build.ci. Rebuilding Hudson configs now.





[DXML-6] data.xml tests fail on clojure 1.2.0 and 1.2.1 Created: 22/May/12  Updated: 22/May/12  Resolved: 22/May/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Ryan Senior Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None


 Description   

See the test matrix here: http://build.clojure.org/job/data.xml-test-matrix/. Looks like the mixed-quotes test is to blame, just a reordering of attributes when they are emitted to a string.



 Comments   
Comment by Ryan Senior [ 22/May/12 12:54 PM ]

Tests now run successfully on 1.2.0 and 1.2.1





[DXML-2] lein deps fails Created: 17/Feb/12  Updated: 22/May/12  Resolved: 22/May/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Ralph Möritz Assignee: Ryan Senior
Resolution: Declined Votes: 0
Labels: None
Environment:

Leiningen


Attachments: File project.clj    

 Description   

C:\Users\ralphm\workspace\dbxml-env>lein version
Leiningen 1.6.2 on Java 1.7.0_02 Java HotSpot(TM) Client VM
C:\Users\ralphm\workspace\dbxml-env>lein deps
Downloading: org/clojure/data.xml/0.0.2-SNAPSHOT/data.xml-0.0.2-SNAPSHOT.pom from repository clojars at http://clojars.org/repo/
Unable to locate resource in repository
[INFO] Unable to find resource 'org.clojure:data.xml:pom:0.0.2-SNAPSHOT' in repository clojars (http://clojars.org/repo/)
Downloading: org/clojure/data.xml/0.0.2-SNAPSHOT/data.xml-0.0.2-SNAPSHOT.jar from repository clojars at http://clojars.org/repo/
Unable to locate resource in repository
[INFO] Unable to find resource 'org.clojure:data.xml:jar:0.0.2-SNAPSHOT' in repository clojars (http://clojars.org/repo/)
An error has occurred while processing the Maven artifact tasks.
Diagnosis:

Unable to resolve artifact: Missing:
----------
1) org.clojure:data.xml:jar:0.0.2-SNAPSHOT

Try downloading the file manually from the project website.

Then, install it using the command:
mvn install:install-file -DgroupId=org.clojure -DartifactId=data.xml -Dversion=0.0.2-SNAPSHOT
-Dpackaging=jar -Dfile=/path/to/file

Alternatively, if you host your own repository you can deploy the file there:
mvn deploy:deploy-file -DgroupId=org.clojure -DartifactId=data.xml -Dversion=0.0.2-SNAPSHOT -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

Path to dependency:
1) org.apache.maven:super-pom:pom:2.0
2) org.clojure:data.xml:jar:0.0.2-SNAPSHOT

----------
1 required artifact is missing.

for artifact:
org.apache.maven:super-pom:pom:2.0

from the specified remote repositories:
central (http://repo1.maven.org/maven2),
clojars (http://clojars.org/repo/)



 Comments   
Comment by Ryan Senior [ 20/Feb/12 10:03 PM ]

As far as I know, there haven't been any releases of data.xml (SNAPSHOT or regular) to the maven repositories. I'm working on this and will hopefully have something out soon.

Comment by Ryan Senior [ 22/May/12 10:56 AM ]

data.xml doesn't get deployed to clojars. Look for it in maven central: http://search.maven.org/#search|ga|1|data.xml . 0.0.3 is the most recent version released, but 0.0.4 will be released soon.





[DXML-19] data.xml should ship a copy of the EPL license in LICENSE Created: 29/Jul/13  Updated: 14/Aug/13  Resolved: 14/Aug/13

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Wolodja Wentland Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None


 Description   

One requirement for licensing code under the EPL is that "a copy of this Agreement [the EPL]
must be included with each copy of the Program." [0]. Unfortunately data.xml does not comply
with this requirement even though its README.md claims that it is licensed under the EPL.

Please fix this issue and release a new version of data.xml as it is not legally distributable
in its current form.

[0] http://www.eclipse.org/legal/epl-v10.html → 3. REQUIREMENTS



 Comments   
Comment by Ryan Senior [ 14/Aug/13 11:51 PM ]

Good catch. I've added the EPL file, it will be in the next release.





[DXML-17] Embedded CDATA end tags are not properly handled Created: 19/Jun/13  Updated: 14/Aug/13  Resolved: 14/Aug/13

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Jeff Weiss Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File dxml17.patch    

 Description   

user> (xml/indent-str (xml/sexp-as-element [:pre [:-cdata "foo]]>bar"]]))
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><pre><![CDATA[foo]]><![CDATA[bar]]></pre>\n"

What's being emitted here is cdata for foobar not foo]]>bar.

What needs to be done is break up the embedded ]]> so that the first two characters are in one cdata block, and the last character is in the next block.

The tests are wrong, as far as I can tell. I think I have fixed the code and the tests, I just need to figure out how to run the tests and submit a patch.



 Comments   
Comment by Jeff Weiss [ 20/Jun/13 8:16 AM ]

Patch that fixes issue and tests

Comment by Jeff Weiss [ 20/Jun/13 8:18 AM ]

And just to clear up what the issue is, currently if the cdata contains the cdata end tag "]]>" it is just dropped and when the xml is read in those characters are gone.

That is not correct behavior, the cdata should be able to contain any arbitrary characters without any loss of data, and the attached patch will allow this.

Comment by Ryan Senior [ 14/Aug/13 11:52 PM ]

Thanks for the patch! It's been pushed up and will be in the next release.





[DXML-14] IllegalArgumentException when trying to emit a boolean value Created: 07/Mar/13  Updated: 10/Nov/13  Resolved: 10/Nov/13

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Ed O'Loughlin Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None
Environment:

JRE 1.7, OS X 10.7.5, Clojure 1.4 & 1.5, data.xml 0.0.7



 Description   

I can create an element with a boolean value but I can't emit it...

user=> (emit-str (element :something {} false))
IllegalArgumentException No implementation of method: :gen-event of protocol: #'clojure.data.xml/EventGeneration found for class: java.lang.Boolean clojure.core/-cache-protocol-fn (core_deftype.clj:541)



 Comments   
Comment by Ryan Senior [ 10/Nov/13 10:41 PM ]

Thanks for the bug report. The fix is in ffd6957baa0cf752fa0678be7f2a3393eab16739 and should be released with 0.0.8.





[DXML-7] cannot change encoding when using the indent function Created: 27/Sep/12  Updated: 09/Oct/12  Resolved: 09/Oct/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Brian Siebert Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None
Environment:

Window 7



 Description   

When using the Indent function, and trying to change the encoding, an exception is thrown.

java.lang.IllegalArgumentException: No value supplied for key: [:encoding "UTF-8"]

This seems to be that the options are not being passed from indent to emit correctly.



 Comments   
Comment by Ryan Senior [ 09/Oct/12 10:46 PM ]

Thanks for finding the bug. It's fixed in the repo and will be included in the next release.





[DXML-9] Remove some use of reflection in data.xml Created: 28/Oct/12  Updated: 26/Jul/13  Resolved: 14/Nov/12

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Andy Fingerhut Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File dxml-9-remove-reflection-v1.txt    
Patch: Code and Test

 Description   

There are a couple of occurrences of reflection in the data.xml library



 Comments   
Comment by Andy Fingerhut [ 28/Oct/12 6:10 PM ]

dxml-9-remove-reflection-v1.txt dated Oct 28 2012 removes one use of reflection in data.xml. There is still one remaining, to which I have added a comment explaining why it cannot be removed with a single type hint.

Comment by Ryan Senior [ 14/Nov/12 7:26 AM ]

Thanks Andy. Will be in the next release.





[DXML-16] Eliminate reflection in emit-cdata Created: 25/Apr/13  Updated: 14/Aug/13  Resolved: 14/Aug/13

Status: Resolved
Project: data.xml
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Andy Fingerhut Assignee: Ryan Senior
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File dxml-16-eliminate-relfection-in-emit-cdata-patch-v1.txt    

 Description   

Solvable with a type hint on emit-cdata arg 'writer'



 Comments   
Comment by Andy Fingerhut [ 25/Apr/13 1:30 PM ]

Patch dxml-16-eliminate-relfection-in-emit-cdata-patch-v1.txt dated Apr 25 2013 eliminates a couple of uses of reflection in function emit-cdata.

Comment by Ryan Senior [ 14/Aug/13 11:50 PM ]

Thanks Andy, just pushed up your patch.

Comment by Ryan Senior [ 14/Aug/13 11:53 PM ]

Accidentally marked as closed





Generated at Wed Apr 16 02:36:22 CDT 2014 using JIRA 4.4#649-r158309.