[DXML-1] Stack overflow when parsing huge XML file Created: 10/Feb/12 Updated: 20/Mar/12 Resolved: 20/Mar/12 |
|
| Status: | Resolved |
| Project: | data.xml |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Defect | Priority: | Major |
| Reporter: | Justin Kramer | Assignee: | Ryan Senior |
| Resolution: | Completed | Votes: | 1 |
| Labels: | patch, | ||
| Environment: |
OS X |
||
| Attachments: |
|
| Patch: | Code and Test |
| Description |
|
This is using Ryan Senior's new 0.0.3-SNAPSHOT. While trying to parse a huge XML file (7.5 GB compressed, a dump of Wikipedia), got a stack overflow error. Some digging turned up this bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6440214 Modifying clojure.data.xml/source-seq to disable the IS_COALESCING property got rid of the error. The old lazy-xml contrib code worked (although used up tons more memory). Attached is a patch that adds keyword options to source-seq, parse, and parse-str, allowing the consumer to disable coalescing and sidestep the upstream bug. |
| Comments |
| Comment by Ryan Senior [ 20/Mar/12 8:05 AM ] |
|
Thanks Justin! |
[DXML-5] OutOfMemory errors when emitting large XML documents Created: 27/Apr/12 Updated: 26/Jun/12 Resolved: 26/Jun/12 |
|
| Status: | Resolved |
| Project: | data.xml |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Defect | Priority: | Major |
| Reporter: | Ryan Senior | Assignee: | Ryan Senior |
| Resolution: | Completed | Votes: | 0 |
| Labels: | None | ||
| Description |
|
Emitting large XML documents, fed from lazy-seqs in data.xml does not work. Currently, the lazy-seq is held in a defrecord, which holds onto the head of the lazy-seq and will force it to all be in memory (eventually consuming all available memory). Example code to reproduce the issue below: Unable to find source-code formatter for language: clojure. Available languages are: javascript, sql, xhtml, actionscript, none, html, xml, java (with-open [fw (java.io.FileWriter. "/tmp/lots-of-foo.xml")] (xml/emit (Element. :some-tags {} (map #(Element. :foo {} [(str "foo" %)]) (range 0 10000000))) fw)) |
| Comments |
| Comment by Ryan Senior [ 22/May/12 10:57 AM ] |
|
Fixed |
| Comment by Ryan Senior [ 26/Jun/12 12:18 PM ] |
|
Found this to be fixed only in the simplest case. If you have a large lazy-seq nested below 2+ tags it will hold onto the head of the lazy-seq and consume memory. |
| Comment by Ryan Senior [ 26/Jun/12 1:37 PM ] |
|
Added an intermediate step to emitting elements to the stream writer. Now elements get flattened to a stream of events that get written to the stream writer. |
| Comment by Ryan Senior [ 26/Jun/12 1:37 PM ] |
|
Not sure how to set a "Fix Version" in Jira, but this was fixed in 0.0.5 |
[DXML-3] Build release on JDK 1.6 Created: 17/Feb/12 Updated: 24/Feb/12 Resolved: 24/Feb/12 |
|
| Status: | Resolved |
| Project: | data.xml |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | Stuart Sierra | Assignee: | Alan Malloy |
| Resolution: | Completed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Approval: | Vetted |
| Description |
|
See https://groups.google.com/d/topic/clojure-dev/Z-wrRTcUs6U/discussion |
| Comments |
| Comment by Ryan Senior [ 20/Feb/12 9:58 PM ] |
|
Patch for adding JDK version to a Hudson job config |
| Comment by Stuart Sierra [ 24/Feb/12 3:13 PM ] |
|
Patch applied to build.ci. Rebuilding Hudson configs now. |
[DXML-6] data.xml tests fail on clojure 1.2.0 and 1.2.1 Created: 22/May/12 Updated: 22/May/12 Resolved: 22/May/12 |
|
| Status: | Resolved |
| Project: | data.xml |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Defect | Priority: | Major |
| Reporter: | Ryan Senior | Assignee: | Ryan Senior |
| Resolution: | Completed | Votes: | 0 |
| Labels: | None | ||
| Description |
|
See the test matrix here: http://build.clojure.org/job/data.xml-test-matrix/. Looks like the mixed-quotes test is to blame, just a reordering of attributes when they are emitted to a string. |
| Comments |
| Comment by Ryan Senior [ 22/May/12 12:54 PM ] |
|
Tests now run successfully on 1.2.0 and 1.2.1 |
[DXML-2] lein deps fails Created: 17/Feb/12 Updated: 22/May/12 Resolved: 22/May/12 |
|
| Status: | Resolved |
| Project: | data.xml |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Defect | Priority: | Major |
| Reporter: | Ralph Möritz | Assignee: | Ryan Senior |
| Resolution: | Declined | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Leiningen |
||
| Attachments: |
|
| Description |
|
C:\Users\ralphm\workspace\dbxml-env>lein version Unable to resolve artifact: Missing: Try downloading the file manually from the project website. Then, install it using the command: Alternatively, if you host your own repository you can deploy the file there: Path to dependency: ---------- for artifact: from the specified remote repositories: |
| Comments |
| Comment by Ryan Senior [ 20/Feb/12 10:03 PM ] |
|
As far as I know, there haven't been any releases of data.xml (SNAPSHOT or regular) to the maven repositories. I'm working on this and will hopefully have something out soon. |
| Comment by Ryan Senior [ 22/May/12 10:56 AM ] |
|
data.xml doesn't get deployed to clojars. Look for it in maven central: http://search.maven.org/#search|ga|1|data.xml . 0.0.3 is the most recent version released, but 0.0.4 will be released soon. |
[DXML-8] Cannot pass strings when keywords are expected Created: 27/Sep/12 Updated: 14/Nov/12 Resolved: 14/Nov/12 |
|
| Status: | Resolved |
| Project: | data.xml |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Defect | Priority: | Major |
| Reporter: | Brian Siebert | Assignee: | Ryan Senior |
| Resolution: | Completed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Windows 7 |
||
| Patch: | None |
| Description |
|
This error does not present till you attempt to emit xml that has a string where the element function is expecting a keyword. This is double hard to figure out at first because the error message is vague. I am requesting that the element function is allowed to use strings instead of keywords or the error message is cleaned up so that the "user" error is clear. |
| Comments |
| Comment by Ryan Senior [ 14/Nov/12 7:28 AM ] |
|
I have added this. Supporting keywords and strings seems to be common in some of the other contrib libraries. Now you can use the keyword :foo or the string "foo" for tags and attributes. |
[DXML-11] Support cdata with sexp-as-element Created: 21/Nov/12 Updated: 08/Jan/13 Resolved: 08/Jan/13 |
|
| Status: | Resolved |
| Project: | data.xml |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Enhancement | Priority: | Major |
| Reporter: | Jeff Weiss | Assignee: | Ryan Senior |
| Resolution: | Completed | Votes: | 0 |
| Labels: | None | ||
| Description |
|
prxml allowed something like this: (prxml [:foo [:cdata! "all my cdata"]]) It doesn't look like that is currently allowed in data.xml. It looked like maybe I could extend the AsElements protocol to get this behavior, but I couldn't quite figure it out, seems like I'd have to have access to the XmlStreamWriter to get the string representation of the cdata. |
| Comments |
| Comment by Ryan Senior [ 08/Jan/13 10:07 PM ] |
|
Added, released in 0.0.7 |
[DXML-12] Do the right thing if cdata content contains the cdata end-tag "]]>" Created: 21/Nov/12 Updated: 08/Jan/13 Resolved: 08/Jan/13 |
|
| Status: | Resolved |
| Project: | data.xml |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Enhancement | Priority: | Major |
| Reporter: | Jeff Weiss | Assignee: | Ryan Senior |
| Resolution: | Completed | Votes: | 0 |
| Labels: | None | ||
| Description |
|
(xml/emit-str (xml/cdata "fooo]]>bar")) "<?xml version=\"1.0\" encoding=\"UTF-8\"?><![CDATA[fooo]]>bar]]>" This is invalid xml. The contract for cdata states that it cannot contain the end tag "]]>", so if the cdata function gets passed content that contains it, it should do the right thing, which is probably this: http://stackoverflow.com/questions/223652/is-there-a-way-to-escape-a-cdata-end-token-in-xml (split the content so it is emitted as multiple cdata blocks, none of which contain the entire end-tag "]]>"). This is not a purely academic bug report - I actually hit this problem in prxml and fixed it on my fork. |
| Comments |
| Comment by Ryan Senior [ 08/Jan/13 10:07 PM ] |
|
Fixed, released in 0.0.7 |
[DXML-7] cannot change encoding when using the indent function Created: 27/Sep/12 Updated: 09/Oct/12 Resolved: 09/Oct/12 |
|
| Status: | Resolved |
| Project: | data.xml |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Defect | Priority: | Minor |
| Reporter: | Brian Siebert | Assignee: | Ryan Senior |
| Resolution: | Completed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Window 7 |
||
| Description |
|
When using the Indent function, and trying to change the encoding, an exception is thrown. java.lang.IllegalArgumentException: No value supplied for key: [:encoding "UTF-8"] This seems to be that the options are not being passed from indent to emit correctly. |
| Comments |
| Comment by Ryan Senior [ 09/Oct/12 10:46 PM ] |
|
Thanks for finding the bug. It's fixed in the repo and will be included in the next release. |
[DXML-9] Remove some use of reflection in data.xml Created: 28/Oct/12 Updated: 14/Nov/12 Resolved: 14/Nov/12 |
|
| Status: | Resolved |
| Project: | data.xml |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Enhancement | Priority: | Minor |
| Reporter: | Andy Fingerhut | Assignee: | Ryan Senior |
| Resolution: | Completed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Patch: | Accepted |
| Description |
|
There are a couple of occurrences of reflection in the data.xml library |
| Comments |
| Comment by Andy Fingerhut [ 28/Oct/12 6:10 PM ] |
|
dxml-9-remove-reflection-v1.txt dated Oct 28 2012 removes one use of reflection in data.xml. There is still one remaining, to which I have added a comment explaining why it cannot be removed with a single type hint. |
| Comment by Ryan Senior [ 14/Nov/12 7:26 AM ] |
|
Thanks Andy. Will be in the next release. |