data.xml

Stack overflow when parsing huge XML file

Details

  • Type: Defect Defect
  • Status: Resolved Resolved
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Environment:
    OS X
  • Patch:
    Code and Test

Description

This is using Ryan Senior's new 0.0.3-SNAPSHOT.

While trying to parse a huge XML file (7.5 GB compressed, a dump of Wikipedia), got a stack overflow error. Some digging turned up this bug:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6440214

Modifying clojure.data.xml/source-seq to disable the IS_COALESCING property got rid of the error.

The old lazy-xml contrib code worked (although used up tons more memory).

Attached is a patch that adds keyword options to source-seq, parse, and parse-str, allowing the consumer to disable coalescing and sidestep the upstream bug.

Activity

Alan Malloy made changes -
Field Original Value New Value
Assignee Alan Malloy [ amalloy ] Ryan Senior [ ryansenior ]
Ryan Senior made changes -
Status Open [ 1 ] In Progress [ 3 ]
Ryan Senior made changes -
Resolution Completed [ 1 ]
Status In Progress [ 3 ] Resolved [ 5 ]

People

Vote (1)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: