<< Back to previous view

[DZIP-7] Make xml-> return empty strings instead of skipping non-matching nodes Created: 13/Apr/17  Updated: 13/Apr/17

Status: Open
Project: data.zip
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Bastien Guerry Assignee: Alex Miller
Resolution: Unresolved Votes: 0
Labels: data, xml, zip



xml-> returns a variable number of strings, depending on what the predicates match.

When using xml-> to filter XML content, it would be very handy to have xml-> return empty strings when there is no match for the predicates. This allows processing malformed XML more accurately.

I've created an example here:

[DZIP-6] Sub Entries with the same Name can't be selected Created: 19/Sep/16  Updated: 27/Oct/17

Status: Open
Project: data.zip
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Benjamin Peter Assignee: Alex Miller
Resolution: Unresolved Votes: 5
Labels: None

Leiningen 2.7.0 on Java 1.8.0_91 Java HotSpot(TM) 64-Bit Server VM

Attachments: File zip-xml-bug-descent.tgz     File zip-xml-bug.tgz    


I want to select the content of an XML element named "Group" which itself is in an element named "Group" using xml-zip/xml1. Instead of returning the content of the inner "Group" the outer "Group" element matches. The approach of how to select this does not work and I suspect this might be a defect.

Please see the minimal example:


<?xml version="1.0" encoding="utf-8" standalone="yes"?>

(zip-xml/xml1-> root :Root :Group :Group :Name zip-xml/text)

Leiningen project with unit-tests as attachment. Run: lein test

$ lein test
lein test :only zip-xml-bug.core-test/parsing-group-elements

FAIL in (parsing-group-elements) (core_test.clj:34)
selecting the name of inner
expected: (= "Inner" (zip-xml/xml1-> root :Root :Group :Group :Name zip-xml/text))
  actual: (not (= "Inner" "Outer"))

Ran 1 tests containing 2 assertions.
1 failures, 0 errors.
Tests failed.

Comment by Benjamin Peter [ 20/Sep/16 2:41 PM ]

I found a "workaround" as you can see below and in zip-xml-bug-descent.tgz

(defn descent=
  (fn [loc]
        (filter #(and (zip/branch? %) (= tagname (:tag (zip/node %))))

  (testing "selecting the name of inner using descent"
    (is (= "Inner"
           (-> (zip-xml/xml1-> root :Root :Group (descent= :Group) :Name zip-xml/text)))))

It seems the first expression in tag= matching the element itself in the or expression is the problem in my case. I suspect it can be used to select the root element. Is there any other need for it?

(defn tag=
  (fn [loc]
    (or (= tagname (:tag (zip/node loc)))
        (filter #(and (zip/branch? %) (= tagname (:tag (zip/node %))))
(zf/children-auto loc)))))

Maybe there should be a self predicate instead?

Comment by Denis Shilov [ 09/Apr/17 1:33 AM ]

It is regression from 0.1.1 to 0.1.2

Commit that changed behaviour is https://github.com/clojure/data.zip/commit/c5d6ca25c128f9fe937b11505c7c9736cfa2dd9a

Simple test to check

This works in 0.1.1

(def nestedxml
  (parse-str "<doc><area><area>1</area><unit>033</unit></area></doc>"))

(deftest same-nested-tags
  (is (= "1" (xml1-> nestedxml :area :area text)))
  (is (= "033" (xml1-> nestedxml :area :unit text))))

Related bug is DZIP-3

Comment by Bastien Guerry [ 12/Apr/17 5:52 AM ]

For what is worth, I've just been hit by this regression too.
I hope a proper fix can be released soon! Thanks in advance.

Comment by Benjamin Peter [ 12/Apr/17 6:02 AM ]

My example does not work with 0.1.1 either, I doubt it is just a regression. @Denis Shilov maybe you want to create another ticket for this.

Comment by Paul Dlug [ 06/Sep/17 3:32 PM ]

Any update on this? We're also hitting this bug.

Comment by Paul Dlug [ 06/Sep/17 3:46 PM ]

This worked for us with 0.2.0-alpha2 when copying the latest implementation of tag= without the or part as Benjamin Peter suggested. I'm not sure what the best fix is here seems tricky to accommodate the previous patch to allow it to match the root, clearly matching descendants it is more frequent case so perhaps a root= predicate is preferred over introducing something like descendant= through all the xml-> matchers. Of course if there's some fix to tag= which can support both that I'm not seeing that would be best but seems tricky.

Comment by Brian Stearns [ 02/Oct/17 11:03 PM ]

Ran into the same thing just today. Posted this (https://stackoverflow.com/questions/46535423/cant-access-deeply-nested-xml-with-clojure-data-zip-xml) a bit ago, but now that I've found this I know I'm not alone/crazy.

@bpeter thanks for the workaround. Does anyone know if fixing this is making it into 0.1.2/is there anything I can do to help make that happen?

Comment by Alex Miller [ 27/Oct/17 12:22 PM ]

Someone needs to dig in to see if there is a solution that lets people do what they want in DZIP-3 and here in DZIP-6. Does the patch here re-break the case in DZIP-3?

If so, then more work needs to be done to either find a solution that works for both or to decide whether one of these cases is not valid and shouldn't be supported, or to add something that lets you do both.

Bumping up a notch, I'd love to have someone signup to be an active maintainer for data.zip. I've helped out here on a drive-by approach, but I have no skin in this game. Given there are a bunch of (obviously) caring users here, it would be great to have help from one of you.

Generated at Sat Jan 20 15:18:20 CST 2018 using JIRA 4.4#649-r158309.