<< Back to previous view

[DZIP-7] Make xml-> return empty strings instead of skipping non-matching nodes Created: 13/Apr/17  Updated: 13/Apr/17

Status: Open
Project: data.zip
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Bastien Guerry Assignee: Alex Miller
Resolution: Unresolved Votes: 0
Labels: data, xml, zip



xml-> returns a variable number of strings, depending on what the predicates match.

When using xml-> to filter XML content, it would be very handy to have xml-> return empty strings when there is no match for the predicates. This allows processing malformed XML more accurately.

I've created an example here:

[DZIP-6] Sub Entries with the same Name can't be selected Created: 19/Sep/16  Updated: 12/Apr/17

Status: Open
Project: data.zip
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Benjamin Peter Assignee: Alex Miller
Resolution: Unresolved Votes: 1
Labels: None

Leiningen 2.7.0 on Java 1.8.0_91 Java HotSpot(TM) 64-Bit Server VM

Attachments: File zip-xml-bug-descent.tgz     File zip-xml-bug.tgz    


I want to select the content of an XML element named "Group" which itself is in an element named "Group" using xml-zip/xml1. Instead of returning the content of the inner "Group" the outer "Group" element matches. The approach of how to select this does not work and I suspect this might be a defect.

Please see the minimal example:


<?xml version="1.0" encoding="utf-8" standalone="yes"?>

(zip-xml/xml1-> root :Root :Group :Group :Name zip-xml/text)

Leiningen project with unit-tests as attachment. Run: lein test

$ lein test
lein test :only zip-xml-bug.core-test/parsing-group-elements

FAIL in (parsing-group-elements) (core_test.clj:34)
selecting the name of inner
expected: (= "Inner" (zip-xml/xml1-> root :Root :Group :Group :Name zip-xml/text))
  actual: (not (= "Inner" "Outer"))

Ran 1 tests containing 2 assertions.
1 failures, 0 errors.
Tests failed.

Comment by Benjamin Peter [ 20/Sep/16 2:41 PM ]

I found a "workaround" as you can see below and in zip-xml-bug-descent.tgz

(defn descent=
  (fn [loc]
        (filter #(and (zip/branch? %) (= tagname (:tag (zip/node %))))

  (testing "selecting the name of inner using descent"
    (is (= "Inner"
           (-> (zip-xml/xml1-> root :Root :Group (descent= :Group) :Name zip-xml/text)))))

It seems the first expression in tag= matching the element itself in the or expression is the problem in my case. I suspect it can be used to select the root element. Is there any other need for it?

(defn tag=
  (fn [loc]
    (or (= tagname (:tag (zip/node loc)))
        (filter #(and (zip/branch? %) (= tagname (:tag (zip/node %))))
(zf/children-auto loc)))))

Maybe there should be a self predicate instead?

Comment by Denis Shilov [ 09/Apr/17 1:33 AM ]

It is regression from 0.1.1 to 0.1.2

Commit that changed behaviour is https://github.com/clojure/data.zip/commit/c5d6ca25c128f9fe937b11505c7c9736cfa2dd9a

Simple test to check

This works in 0.1.1

(def nestedxml
  (parse-str "<doc><area><area>1</area><unit>033</unit></area></doc>"))

(deftest same-nested-tags
  (is (= "1" (xml1-> nestedxml :area :area text)))
  (is (= "033" (xml1-> nestedxml :area :unit text))))

Related bug is DZIP-3

Comment by Bastien Guerry [ 12/Apr/17 5:52 AM ]

For what is worth, I've just been hit by this regression too.
I hope a proper fix can be released soon! Thanks in advance.

Comment by Benjamin Peter [ 12/Apr/17 6:02 AM ]

My example does not work with 0.1.1 either, I doubt it is just a regression. @Denis Shilov maybe you want to create another ticket for this.

Generated at Tue Jul 25 13:45:08 CDT 2017 using JIRA 4.4#649-r158309.