<< Back to previous view

[DZIP-7] Make xml-> return empty strings instead of skipping non-matching nodes Created: 13/Apr/17  Updated: 13/Apr/17

Status: Open
Project: data.zip
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Bastien Guerry Assignee: Alex Miller
Resolution: Unresolved Votes: 0
Labels: data, xml, zip
Environment:

Any



 Description   

xml-> returns a variable number of strings, depending on what the predicates match.

When using xml-> to filter XML content, it would be very handy to have xml-> return empty strings when there is no match for the predicates. This allows processing malformed XML more accurately.

I've created an example here:
https://gist.github.com/bzg/a35c4f986583e490480b5932d601ffed






[DZIP-6] Sub Entries with the same Name can't be selected Created: 19/Sep/16  Updated: 02/Oct/17

Status: Open
Project: data.zip
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Benjamin Peter Assignee: Alex Miller
Resolution: Unresolved Votes: 4
Labels: None
Environment:

Leiningen 2.7.0 on Java 1.8.0_91 Java HotSpot(TM) 64-Bit Server VM


Attachments: File zip-xml-bug-descent.tgz     File zip-xml-bug.tgz    

 Description   

I want to select the content of an XML element named "Group" which itself is in an element named "Group" using xml-zip/xml1. Instead of returning the content of the inner "Group" the outer "Group" element matches. The approach of how to select this does not work and I suspect this might be a defect.

Please see the minimal example:

XML:

root:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Root>
  <Group>
    <Name>Outer</Name>
    <Group>
      <Name>Inner</Name>
    </Group>
  </Group>
</Root>

(zip-xml/xml1-> root :Root :Group :Group :Name zip-xml/text)
"Outer"

Leiningen project with unit-tests as attachment. Run: lein test

$ lein test
lein test :only zip-xml-bug.core-test/parsing-group-elements

FAIL in (parsing-group-elements) (core_test.clj:34)
selecting the name of inner
expected: (= "Inner" (zip-xml/xml1-> root :Root :Group :Group :Name zip-xml/text))
  actual: (not (= "Inner" "Outer"))

Ran 1 tests containing 2 assertions.
1 failures, 0 errors.
Tests failed.


 Comments   
Comment by Benjamin Peter [ 20/Sep/16 2:41 PM ]

I found a "workaround" as you can see below and in zip-xml-bug-descent.tgz

(defn descent=
  [tagname]
  (fn [loc]
        (filter #(and (zip/branch? %) (= tagname (:tag (zip/node %))))

  (testing "selecting the name of inner using descent"
    (is (= "Inner"
           (-> (zip-xml/xml1-> root :Root :Group (descent= :Group) :Name zip-xml/text)))))

It seems the first expression in tag= matching the element itself in the or expression is the problem in my case. I suspect it can be used to select the root element. Is there any other need for it?

(defn tag=
  [tagname]
  (fn [loc]
    (or (= tagname (:tag (zip/node loc)))
        (filter #(and (zip/branch? %) (= tagname (:tag (zip/node %))))
(zf/children-auto loc)))))

Maybe there should be a self predicate instead?

Comment by Denis Shilov [ 09/Apr/17 1:33 AM ]

It is regression from 0.1.1 to 0.1.2

Commit that changed behaviour is https://github.com/clojure/data.zip/commit/c5d6ca25c128f9fe937b11505c7c9736cfa2dd9a

Simple test to check

This works in 0.1.1

(def nestedxml
  (parse-str "<doc><area><area>1</area><unit>033</unit></area></doc>"))

(deftest same-nested-tags
  (is (= "1" (xml1-> nestedxml :area :area text)))
  (is (= "033" (xml1-> nestedxml :area :unit text))))

Related bug is DZIP-3

Comment by Bastien Guerry [ 12/Apr/17 5:52 AM ]

For what is worth, I've just been hit by this regression too.
I hope a proper fix can be released soon! Thanks in advance.

Comment by Benjamin Peter [ 12/Apr/17 6:02 AM ]

My example does not work with 0.1.1 either, I doubt it is just a regression. @Denis Shilov maybe you want to create another ticket for this.

Comment by Paul Dlug [ 06/Sep/17 3:32 PM ]

Any update on this? We're also hitting this bug.

Comment by Paul Dlug [ 06/Sep/17 3:46 PM ]

This worked for us with 0.2.0-alpha2 when copying the latest implementation of tag= without the or part as Benjamin Peter suggested. I'm not sure what the best fix is here seems tricky to accommodate the previous patch to allow it to match the root, clearly matching descendants it is more frequent case so perhaps a root= predicate is preferred over introducing something like descendant= through all the xml-> matchers. Of course if there's some fix to tag= which can support both that I'm not seeing that would be best but seems tricky.

Comment by Brian Stearns [ 02/Oct/17 11:03 PM ]

Ran into the same thing just today. Posted this (https://stackoverflow.com/questions/46535423/cant-access-deeply-nested-xml-with-clojure-data-zip-xml) a bit ago, but now that I've found this I know I'm not alone/crazy.

@bpeter thanks for the workaround. Does anyone know if fixing this is making it into 0.1.2/is there anything I can do to help make that happen?





Generated at Sun Oct 22 01:42:50 CDT 2017 using JIRA 4.4#649-r158309.