Details
-
Type:
Enhancement
-
Status:
Closed
-
Resolution: Declined
-
Affects Version/s: None
-
Fix Version/s: Backlog
-
Component/s: None
-
Labels:None
-
Approval:Test
Description
Reported by dimi...@gashinsky.com, May 09, 2009 ;; A lot of times I needed a padding option on the partition. This is ;; my attempt to solve this problem. Any suggestions are welcome. I ;; hope this patch or something similar will make its way into the ;; core. Some discussion that happened here: http://groups.google.com/group/clojure/browse_frm/thread/6fcc1dd999a5ec02?tvc=1 was integrated into the patch.
Activity
chouser@n01se.net said: It looks to me like the 3- and 4-arg bodies could be combined resulting in less code and no significant loss of performance. A pad of nil could be treated the same as no pad supplied, which would be different from a numeric pad (including 0).
scgilardi said: By "combined" do you mean that the 3 argument version should call the 4 argument version with an explicit pad of nil?
Currently, the 3 argument version does no padding. Instead it stops as soon as there are less than n args left:
user=> (partition 3 [1 2 3 4]) ((1 2 3)) </code></pre> In contrast, the 4 argument version with a pad of nil produces (untested): <pre><code>user=> (partition 3 nil [1 2 3 4]) ((1 2 3) (4 nil nil))
Interpreting nil passed to the 4 argument version as a request to get the behavior of the 3 argument version looks wrong to me. There would be no way to express both a desire for padding and that the padding value should be nil.
user=> (partition 3 [1 2 3 4]) ((1 2 3)) </code></pre> In contrast, the 4 argument version with a pad of nil produces (untested): <pre><code>user=> (partition 3 nil [1 2 3 4]) ((1 2 3) (4 nil nil))
scgilardi said: Looking into the issue further, I see I made a mistake in my previous comment. The padding is given as a sequence, not a value. However, the key difference between the 3 and 4 arg versions remains. The 3 argument version never returns a "short" sequence at the end. The 4 arg version can.
Along the lines of your suggestion, we could combine the two by changing the 4 arg version to interpret a pad value of "[]" to mean "return a short sequence at the end if necessary" and a pad value of "nil" to mean "never return a short sequence". This would involve interpreting "nil" different from "the empty sequence" where in many other contexts they're equivalent. I'm not sure whether or not saving some code is a good trade for introducing that subtle difference.
chouser@n01se.net said: I had also mistaken pad to be a number. I think you're right, producing different behavior when pad is an empty seq vs. nil is just asking for trouble. Thanks for taking a second look.
importer said: (In [[r:e0e8326871983be5615f5c0bc9dbf66140c7017f]]) add optional pad argument to partition. Fixes #120
Signed-off-by: Chouser <chouser@n01se.net>
Branch: master
digash said: The original version was using nil but Rich suggested not do it that way.
For different revisions take a look at this gist http://gist.github.com/109002
chouser@n01se.net said: Thanks for that link. Your final solution (not using nil) is already committed, but we should get those tests into clojure-test once it's location has been settled.
richhickey said: The new partition has this behavior:
user=> (partition 3 1 nil (range 10)) ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9) (8 9)) user=> (partition 3 1 [42] (range 10)) ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9) (8 9 42))
It seems to me that with a step of one it should never use the pad, and, more generally, once it has produced one partition containing the last element of the coll it should be done.
user=> (partition 3 1 nil (range 10)) ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9) (8 9)) user=> (partition 3 1 [42] (range 10)) ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9) (8 9 42))
richhickey said: An alternative rule is that every step within the supplied coll is yielded, padding as supplied, which would mean ending with:
(7 8 9) (8 9) (9) (7 8 9) (8 9 42) (9 42)
(7 8 9) (8 9) (9) (7 8 9) (8 9 42) (9 42)
scgilardi said: [file:ckiilEyzqr3PTqeJe5afGb]: proposed alternative (see my comment)
scgilardi said: I've uploaded part.clj which is another possible alternative. It handles a step of 1 without using the pad. Some invariants in the partitions it produces for the 4 argument case:
- for a step size of n, if you provide n - 1 objects in the padding sequence, all output sequences will be of size n
- for a step size n, the output sequences will begin with elements at offsets 0, n, 2n, 3n, ... in the original sequence until no such element exists.
Some outputs:
user=> (part/partition 3 1 nil (range 10))
((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9))
user=> (part/partition 3 1 [42] (range 10))
((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9))
---
user=> (part/partition 3 2 nil (range 9))
((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8))
user=> (part/partition 3 2 [42] (range 9))
((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8 42))
user=> (part/partition 3 2 [42 43] (range 9))
((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8 42 43))
---
user=> (part/partition 3 4 nil (range 6))
((0 1 2) (4 5))
user=> (part/partition 3 4 nil (range 7))
((0 1 2) (4 5 6))
user=> (part/partition 3 4 nil (range 8))
((0 1 2) (4 5 6))
user=> (part/partition 3 4 nil (range 9))
((0 1 2) (4 5 6) (8))
---
user=> (part/partition 3 3 [42 43] (range 3))
((0 1 2))
user=> (part/partition 3 3 [42 43] (range 4))
((0 1 2) (3 42 43))
user=> (part/partition 3 3 [42 43] (range 5))
((0 1 2) (3 4 42))
user=> (part/partition 3 3 [42 43] (range 6))
((0 1 2) (3 4 5))
user=> (part/partition 3 3 [42 43] (range 7))
((0 1 2) (3 4 5) (6 42 43))
</code></pre>
Here's the relevant portion of the code:
<pre><code> ([n step pad coll]
(lazy-seq
(when-let [s (seq coll)]
(let [p (take n s)]
(cond (= n (count p))
(cons p (partition n step pad (drop step s)))
(>= step (count p))
(list (take n (concat p pad)))))))))
Please see part.clj for details and a test program.
- for a step size of n, if you provide n - 1 objects in the padding sequence, all output sequences will be of size n
- for a step size n, the output sequences will begin with elements at offsets 0, n, 2n, 3n, ... in the original sequence until no such element exists.
user=> (part/partition 3 1 nil (range 10))
((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9))
user=> (part/partition 3 1 [42] (range 10))
((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9))
---
user=> (part/partition 3 2 nil (range 9))
((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8))
user=> (part/partition 3 2 [42] (range 9))
((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8 42))
user=> (part/partition 3 2 [42 43] (range 9))
((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8 42 43))
---
user=> (part/partition 3 4 nil (range 6))
((0 1 2) (4 5))
user=> (part/partition 3 4 nil (range 7))
((0 1 2) (4 5 6))
user=> (part/partition 3 4 nil (range 8))
((0 1 2) (4 5 6))
user=> (part/partition 3 4 nil (range 9))
((0 1 2) (4 5 6) (8))
---
user=> (part/partition 3 3 [42 43] (range 3))
((0 1 2))
user=> (part/partition 3 3 [42 43] (range 4))
((0 1 2) (3 42 43))
user=> (part/partition 3 3 [42 43] (range 5))
((0 1 2) (3 4 42))
user=> (part/partition 3 3 [42 43] (range 6))
((0 1 2) (3 4 5))
user=> (part/partition 3 3 [42 43] (range 7))
((0 1 2) (3 4 5) (6 42 43))
</code></pre>
Here's the relevant portion of the code:
<pre><code> ([n step pad coll]
(lazy-seq
(when-let [s (seq coll)]
(let [p (take n s)]
(cond (= n (count p))
(cons p (partition n step pad (drop step s)))
(>= step (count p))
(list (take n (concat p pad)))))))))
scgilardi said: Ugh, the second "invariant" I listed doesn't hold for step = 1. It appears to hold for other step values. Perhaps part.clj will still be useful to someone in coming up with a better solution.
richhickey said: I don't see why step of 1 should get special treatment. If the rule is the second one (yield partitions as long as step offsets are present in original coll), then
(part/partition 3 1 nil (range 10)) should end with: (7 8 9) (8 9) (9)
(part/partition 3 1 nil (range 10)) should end with: (7 8 9) (8 9) (9)
scgilardi said: This one appears to work:
([n step pad coll]
(lazy-seq
(when-let [s (seq coll)]
(cons
(take n (concat s pad))
(partition n step pad (drop step s)))))))
</code></pre>
<pre><code>user=> (run part)
n = 3, pad = nil
:max 5 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4) (4))
:max 6 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5) (5))
:max 7 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6) (6))
:max 8 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7) (7))
:max 9 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8) (8))
:max 5 :step 2 ((0 1 2) (2 3 4) (4))
:max 6 :step 2 ((0 1 2) (2 3 4) (4 5))
:max 7 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6))
:max 8 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 7))
:max 9 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8))
:max 5 :step 3 ((0 1 2) (3 4))
:max 6 :step 3 ((0 1 2) (3 4 5))
:max 7 :step 3 ((0 1 2) (3 4 5) (6))
:max 8 :step 3 ((0 1 2) (3 4 5) (6 7))
:max 9 :step 3 ((0 1 2) (3 4 5) (6 7 8))
:max 5 :step 4 ((0 1 2) (4))
:max 6 :step 4 ((0 1 2) (4 5))
:max 7 :step 4 ((0 1 2) (4 5 6))
:max 8 :step 4 ((0 1 2) (4 5 6))
:max 9 :step 4 ((0 1 2) (4 5 6) (8))
n = 3, pad = [42 43]
:max 5 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 42) (4 42 43))
:max 6 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 42) (5 42 43))
:max 7 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 42) (6 42 43))
:max 8 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 42) (7 42 43))
:max 9 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 42) (8 42 43))
:max 5 :step 2 ((0 1 2) (2 3 4) (4 42 43))
:max 6 :step 2 ((0 1 2) (2 3 4) (4 5 42))
:max 7 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 42 43))
:max 8 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 7 42))
:max 9 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8 42 43))
:max 5 :step 3 ((0 1 2) (3 4 42))
:max 6 :step 3 ((0 1 2) (3 4 5))
:max 7 :step 3 ((0 1 2) (3 4 5) (6 42 43))
:max 8 :step 3 ((0 1 2) (3 4 5) (6 7 42))
:max 9 :step 3 ((0 1 2) (3 4 5) (6 7 8))
:max 5 :step 4 ((0 1 2) (4 42 43))
:max 6 :step 4 ((0 1 2) (4 5 42))
:max 7 :step 4 ((0 1 2) (4 5 6))
:max 8 :step 4 ((0 1 2) (4 5 6))
:max 9 :step 4 ((0 1 2) (4 5 6) (8 42 43))
nil
user=>
([n step pad coll]
(lazy-seq
(when-let [s (seq coll)]
(cons
(take n (concat s pad))
(partition n step pad (drop step s)))))))
</code></pre>
<pre><code>user=> (run part)
n = 3, pad = nil
:max 5 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4) (4))
:max 6 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5) (5))
:max 7 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6) (6))
:max 8 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7) (7))
:max 9 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8) (8))
:max 5 :step 2 ((0 1 2) (2 3 4) (4))
:max 6 :step 2 ((0 1 2) (2 3 4) (4 5))
:max 7 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6))
:max 8 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 7))
:max 9 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8))
:max 5 :step 3 ((0 1 2) (3 4))
:max 6 :step 3 ((0 1 2) (3 4 5))
:max 7 :step 3 ((0 1 2) (3 4 5) (6))
:max 8 :step 3 ((0 1 2) (3 4 5) (6 7))
:max 9 :step 3 ((0 1 2) (3 4 5) (6 7 8))
:max 5 :step 4 ((0 1 2) (4))
:max 6 :step 4 ((0 1 2) (4 5))
:max 7 :step 4 ((0 1 2) (4 5 6))
:max 8 :step 4 ((0 1 2) (4 5 6))
:max 9 :step 4 ((0 1 2) (4 5 6) (8))
n = 3, pad = [42 43]
:max 5 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 42) (4 42 43))
:max 6 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 42) (5 42 43))
:max 7 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 42) (6 42 43))
:max 8 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 42) (7 42 43))
:max 9 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 42) (8 42 43))
:max 5 :step 2 ((0 1 2) (2 3 4) (4 42 43))
:max 6 :step 2 ((0 1 2) (2 3 4) (4 5 42))
:max 7 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 42 43))
:max 8 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 7 42))
:max 9 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8 42 43))
:max 5 :step 3 ((0 1 2) (3 4 42))
:max 6 :step 3 ((0 1 2) (3 4 5))
:max 7 :step 3 ((0 1 2) (3 4 5) (6 42 43))
:max 8 :step 3 ((0 1 2) (3 4 5) (6 7 42))
:max 9 :step 3 ((0 1 2) (3 4 5) (6 7 8))
:max 5 :step 4 ((0 1 2) (4 42 43))
:max 6 :step 4 ((0 1 2) (4 5 42))
:max 7 :step 4 ((0 1 2) (4 5 6))
:max 8 :step 4 ((0 1 2) (4 5 6))
:max 9 :step 4 ((0 1 2) (4 5 6) (8 42 43))
nil
user=>
scgilardi said: [file:dmo9mEyzWr3QuceJe5aVNr]: improved version
richhickey said: This looks fine to me, do you want to make up a patch Steve?
scgilardi said: [file:bolonky_8r3RY8eJe5afGb]: modified per comments
scgilardi said: The patch includes and updated doc string, the new 4 argument case, and a change to the whitespace in the 3 argument case for consistent indentation with the 4 argument case (current emacs clojure-mode). I can provide a patch that doesn't touch the 3 argument case whitespace if desired.
richhickey said: I think the doc should be even more explicit, something like:
... do not overlap. If no pad argument is supplied, will produce only complete partitions of size n, possibly not including items at the end if the size of coll is not a multiple of n. If the pad argument is supplied, will produce a partition at every offset present in the supplied collection, using the pad elements as necessary to pad trailing partitions up to n items each. If pad is nil or a collection containing fewer than n-1 items, ...
scgilardi said: How about this:
"Returns a lazy seq of lazy subseqs of coll, each of (nominally) n items. The subseqs begin at offsets 0, step, 2*step, etc. in coll. If step is not supplied, it defaults to n yielding adjacent, non-overlapping subseqs. If pad is not supplied, produces only complete subseqs of n items, possibly not including some items at the end of coll. If pad is supplied, produces a subseq at every offset present in coll, using any available items from pad to pad shorter subseqs up to n items. If pad is a seq of at least n-1 items, produces only complete padded subseqs of n items. If pad is shorter (or nil) trailing padded subseqs may be shorter."
I adopted the mathematical terminology that the partition is the operation (the division into parts) and that the individual pieces are not "partitions", but something else: part, block, chunk, or as I propose here, subseq. I like subseq because it embodies succinctly the fact that the items from coll are always kept sequential.
"Returns a lazy seq of lazy subseqs of coll, each of (nominally) n items. The subseqs begin at offsets 0, step, 2*step, etc. in coll. If step is not supplied, it defaults to n yielding adjacent, non-overlapping subseqs. If pad is not supplied, produces only complete subseqs of n items, possibly not including some items at the end of coll. If pad is supplied, produces a subseq at every offset present in coll, using any available items from pad to pad shorter subseqs up to n items. If pad is a seq of at least n-1 items, produces only complete padded subseqs of n items. If pad is shorter (or nil) trailing padded subseqs may be shorter."
richhickey said: I'm now convinced we are cramming 2 functions into one, and would prefer to see this new functionality as a new function:
(take-subs n coll) (take-subs n step coll)
'padding' isn't a necessary concept, as we have concat. take implies the possible partial subseqs. Also take-subs can yield a lazy seq of lazy seqs, but partition can't.
(take-subs n coll) (take-subs n step coll)
scgilardi said: [file:bnWYoqzBKr3RKeeJe5afGb]: take-subs + tests
scgilardi said: OK, paddiing isn't necessary because the same thing can be accomplished by using concat to append a padding sequence of exactly n-1 items to coll before processing it with partition.
partition can't return lazy subseqs because it counts them which (in the general case) will realize them.
ticket-120.patch contains modified docs for partition, removed arity 4 case from partition, take-subs implemented, tests for take-subs based on tests for partition, enhanced one sub-test for partition.
digash said: I like the new take-subs, but I cannot figure out how to use concat instead of partition without counting it and realizing the whole sequence.
The initial motivation was to use partition with destructuring and creating matrix from a sequences. I do not see an easy way to reproduce this case with the new implementation.
The old implementation:
(for [[a b c] (partition 3 3 (repeat 0) [1 2 3 4])] [a b c]) ==> ([1 2 3] [4 0 0])
The new implementation:
(take 10 (let [coll [1 2 3 4] p 3] (take (/ (count coll) p) (for [[a b c] (take-subs p (concat coll (repeat 0)))] [a b c])))) ==> ([1 2 3] [4 0 0])
scgilardi said: Here are two options for how I would write the example using the most recent patch:
user=> (for [[a b c] (partition 3 (concat [1 2 3 4] (take 3 (repeat 0))))] [a b c]) ([1 2 3] [4 0 0]) user=> (for [[a b c] (partition 3 (concat [1 2 3 4] [0 0 0]))] [a b c]) ([1 2 3] [4 0 0]) user=> </code></pre>or in the general case: <pre><code>user=> (defn padded-partition [n pad coll] (partition n (concat coll (take (dec n) pad)))) #'user/padded-partition user=> (for [[a b c] (padded-partition 3 (repeat 0) [1 2 3 4])] [a b c]) ([1 2 3] [4 0 0]) user=>
user=> (for [[a b c] (partition 3 (concat [1 2 3 4] (take 3 (repeat 0))))] [a b c]) ([1 2 3] [4 0 0]) user=> (for [[a b c] (partition 3 (concat [1 2 3 4] [0 0 0]))] [a b c]) ([1 2 3] [4 0 0]) user=> </code></pre>or in the general case: <pre><code>user=> (defn padded-partition [n pad coll] (partition n (concat coll (take (dec n) pad)))) #'user/padded-partition user=> (for [[a b c] (padded-partition 3 (repeat 0) [1 2 3 4])] [a b c]) ([1 2 3] [4 0 0]) user=>
scgilardi said: My first code example above was incorrect. For proper operation with all combinations of n and step, the concatenated padding seq needs to be exactly n-1 in length.
Here's the correction:
user=> (for [[a b c] (partition 3 (concat [1 2 3 4] (take 2 (repeat 0))))] [a b c]) ([1 2 3] [4 0 0]) user=> (for [[a b c] (partition 3 (concat [1 2 3 4] [0 0]))] [a b c]) ([1 2 3] [4 0 0]) user=>
user=> (for [[a b c] (partition 3 (concat [1 2 3 4] (take 2 (repeat 0))))] [a b c]) ([1 2 3] [4 0 0]) user=> (for [[a b c] (partition 3 (concat [1 2 3 4] [0 0]))] [a b c]) ([1 2 3] [4 0 0]) user=>
Converted from http://www.assembla.com/spaces/clojure/tickets/120
Attachments:
partition-with-pad.patch - https://www.assembla.com/spaces/clojure/documents/ctfCdww4qr3RbzeJe5afGb/download/ctfCdww4qr3RbzeJe5afGb
part.clj - https://www.assembla.com/spaces/clojure/documents/ckiilEyzqr3PTqeJe5afGb/download/ckiilEyzqr3PTqeJe5afGb
part.clj - https://www.assembla.com/spaces/clojure/documents/dmo9mEyzWr3QuceJe5aVNr/download/dmo9mEyzWr3QuceJe5aVNr
partition-with-pad-2.diff - https://www.assembla.com/spaces/clojure/documents/bolonky_8r3RY8eJe5afGb/download/bolonky_8r3RY8eJe5afGb
ticket-120.patch - https://www.assembla.com/spaces/clojure/documents/bnWYoqzBKr3RKeeJe5afGb/download/bnWYoqzBKr3RKeeJe5afGb