apache spark - Which operations preserve RDD order?

Question

Welcome To Ask or Share your Answers For Others

apache spark - Which operations preserve RDD order?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

apache spark - Which operations preserve RDD order?

RDD has a meaningful (as opposed to some random order imposed by the storage model) order if it was processed by sortBy(), as explained in this reply.

Now, which operations preserve that order?

E.g., is it guaranteed that (after a.sortBy())

a.map(f).zip(a) === 
a.map(x => (f(x),x))

How about

a.filter(f).map(g) === 
a.map(x => (x,g(x))).filter(f(_._1)).map(_._2)

what about

a.filter(f).flatMap(g) === 
a.flatMap(x => g(x).map((x,_))).filter(f(_._1)).map(_._2)

Here "equality" === is understood as "functional equivalence", i.e., there is no way to distinguish the outcome using user-level operations (i.e., without reading logs &c).

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-16T22:35:21+0000

All operations preserve the order, except those that explicitly do not. Ordering is always "meaningful", not just after a sortBy. For example, if you read a file (sc.textFile) the lines of the RDD will be in the order that they were in the file.

Without trying to give a complete list, map, filter and flatMap do preserve the order. sortBy, partitionBy, join do not preserve the order.

The reason is that most RDD operations work on Iterators inside the partitions. So map or filter just has no way to mess up the order. You can take a look at the code to see for yourself.

You may now ask: What if I have an RDD with a HashPartitioner. What happens when I use map to change the keys? Well, they will stay in place, and now the RDD is not partitioned by the key. You can use partitionBy to restore the partitioning with a shuffle.

Categories

apache spark - Which operations preserve RDD order?

apache spark - Which operations preserve RDD order?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags