The Power of Spark Accumulators
Spark Accumulators, discussed in an earlier blog) are massively more powerful than Hadoop counters because they support multiple types of data. I have not seem discussions of using accumulators holding large sets of data, something that some of the classes discussed here could certainly do. The code discussed here is available here.
The only things required for an accumulator are a an AccumulatorParam instance defining how to construct a zero element and how to combine multiple instances.
AccumulatorParam use a Long as a Counter (accumulator)
AccumulatorParam to accumulate a single string by concatenation
AccumulatorParam use a Set of Strings as an accumulator
How to use accumulators
Accumulators may be used in two ways. First, the accumulator may be created in code as a final variable in the scope of the function - this is especially useful for lambdas, functions created on line.
The following is an illustration of this
Using an accumulator as a final local variable
Alternatively a function may be defined with an accumulator as a member variable. Here the function is defined as a class and later used. I prefer this approach to lambdas especially if significant work is done in the function.
In a later blog I will discuss using a base class for more sophisticated logging
In a later blog I will discuss using a base class for more sophisticated logging
No comments:
Post a Comment