Calculation
There are two different datasets used for benchmarking obtained from the util
functions
simple()
and random()
. The former is the sum
of two gaussians that results in a small number of relatively short contours. The latter is random
data that results in a large number of small contours and a few large contours; this is an extreme
dataset designed to stress the contouring algorithms. Both have the option to generate masked data.
All of the results shown are for a single chunk with a problem size n
(== nx == ny
) of 1000.
As a guide to the complexity of the output, the unmasked datasets generate the following line contours in the benchmarks
simple
: 38 lines of about 36 thousand points.
random
: 850 thousand lines of about 7.4 million points.
and the following filled contours
simple
: 55 boundaries (39 outers and 16 holes) of about 76 thousand points.
random
: 1.7 million boundaries (half each of outers and holes) of about 15 million points.
Contour lines
For the simple
dataset above the performance of serial
for contour lines is the same
regardless of LineType
. It is slightly faster than mpl2005
and significantly faster than
mpl2014
with a speedup of 1.7-1.9.
For the random
dataset above the performance of serial
varies significantly by LineType
.
For LineType.SeparateCode
serial
is 10-15% faster than mpl2005
and is slightly faster
than mpl2014
when z
is masked but about 5% slower when z
is not masked.
Other LineType
are faster. LineType.Separate
has a speedup of about 1.4 compared to
LineType.SeparateCode
; most of the difference here is the time taken to allocate the extra 850
thousand NumPy arrays (one per line) and a small amount is the time taken to calculate the
Matplotlib kind codes to put in them.
Both LineType.ChunkCombinedCode
and LineType.ChunkCombinedOffset
have similar timings with
a speedup of 2.3-2.5 compared to LineType.SeparateCode
. The big difference here again is in
array allocation, for a single chunk these two LineType
allocate just two large arrays whereas
LineType.SeparateCode
allocates 1.7 million NumPy arrays, i.e. two per each line returned.
Filled contours
For the simple
dataset above the performance of serial
for filled contours is the same
regardless of FillType
. It has about the same performance as mpl2005
and is significantly
faster than mpl2014
with a speedup of 1.75-1.9.
For the random
dataset above the performance of serial
varies significantly by FillType
.
For FillType.OuterCode
it is faster than mpl2014
with a speedup of about 1.3. It is also
faster than mpl2005
but only the corner_mask=False
option is shown in full as the unmasked
benchmark here is off the scale at 12.1 seconds. The mpl2005
algorithm calculates points for
outer and hole boundaries in an interleaved format which need to be reordered, and this approach
scales badly for a large outer boundary containing many holes as occurs here for unmasked z
.
Other FillType
are faster, although FillType.OuterOffset
is only marginally so as it
creates the same number of NumPy arrays as FillType.OuterCode
but the arrays are shorter.
The other four FillType
can be grouped in pairs: FillType.ChunkCombinedCodeOffset
and
FillType.ChunkCombinedOffsetOffset
have a speedup of 1.8-1.95 compared to
FillType.OuterCode
; whereas FillType.ChunkCombinedCode
and
FillType.ChunkCombinedOffset
are marginally faster with a speedup of 1.9-2.05. The speed
improvement has the usual explanation that they only allocate a small number of arrays whereas
FillType.OuterCode
allocates 1.7 million arrays. FillType.ChunkCombinedCode
and
FillType.ChunkCombinedOffset
are slightly faster than the other two because they do not
determine the relationships between outer boundaries and their holes, they treat all boundaries the
same.