Calculation#

There are two different datasets used for benchmarking obtained from the contourpy.util functions simple() and random(). The former is the sum of two gaussians that results in a small number of relatively short contours. The latter is random data that results in a large number of small contours and a few large contours; this is an extreme dataset designed to stress the contouring algorithms. Both have the option to generate masked data.

All of the results shown are for a single chunk with a problem size n (== nx == ny) of 1000.

As a guide to the complexity of the output, the unmasked datasets generate the following line contours in the benchmarks

simple: 38 lines of about 36 thousand points.
random: 850 thousand lines of about 7.4 million points.

and the following filled contours

simple: 55 boundaries (39 outers and 16 holes) of about 76 thousand points.
random: 1.7 million boundaries (half each of outers and holes) of about 15 million points.

Contour lines#

For the simple dataset above the performance of serial for contour lines is the same regardless of LineType. It is about 20% faster than mpl2005 and significantly faster than mpl2014 with a speedup of 1.7-1.8.

For the random dataset above the performance of serial varies significantly by LineType. For LineType.SeparateCode serial is 10-20% faster than mpl2005, and is about the same as mpl2014 if masked and 10% slower if not masked.

Other LineType are faster. LineType.Separate has a speedup of about 1.4 compared to LineType.SeparateCode; most of the difference here is the time taken to allocate the extra 850 thousand NumPy arrays (one per line) and a small amount is the time taken to calculate the Matplotlib kind codes to put in them.

The chunked line types (LineType.ChunkCombinedCode, LineType.ChunkCombinedOffset and LineType.ChunkCombinedNan) have similar timings with a speedup of 2.3-2.6 compared to LineType.SeparateCode. The big difference here again is in array allocation, for a single chunk these two LineType allocate just two large arrays whereas LineType.SeparateCode allocates 1.7 million NumPy arrays, i.e. two per each line returned.

Filled contours#

For the simple dataset above the performance of serial for filled contours is the same regardless of FillType. It it 10-20% faster than mpl2005 and significantly faster than mpl2014 with a speedup of 1.7-1.8.

For the random dataset above the performance of serial varies significantly by FillType. For FillType.OuterCode it is faster than mpl2014 with a speedup of 1.2-1.3. It is also faster than mpl2005 but only the corner_mask=False option is shown in full as the unmasked benchmark here is off the scale at 11.2 seconds. The mpl2005 algorithm calculates points for outer and hole boundaries in an interleaved format which need to be reordered, and this approach scales badly for a large outer boundary containing many holes as occurs here for unmasked z.

Other FillType are faster, although FillType.OuterOffset is only marginally so as it creates the same number of NumPy arrays as FillType.OuterCode but the arrays are shorter.

The other four FillType can be grouped in pairs: FillType.ChunkCombinedCodeOffset and FillType.ChunkCombinedOffsetOffset have a speedup of 1.8-2 compared to FillType.OuterCode; whereas FillType.ChunkCombinedCode and FillType.ChunkCombinedOffset are marginally faster with a speedup of 1.9-2. The speed improvement has the usual explanation that they only allocate a small number of arrays whereas FillType.OuterCode allocates 1.7 million arrays. FillType.ChunkCombinedCode and FillType.ChunkCombinedOffset are slightly faster than the other two because they do not determine the relationships between outer boundaries and their holes, they treat all boundaries the same.