Kurtosis, Fat Tails, and Extremes

sketch demonstrating kurtosis


Why must I explain “kurtosis”?

Manilla 21-year rainfall mediansThe annual rainfall at Manilla, NSW has changed dramatically decade by decade since the record began in 1883. One way that it has changed is in the amount of rain each year, as shown in this graph that I posted earlier.

Another way, unrelated to the amount of rain, is in its kurtosis. Higher kurtosis brings more rainfall values that are extreme; lower kurtosis brings fewer. We would do well to learn more about rainfall kurtosis.

[A comment on the meaning of kurtosis by Peter Westfal is posted below.]

Describing Frequency Distributions

The Normal Distribution

Many things vary in a way that seems random: pure chance causes values to spread above and below the average.
If the values are counted into “bins” of equal width, the pattern is called a frequency-distribution. Randomness makes this pattern form the unique bell-shaped curve of Normal Distribution.

Histogram of annual rainfall frequency at Manilla NSWThe values of annual total rainfall measured each year at Manilla have a frequency-distribution that is rather like that. This graph compares the actual distribution with a curve of Normal Distribution.

Moments of a Normal Distribution: (i) Mean, and (ii) Variance

The shape of any frequency-distribution is described in a simple way by a set of four numbers called moments. A Normal Distribution is described by just the first two of them.
The first moment is the Mean (or average), which says where the middle line of the values is. For Manilla annual rainfall, the Mean is 652 mm.
The second moment is the Variance, which is also the square of the Standard Deviation. This second moment says how widely spread or scattered the values are. For Manilla annual rainfall, the Standard Deviation is 156 mm.

Moments of other (non-normal) distributions: (iii) Skewness, and (iv) Kurtosis

The third moment, Skewness, describes how a frequency-distribution may have one tail longer than the other. When the tail on the right is longer, that is called right-skewness, and the skewness value is positive in that case. For the actual frequency-distribution of Manilla annual rainfall, the Skewness is slightly positive: +0.268. (That is mainly due to one extremely high rainfall value: 1192 mm in 1890.)
Kurtosis is the fourth moment of the distribution. It describes how the distribution differs from Normal by being higher or lower in its peak or its tails, as compared to its shoulders.
As it was defined at first, a Normal Distribution had the kurtosis value of 3, but I (and Excel) use the convention “excess kurtosis” from which 3 has been subtracted. Then the excess kurtosis value for a Normal Distribution is zero, while the kurtosis of other, non-normal distributions is either positive or negative.

Smoothed rainfall frequency and a platykurtic curveManilla’s total frequency distribution of annual rainfall has a Kurtosis of -0.427. As shown here (copied from an earlier post), I fitted a curve with suitably negative kurtosis to Manilla’s (smoothed) annual rainfall distribution.

Platykurtic, Mesokurtic, and Leptokurtic distributions

Karl Pearson invented the terms: platykurtic for (excess) kurtosis well below zero, mesokurtic for kurtosis near zero, and leptokurtic for kurtosis well above zero.
The sketch at the top of this page shows the typical shapes of platykurtic and leptokurtic curves.
(See the Note below: ‘The sketch by “Student”‘.)

In the two graphs that follow, I show how a curve of Normal Distribution can be modified to be leptokurtic or platykurtic while remaining near-normal in shape. (See the note “Constructing the kurtosis adjuster”)
In both of these graphs, I have drawn the curve of Normal Distribution in grey, with call-outs to locate the mean point and the two “shoulder” points that are one Standard Deviation each side of the mean.

A leptokurtic curve

A leptokurtic curve

By adding the “adjuster curve” (red) to the Normal curve, I get the classical leptokurtic shape (green) as was sketched by Gosset. It has a higher peak, lowered shoulders, and fat tails. The shape is like that of a volcanic cone: the peak is narrow, and the upper slopes steep. The slopes get gentler as they get lower, but not as gentle as on the Normal Curve.

A platykurtic curve

A platykurtic curve

Continue reading


Annual Rainfall Extremes at Manilla NSW: II

II. Platykurtic, Bimodal Annual Rainfall

Histogram annual rainfall frequency Manilla NSW

Manilla’s 134 years of rainfall readings yield the graph above. There are several features to notice.

Back to the prelude “Manilla’s Yearly Rainfall History”.
Back to Extremes Part I.
Forward to Extremes Part III.
Forward to Extremes Part IV.

A ragged pattern

Despite having as many as 134 annual rainfall values, the graph is still ragged. Some of the 20 mm “bins” near the middle have less than 2% of the observations, while others have over 5%. The pattern has not yet become smooth.

It is not near a normal distribution

Rainfall is thought of as a random process, likely to match a curve of normal distribution. On the first two graphs I have drawn the curve of normal distribution that best fits the data.

Smoothed annual rainfall frequency Manilla NSW

In this second graph, I have smoothed out the ragged shape of the plotted data, using a 9-point Gaussian smoothing. You can see more clearly where the actual curve (black) and the normal curve (magenta) differ. The dotted red line shows the differences directly:

The peak is low;
The shoulders, each side of the “peak”, are high;
Both of the tails are thin.

These three features describe a platykurtic curve: one with low kurtosis. This fact makes the highest and lowest annual rainfalls at Manilla less extreme than would be expected in a normal distribution.

Another departure from normality is that the curve is skewed: the tail on the left is shorter than the one on the right. That is a positive skew, but it is small. (By contrast, most of the rainfall distributions for individual months at Manilla have large positive skew. In them, the peak is well below the mean, and a tail extends to rare high values.)

In summary, four of the leading features of the shape of Manilla’s annual rainfall distribution are:

Mean or average: 652 mm per year.
Standard Deviation (measuring spread or scatter): 156 mm.
Skewness: 0.268 (slightly positive).
Kurtosis: -0.427 (strongly platykurtic).

A platykurtic curve matches the Manilla annual rainfall frequency curve to some extent.

The sum of two Gaussian curves gives a much better match.

Fitting a platykurtic near-normal curve

Much of the poor fit of a normal curve to the data is due to the data having a platykurtic distribution. Being platykurtic produces a reduced peak, high shoulders, and thin tails, as was noted.

Smoothed rainfall frequency and a platykurtic curve

In the third graph, I have drawn (in magenta) a new model distribution that is platykurtic. It is a transform of the normal distribution with a weighted sinusoidal correction. The new curve fits much better up both flanks of the data curve. It cannot be made to fit in the peak area between 500 mm and 820 mm.

Fitting a bimodal model made of two normal curves

The shoulders of the smoothed rainfall distribution curve (black) are not simply high; they are higher than the  zone in the middle where the peak would normally be. There is a major mode (peaking at 5.1%) on the left, a minor mode (3.9%) on the right, and an antimode (3.7%) between them.

Smoothed rainfall frequency and a bimodal curve

Continue reading