Kurtosis, Fat Tails, and Extremes

sketch demonstrating kurtosis

PLATYKURTIC left; LEPTOKURTIC right

Why must I explain “kurtosis”?

Manilla 21-year rainfall mediansThe annual rainfall at Manilla, NSW has changed dramatically decade by decade since the record began in 1883. One way that it has changed is in the amount of rain each year, as shown in this graph that I posted earlier.

Another way, unrelated to the amount of rain, is in its kurtosis. Higher kurtosis brings more rainfall values that are extreme; lower kurtosis brings fewer. We would do well to learn more about rainfall kurtosis.

[A comment on the meaning of kurtosis by Peter Westfall is posted below.]

Describing Frequency Distributions

The Normal Distribution

Many things vary in a way that seems random: pure chance causes values to spread above and below the average.
If the values are counted into “bins” of equal width, the pattern is called a frequency-distribution. Randomness makes this pattern form the unique bell-shaped curve of Normal Distribution.

Histogram of annual rainfall frequency at Manilla NSWThe values of annual total rainfall measured each year at Manilla have a frequency-distribution that is rather like that. This graph compares the actual distribution with a curve of Normal Distribution.

Moments of a Normal Distribution: (i) Mean, and (ii) Variance

The shape of any frequency-distribution is described in a simple way by a set of four numbers called moments. A Normal Distribution is described by just the first two of them.
The first moment is the Mean (or average), which says where the middle line of the values is. For Manilla annual rainfall, the Mean is 652 mm.
The second moment is the Variance, which is also the square of the Standard Deviation. This second moment says how widely spread or scattered the values are. For Manilla annual rainfall, the Standard Deviation is 156 mm.

Moments of other (non-normal) distributions: (iii) Skewness, and (iv) Kurtosis

The third moment, Skewness, describes how a frequency-distribution may have one tail longer than the other. When the tail on the right is longer, that is called right-skewness, and the skewness value is positive in that case. For the actual frequency-distribution of Manilla annual rainfall, the Skewness is slightly positive: +0.268. (That is mainly due to one extremely high rainfall value: 1192 mm in 1890.)
Kurtosis is the fourth moment of the distribution. It describes how the distribution differs from Normal by being higher or lower in its peak or its tails, as compared to its shoulders.
As it was defined at first, a Normal Distribution had the kurtosis value of 3, but I (and Excel) use the convention “excess kurtosis” from which 3 has been subtracted. Then the excess kurtosis value for a Normal Distribution is zero, while the kurtosis of other, non-normal distributions is either positive or negative.

Smoothed rainfall frequency and a platykurtic curveManilla’s total frequency distribution of annual rainfall has a Kurtosis of -0.427. As shown here (copied from an earlier post), I fitted a curve with suitably negative kurtosis to Manilla’s (smoothed) annual rainfall distribution.

Platykurtic, Mesokurtic, and Leptokurtic distributions

Karl Pearson invented the terms: platykurtic for (excess) kurtosis well below zero, mesokurtic for kurtosis near zero, and leptokurtic for kurtosis well above zero.
The sketch at the top of this page shows the typical shapes of platykurtic and leptokurtic curves.
(See the Note below: ‘The sketch by “Student”‘.)

In the two graphs that follow, I show how a curve of Normal Distribution can be modified to be leptokurtic or platykurtic while remaining near-normal in shape. (See the note “Constructing the kurtosis adjuster”)
In both of these graphs, I have drawn the curve of Normal Distribution in grey, with call-outs to locate the mean point and the two “shoulder” points that are one Standard Deviation each side of the mean.

A leptokurtic curve

A leptokurtic curve

By adding the “adjuster curve” (red) to the Normal curve, I get the classical leptokurtic shape (green) as was sketched by Gosset. It has a higher peak, lowered shoulders, and fat tails. The shape is like that of a volcanic cone: the peak is narrow, and the upper slopes steep. The slopes get gentler as they get lower, but not as gentle as on the Normal Curve.

A platykurtic curve

A platykurtic curve

Continue reading

Advertisements

Annual Rainfall Extremes at Manilla NSW: IV

IV. Some distributions had heavy tails

Graph of history of heavy tails in Manilla annual rainfall

This graph is based on applying a 21-year sampling window to each year in the Manilla rainfall record, then adding smoothing. (See “Note about Sampling” below.)

“Heavy tails”

In the previous postI plotted only the most extreme high and low values of annual rainfall in each sampling window. Now, I choose two rainfall amounts (very high and very low) to define where the “Tails” of the frequency distribution begin. These Tails are the parts that I will call “extreme”. I count the number of values that qualify as extreme by being within the tails.
In this post, I recognise heavy tails, when before I recognised long tails.


Back to the prelude “Manilla’s Yearly Rainfall History”.
Back to Extremes Part I.
Back to Extremes Part II.
Back to Extremes Part III.


Making the graph

The long-term Normal Distribution

The graph relies on the long-term Normal Distribution curve (“L-T Norm. Dist.” in the legend of the graph). That is, the curve that I fitted earlier to the 134-year record of annual rainfall values at Manilla NSW.
Histogram annual rainfall frequency Manilla NSWThe graph is copied here.

I defined as “Extreme Values” those either below the 5th percentile or above the 95th percentile of the fitted Normal Distribution. That is to say, those that were more than 1.645 times the Standard Deviation (SD = 156 mm) below or above the Mean (M = 652 mm). When expressed in millimetres of annual rainfall, that is less than 395 mm or more than 909 mm.
These ‘Tails’ of the Normal Distribution each totalled 5% of the modeled population, making 10% when added together.

The data

For each year’s 21-year sample, I counted those rainfall values that were lower than 395 mm (for the Low Tail) and those higher than 909 mm (for the High Tail). I added the two to give a count for Both Tails. To get a percentage value, I divided by 21.
I then found the ratio of this value to that of the fitted long-term Normal Distribution by dividing by 5% for each tail, and by 10% for both tails together. Ratios above 1.0 are Heavy Tails, and ratios below 1.0 are Light Tails.
That ratio, when smoothed, is plotted on the main graph at the head of the page.

Results

The resulting pattern of heavier and lighter tails, shown above, is similar to that found by using more and less extreme values, shown in the graph copied here.

Graph of history of extremes of annual rainfallAs before, there were less extremes in the 1900’s, 1910’s, 1920’s and 1930’s.
As before, there were more extremes in the 1940’s and 1950’s.
In the 1890’s, the “Tails” graph did not confirm the more extreme values that had been found earlier.

The 1990’s discrepancy

Extremes had been near normal through the last five decades in the earlier graph. By contrast, the “Tails” graph shows extremes in the most recent decade, the 1990’s, that were just as high as those in the 1950’s. Those two episodes differ, however: in the 1950’s only the high tail was heavy; in the 1990’s, only the low tail was heavy.
(For the 1990’s heavy low tail, see the Note below.)

The inadequacy of the data

Continue reading