IV. Some distributions had heavy tails
This graph is based on applying a 21-year sampling window to each year in the Manilla rainfall record, then adding smoothing. (See “Note about Sampling” below.)
In the previous post, I plotted only the most extreme high and low values of annual rainfall in each sampling window. Now, I choose two rainfall amounts (very high and very low) to define where the “Tails” of the frequency distribution begin. These Tails are the parts that I will call “extreme”. I count the number of values that qualify as extreme by being within the tails.
In this post, I recognise heavy tails, when before I recognised long tails.
Forward to Extremes Part V.
Making the graph
The long-term Normal Distribution
The graph relies on the long-term Normal Distribution curve (“L-T Norm. Dist.” in the legend of the graph). That is, the curve that I fitted earlier to the 134-year record of annual rainfall values at Manilla NSW.
The graph is copied here.
I defined as “Extreme Values” those either below the 5th percentile or above the 95th percentile of the fitted Normal Distribution. That is to say, those that were more than 1.645 times the Standard Deviation (SD = 156 mm) below or above the Mean (M = 652 mm). When expressed in millimetres of annual rainfall, that is less than 395 mm or more than 909 mm.
These ‘Tails’ of the Normal Distribution each totalled 5% of the modeled population, making 10% when added together.
For each year’s 21-year sample, I counted those rainfall values that were lower than 395 mm (for the Low Tail) and those higher than 909 mm (for the High Tail). I added the two to give a count for Both Tails. To get a percentage value, I divided by 21.
I then found the ratio of this value to that of the fitted long-term Normal Distribution by dividing by 5% for each tail, and by 10% for both tails together. Ratios above 1.0 are Heavy Tails, and ratios below 1.0 are Light Tails.
That ratio, when smoothed, is plotted on the main graph at the head of the page.
The resulting pattern of heavier and lighter tails, shown above, is similar to that found by using more and less extreme values, shown in the graph copied here.
As before, there were less extremes in the 1900’s, 1910’s, 1920’s and 1930’s.
As before, there were more extremes in the 1940’s and 1950’s.
In the 1890’s, the “Tails” graph did not confirm the more extreme values that had been found earlier.
The 1990’s discrepancy
Extremes had been near normal through the last five decades in the earlier graph. By contrast, the “Tails” graph shows extremes in the most recent decade, the 1990’s, that were just as high as those in the 1950’s. Those two episodes differ, however: in the 1950’s only the high tail was heavy; in the 1990’s, only the low tail was heavy.
(For the 1990’s heavy low tail, see the Note below.)
The inadequacy of the data
In the earlier post, each 21-year sample contained only two independent data points: the height of the highest value above the mean, and the depth of the lowest value below the mean.
This analysis uses counts of values that lie within the defined 5% tails of a fixed normal distribution. Given a 21-year sample, the expected count for each tail is close to one (5% of 21 is 1.05). Actual counts turned out to include only the values of zero, one, and two. This data set is quantitative, but the number of values it takes is very small. Even the mean of both tails takes only five ratio values: 0.00, 0.476, 0.952, 1.429, or 1.905.
My use of heavy tails to recognise extreme values has not increased the data density by very much.
Yet, in my view, to increase the count to include more values than are in the 5% tails would depart too far from the concept of “extreme” to be useful. (See the Note below on “Use of 10% tails”)
This analysis makes explicit that values are compared to the long-term mean and standard deviation of Manilla’s annual rainfall distribution. This may not be appropriate: both the mean and the standard deviation (or variance) of annual rainfall varied widely decade by decade, as shown in the post “Moments of Manilla’s Yearly Rainfall History”.
[Note added November 2019.
I have since increased the sampling density by a factor of 12. Instead of using 134 annual rainfall totals, I have used 12-monthly totals for each of the 1600 months of record. The post “Moments of Manilla’s Yearly Raifall History” linked above, is now replaced by “Moments of Manilla’s 12-monthly Rainfall”. This results in more reliable estimates of kurtosis for use as a measure of extremes. See “Rainfall kurtosis vs. HadCRUT4, revised” and “Relations Among Rainfall Moments”.]
Note about Sampling
I chose a 21-year sampling window to be wide enough to contain enough points for analysis, without losing time-resolution, or losing too many years at each end of the record from 1883 to 2016.
The first mid-year of a sampling window was 1893 and the last, 2006.
To remove jumps in the trace on the graph, I then applied a nine-point Gaussian smoothing function. That further reduced the years that could be plotted to those from 1897 to 2002.
Note: the heavy low tail in the 1990’s
The different results for low rainfall extremes in the 1990’s arose as follows:
The lowest annual rainfall of 351 mm was not especially low, being only 301 mm below the mean. The low tail was heavy, however.
Since the upper limit for the lower tail had been set at 395 mm, both the value of 351 mm (1994) and another value of 366 mm (2002) fell within the lower tail. In every 21-year sample that included these two years, the percentage within the low tail was 9.5%. When compared to the 5% in the lower tail of the fitted long-term Normal Distribution, that makes a very high ratio of 1.9.
Note: Use of 10% tails
The immense US Climate Extremes Index, introduced in 1996, counts the upper and lower 10% of data values as “Extremes”. Counting 20% of available data in this way adds stability to the analysis, but does so at the expense of including conditions that are not far removed from the ordinary. In a normal distribution, some of the values that would be counted as “Extremes” would be only 1.3 Standard Deviations away from the Mean.