Kurtosis, Fat Tails, and Extremes

sketch demonstrating kurtosis

PLATYKURTIC left; LEPTOKURTIC right

Why must I explain “kurtosis”?

Manilla 21-year rainfall mediansThe annual rainfall at Manilla, NSW has changed dramatically decade by decade since the record began in 1883. One way that it has changed is in the amount of rain each year, as shown in this graph that I posted earlier.

Another way, unrelated to the amount of rain, is in its kurtosis. Higher kurtosis brings more rainfall values that are extreme; lower kurtosis brings fewer. We would do well to learn more about rainfall kurtosis.

[A comment on the meaning of kurtosis by Peter Westfal is posted below.]

Describing Frequency Distributions

The Normal Distribution

Many things vary in a way that seems random: pure chance causes values to spread above and below the average.
If the values are counted into “bins” of equal width, the pattern is called a frequency-distribution. Randomness makes this pattern form the unique bell-shaped curve of Normal Distribution.

Histogram of annual rainfall frequency at Manilla NSWThe values of annual total rainfall measured each year at Manilla have a frequency-distribution that is rather like that. This graph compares the actual distribution with a curve of Normal Distribution.

Moments of a Normal Distribution: (i) Mean, and (ii) Variance

The shape of any frequency-distribution is described in a simple way by a set of four numbers called moments. A Normal Distribution is described by just the first two of them.
The first moment is the Mean (or average), which says where the middle line of the values is. For Manilla annual rainfall, the Mean is 652 mm.
The second moment is the Variance, which is also the square of the Standard Deviation. This second moment says how widely spread or scattered the values are. For Manilla annual rainfall, the Standard Deviation is 156 mm.

Moments of other (non-normal) distributions: (iii) Skewness, and (iv) Kurtosis

The third moment, Skewness, describes how a frequency-distribution may have one tail longer than the other. When the tail on the right is longer, that is called right-skewness, and the skewness value is positive in that case. For the actual frequency-distribution of Manilla annual rainfall, the Skewness is slightly positive: +0.268. (That is mainly due to one extremely high rainfall value: 1192 mm in 1890.)
Kurtosis is the fourth moment of the distribution. It describes how the distribution differs from Normal by being higher or lower in its peak or its tails, as compared to its shoulders.
As it was defined at first, a Normal Distribution had the kurtosis value of 3, but I (and Excel) use the convention “excess kurtosis” from which 3 has been subtracted. Then the excess kurtosis value for a Normal Distribution is zero, while the kurtosis of other, non-normal distributions is either positive or negative.

Smoothed rainfall frequency and a platykurtic curveManilla’s total frequency distribution of annual rainfall has a Kurtosis of -0.427. As shown here (copied from an earlier post), I fitted a curve with suitably negative kurtosis to Manilla’s (smoothed) annual rainfall distribution.

Platykurtic, Mesokurtic, and Leptokurtic distributions

Karl Pearson invented the terms: platykurtic for (excess) kurtosis well below zero, mesokurtic for kurtosis near zero, and leptokurtic for kurtosis well above zero.
The sketch at the top of this page shows the typical shapes of platykurtic and leptokurtic curves.
(See the Note below: ‘The sketch by “Student”‘.)

In the two graphs that follow, I show how a curve of Normal Distribution can be modified to be leptokurtic or platykurtic while remaining near-normal in shape. (See the note “Constructing the kurtosis adjuster”)
In both of these graphs, I have drawn the curve of Normal Distribution in grey, with call-outs to locate the mean point and the two “shoulder” points that are one Standard Deviation each side of the mean.

A leptokurtic curve

A leptokurtic curve

By adding the “adjuster curve” (red) to the Normal curve, I get the classical leptokurtic shape (green) as was sketched by Gosset. It has a higher peak, lowered shoulders, and fat tails. The shape is like that of a volcanic cone: the peak is narrow, and the upper slopes steep. The slopes get gentler as they get lower, but not as gentle as on the Normal Curve.

A platykurtic curve

A platykurtic curve

Continue reading

Advertisements

Global Warming Bent-Line Regression

HadCRUT global near-surface temperatures

HadCRUtemp2lineThis graph, posted with permission, shows a bent line fitted to the HadCRUT annual data series for global near-surface temperature. Professor Thayer Watkins of San Jose State University Department of Economics posted it on his blog about 2009.

HadCRUTsmoothWithout knowing of this work, I constructed the second graph. I used data from the same HadCRUT source, but a data set smoothed by the authors.

In April 2013 I posted it to a forum thread in”weatherzone”.

Next, I added to that graph a logarithmic plot of global carbon emissions, similarly fitted with a series of straight trend lines.

Log from 1850 of world surface air temperature and carbon emissionsThis I included in posts to several forums: in a post to “weatherzone”, in a post to the Alternative Technology Association forum, and finally in a post to this blog.

Both Professor Watkins and I have fitted bent lines to the data. I fitted the lines by eye (for which I was accused of “cherry-picking”). Professor Watkins used an explicit process of Bent-Line Regression, minimising the deviations by the method of least-squares. Like me, he initially chose by eye the dates of the change points where the straight lines meet. But he then adjusted them so as to minimise the least squares deviations.
[See notes below on the method of Bent-Line Regression.]

The trend lines and change points are practically the same in the Thayer Watkins and the “Surly Bond” graphs:
1. (Up to Down) TW: 1881; SB: 1879.
2. (Down to Up) TW: 1911; SB: 1909.
3. (Up to Down) TW: 1940; SB: 1943.
4. (Down to Up) TW: 1970; SB: 1975.
As I said at the time, once straight trend lines are chosen, the dates of change points to fit this data series closely do not allow of much variation.

Relation to the IPO (or PDO) of the Pacific

Not by coincidence, Watkins and I both went on to relate the multi-decadal oscillations of Pacific Ocean temperatures to the global near-surface average temperatures.

My approach

I merely plotted my chosen global temperature change points on to the Pacific graphs (I chose to cite the IPO (Inter-decadal Pacific Oscillation)). In two posts I noted (i) the way the change points in the HadCRUT global temperature series were close to those in the IPO, and (ii) the way the IPO seemed able to explain why the trend in global warming was “bent” in 1943 and 1975 but, in that case, could only sharpen the bends of 1910 and 1880.

Professor Watkins’ approach

AGT_PDO7Professor Watkins did a separate Bent Line Regression Analysis on the Pacific graphs (He chose to cite the earlier-developed PDO (Pacific inter-Decadal Oscillation)). His analysis “A Major Source of the Near-Sixty Year Cycle in Average Global Temperatures is the Pacific (Multi)Decadal Oscillation” is here.

He admits the match is poor, with various lags and a different period. He concludes:
“Thus while the Pacific (Multi)Decadal Oscillation appears to be involved in the cycles of the average global temperature there have to be other factors also involved.”

The significance of the IPO

Continue reading

HadCRUT Global Temperature Smoothing

Graph of recent HadCRUT4

As a long-term instrumental record of global temperature, the HadCRUT4 series may be the best we have. [See Ole Humlum’s blog in the notes below.]
I like to use the published smoothed annual series of HadCRUT4.  I find that this smoothing gets rid of the “noise” that makes graphs about global warming needlessly hard to read. I used the smoothed HadCRUT series to point out the curious inverse relation between the rate of warming and the rate of growth of carbon emissions in this post from 2014.  I will refer again to that post in discussing the use of bent-line regression to describe global warming.

The Met Office Hadley Centre published the smoothing procedure that they used for the time series of smoothed annual average temperature in the HadCRUT3 data set. The smoothing function used is a 21-point binomial filter. The weights are specified in the link above.
The authors discuss the fudge that they use to plot smoothed values up to the current year, even though a validly smoothed value for that year would require ten years of data from future years. Their method is to continue the series by repeating the final value. They had added to the uncertainty by using a final value from just part of a year.
They relate how this procedure had caused consternation when the smoothed graph published in March 2008 showed a curve towards cooling, due to the final value used being very cool.
They show the effect by displaying the graph for that date.
They maintain that the unacceptable smoothed curve (because it shows cooling, not warming) is due mainly to using a final value from an incomplete year, saying:
“The way that we calculate the smoothed series has not changed except that we no longer use data for the current year in the calculation.”
That web-page is annotated:
“Last updated: 08/04/2008 Expires: 08/04/2009”
However, this appears to be the current procedure, used with the HadCRUT4 data set.

For my own interest, I plotted the values from 1990 to 2016 of the annual series of HadCRUT4, averaged over northern and southern hemispheres. [Data sources below.]

On my graph (above), all points 1990 to 2016 are as sourced. I have plotted raw values 2017 to 2026 (uncoloured) as I believe they are used in the smoothing procedure. I have also left uncoloured the smoothed data points from 2007 to 2016, to indicate that their values are not fully supported by data.

I agree with Ole Humlum that it is very good of the Met Office to come clean on the logical shortcomings of their procedure for smoothing, but it would be even better if they ceased plotting smoothed points when the smoothing depends on data points for future years.
In my monthly series of parametric plots of smoothed monthly values of climate anomaly variables, I have faced the same problem. I smooth using a 13-point Gaussian curve. My solution is to plot “fully-smoothed” data points (in colour) up to six months ago. That gives a consistent mapping up to that date. The fifth month before now (plotted uncoloured) is smoothed with an 11-point Gaussian and so on, up to the latest month with a necessarily unsmoothed value.


Notes

1.
Ole Humlum’s blog “Climate4you”

[See: Index\Global Temperature\Recent global air temperature change, an overview\]

2.
HadCRUT4 data
Source of raw annual values:

Source of smoothed annual values:

El Niño and My Climate

ENSO and Manilla NSW temperature anomalies over sixteen years

Temperature

The first graph shows that the temperature at Manilla NSW agreed very closely with El Niño and La Niña temperatures for a good part of the last sixteen years.
The El Nino – Southern Oscillation (ENSO) is shown by NINO3.4 monthly anomaly values, and temperature at Manilla, NSW is smoothed monthly mean daily maximum temperature anomalies. (See the Note below.)
Values of Manilla temperatures agree with those of ENSO through the major temperature peaks and troughs in the spring seasons of 2002, 2006, 2007, 2009, and 2010. In the two highest peaks of 2002 and 2009 and the deep trough of 2010, Manilla temperature extremes were more than a month ahead of ENSO temperature extremes.
Since mid-2011, the two curves do not agree well:
* A La Nina in summer 2011-12 that was very weak produced the deepest of all troughs in Manilla temperature.
* An El Nino in winter 2012 resulted in heat at Manilla, but not until four months later.
* In spring 2013, when there was no El Nino at all, Manilla had a heat wave just like those with the El Nino’s of 2002 and 2009, .
The record for ENSO since January 2013 is unlike that earlier this century: it flutters rather than cycles.
To show slower changes, I have drawn cubic trend lines for both of the variables. These also agree closely, with ENSO going from a maximum (2004) to a minimum (2011) seven years later. Manilla temperature trends remained ahead of ENSO temperature trends by one or two years.

Rainfall

ENSO and Manilla NSW rainfall anomalies over sixteen years.

Continue reading

Extreme Droughts by Decade at Manilla

Extreme droughts per decade at Manilla NSW

The record of extreme droughts at Manilla, NSW, relates to the Southern Oscillation only now and then, and relates to global warming not at all.

This graph shows some of the same data as I presented earlier in the post “Manilla’s Record of Droughts”. The graph there showed precise dates, but it was hard to tell when extreme droughts were more or less frequent. This graph adds up the number of months of extreme drought in each decade. There are separate columns (getting progressively redder) for extreme droughts of duration 3 months, 1 year, 3 years, and 10 years.
Extreme droughts of 10-year duration occurred only in the 1920’s and 1940’s.
Extreme droughts of 3-year duration occurred in the 1910’s, 1940’s, and 1960’s.
Extreme droughts of 1-year duration occurred in the 1900’s, 1940’s, 1960’s and 2000’s.
Extreme droughts of 3-month duration occurred in the 1880’s, 1900’s, 1910’s, 1920’s, 1940’s, 1970’s and 2000’s.
No extreme droughts at all occurred in five of the fourteen decades: the 1890’s, 1930’s, 1980’s, 1990’s, and 2010’s.

Relation to the Southern Oscillation Index

I posted this graph of cumulative values of the SOI earlier.

SOI CUSUM plot

The record of the Southern Oscillation Index relates to the Manilla record of extreme rainfall deficiency only now and then. Persistent El Niños from 1911 to 1915 seem to relate to four months in the decade of the 1910’s having extreme 3-year droughts, carrying forward to two months in the 1920’s having extreme 10-year droughts. Similarly, the catastrophic droughts of short to very long duration in the 1940’s relate to El Niños that persisted from 1939 to 1942.
Other major El Niño events did not produce extreme droughts at Manilla: those of 1896, 1982, and 1997.
Long term trends in the Southern Oscillation Index do not predict Manilla’s extreme droughts at all. The 1940’s droughts Continue reading

Hammering Global Warming Into Line

Global temp and IPO graph

In my post of 18 Sep 2014 “The record of the IPO”, I showed a graph of the Inter-decadal Pacific Oscillation,plotted as a cumulative sum of anomalies (CUSUM). This CUSUM plot has a shape that makes it seem that it could be used to straighten the dog-leg (zig-zag) trace of global temperature that we see. A straighter trace of global warming would support the claim that a log-linear growth in carbon dioxide emissions is the main cause of the warming.

My attempt to straighten the trace depends on the surmise (or conjecture) that the angles in the global temperature record are caused by the angles in the IPO CUSUM record. That is, the climatic shifts that appear in the two records are the same shifts.
I have adopted an extremely simple model to link the records:
1. Any global temperature changes due to the Inter-decadal Pacific Oscillation are directly proportional to the anomaly. (See Note 1.);
2. Temperature changes driven by the IPO are cumulative in this time-frame.

To convert IPO CUSUM values to temperature anomalies in degrees, they must be re-scaled. By trial and error, I found that dividing the values by 160 would straighten most of the trace – the part from 1909 to 2008. (See Note 2.) The first graph shows (i) the actual HadCRUT4 smoothed global temperature trace, (ii) the re-scaled IPO CUSUM trace, and (iii) a model global temperature trace with the supposed cumulative effect of the IPO subtracted.


The second graph compares the actual and model temperature traces. I note, in a text-box, that the cooling trend of the actual trace from 1943 to 1975 has been eliminated by the use of the model.
The graph includes a linear trend fitted to the model trace for the century 1909 to 2008, with its equation: y = 0.0088x – 0.9714 and R² = 0.9715.

Continue reading

The record of the IPO

Graphical record of the IPO, plus CUSUM plot and climate shift dates

My post showing shifting trends in world surface temperature and in carbon emissions brought a suggestion from Martin Shafer that allowing for the PDO could straighten the trend. I think that perhaps it could, but I have tried the IPO (Inter-decadal Pacific Oscillation) rather than the PDO (Pacific inter-Decadal Oscillation). (See below.)

Along the top of the graph I have marked in the climate shifts that prevent the trace of world temperature from being anything like a straight line. The blue line is the IPO, as updated to 2008.
The IPO is positive in the space between the last two climate shifts, negative in the next earlier space, and positive in part of the space before that. By plotting the CUSUM values of the IPO (red), it is clear that the pattern of the IPO relates very closely to the climate shift dates. Four of the seven extreme points of the IPO CUSUM trace match climate shifts. In addition, since 1925, the CUSUM trace between the sharply-defined extreme points has been a series of nearly straight lines. These represent near-constant values of the IPO, a rising line representing a positive IPO and a falling line a negative one.

As shown by the map in the Figure copied below, a positive extreme of the IPO has higher than normal sea surface temperatures in the equatorial parts of the Pacific. Could the transfer of heat from the ocean to the atmosphere be enhanced at such times?

This conjecture is developed in the post “Hammering Global Warming Into Line”.


The PDO and the IPO

The PDO is the Pacific Decadal Oscillation (or Pacific inter-Decadal Oscillation). It is one of a number of climate indicators that rise and fall over periods of a decade or more. These indicators have been introduced by different research groups at different times.
A current list of such indicators is in the contribution of Working Group I to the Fifth Assessment Report (5AR) of the Intergovernmental Panel on Climate Change (IPCC). The list is in Chapter 2 (38MB). It is at the end, in a special section: “Box 2.5: Patterns and Indices of Climate Variability”. Continue reading