Simulations of estimators for extreme value distributions

On Sunday I blogged the new Stata program I wrote for applying extreme value theory. It includes a novel computation to reduce bias for the generalized extreme value distribution (GEV). To document the efficacy of that correction and the package as a whole, I set my computer to testing it on simulated data. Since Sunday the little Lenovo has run half a billion regressions. I simulated three distributions: the generalized Pareto distribution (GPD), the standard GEV, and the GEV extended to five order statistics instead of one. In all cases, N=50 and μ=ln⁡σ=0. ξ varies across the simulations from –0.5 to +1.0 in increments of 0.1. For each distribution and value of ξ, I ran 5000 Monte Carlo simulations, applying four variants of the estimator: Standard maximum likelihood (ML). ML followed by the Cox-Snell analytical bias correction if ξ^≥–0.2, where ξ^ is the ML estimate. (The correction diverges as ξ→–1/3.) I thus follow Giles, Feng, and Godwin, except that I still make the correction if ξ^≥1. ML followed by a parametric bootstrap bias correction with 100 replications. The same, but with 1000 replications. The graphs below show, for each distribution, the average bias, standard deviation, and root-mean-square error of the estimators for all parameters. Click them to see full-sized versions. In each graph the ξ-labeled horizontal axis is for the true value of that parameter while the μ, ln⁡σ, and ξ–labeled vertical axes are for the various measures of estimator accuracy and stability (mean, SD, RMSE). I reset the pseudorandom number generator to the same state for each value of ξ, creating a kinship between between the 5000 simulated...

New package for extreme value analysis in Stata

One topic I’m studying for my main client, the Open Philanthropy Project, is the risk of geomagnetic storms. I hadn’t heard of them either. Actually, they originate as solar storms, which hurl magnetically charged matter toward earth, jostling its magnetic field. Routine-sized storms cause the Auroras Borealis and Australis. Big ones happen roughly once a decade (1972, 1982, 1989, 2003, a near-miss in 2012…) and also mostly hit high latitudes. The worry: a really big one could send currents surging through long-distance power lines, frying hundreds of major transformers, knocking out power to continent-scale regions for months or even years, and causing an economic or humanitarian catastrophe. My best assessment at this point is that if one extrapolates properly from the available modern data, the risk is much lower than the 12%-chance-per-decade cited by the Washington Post last summer. But that’s a preliminary judgment, and I’m not a seasoned expert. And even if the risk is only 1%, it almost certainly deserves more attention. More from me on that in time. (For a mathematician’s doubts about the 12% figure see Stephen Parrott.) You can see “geomagnetic storms” beneath Cari Tuna’s elbow in this photo from a recent Post story about the Open Philanthropy Project: Geomagnetic storms constitute an extremely rich topic, encompassing (ha ha) solar physics, geophysics, the fundamentals of electromagnetism, dendrochronology, power system dynamics, transformer engineering…and statistics. The statistical question is: given the historical data on the severity and frequency of geomagnetic disruptions, what can we say about the probability per unit time of one at or beyond the frontier of historical experience? And that leads into the branch of statistics called extreme value theory. I think of it this way. A...