Q. Is the freeze duration of Lake Mendota decreasing over the years?
Freeze duration data for Lake Mendota and other Madison lakes is available online. $N=156$ data points for Lake Mendota are available starting in the year 1855 [1]. Let us run a simple statistical test to check for a negative slope trend.
Consider a linear model, i.e.,
$$ y_n = \beta_1 n + \beta_0 + w_n, $$
where $y_n$ denotes the freeze duration at any data point $0 \leq n \leq N-1$ and $w_n$ is i.i.d. Gaussian noise with zero mean and unknown variance. The regression coefficients $\hat{\beta_0}$ and $\hat{\beta_1}$ can be calculated from data using linear regression (see Figure below).
Now, setting up the hypotheses:
$$
H_0: \beta_1 = 0
$$
and
$$
H_1: \beta_1 < 0.
$$
It is known that $$t = \frac{\hat{\beta_1}}{SE}$$ follows a t-distribution with $N-2$ degrees of freedom. Here $$SE = \sqrt{\frac{\frac{1}{N-2}\sum_{n=0}^{N-1} (y_n - \hat{\beta_1} n- \hat{\beta_0} )^2}{\;\;\;\;\;\;\;\sum_{n=0}^{N-1} \left(n-\frac{(N-1)(N-2)}{\;\;\;\;\;2N}\right)^2}}$$ is called the standard error [2].
A. Unless there is an algebra error in my calculations, the t-test shows that there is statistically significant negative slope trend in this data (at both 1% and 5% significance levels).
References:
[1] UW-Madison Department of Atmospheric and Oceanic Sciences. URL: http://www.aos.wisc.edu/~sco/lakes/Mendota-ice.html (Accessed: Jan 20, 2012)
[2] T. Tarpey, Course Notes for Stat-630 at Wright State University. URL: http://www.wright.edu/~thaddeus.tarpey/stt630chap8.pdf (Accessed: Jan 20, 2012).
Appendix:
I have copied data from [1] in a .csv file for easy access. You can download it here. And here's some Octave code. Make sure the .csv file is in the same directory.
data = csvread('MendotaFreezeDurations.csv');
N = numel(data(:,1));
ply = polyfit(data(:,1), data(:,2), 1);
% ply = [-0.18829 117.82053]
SSE = sum((data(:,2)-polyval(ply, data(:,1))).^2);
% SSE = 4.6469e+04
MSE = SSE/(N-2);
SE = sqrt(MSE/sum( (data(:,1)-mean(data(:,1))).^2 ));
% SE = 0.030491
tstatistic = ply(1)/SE;
% tstatistic = -6.1754
tthresh1 = -tinv(0.99,N-2)
tthresh5 = -tinv(0.95,N-2)
if tstatistic < tthresh1,
disp('H0 rejected at 1%');
else,
disp('H0 accepted at 1%');
end;
if tstatistic < tthresh5,
disp('H0 rejected at 5%');
else,
disp('H0 accepted at 5%');
end;
% H0 is rejected in both cases.
% There is a statistically significant
% downward sloping trend.
plot(data(:,1), data(:,2),...
data(:,1), polyval(ply, data(:,1)));
axis tight
ylabel('freeze duration (days)')
xlabel('data point')
title('lake Mendota freeze duration linear regression')
print -depsc freezedurplot.eps
Freeze duration data for Lake Mendota and other Madison lakes is available online. $N=156$ data points for Lake Mendota are available starting in the year 1855 [1]. Let us run a simple statistical test to check for a negative slope trend.
Consider a linear model, i.e.,
$$ y_n = \beta_1 n + \beta_0 + w_n, $$
where $y_n$ denotes the freeze duration at any data point $0 \leq n \leq N-1$ and $w_n$ is i.i.d. Gaussian noise with zero mean and unknown variance. The regression coefficients $\hat{\beta_0}$ and $\hat{\beta_1}$ can be calculated from data using linear regression (see Figure below).
Now, setting up the hypotheses:
$$
H_0: \beta_1 = 0
$$
and
$$
H_1: \beta_1 < 0.
$$
Figure: Linear fit to Lake Mendota freeze duration data |
It is known that $$t = \frac{\hat{\beta_1}}{SE}$$ follows a t-distribution with $N-2$ degrees of freedom. Here $$SE = \sqrt{\frac{\frac{1}{N-2}\sum_{n=0}^{N-1} (y_n - \hat{\beta_1} n- \hat{\beta_0} )^2}{\;\;\;\;\;\;\;\sum_{n=0}^{N-1} \left(n-\frac{(N-1)(N-2)}{\;\;\;\;\;2N}\right)^2}}$$ is called the standard error [2].
A. Unless there is an algebra error in my calculations, the t-test shows that there is statistically significant negative slope trend in this data (at both 1% and 5% significance levels).
References:
[1] UW-Madison Department of Atmospheric and Oceanic Sciences. URL: http://www.aos.wisc.edu/~sco/lakes/Mendota-ice.html (Accessed: Jan 20, 2012)
[2] T. Tarpey, Course Notes for Stat-630 at Wright State University. URL: http://www.wright.edu/~thaddeus.tarpey/stt630chap8.pdf (Accessed: Jan 20, 2012).
Appendix:
I have copied data from [1] in a .csv file for easy access. You can download it here. And here's some Octave code. Make sure the .csv file is in the same directory.
data = csvread('MendotaFreezeDurations.csv');
N = numel(data(:,1));
ply = polyfit(data(:,1), data(:,2), 1);
% ply = [-0.18829 117.82053]
SSE = sum((data(:,2)-polyval(ply, data(:,1))).^2);
% SSE = 4.6469e+04
MSE = SSE/(N-2);
SE = sqrt(MSE/sum( (data(:,1)-mean(data(:,1))).^2 ));
% SE = 0.030491
tstatistic = ply(1)/SE;
% tstatistic = -6.1754
tthresh1 = -tinv(0.99,N-2)
tthresh5 = -tinv(0.95,N-2)
if tstatistic < tthresh1,
disp('H0 rejected at 1%');
else,
disp('H0 accepted at 1%');
end;
if tstatistic < tthresh5,
disp('H0 rejected at 5%');
else,
disp('H0 accepted at 5%');
end;
% H0 is rejected in both cases.
% There is a statistically significant
% downward sloping trend.
plot(data(:,1), data(:,2),...
data(:,1), polyval(ply, data(:,1)));
axis tight
ylabel('freeze duration (days)')
xlabel('data point')
title('lake Mendota freeze duration linear regression')
print -depsc freezedurplot.eps