Wednesday, September 25, 2013

MATLAB startup.m

Add these lines to your Matlab startup.m file for some amusement in life:

clc
s = RandStream('mt19937ar','Seed','shuffle');
RandStream.setGlobalStream(s);
disp('Why? Because...')
why

clear all

Wednesday, May 15, 2013

Should I fill that survey?

I was one of "very few people" selected to complete a survey for some research study. In return for the time spent clicking radio buttons and checkboxes I was offered entry into a raffle: 2 prizes of \$150 each and 4 prizes of \$50 each. At first sight my reaction was: That's quite a reward!

But, wait a minute: It looked like a really long and tedious survey to fill out, each page laid out in a 9 pt font, with over 10 pages of multiple-choice and open ended questions. It would easily take me 30 minutes from start to finish! Considering the work involved, I did not expect more than 100 people to complete the survey. Was it worth the effort? Let's check:

$$
E[\mbox{prize money}] = \frac{2}{100}\cdot 150 + \frac{4}{100}\cdot 50 = $5.
$$

*On average*, this deal looks like a $10/hr hourly job. I'll pass!

Wednesday, March 20, 2013

Type I or Type II?

The following table has helped me avoid confusing Type I and Type II errors. Write the true hypothesis numbers on the left, and the ones announced on top.



Announced 0
Announced 1
Actually 0
No error
Type I
Actually 1
Type II
No error

Now read out the row and column headers as a binary number:

  • 01 (which is 1 in binary) means H0 was true and we announced H1. That is error Type I.

  • 10 (which is 2 in binary) means H1 was true and we announced H0. That is error Type II.

(Or, perhaps, it is easier to remember that Type I is a false alarm and Type II is a miss.)

Sunday, March 3, 2013

Lost in Translation

I recently came across a translation of a verse from Atharvaveda when browsing through a book on Hindusim by A. L. Herman [1]. Herman cites Maurice Bloomfield [2] as the source of this translation.

As the best of the plants thou art reputed, O herb: turn this man for me to-day into a eunuch that wears his hair dressed!
Turn him into a eunuch that wears his hair dressed, and into one that wears a hood! Then Indra with a pair of stones shall break his testicles both!
O eunuch, into a eunuch thee I have turned; O castrate, into a castrate thee I have turned; O weakling, into a weakling thee I have turned! A hood upon his head, and a hair-net do we place.
The two canals, fashioned by the gods, in which man's power rests, in thy testicles . . . . . . . . . . . . I break them with a club.
As women break reeds for a mattress with a stone, thus do I break thy member.

There is no surprise that Herman refers to these verses as "grisly incantations" that were used against an enemy by specially requesting a priest who was aware of both the whereabouts of this magical herb and the exact procedure of casting this spell.

A similar grisly translation, although with a slightly different interpretation can be found in the book by Griffith (which is freely available [3]):

O Plant, thy fame is spread abroad as best of all the herbs that grow.
  Unman for me to-day this man that he may wear the horn of hair.
Make him a eunuch with a horn, set thou the crest upon his head.
  Let Indra with two pressing-stones deprive him of his manly strength.
I have unmanned thee, eunuch! yea, impotent! made thee impotent, and robbed thee, weakling! of thy strength.
  Upon his head we set the horn, we set the branching ornament.

Griffith decodes this verse as being used by a woman to curse an unfaithful lover.

A translation of the Atharvaveda to Hindi by the modern-day Hindu saint Shriram Sharma also follows this line of interpretation and roughly goes as follows (I am paraphrasing this from Hindi [4]):

O herb, you are the best of all herbs! Render my rival impotent, turn him into a eunuch. 
O herb, turn our enemies into eunuchs. May Indra crumble their organ of manliness and cause the hair on their heads to grow long like a woman. 
O enemy, through this ritual we have turned you into a eunuch, and dressed you up like a woman. We break the veins that go into your testicles. Just as women break weeds and reeds with a stone, we break the effect of your testicles.


Finally, let us look at Dayananda Saraswati's interpretation of the same verse in his Hindi translation [5] which I paraphrase below:

O medicinal herb, you are supreme and renowned! Please turn this weakling into a useful man for me. 
Turn the weakling into a useful man, make him active and hardworking.
And O powerful doctor! Break the illness in this man's testicles with two mighty rock-like weapons.
O debilitating illness, I have rendered you ineffective! I have rendered you powerless. We place ornaments and the strength to work on this man's forehead, who has been treated of his disease.


The first three translations seem to agree more or less, but this last one is a mysteriously distinct interpretation!

Appendix

The original in Sanskrit is as follows.

त्वं वीरुधां श्रेष्ठतमाभिश्रुतास्योषधे ।
इमं मे अद्य पूरुषं क्लीबमोपशिनं कृधि ।।
क्लीबं कृध्योपशिनमथो कुरीरिणं कृधि । 
अथास्येन्द्रो ग्रावभ्यामुभे भिनत्त्वाण्ड्यौ ।।
क्लीबं क्लीबं त्वाकरं वध्रे वध्रिं त्वाकरमरसारसं त्वाकरम् । 
कुरीरमस्य शीर्षणि कुम्बं चाधिनिदध्मसि ।।
ये ते नाड्यौ देवकृते ययोस्तिष्ठति वृष्ण्यम् । 
ते ते भिनद्मि शम्ययामुष्या अधि मुष्कयोः ।।
यथा नडं कशिपुने स्त्रियो भिन्दन्त्यश्मना । 
एवा भिनद्मि ते शेपोSमुष्या अधि मुष्कयोः ।।

References

[1] A. L. Herman, A Brief Introduction to Hindusim, Westview Press, 1991.
[2] M. Bloomfield, Translation of the Atharva-veda, Sacred Books of the East, Vol. 42, 1897. Retrieved from http://www.sacred-texts.com/hin/sbe42/av131.htm
[3] Ralph T. H. Griffith, Hymns of the Atharvaveda, 1895. Retrieved from http://www.sacred-texts.com/hin/av/av06138.htm
[4] Shriram Sharma, Atharvaveda Samhita, Yuga Nirman Yojana, Mathura, India, 2005. Retrieved from http://literature.awgp.org/hindibook/vedPuranDarshan/atharvaved/athaveda1b.99
[5] Dayanada Sarasvati, ``Atharvaveda Bhashabhashyam,'' c.a. 1975. Retrieved from http://www.aryasamajjamnagar.org/athrvaveda/atharvaveda.htm

All links last accessed on August 18, 2012. 

Saturday, February 23, 2013

The tale of intel reports and tail events

I remember posting something about bomb blasts on one occasion expressing utter dismay at how people have become so insensitive to such incidents. This is my second post on the same topic, this time with a completely disinterested and technical disposition.

Time and again we come across a news article claiming that the Center had warned the local or the state police about a possible terrorist attack. The recent Hyderabad blasts are no exception (see [1] for instance). The local government can take appropriate action based on such information provided by central intelligence teams. This time around the Chief minister decided not to take the information too seriously (``Andhra Pradesh Chief Minister Kiran Kumar Reddy had said those were general alerts which often keep coming from the Centre'' [1]).

I am not sure of the format in which such information is transmitted to the local ministers but let me hazard a few guesses: perhaps it is a lengthy report in teletype font typewritten on legal size eggshell finish papers, or maybe a frantic last minute phone call from PM Manmohan Singh saying something that is roughly equivalent to ``Take extra care dude.'' In any case, the minister (Mr. NKK Reddy, in the present case) probably makes up a numerical measure of the seriousness of the intel report.

Let us suppose that a report comes in every day and Mr. Reddy decides to assign a random positive number as the ``alert level of the day.'' This procedure may be as simple as just counting the number of pages in the daily intel report. Now some notation: let $X_n$ denote the alert level for the day $n$. Assume $X_n$ is exponentially distributed with a (normalized) mean of $\lambda=1$. Mr. Reddy likes being conservative---he sees no point in putting the state on high alert just because one particular $X_n$ came out really big. After all, when one observes long sequences of random values, it is quite natural to expect some really huge readings, purely out of luck.

Let $M_n = \max \{X_1, X_2, \cdots, X_n\}$ denote the largest of this sequence of i.i.d. exponentials. It is easy to derive the distribution $F_n(\cdot)$ of $M_n$ in terms of the common distribution $G(\cdot)$ of the i.i.d. exponential random variables:
\begin{array}{rcl}
F_n(t) &:=& Pr(M_n \leq t) \\
           &=& Pr(\max\{X_1,\cdots, X_n\} \leq t) \\
           &=& Pr(\bigcap_{i=1}^n \{X_i \leq t\}) \\
           &=& \prod_{i=1}^n Pr(X_i \leq t) \\
           &=& (G(t))^n \\
           &=& (1-\exp(-t))^n,
\end{array}
for $t\geq 0$.

In order to be conservative and stay on budget, Mr. Reddy can raise an alarm only if the alert number he sees is in the $\alpha \%$ tail of the distribution of the maximum. Suppose $n=822$, which is how many days Mr. Reddy has been the CM [2]. $(100-\alpha) \%$ area under the density of $M_{822}$ can be calculated: $t_\alpha = F_{822}^{-1}(1-\alpha/100)$. If $\alpha = 5\%$, for example, then Mr. Reddy should pronounce high alert when he sees a number at least as big as 9.68197. See Figure below and also see [3].

Figure: CDF of $M_n$ with location of 5% tail (click image for larger view)


I agree, this is an oversimplified (and probably useless) approach. But embellishments are possible. Foremost, the i.i.d. assumption may not be true. Some contextual information may be incorporated to create a mild correlation structure between the $X_n$'s. The other weakness in this model is that rare events like bomb blast may occur even when the alert level is quite low. (Remember what the great logodaedalist Mr. Chidambaram said in July 2011---the absence of intelligence is not failure of intelligence [4].)

References
[1] ``Specific alert was sent to Hyderabad Thursday morning: Centre'' in The Hindu, as reported by PTI available here.
[2] Wikipedia page on a list of Andhra Pradesh chief ministers is here.
[3] A calculation using Wolfram Alpha here.
[4] ``No intel about blasts: Chidambaram,'' July 14, 2011. Available here
(Note: All links last accessed on Feb 23, 2013.)

Sunday, January 20, 2013

Lake Mendota Freeze Durations


Q. Is the freeze duration of Lake Mendota decreasing over the years?

Freeze duration data for Lake Mendota and other Madison lakes is available online. $N=156$ data points for Lake Mendota are available starting in the year 1855 [1]. Let us run a simple statistical test to check for a negative slope trend.

Consider a linear model, i.e.,
$$ y_n = \beta_1 n + \beta_0 + w_n, $$
where $y_n$ denotes the freeze duration at any data point $0 \leq n \leq N-1$ and $w_n$ is i.i.d. Gaussian noise with zero mean and unknown variance. The regression coefficients $\hat{\beta_0}$ and $\hat{\beta_1}$ can be calculated from data using linear regression (see Figure below).

Now, setting up the hypotheses:
$$
H_0: \beta_1 = 0
$$
and
$$
H_1: \beta_1 < 0.
$$

Figure: Linear fit to Lake Mendota freeze duration data


It is known that $$t = \frac{\hat{\beta_1}}{SE}$$ follows a t-distribution with $N-2$ degrees of freedom. Here $$SE = \sqrt{\frac{\frac{1}{N-2}\sum_{n=0}^{N-1} (y_n - \hat{\beta_1} n- \hat{\beta_0} )^2}{\;\;\;\;\;\;\;\sum_{n=0}^{N-1} \left(n-\frac{(N-1)(N-2)}{\;\;\;\;\;2N}\right)^2}}$$ is called the standard error [2].

A. Unless there is an algebra error in my calculations, the t-test shows that there is statistically significant negative slope trend in this data (at both 1% and 5% significance levels).

References:
[1] UW-Madison Department of Atmospheric and Oceanic Sciences. URL: http://www.aos.wisc.edu/~sco/lakes/Mendota-ice.html (Accessed: Jan 20, 2012)
[2] T. Tarpey, Course Notes for Stat-630 at Wright State University. URL: http://www.wright.edu/~thaddeus.tarpey/stt630chap8.pdf (Accessed: Jan 20, 2012).


Appendix:
I have copied data from [1] in a .csv file for easy access. You can download it here. And here's some Octave code. Make sure the .csv file is in the same directory.

data = csvread('MendotaFreezeDurations.csv');
N = numel(data(:,1));
ply = polyfit(data(:,1), data(:,2), 1);
% ply = [-0.18829   117.82053]
SSE = sum((data(:,2)-polyval(ply, data(:,1))).^2);
% SSE =  4.6469e+04
MSE = SSE/(N-2);
SE = sqrt(MSE/sum( (data(:,1)-mean(data(:,1))).^2 ));
% SE =  0.030491
tstatistic = ply(1)/SE;
% tstatistic = -6.1754
tthresh1 = -tinv(0.99,N-2)
tthresh5 = -tinv(0.95,N-2)
if tstatistic < tthresh1, 

  disp('H0 rejected at 1%'); 
else, 
  disp('H0 accepted at 1%'); 
end;
if tstatistic < tthresh5, 

  disp('H0 rejected at 5%'); 
else, 
  disp('H0 accepted at 5%'); 
end;
% H0 is rejected in both cases. 

% There is a statistically significant 
% downward sloping trend.
 

plot(data(:,1), data(:,2),... 
  data(:,1), polyval(ply, data(:,1)));
axis tight
ylabel('freeze duration (days)')
xlabel('data point')
title('lake Mendota freeze duration linear regression')
print -depsc freezedurplot.eps