Education and Workshops

Analysis of Variance (ANOVA)

The Summary from class is outlined below. It outlines our textbook, " Lind. (2005). Statistical techniques in business & economics (11th ed). New York: McGraw-Hill, Chapter 8 &(12th ed.) Chapter. 12."

Analysis of Variance (ANOVA) is use under these conditions.

The sampled populations follow the normal distribution.
The populations have equal standard deviations.
The samples are randomly selected and are independent.
Testing two or more samples

Characteristics of F-Distribution:

The F distribution is used to test whether two or more sample means came from the same or equal populations
There is a Family of F-Distributions
Each member of the family is determined by two parameters: the numerator degrees of freedom and the denominator degrees of freedom.
F cannot be negative, and it is a continuous distribution.
The F distribution is positively skewed.
Its values range from 0 to infinity . As F goes to infinity the curve approaches the X-axis

Contributed by Gary Hale, UoP, 2005
Chapter 12: Analysis of Variance (ANOVA)

The probability distribution used in this chapter is the F distribution. This probability distribution is used as the distribution of the test statistic for many situations. It is used to test whether two samples are from populations having equal variances, and it is also applied when there is needed a comparison of several population means simultaneously. The simulation comparison of several population means is called analysis of variance (ANOVA). In both of these situations, the populations must follow a normal distribution, and the data must be at least interval-scale (Lind, D.M., Marchal, W.G. & Wathen, S.A., 2004).

There are five characteristics of the F distribution:

There is a “family” of F distributions. A particular member of the family is determined by two parameters: the degrees of freedom in the numerator and the degrees of freedom in the denominator. The shape of the distribution is illustrated by the graph below. There is one F distribution for the combination of 29 degrees of freedom in the numerator and 28 degrees of freedom in the denominator. There is another F distribution for 19 degrees in the numerator and 6 degrees of freedom in the denominator. The shape of the curves change as the degrees of freedom change (Lind et al., 2004).

The F distribution is continuous, which means that it can assume an infinite number of values between zero and positive infinity (Lind et al., 2004).
The F distribution can not be negative. The smallest value F can assume is 0.
(Lind et al., 2004).
It is positively skewed. The long tail of the distribution is to the right hand side.
As the number of degrees of freedom increases in both the numerator and
denominator the distribution approaches a normal distribution (Lind et al., 2004).
It is asymptotic. As the values of X increases, the F curve approaches the X axis
but never touches it. This is similar to the behavior of the normal distribution
(Lind et al., 2004).

The F distribution is used to test the hypothesis that the variance of one normal population equals the variance of another normal population. The F distribution is also used to test assumptions for some statistical tests. To use that test we assume that the variances of two normal populations are the same. The F distribution provides a means for conducting a test regarding the variances of two normal populations (Lind et al., 2004).

Regardless of whether we want to determine whether one population has more variation than another population or validate an assumption for a statistical test, we first state the null hypothesis. The null hypothesis is that the variance of one normal population equals the variance of the other normal population. The alternate hypothesis could be that the variances differ. The null hypothesis and the alternate hypothesis for a two sided test are:

To conduct the test, we select a random sample of n1 observation from one population, and a sample of n2 observations from the second population. The statistic is defined as shown:

The terms s2/1 and s2/2 are the respective sample variances. If the null hypothesis is true, the test statistic follows the F distribution with n1 – 1 and n2 – 1 degrees of freedom. In order to reduce the size of the table of critical values, the larger sample variance is placed in the numerator; therefore, the tabled F ratio is always larger than 1.00. The right-tail critical value is the only one required. The critical value of F for a two-tailed test is found by dividing the significance level in half (a/2) and then referring to the appropriate degrees of freedom (Lind et al., 2004).

The usual practice is to determine the F ratio by putting the larger of the two sample variances in the numerator. This will force the F ratio to be at lease 1.00. This allows us to always use the right tail of the F distribution, avoiding the need for more extensive F tables (Lind et al., 2004).

Test for Equal Variances

For the two tailed test (two sample variances are given)

The null hypothesis is rejected if the computed value of the F test statistic is greater than the critical value from the table.

Analysis of Variance Procedure

Ho = The Null Hypothesis states the population means are the same.
Ha = The Alternative Hypothesis says that at least one of the means is different.
The Test Statistic is the F distribution.
The Decision rule is to reject the null hypothesis if F (computed) is greater than F (table) with numerator and denominator degrees of freedom.

To Compute the F value we would do the following steps. To simplify the measures use the ANOVA Table

Source	Sum of Squares	Degrees of Freedom	Means Squares	F	P
Treatments	SST	k-1	SST/(k-1) = MST	MST/MSE	P
Error	SSE	n-k	SSE/(n-k) = MSE
Total	SStotal	n-1

If there are k populations being sampled, the numerator degrees of freedom is k – 1.
If there are a total of n observations, the denominator degrees of freedom is n – k.
The test statistic is computed by:

SS Total (SSTot) is the total sum of squares.

SST is the treatment sum of squares

Tc is the column total,

nc is the number of observations in each column,

Sum of X is the sum of all the observations, and

n the total number of observations.

SSE is the sum of squares error

SSE = SStotal -SST

We then make Inferences About Treatment Means:

If and when we reject the null hypothesis, we are saying we can not prove that the means are equal. When more than two samples, we may want to know which treatment means differ. One of the simplest procedures to tell which treatment means differ is through the use of confidence intervals.
Confidence Interval for the Difference Between Two Means: where t is obtained from the t table with degrees of freedom (n - k).

MSE = [SSE/(n - k)]

Two-Factor ANOVA

For the two-factor ANOVA we test whether there is a significant difference between the treatment effect and whether there is a difference in the blocking effect.
Let Br be the block totals (r for rows)
Let SSB represent the sum of squares for the blocks where:

Source	Sum of Squares	Degrees of Freedom	Means Squares	F	P
Treatments	SST	k-1	SST/(k-1) = MST	MST/MSE	P
Error	SSB	b-1	SSB/(b-1n-k) = MSB	MSB/MSE
	SSE	(k-1) (b-1)	SSE/((k-1)(b-1) = MSE
Total	SStotal	n-1