Descriptive
Statistics
Graphical Methods to Explain Statistics
Frequency distribution - a grouping of data into
mutually exclusive categories showing the number of observations in each class.
- Class midpoint: A point that divides a class into two equal parts.
This is the average of the upper and lower class limits.
- Class frequency: The number of observations in each class.
- Class interval: The class interval is obtained by subtracting the
lower limit of a class from the lower limit of the next class. The
class intervals should be equal.

- Relative frequency distribution shows the percentage of
observations in each class.
Stem and Leaf Plot - a statistical
technique for displaying a set of data. Each numerical value is divided into two
parts: the leading digits become the stem and the trailing digits the
leaf. One advantage over the frequency distribution is you keep the
identity of the values.

Histogram - a graph in which the classes are marked on
one axis and the class frequencies on the other axis. The class frequencies are
represented by the length of the bars, drawn adjacent to each other.

Steps for a Histogram
-
Count number of observation in the sample
-
Determine the range of values in the sample
-
Determine the class interval size
-
Establish class midpoints and boundaries
-
Determine class boundaries
-
Tally numbers of observations to fall in each class
-
Construct histogram
Class Width = Range /(number of classes you want)
Class Relative Frequency = (Class Frequency) / (Total Number of
Measurements (n))
Frequency polygon - consists of
line segments connecting the points formed by the class midpoint and the class
frequency.

Cumulative frequency distribution - used to determine
how many or what proportion of the data values are below or above a certain
value.

Bar chart - used to depict any of the
levels of measurement (nominal, ordinal, interval, or ratio).

Pie chart - useful for displaying a
relative frequency distribution.

Calculation Methods to Explain Simple Statistics
Measures of Central Tendency - describe and locate the
center of the data
- Mean (average of data set)
- Sample mean (Xbar)

Where n is the total number of values in the sample.
What would be the effect on the mean
if all observations are multiplied by the same constant? The
mean is also multiplied by the constant. Same goes with division.
Three Reasons To Use a Sample Rather than a Population
Population:
(1) Too large to study
(2) Cost Might be Prohibitive
(3) Test Might be Destructive How long a
light bulb will burn
- Population Mean (m)
- where µ is the population mean.
- N is the total number of observations.
- X is a particular value.
- Summation Mark indicates the operation of adding.

- Median (Sample median = x^~ = middle value)
What is more relevant in real-estate, Mean or
Median? Median because that is where the
majority of the buyers would be for a specific neighborhood.
- Mode (measurement that occurs most often)
Simple Deviation:

Variance - is the spread or dispersion in the
probability associated with elements in some real space.
- A measure of the spread of a random variable based on the squared
deviations around a mean value.
- The square root of the variance is called the Standard deviation.
- The range of a random variable is proportional to its standard deviation.


What is the effect on the variance if all observations are multiplied by
the same constant? The variance is multiplied by the constant squared.
Measures of Variability


Notice the square and square roots between variance and standard deviation.
You have to square the values for one simple reason.
You are taking all of the variances from the mean. If you did not
square them they would add up to zero. The standard deviation is
where you take the square root to cancel out the effects of squaring the
numbers in the variance. The standard deviation is the real work
horse.

Standard Deviation
- Unit of measurement that tells us where the values lie in relationship to
one another (mean)
- Larger the Standard Deviation the wider the curve

I found this website when trying to get
information on my son's IQ test.
http://2enewsletter.com/arch%20Interpreting%20test%20results%208_04.htm.
It explains standard deviation in a very matter
of fact easy to understand way. And the
advantage is as people in education it is
something we all can relate to. I hope this
helps you understand this concept better. It
helped me.
By Theresa Kennedy (UoP 2005) |
Sources of Variability
- Lot-to-lot variability:
- Within-lot variability:
- Stream-to-stream variability
- Within-stream variability
- Time-to-time variability
- Within-time variability
- Piece-positional variability
- Piece-to-piece variability
The Empirical Rule for A Normally Distributed Population
-
68% of measurements are within 1 Standard Deviation from the
mean
- 95% of measurement are within 2 Standard Deviations from the mean
- 99.7% of the measurements are within 3 Standard Deviations from the
mean
- Nearly Every measurement is within +/-3 Standard Deviation form mean
Except Outliers. And that is only 3 sigma...think about 6 sigma.

Here is a link to University of Leicester and a good explanation
of Normally Distributed Populations.
http://www.le.ac.uk/biology/gat/virtualfc/Stats/normal.htm
Measures of Relative Standing:
Measures of Relative Standing are where measurements lie in
relation to one another. You can not always compare values from different
places. For example, would a "A" in statistics at Harvard carry
more weight than an "A" in statistics at Nashville State Tech?
It would not be hard to imagine that a person with a lower grade
in one school, could actually have a better education than a person with the
higher grade at another school.
Inter the Z score for relative standing.


If the Z Score tells us how many standard deviations a measure is away from
the mean, then a score of 1.50 would be 1 1/2 standard deviations to the
right, - 1.50 would be to the left and 0.0 would be on the mean.
When comparing two values it becomes important to read and understand the
problem.
- If more is better then to the right is better (larger +Z
Score).
- If less is better then to the left is better (larger Negative Z
Score).
- If closest to the mean is better then smallest absolute value (+ or -) may
be better (Z Score closest to zero).
Z Scores work with normal distributions. Remember not all data curves
are normal and perfectly bell shaped; some are skewed.
Skewedness:

I like the skiers....but...you could also think of it this way.
Negatively Skewed Values would have to be subtracted (-) from the Mean
Positively Skewed Values would be added (+) to the Mean
|