Descriptive Statistics

Graphical Methods to Explain Statistics

Frequency distribution - a grouping of data into mutually exclusive categories showing the number of observations in each class.

 

YouTube Video - Craig A. Stevens explains Frequency Distributions

 

  1. Class midpoint: A point that divides a class into two equal parts. This is the average of the upper and lower class limits.
  2. Class frequency: The number of observations in each class.
  3. Class interval: The class interval is obtained by subtracting the lower limit of a class from the lower limit of the next class.  The class intervals should be equal.

  1. Relative frequency distribution shows the percentage of observations in each class. 

Stem and Leaf Plot - a statistical technique for displaying a set of data. Each numerical value is divided into two parts: the leading digits become the stem and the trailing digits the leaf.  One advantage over the frequency distribution is you keep the identity of the values.

Histogram - a graph in which the classes are marked on one axis and the class frequencies on the other axis. The class frequencies are represented by the length of the bars, drawn adjacent to each other.

Steps for a Histogram

  1. Count number of observation in the sample 

  2. Determine the range of values in the sample 

  3. Determine the class interval size 

  4. Establish class midpoints and boundaries 

  5. Determine class boundaries 

  6. Tally numbers of observations to fall in each class 

  7. Construct histogram 

Class Width = Range /(number of classes you want) 

Class Relative Frequency = (Class Frequency) / (Total Number of Measurements (n))

Frequency polygon - consists of line segments connecting the points formed by the class midpoint and the class frequency.

Cumulative frequency distribution - used to determine how many or what proportion of the data values are below or above a certain value.

 

 

Bar chart - used to depict any of the levels of measurement (nominal, ordinal, interval, or ratio).

Pie chart - useful for displaying a relative frequency distribution. 

Calculation Methods to Explain Simple Statistics

Measures of Central Tendency - describe and locate the center of the data 

  1. Mean (average of data set) 
  2. Sample mean (Xbar) 

    Where n is the total number of values in the sample.

    What would be the effect on the mean if all observations are multiplied by the same constant? The mean is also multiplied by the constant. Same goes with division.

    Three Reasons To Use a Sample Rather than a Population Population: 

(1)    Too large to study 

(2)    Cost Might be Prohibitive 

(3)    Test Might be Destructive How long a light bulb will burn

  1. Population Mean (m) 
    1. where µ is the population mean. 
    2. N is the total number of observations. 
    3. X is a particular value. 
    4. Summation Mark indicates the operation of adding.

  2. Median (Sample median = x^~ = middle value) 

What is more relevant in real-estate, Mean or Median?  Median because that is where the majority of the buyers would be for a specific neighborhood.

  1. Mode (measurement that occurs most often)

Simple Deviation:

Variance - is the spread or dispersion in the probability associated with elements in some real space. 

  1. A measure of the spread of a random variable based on the squared deviations around a mean value. 
  2. The square root of the variance is called the Standard deviation. 
  3. The range of a random variable is proportional to its standard deviation.

 

 

What is the effect on the variance if all observations are multiplied by the same constant? The variance is multiplied by the constant squared.

Measures of Variability 

Notice the square and square roots between variance and standard deviation.

You have to square the values for one simple reason.  You are taking all of the variances from the mean.  If you did not square them they would add up to zero.  The standard deviation is where you take the square root to cancel out the effects of squaring the numbers in the variance.  The standard deviation is the real work horse.

Standard Deviation 

  1. Unit of measurement that tells us where the values lie in relationship to one another (mean) 
  2. Larger the Standard Deviation the wider the curve
 

I found this website when trying to get information on my son's IQ test. http://2enewsletter.com/arch%20Interpreting%20test%20results%208_04.htm.  It explains standard deviation in a very matter of fact easy to understand way. And the advantage is as people in education it is something we all can relate to. I hope this helps you understand this concept better. It helped me.

By Theresa Kennedy (UoP 2005)

Sources of Variability

  1. Lot-to-lot variability: 
  2. Within-lot variability: 
  3. Stream-to-stream variability 
  4. Within-stream variability 
  5. Time-to-time variability 
  6. Within-time variability 
  7. Piece-positional variability 
  8. Piece-to-piece variability

The Empirical Rule for A Normally Distributed Population

  1. 68% of measurements are within 1 Standard Deviation from the mean 

  2. 95% of measurement are within 2 Standard Deviations from the mean 
  3. 99.7% of the measurements are within 3 Standard Deviations from the mean 
  4. Nearly Every measurement is within +/-3 Standard Deviation form mean Except Outliers.  And that is only 3 sigma...think about 6 sigma.

Here is a link to  University of Leicester and a good explanation of Normally Distributed Populations.  

 http://www.le.ac.uk/biology/gat/virtualfc/Stats/normal.htm 

Measures of Relative Standing:

Measures of Relative Standing are where measurements lie in relation to one another.  You can not always compare values from different places.  For example, would a "A" in statistics at Harvard carry more weight than an "A" in statistics at Nashville State Tech?

It would not be hard to imagine that a person with a lower grade in one school, could actually have a better education than a person with the higher grade at another school.

Inter the Z score for relative standing.

If the Z Score tells us how many standard deviations a measure is away from the mean, then a score of 1.50 would be 1 1/2 standard deviations to the right,  - 1.50 would be to the left and 0.0 would be on the mean.

When comparing two values it becomes important to read and understand the problem.  

  • If more is better then to the right is better (larger +Z Score).  
  • If less is better then to the left is better (larger Negative Z Score).  
  • If closest to the mean is better then smallest absolute value (+ or -) may be better (Z Score closest to zero).  

Z Scores work with normal distributions.  Remember not all data curves are normal and perfectly bell shaped; some are skewed.

Skewedness:

I like the skiers....but...you could also think of it this way.

Negatively Skewed Values would have to be subtracted (-) from the Mean

Positively Skewed Values would be added (+) to the Mean