Statistical Symbols and Definitions

 

 

Nielson, Jakob (1997). The use and misuse of focus groups. Retrieved August 28, 2006

    from http://www.useit.com/papers/focusgroups.html.

 

Power Analysis (2003).  Retrieved August 28, 2006 from   

     http://www.statsoft.com/textbook/stpowan.html.

 

Michelle Jones, University of Phoenix, August 28, 2006

 

 

Symbols Provided by Tracie Williams-Algood, (UOPhx QNT 531, Fall 2004)

ANOVA (Analysis of Variance) - Link on ANOVA

Baye's Theorem - as Baye originally intended. (Submitted by Lisa Cox UOP)  http://www.selfknowledge.org/resources/press/nyt_eakin.htm

Blocking - A portion of the experimental material that should be more homogeneous than the entire set of material (days, shifts, some other group). - Used to increase the precision of an experiment. Rational subgroup selected so that if assignable causes are present, the chance for differences between subgroups will be maximized, while the chance for differences due to these assignable causes within a subgroup will be minimized.

Box Plots -- Used to view the mean differences and distributions of spread or variability of the data. - To determine if they appear equal.

Cause and Effect Diagram - How to Construct:
(1) Define the problem or effect to be analyzed;
(2) Form the team;
(3) Draw the effect box and center line;
(4) Specify the major potential cause categories and join them as boxes connected to center line;
(5) Identify the possible causes and classify them into the categories in step 4 and create new categories, if necessary;
(6) Rank order the cause to identify those that seem most likely to impact the problem;
(7) Take corrective action.

Central Limit Theorem - Regardless of the distribution of the population the sum of n independently distributed random variables is approximately normal. The approximation improves as n increases.

Measures of Central Tendency describe and locate the center of the data:

  1. Mean (average of data set) 

  2. Sample mean (Xbar) 

  3. Population Mean (m) 

  4. Median (Sample median = x^~ = middle value) 

  5. Mode (measurement that occurs most often)

By Steffanie (UoP 2005)

 

Statistics is defined as the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions.


Why are we studying statistics? Because we are all interested in profits, hours worked, and wages. No matter what your subject is, all are interested in what a typical value and how much variation there is the data.

 

Here are a few examples of statistics:

• The mean time waiting for a phone operator is 4 minutes
• The average starting salary of a college graduate is $22,000/yr
• The average number of deaths by Hurricane Katrina are 1,000

 

Why is Statistics required in so many majors? It is because numerical information is everywhere; we need the statistical techniques to make decisions that affect our daily lives, such as insurance for your life, home and automobile. Another reason is that medical researchers study the cure rates for diseases using different drugs and different forms of treatment. Last but not least, for environment protection. This agency is interested in the water qualities. They take samples of different lakes and rivers to establish the level of contamination and maintain the level of quality.


Another good reason to take a statistics course is that the knowledge of statistical methods will help you understand how decisions are made and give you a better understanding of how they affect you.
In chapter one, we learned that there are two types of statistics.

 

Descriptive statistics, are methods of organizing, summarizing, and presenting data in an informative way.  In chapter one, we also learned about variables. A qualitative variable is nonnumeric and is usually summarized in bar charts and graphs. With qualitative variables, we are usually interested in the number or percent of the observations in each of the categories. Quantitative variables are either a continuous variable that can assume any value within a specified range or a discrete variable that can assume only certain values, which have gaps.

 

Examples of qualitative variables:
• Gender
• Religious affiliation
• Type of automobile
• State of birth
• Eye color

 

When the variable studied can be reported numerically, the variable is called a Quantitative Variable.

Examples of Qualitative variables:
• The ages of company presidents
• The life of an automobile battery
• The balance in your checking account
• The number of children in a family

 

Quantitative variables are usually reported numerically.

 

There are 4 levels of measurements:

• Nominal Level
• Ordinal Level
• Interval Level
• Ratio Level

In the Nominal Level, there is no particular order to the categories. Observations of a variable can only be classified and counted. These are also mutually exclusive and Exhaustive.

Mutually Exclusive: A property of a set of categories such that an individual or object is included in only one category.

 

 

 

 

YouTube Video  -  Craig A. Stevens Explains Nominal Data During a Stats Class

 

 

On the Ordinal Level, which is the next highest level of data, one classification is “higher” of “better” than the next one.

 

 

 

YouTube Video  -  Craig A. Stevens Explains Ordinal Data During a Stats Class

 



The Interval level of measurement is the next highest level. It includes all the characteristics of the ordinal level, but in addition, the difference between values is a constant size.

The properties of the interval-level data are:
1. Data classifications are mutually exclusive and exhaustive.
2. Data classifications are ordered according to the amount of the characteristic they possess.
3. Equal differences in the characteristic are represented by equal differences in the measurements.

 

Examples of the Interval Scale of Measurement:
• Shoe Size
• IQ Scores
• Temperature

 

 

 

 

YouTube Video  -  Craig A. Stevens Explains Interval and Ratio Data During a Stats Class

 

 


The Ratio Level is the “highest” level of measurement. It has all the characteristics of the interval level, but in addition, the 0 point is meaningful and the ratio between two numbers is meaningful
 

Examples of the ratio Scale of Measurement:
• Units of production
• Wages
• Weight
• Height
 

The properties of the ratio-level data are:
1. Data classifications are mutually exclusive and exhaustive.

2. Data classifications are ordered according to the amount of the characteristics they possess.
3. Equal differences in the characteristic are represented by equal differences in the numbers assigned to the classifications.
4. The zero point is the absence of the characteristic.


Inferential statistics is the methods used to determine something about a population on the basis of a sample.

 

Population: The entire set of individuals or objects of interest or the measurements obtained from all individuals or objects of interest.


Sample: A portion, or part, of the population of interest

Exhaustive: A property of a set of categories such that each individual or object must appear in a category.

 

Common Cause Variability - Controlled Variation

 

Statistical Data

 

by Verta Session-Webb (UoPhx 2008)

 

Quantitative or numerical information may be found almost everywhere. However, not all quantitative information is regarded as statistical data. Quantitative information suitable for statistical analysis must be a set of numbers that are measurable and show significant relationships. In other words, statistical data are numbers that can be subjected to comparison, analysis, and interpretation. The area from which statistical data are collected is generally referred to as the population or universe. A population can be finite or infinite. A finite population has a limited number of objects or cases, whereas an infinite population has an unlimited number.

 

The task of collecting data from a small finite population is relatively simple. If it is desired to obtain a complete set of data on the monthly incomes of the college instructors in a university, we may simply ask each instructor his/her monthly income. However, collecting such data from a large population is sometimes impractical or nearly impossible.

 

In order to avoid the impractical or impossible task, a sample consisting of a group of representative items is usually drawn from the population. The sample is then used for statistical study and the findings from the sample are used as the basis to describe, estimate, or project the characteristics of the population.

 

 

Continuous vs Discrete Data

 

 

 

YouTube Video  -  Craig A. Stevens Explains Continuous vs Discrete Data

 

 

Control Charts- Shows graphically, the results of many sequential hypothesis tests. UCL = m + Ls, Center Line =m,
LCL = m - Ls, Benefits: Improve Productivity, Defect Prevention, Prevent Unnecessary Process Adjustments, Diagnostic, Provide Process Capability Information. Mathematically equivalent to a series of statistical hypothesis test. Average Run Length (ARL) = 1/p = average number of points that must be plotted before a point indicates an out-of-control condition. P= probability that any point exceeds the CL. Average time to signal (ATS) = ARL (# Hour Sample Taken) = Time elapses (on average) between shift and its detection. "X = Average" and Variability "R = Range" Charts.

The Process is out of control if:

  1. One point is out of 3-sigma control limits;

  2. Two out of three consecutive points plot beyond the 2-sigma warning limits;

  3. Four out of five consecutive points plot are a distance of 1-sigma or beyond from the center line;

  4. Eight consecutive points plot on one side of the center line;

  5. Six points in a row steadily increasing or decreasing;

  6. Fifteen points in a row in zone C (both above and below the center line);

  7. Fourteen points in a row alternating up and down;

  8. Eight points in a row both sides of the center line with none in zone C;

  9. An unusual or nonrandom pattern in the data;

  10. One or more points near a warning or control limit.

Collectively Exhaustive Events - Events are collectively exhaustive if at least one of the events must occur when an experiment is conducted.

Controlled Variation (common cause variability) - many inherent sources of variation that affect the process in a random manner. The distribution of process output does not change overtime; thus, the pattern of variation is consistent and stable. Result from a large number of chance causes. Distribution of process output is stable over time. Removable only by process redesign. Examples: Machine vibration, environmental fluctuations, and human variation in setting equipment.

Conway, William B., Ex-president of Nashua Corp. Founded Conway Quality, Inc. in 1983. Quality Management. Eliminate waste of material and time. Right way to management rather than simply how to improve. New system of management whose primary task is continuous improvement which means changing all the unwritten rules in a company. Advocate of statistical methods to achieve quality gains. 85% of all problems can be solved by simple tools only about 15% require more complicate statistical process control methods. -

6 Tools for Quality Improvement
(1) Human relation skills
(2) Statistical surveys
(3) Simple statistical techniques
(4) Statistical Process Control
(5) Imagineering
(6) Industrial Engineering

Crosby, Philip B. - Book Quality is Free, Defines quality as conformance to requirements and can only be measured by cost of non-conformance. Prevention (means perfection) is the word summing quality. No place for statistical levels of quality. Three ingredients - (1) determination, education, and implementation. Management's responsibility (should be as concerned about quality as profit).

Zero Defects as a management performance standard.
(1) make it clear that management is committed to quality.
(2) Form quality improvement teams with representatives form each department.
(3) Determine where current and potential quality problems lie.
(4) evaluate the cost of quality and explain its use as a management tool.
(5) Raise the quality awareness and personal concern of all employees.
(6) Take actions to correct problems identified through previous steps.
(7) Establish a committee for the zero defects program.
(8) Train supervisors to activity carry out their part of the quality improvement program.
(9) Hold a zero defects day to let all employees realize that there has been change.
(10) Encourage individuals to establish improvement goals for themselves and their groups.
(11) Encourage employees to communicate to management the obstacles they face in attaining their improvement goals.
(12) Recognize and appreciate those who participate.
(13) Establish quality councils to communicate on a regular basis.
(14) Do it all over again to emphasize improvement never ends.

Data, Types of - Variable (anything measurable) and attribute (counting data, good or bad, conforming or nonconforming).

Distribution, Probability - Mathematical model that relates the value of the variable with the probability of occurrence of the value in the population. Continuous Distribution - Variable being measured is expressed on a continuous scale. Discrete Distribution - Parameter being measured can only take on certain value, such as integers 0,1,2,….

Degrees of Freedom - Are the number of independent elements that go into a statistic.

Demings, Dr. W. Edwards - Believes responsibility for quality rests with management, and the power of statistical methods. "People who expect quick results are doomed to disappointment." Good quality is nor necessarily high quality. It is predictable degree of uniformity and dependability, at low cost and suited to the market. Statistical control doses not imply absence of defective items. It is a state of random variation, in which the limits or variation are predictable. Two type of variation: Chance and Assignable. Not enough to meet specifications; one has to keep working to reduce the variation. Advocate of worker participation in decision making. Management action is required to improve quality (94% of all issues). Inspection is designed to allow a certain number of defects to enter the systems. (Vendors should be under statistical control. Advocates single sourcing.

(1) Create a constancy of purpose focused on the improvement of products and services.
(2) Adopt a new philosophy of rejecting poor quality.
(3) Do not rely on mass inspection to control quality.
(4) Do not award business to suppliers of the basis of price alone but also consider quality.
(5) Focus on continuous improvement.
(6) Practice modern training methods and invest in training for all employees.
(7) Practice modern supervision methods.
(8) Drive out fear.
(9) Break down the barriers between functional areas of the business.
(10) Eliminate targets, slogans, and numerical goals for the workforce.
(11) Eliminate numerical quotas and work standards.
(12) Remove the barriers that discourage employees from doing their jobs.
(13) Institute an ongoing program of training and education for all employees.
(14) Create a structure in top management that will vigorously advocate the first 13 points.

Erlang - Developed queuing for handling telephone switchboard problems.

Error -- ÎI,j = Yi,j - Average Yj.

Event - An event is the collection of one or more outcomes of an experiment.

Exhaustive Data - Each individual, object, or measurement must appear in one of the categories.

Experiment - A test in which purposeful changes are made to input - So that we can observe and Identify Changes in Outputs of a Process.  An experiment is the observation of some activity or the act of taking some measurement. 

Experimental Error Variance - MSE = E(s^2)

Feigenbaum, Dr. Armand V. - First introduced the concept of company -wide quality control in his book Total Quality Control. More concerned with organizational structure and a systems approach to improving quality than he is with statistical methods, important, as quality improvement does not usually spring forth as a "grass roots" activity; it required a lot of management commitment to make it work. Once suggested that the technical capability be concentrated in a specialized department, this differs form the more modern view that knowledge and use of statistical tools need to be widespread.

Fixed Effects Model - E(MSTRT) = s^2 + (n(ĺtI^2))/(a-1) Where (I=1,2…a)

Fuzzy Logic - http://plato.stanford.edu/entries/logic-fuzzy  (submitted by Cyndi Cox UOP).

Grand Mean - Overall Mean

Graphical method -- from section 3-5.3 - SDV = s/(n)^1/2 @ (MSE/n)^1/2 - Draw Normal Curve with Grand average and lower and upper limits, then place other data averages on curve.

Histogram - (1) Count number of observation in the sample (2) Determine the range of values in the sample (3) Determine the class interval size (4) Establish class midpoints and boundaries (5) Determine class boundaries (6) Tally numbers of observations to fall in each class (7) Construct histogram

Class With = Range /(number of classes you want)

Class Relative Frequency = (Class Frequency) / (Total Number of Measurements (n))

Hypothesis (from Larry Buess of Trevecca) -- A hypothesis is a measurable statement about a condition you suspect exists today that supports an objective.  The hypothesis is a guess about the strength of a current condition and is usually stated as a proportion (a percent) or in some cases as a significant strength.  The hypothesis does not depend on any intervention or future events.  You know or suspect a problem or need exists today, but you have to measure how big the problem or need is..


Examples:
  
Objective:  Within 3 months after implementing the new tube filler procedure, there will be a 30% increase in production at the Oxydent toothpaste factory in Nolensville, TN.
 
The “what” of this objective is “increase in production” .   The new tube filler procedure is the intervention.  The hypothesis must support “increase in production” and not reasons for the new tube filler procedure.
 
Possible hypotheses could arise from questions such as::
 
   Why does production need to increase?
 
Hypothesis 1.1  Production at the Nolensville plant is 10% below the average of other Oxydent factories in the USA.
 
Hypothesis 1.2  Production at the Nolensville plant is 15% below production of all toothpaste factories in Tennessee.
 
Hypothesis 1.3  Production at the Nolensville plant has decreased by over 26% during the last 2 years.
 
     How do employees feel about production?
 
Hypothesis 1.4  Over 75% of the employees feel that production could be higher.
 
Hypothesis 1.5  Over 80% of the supervisors feel that production must increase.
 
      Another possible hypothesis:
 
Hypothesis 1.6  Over 30% of our Oxydent toothpaste distributors have orders that can’t be filled.

 

 

HYPOTHESIS

 

By Barbara Townsend, (UoPhx 2008)

 

            When trying to understand hypothesis it is important to know what it is.  One interpretation is; a hypothesis is a measureable statement about a condition suspected of existing today and supports an objective.

Ex.       Objective:  Within 3 months after implementing the new accounting software, there will  be a 20% increase in payroll efficiencies at the Taylor and Tyler Accounting Firm in  Selma, Alabama.

 

There could be two or more hypotheses arising from this objective.  Two will be explored.

            Hypothesis 1.1 Payroll efficiencies at the Taylor and Tyler Accounting Firm are 10% below the average when compared to other law firms in the U.S.

            Hypothesis 1.2 Payroll efficiency at the Taylor and Tyler Accounting Firm has decreased by over 15% during the last 2 years.

In determining various characteristics about a population using a sample, inferential statistics is the method used. Setting up a problem and testing hypotheses is an important part of statistical inference.  

            In the null hypothesis, Ho is presented as a theory that has been put forward but has not been proved. For example, in a clinical trial of laundry detergent, the null hypothesis might be that the new Tide 2 x extra strength laundry detergent is no better, on average, than   the regular Tide.

            Ho: there is no difference between the two detergents on average.

            H1:  the new detergent is better that the current detergent, on average.

Special consideration is given to the null hypothesis because it relates to the statement being tested. The alternative hypothesis relates to the statement to be accepted if/ when the null is rejected. In conclusion, we either Reject Ho in favor of H1 or Do not reject Ho; but never conclude Reject H1 or even Accept H1.

 

Independent Events - Events are independent if the occurrence of one event does not affect the occurrence of another.

Juran, Dr. Joseph M. -- One of the founding fathers of Statistical Quality Control. Co-authored the Quality Control Handbook. Less focused than Dr. Deming on statistical methods. Philosophy based on organization for change and the implementation of improvement through "managerial breakthrough" which is a structured problem-solving process. Management action is required to improve quality (80% of all issues). Species of Quality: - Structural, Sensory time orientated, commercial, and ethical. Two kinds of quality: fitness for use and conformance to specifications. Three basic steps: structured annual improvements combined with devotion and sense of urgency. All major problems are interdepartmental. Quality is not free. Likes quality circles. Law of diminishing returns where changes become too costly. Not in favor of single sourcing. Training for purchasing managers should include rating vendors. No such thing as improvement in general. Improvement is going to come about project by project and no other way.

10-steps to Quality Improvement
(1) Build awareness of the need and opportunity for improvement.
(2) Set goals
(3) Organize to reach the goals,
(4) Provide training
(5) Carry out projects to solve Problems
(6) Report progress
(7) give recognition,
(8) Communicate results,
(9) Keep score,
(10) Maintain momentum by making annual improvement part of the regular systems and processes of the company.

Levels of Measurement:

  1. Nominal level of data is classified into categories and cannot be arranged in any particular order.

  2. Ordinal level involves data arranged in some order.  However, the differences between data values cannot be determined or are meaningless. (Such as the which athlete is first second.)

  3. Interval level is similar to the ordinal level However, meaningful amounts of differences between data values can be determined. There is no natural zero point. (example in book is temperature.)

  4. Ratio level has an inherent zero starting point. Differences and ratios are meaningful for this level of measurement.

Mean = sum of all observations divided by the number of observations.

  1. Discrete Mean and Average

Question: What would be the effect on the mean if all observations are multiplied by the same constant? 

Answer: The mean is also multiplied by the constant. Same goes with division.

 

Mean, Mode, and Median

 

By Austine Ozubu (UoPhx 2008)

 

 

The mean of a list of numbers is the average of those numbers. Mean is calculated by adding all the numbers in the list and dividing by the number of numbers in the list.

 

Companies can use mean to calculate the average company salary as in the table below. The mean salary is higher than all but two salaries because of the salary of the manager of $60,000.

 

Table 1.0 – Employee Salaries

Employees                                                      Salaries

Secretary                                                         $12,000

Bookkeeper                                                     $19,000

Machinist, level 1                                           $15,000

Machinist, level 1                                           $15,000

Machinist, level 1                                           $15,000

Machinist, level 2                                           $18,000

Machinist, level 2                                           $18,000

Machine supervisor                                        $22,000

Sales Representative                                      $20,000

Sales Representative                                      $20,000

Owner                                                             $60,000

Total                                                               $234,000

                        Mean or Average        $21,273

 

Mode is the most frequent value. An advantage of mode is that it can be used for nonnumeric data. Mode can be used to describe the United States Senate by saying the mode sex of the senators is male and their race is Caucasian.

 

Median is the middle value of a list. If the list has an odd number of entries, the median is the middle entry in the list after sorting the list into increasing order. If the list has an even number of entries, the median is equal to the sum of the two middle entries, after sorting the numbers. A practical use of the median concept is to compute the mean weekly salary of a group of doctors.

 

In conclusion, there is no basis to judge the accuracy of a given statistical concepts or its distribution. The types of statistical information collected in the teams professional work setting and the types of data not collected that should be have been discussed in this paper. The description of the advantages of accurate interpretation of data improving decision making in work setting was discussed with an example provided on the result of positive outcomes from an analysis of data collection and statistical information.

 

Mutually exclusive - An individual, object, or measurement is included in only one category.  Events are mutually exclusive if the occurrence of any one event means that none of the others can occur at the same time.

Nonparametric - Definition quoted from ...W. J. Conover, Practical Nonparametric Statistics 2nd ed, 1980, John Wiley and Sons, Inc, page 92.

" A statistical method is nonparametric if is satisfies at least one of the following criteria:

  1. The method may be used on data with a nominal scale of measurement.

  2. The method may be used on data with an ordinal scale or measurement.

  3. The method may be used on data with an interval or ratio scale of measurement, where the distribution function on the random variable producing he data is either unspecified or specified except for the infinite number of unknown parameters."

Normal Probability Plots - Is a plot of data (either raw data or residuals) against percent cumulative normal probability. - To check assumptions of normality. If normal the data will follow a linear line with data parts near the line, which is sometimes tested by covering these points with a pencil.

Normally Distributed Populations 

Outcome - An outcome is the particular result of an experiment. 

Outliners - If there is reason to suspect that the point was not a legitimate collection, then it can be ignored. Otherwise, it points to a need for investigation.

Parameter - is a measurable characteristic of a population.

Probability - It is a measure of the likelihood of an event occurring. We can obtain probabilities by experience, subjective ways or counting

Quality - Fitness for use which will include the fit, form and function. Fitness of use is based on customer requirements. Quality is inversely proportional to variability.

Quality Characteristics - (1) Physical (Length, weight, voltage, viscosity); (2) Sensory (Taste, appearance, color); (3) Time Orientation (Reliability, durability, serviceability).

Quality Cost -- Reasons to consider 1. Increase in the complexity of manufactured products associated with advances in technology. (2) Increasing awareness of life cycle cost (maintenance, labor, spare parts, and cost of field failures). (3) Need for quality professionals to effectively communicate the cost of quality. Prevention Cost - Quality planning and engineering, New products review, product & process design, Process control, Burn-in, Training, Quality data acquisition and analysis. Appraisal Costs - Inspection and test of incoming, material , product inspection and test, materials and services consumed, maintaining accuracy of test equipment. Internal Failure Cost - Scrap, Rework, Retest, Failure Analysis, Downtime, Yield Losses, Downgrading (off-specing). External Failure Costs - complaint adjustment, Returned product/material, Warranty charges, liability, Indirect costs.

Quality, Dimensions of - 1. Performance (will the product do the intended job?); 2. Reliability (how often does the product fail?); (3) Durability (how long does the product last?); (4) Serviceability (how easy is it to repair the product?; (5) Aesthetics (what does the product look like?); (6) Features (what does the product do?); (7) Perceived Quality (what is the reputation of the company or its product?); (8) Conformance to Standards (is the product made exactly as the designer intended?); (8+1) Use (is the product used as envisioned?)

Quality Engineering the set of operational, managerial, and engineering activities that a company uses to ensure that the quality characteristics of a product are at the nominal or required levels.

Quality Improvement is the reduction of variability in processes and products.

Quality Program Evaluation - (1) Quality of materials, (2) accuracy and precision of measurement systems, (3) Knowledge, ability, techniques and support (time) to control the process over a long period of time (4) Capability of a process measured over a short period of time (5) Modifying system for ht process to ensure that the control techniques are operating properly (6) Company policies concerning continuous Improvement

Queuing - Developed by Erlang for handling telephone switchboard problems. Used arrival rates, service rates, number of servers type information to find information related to server use, average waiting times, probability of waiting times exceeding a certain limit, average length of queue, Average waiting times in queue and in system.

Randomization - Both the allocations of the experimental material and the order in which the individual runs or trials of the experiment are to be performed are random - To Eliminate Biases.

Random Effects Experiments - Have the following in common: (a) treatments are random sample from a larger population, (b) tI I=1,2,3,…a are random variables, (c) Test of hypothesis is about the variability of tI

Random Effects Model -- E(MSTRT) = s^2 + (n * sT^2)

Random Error Component - See Experimental Error Variance

Rational Subgroup - Subgroups or samples selected so that the assignable causes are present, the chance for differences between subgroups will be maximized, while the chance for differences due to these assignable causes within a subgroup will be minimized. (see blocking).

Regression Significance Test -- bo = ma; b1 = m1 -ma;; b2 = m2 - ma ;…….

Residuals - See step five above.

Replication - Duplication of the same test where all controls are the same - Used to get an estimate of error and to estimate the central tendency related to mean and variability to observe differences in the data to determine if they are really statistically different. If the sample mean is used this permits the experimenter to obtain a more precise estimate of effects.

Shewhart, Walter -- Father of SPC - While every process displays variation, some processes display controlled variation (common cause variability), while others display uncontrolled variation. Defined Special-Case Variability and Common Cause Variability.

 

Sampling

 

by Terri-Jane Hammerle (UoPhx 2008)

 

Sampling is the process of choosing a few from a larger group or population. If data can be obtained on every member of a population, sampling is not needed. If data cannot be obtained, sampling is needed from the population from which to analyze the data. The television industry uses sampling to measure what television programs people are watching. According Nielsen Media Research (2003), “We continually measure television viewing with a number of different samples all across the U.S”.

 

The first step is to develop representative samples. This must be done with a scientifically drawn random selection process. No volunteers can be accepted or the statistical accuracy of the sample would be in jeopardy. Nationally, there are 5,000 television households in which electronic meters (called People Meters) are attached to every TV set, VCR, cable converter box, satellite dish, or other video equipment in the home. The meters continually record all set tuning. In addition, we ask each member of the household to let us know when they are watching by pressing a pre-assigned button on the People Meter, which is also present. By matching this button activity to the demographic information (age/gender) we collect at the time the meters are installed, we can match the set tuning – what is being watched – with who is watching. All these data are transmitted to Nielsen Media Research's computers where they are processed and released to our customers each day.

 

Reference:

 

Nielsen Media Research (2003). FAQ-Rating Questions. Retrieved August 17, 2008 from the World Wide Web: http://www.nielsenmedia.com/FAQ/ratings.html.

 

 

 

Standard Deviation

 

by Kourtney Tharpe (UoPhx 2008)

 

The standard deviation method is one of the most commonly used statistical tools. It gives a precise measure of the amount of variation in any group of numbers. One practical use of standard deviation is to measure risk of investing in mutual funds or other investment products. It can also be used for a number of different purposes in investment decision-making. As a measure of volatility, standard deviation measures the tendency of data to be spread out.

 

When looking at the historic returns of a mutual fund, standard deviation can be used to measure the variation of expected return that has taken place in the past giving a sense of range of performance that can be expected given different probabilities of return for the future. According Harrell (1997),"When used to measure the volatility of the performance of a security or a portfolio of securities, standard deviation is generally calculated for monthly returns over a specific time period--frequently 36 months. And, because most people think about returns on an annual, not monthly, basis, the resulting number is then modified to produce an annualized standard deviation." 

 

 Reference:

 

Harrell, D. (1997). How Standard Deviation Works. Retrieved August 17, 2008 from the World Wide Web: http://www.morningstar.com.

 

 

 

 

 

Standard Deviation

by Cicely Y. Peterson (UoPhx 2008)

 

 

The formula for standard deviation is

 

 

Lower case sigma means = standard deviation
Capital sigma means = the sum of
x bar means = the mean

Standard deviation is a term used to measure the range of data around the mean value.

 

When the data is bunched tightly together and the bell-shaped curve is steep, then the standard deviation is small. When the data is spread apart and the bell is more flat, this means that you have a larger standard of deviation.

 

 
Example: 
Find the standard deviation:

4, 9, 11, 12, 17, 5, 8, 12, 14

 

(Example from - Standard Deviation. (2008). Retrieved August 18, 2008 from http://www.gcseguide.co.uk/standard_deviation.htm)

 

Step 1:

 

Find the Mean by adding all the numbers and dividing by the amount of numbers (average) 92/9= 10.222
 

Step 2:


 Subtract the mean from each number and square the result.

4-10.22=-.6.22˛=38.7,   9-10.22=1.22˛=1.49…..etc

                    4         9        11       12       17        5       8        12        14

Result       38.7   1.49    0.60    3.16    45.9   27.3   4.94    3.16     14.3

Step 3:

 

Add all of the results to get 139.55 – This is the Σ in the formula

 

Step 4:

 

 Divide by n-1, n= the amount of numbers in the problem = 139.55/8 =17.44

 

Step 5:

 

Take the square root of 17.44= 4.18


 

References

 

 

Standard Deviation. (2008). Retrieved August 18, 2008 from http://www.gcseguide.co.uk/standard_deviation.htm

 

Niles, R. (2008). Standard Deviation. Retrieved August 18, 2008 from http://www.robertniles.com/stats/stdev.shtml

 

 

 

Statistics - What is Statistics? – A Study of how to best (a) Collect Data, (b) Describe and Summarize Data, and (c) Draw Practical Conclusions Based on Data. A way to: test theories and practices in some cases related to doing work determine the characteristics of a population by using a sample Using data to make inferences under uncertainty Big Picture - Science of amassing data, taking a portion of it, and seeing what that portion tells us about the whole Little Picture - actual statistics themselves statistics with little “s” is any number that represents something else.

Statistical Process Control -- Program Elements - (1) Management Leadership, (2) A team approach. (3) Education of Employees at all levels. (4) Emphasis on Continuous Improvement. (5) A mechanism for recognizing success and communication this throughout the organization. Goal to remove all special-cause variation, so that process is predictable. Characteristic parameters like the mean, standard deviation, and probability distribution are constant. Can determine -Process Capability, if redesign is economically feasible, affect on the final product. Benefits - Increase Customer Satisfaction, Decrease scrap, rework, inspection, operation cost, Maximize productivity, Establish a predictable and consistent level of quality.

Special-Case Variability -- see Uncontrolled Variability

Special Rule of Addition - If two events A and B are mutually exclusive, the special rule of addition states that the probability of A or B occurring equals the sum of their respective probabilities: P(A or B) = P(A) + P(B)

Subjective Approach - Based on opinions could have different approaches.

Total Variability - SST

Transformations - Used to analyze data that is non-normal in its standard form (but becomes normal in the transformed form) or data that has unequal variance.

Treatment Effects - See 4.3 above

Type I Error -- a = r (reject Ho|Ho is true)

Type II Error -- b = r (fail to reject Ho|Ho is false) ---Power = 1-b = r (reject Ho|Ho is false)

Types of Data: 

  1. Qualitative or Attribute variable is nonnumeric.

  2. Quantitative variable information is numeric.

    1. Discrete variables can only assume certain values (such as counting the number of cars).

    2. Continuous variables can assume any value within a specified range (such as height of a person).

    From an example in book.

 

Uncontrolled Variability - Special-Case Variability - typically results from the influence of one or two identifiable sources. Sources also known as assignable cause variability tend to be unpredictable and may come and go. Distribution of process output is unstable and unpredictable. Removable by systematic identification and elimination on the shop floor. Examples: Operator error, inferior raw stock, over adjustment, and poor setup.

Variable Types

 

  1. Category (Less Exact)

    1. Nominal Variable (Name)

    2. Ordinal Variable (relative condition) 

      1. Ordered Categories 

      2. Ranks 

  2. Quantity

    1. Discrete Variable (countable units, integers)

    2. Continuous Variable (infinite but measurable possible events)

Variability Between Treatments - E(s^2) = (MSTRT - MSE)/n
Variability, Types of - Controlled Variation (Common Cause Variability) and Uncontrolled Variation (Special-Cause Variability)