Module
1.5 Notes |
Index to Module One Notes
|
Module 1.4 covered one of the
two classic methods of inferential statistics - confidence interval
estimation. That method is most frequently used to answer exploratory
research questions, such as "what is the average cycle time," or
"what is the average profit contribution?" Explanatory research
questions are frequently answered through the other method of inferential
statistics - the testing of hypotheses, or hypothesis-testing. We start
with a belief, claim, prediction, or assertion (hypothesis) about the parameter
of interest (in this Module we have been studying the sample mean). Then we
gather a sample, compute sample statistics, and make a conclusion about that
hypothesis. Since we are working with a sample, we have to add an appropriate
measure of reliability. The following notes cover a five-step methodology for
hypothesis-testing.
Step
One: State Null and Alternative Hypotheses
To illustrate
statistical statements of hypotheses, I'll present three hypothesis test
scenarios. In Scenario One, suppose someone believes the true average cycle
time is less than 24 days. Recall that cycle time in this illustration is the
time between when a company makes an order for material, and the time the
material is received. But cycle time is a variable closely monitored in many
activities - the time to pay an account, the time to process a customer service
request, the time to hang a bottle of blood for a blood transfusion, and so
forth.
Back to this scenario: the belief or prediction that true average cycle time is
less than 24 days is generally based upon someone's knowledge of the underlying
process. Perhaps they were involved in making an improvement to cycle time
experiences. Last year, the average cycle time may have been 24 days,
improvements were made, and this year they expect cycle time to improve (<
24 days, on average). This belief, or research hypothesis, is generally what
the analyst tries to prove or support by gathering evidence. In statistics, it
is called the alternative hypothesis, also known as the research
hypothesis (symbol Ha or you will also see H1
in some texts and journals). The hypothesis that complements the alternative is
called the null hypothesis (symbol H0), or hypothesis
of equality. The statistical hypothesis statements are written as follows:
Scenario One
Ho:
Population Mean = 24 (this is the null hypothesis)
Ha: Population Mean < 24 (this is the alternative hypothesis)
For Scenario Two, suppose someone believes the true average cycle time to be greater than 20.9 (the alternative hypothesis). Perhaps the scenario is that last year the company was using vendors who shipped by less than truckload and the average cycle time experience was 20.9 days (the null hypothesis). This year they switched to vendors who use truckload (cheaper but takes longer), thus they predict the cycle time will go up compared to last year. The null and alternative hypothesis statements are written:
Scenario Two
Ho:
Population Mean = 20.9 (this is the null hypothesis)
Ha: Population Mean > 20.9 (this is the alternative hypothesis)
For Scenario Three, now suppose last year the average cycle time was 20 and vendors were replaced so that changes in cycle time are expected but no one knows if the changes lead to increased cycle time or decreased cycle time. The alternative hypothesis would be that the cycle time is not equal to 20. The null hypothesis is that the cycle time is equal to 20. These statements for this test would be written:
Scenario Three
Ho:
Population Mean = 20 (this is the null hypothesis)
Ha: Population Mean =/= 20 (=/= is the symbol for not equal)
Note carefully that each scenario involved two statistical hypothesis statements. The first two scenarios involved directional hypothesis tests or one-tailed tests. Specifically, Scenario One is a lower-tail test: to support the alternative hypothesis, we would have to find sample means much lower than the hypothesized mean. Scenario Two is specifically called an upper-tail directional test: to support the alternative hypothesis, we would have to find sample means much higher than the hypothesized mean. The third scenario involved a non directional or two-tailed test. In all tests, the null hypothesis:
1. Contains the equal sign, thus it is sometimes referred to as the hypothesis of no difference or no effect.
Some texts write the null hypothesis as > for an alternative written as <; and < for an alternative written as > to make the null hypothesis always opposite in sign. Classical hypothesis testing simply puts the = sign in the null and that makes it easier to understand that the null is always the hypothesis of equality.
2. Is stated in specific
terms regarding what the true value of the population parameter (in this case,
the population mean) is predicted to be (24, 20.5 and 19 were the values for
these three separate scenarios).
3. Is the hypothesis to be tested. We either reject or fail to reject the null.
If we reject the null hypothesis, we do so in favor of the alternative because the evidence we have gathered supports the alternative. If we fail to reject the null hypothesis, we have insufficient evidence to support the alternative. Thus the null hypothesis "presumes innocence until proven guilty."
In all statements of hypothesis tests, the alternative hypothesis:
1. Does not
contain the equal sign.
2. Is the conclusion supported (must be true) when the null hypothesis is
rejected (proven to be false).
Please note that researchers and business data analysts would only test one set of statistical hypothesis statements to answer a specific research question with a sample of data. A typical scenario might be scenario one. Last year, the mean cycle time was 24 days before a continuous improvement program was initiated and they want to see if cycle time decreases because of the continuous improvement. I presented two other scenarios for illustration purposes.
How we know whether to fail
to reject the null hypothesis, or to reject the null in favor of the
alternative? We gather a sample set of data from the population of interest,
find the sample statistic that best estimates the population parameter under
investigation, find the probability of getting the sample statistic if the null
hypothesis is true, and make a conclusion based on the probability. The concept
is simple: in scenario one, if our sample mean comes out to be 7 we would say,
"there is no way we could get a sample mean equal to 7 if the true
population mean was equal to 24, so reject the null in favor of the
alternative." But what if the sample mean came out to be 23.999. There is
a fairly high probability of getting a sample mean of 23.9999, if the true
population mean was in fact 24, just by chance alone. In this case, we would
fail to reject the null hypothesis. In other words, we haven't gathered enough
evidence to reject the null hypothesis - the continuous improvement program did
not work - the sample mean is only different from the true population mean
because of sampling error.
While many hypothesis tests are supported by observation such as above, we
obviously need more precision in making the decision to reject or fail to
reject the null hypothesis. That precision comes in steps 2 through 5.
Step
2: Determine and Compute the Test Statistic
The general form of
the hypothesis test statistics is shown in Equation 1.5.1:
Eq. 1.5.1: Test Statistic =
(Estimator - Hypothesized Value of Estimator) / Standard Error of the Estimator
There are two test statistics for testing a population mean; the Z and the t. The Z test of hypothesis for a population mean is used when the population standard deviation is known:
Eq. 1.5.2: Z =(Sample Mean - Hypothesized Mean) divided by
[Population Standard Deviation / Sq. Rt. (n)]
The assumptions for the Z test are:
1. The Population Standard Deviation is known.
2. Numerical data is independently and randomly drawn from a population known to be normally distributed
3. If the population is not normally distributed, it can be approximated by the normal distribution as long as the sample size is large ( > 30)
Suppose we want to test the following hypotheses and know that the population standard deviation is 3:
Scenario One
Ho:
Population Mean = 24 (this is the null hypothesis)
Ha: Population Mean < 24 (this is the alternative hypothesis)
We would gather a sample, compute the sample mean and then solve for Z using Equation 1.5.2. Let's say the sample mean is 21, and the sample size is 30. Then:
Eq. 1.5.3: Z = (21 - 24) / [ ( 3 / Sq. Rt. (30) ]
=
- 3 / ( 3/ 5.5)
= - 3 / 0.55
= - 5.5
The interpretation is:
the sample mean of 21 is 5.5 standard errors less than the hypothesized mean of
24 (21 is quite far from 24 in terms of standard errors and in the direction of
the alternative hypothesis, casting doubt on the truth of the null hypothesis,
as we will see in Steps 3 - 5).
This formula could be easily constructed in an active cell on an Excel
worksheet. The cell formula would be:
= (21-24)/((3/SQRT(30)))
The other test statistic is the t. The t test is used when the population
standard deviation is unknown and must be estimated by the sample standard
deviation. Because the population standard deviation is generally unknown,
this is the more common test statistic. The formula for the t statistic is:
Eq. 1.5.4: t = (Sample Mean - Hypothesized Mean) divided by
[Sample Standard Deviation / Sq. Rt. ( n )]
The assumptions for the t test:
1. The
population standard deviation is unknown and is estimated by the sample
standard deviation.
2. Numerical data is independently and randomly drawn from a normal
distribution,
3. If the population is not normal, but not very skewed and the sample size is
large (> 30), the t distribution provides a good approximation to the
sampling distribution of the sample mean.
Note the only difference in this
formula and Eq. 1.5.2 is that we use the sample standard deviation, s, rather
than the population standard deviation. If the sample mean is 21, the sample
standard deviation is 3, the sample size is 30, and the hypothesized value of
the population mean is 24, the t statistic has a value of - 5.5 similar to the
result for Z in Eq. 1.5.3. Any difference in the Z and the t will appear when
we compute probabilities in Step 3, although with large sample sizes, the Z and
the t are identical, as was noted in Module 1.4 Notes.
Before we compute the probabilities, let's compute the values of the test
statistics for Scenarios Two and Three. For each scenario, we will assume that
the population standard deviation and the sample standard deviation are the
same (3), the sample size is 30, and the sample mean is 21. Since the
population and sample standard deviations are assumed equal, the Z and the t
values will be equal.
Scenario Two
Eq. 1.5.5: Z = t = (21 - 20.9) / [ ( 3 / Sq. Rt. (30) ]
=
0.1 / ( 3/ 5.5)
= 0.1 / 0.55
= 0.18
The interpretation: the sample mean of 21 is just 0.18 standard errors from the hypothesized mean of 20.9 (20.9 is a reasonable expectation if the null hypothesis is indeed true, as we will see in Steps 3 - 5).
Scenario Three
Eq. 1.5.6: Z = t = (21 - 20) / [ ( 3 / Sq. Rt. (30) ]
=
1.0 / ( 3/ 5.5)
= 1.0 / 0.55
= 1.82
The interpretation: the sample mean of 21 is 1.82 standard errors from the hypothesized mean of 20 (since it isn't a clear case of rejecting the null hypothesis as in Scenario One, or failing to reject the null hypothesis as in Scenario Two, we need the precision of Step 3 to make the decision). Please note that since Scenario Three is a two-tailed test, we have to consider both the possibility of getting a Z or a t equal to 1.82 or -1.82.
Step
3: Find Probability of Test Statistics (p-Value)
At this point, we
want to know the probability of obtaining a test statistic as small as the
calculated statistic (for < directional alternative hypothesis tests such as
the Scenario One example); the probability of obtaining a test statistic as
large as the calculated statistic (for > directional alternative hypothesis
tests such as the Scenario Two example); or the probability of obtaining a test
statistic as large or as small as the calculated test statistic (for non
directional =/= alternative hypothesis tests such as the Scenario Three
example). In hypothesis testing, these probability values are called p-values.
I should point out that I am following the p-value approach to hypothesis
testing to focus on the approach most widely used in the literature, rather
than the Critical Value approach provided in some texts (Anderson, Sweeney ,
Williams, pp. 334-337, Chapter 9).
Probability tables for finding p-values are built into Excel . For
probabilities associated with Z test statistics (Z-Scores), select an active
cell in an open worksheet, select Insert from the Standard Toolbar, then
Function, Statistical, NORMSDIST, and then enter the Z-Score to get the
cumulative probability up to the Z-Score. You may recall the NORMSDIST function
from Module 1.3 Notes.
p-Values for Z Test
Statistics
Scenario One
Eq. 1.5.7: =NORMSDIST(-5.5)
This equation is what you enter in an active cell on an Excel worksheet to get Probability (Z < -5.5) for this one-tail test. This is equivalent to stating Probability(Sample Mean < 21 given the true mean is 24). Excel returns 1.9E-08 in the active cell.
Interpretation: 1.9 E -08 is scientific notation, meaning move the
decimal point eight digits to the left giving 0.000000019. This says the
probability of getting a Z-Score of less than -5.5 is 0.000000019, a very small
probability. Remember, the Z-value of -5.5 really represents the number of
standard errors the sample mean of 21 is from the hypothesized mean of 24.
Thus, the probability of us getting a sample mean of 21 is relatively low if
the null hypothesis is true (population mean = 24); so the null hypothesis must
be rejected in favor of the alternative based on evidence in this sample. We
will put more precision in determining what is "relatively low" in
Step 4.
Scenario Two
Eq. 1.5.8: =1 - NORMSDIST(0.18)
This equation is what you enter in an active cell of an Excel worksheet to get Probability(Z > 0.18) for this one tail test. This is equivalent to Probability (Sample Mean > 21 given the true mean is 20.9). Excel returns a p-value of 0.43 in the active cell. Note that since the NORMSDIST function returns a cumulative probability up to the Z-Score, to get the cumulative probability above the Z-Score we have to use =1 - NORMSDIST(0.18) for this upper-tail test since we are interested in probabilities above the Z-Score of 0.18.
Interpretation: The probability of obtaining a sample mean of 21 is relatively high if the null hypothesis is true (population mean = 20.9); so the null hypothesis cannot be rejected beyond a shadow of a doubt based on the evidence of this sample. As with Scenario One, we will put more precision in determining what is "relatively high" in Step 4.
Scenario Three
Eq. 1.5.9: =2 * (1-NORMSDIST(ABS(1.82))
This equation is what you enter in an active cell of an Excel worksheet to get Probability(Z > 1.82 or Z < -1.82) for this two-tail test. This is equivalent to Probability (Sample Mean > 21 or < 19 given the true mean is 20). Excel returns a p-value of 0.0688 in the active cell.
Interpretation: The probability of obtaining a sample mean of 21 is 3.44 % if the null hypothesis is true (population mean = 20). But since we are doing a two-tail test, we have to multiply 3.44% times 2 since we could just as likely get another sample mean 1.82 standard errors to the left of the hypothesized mean. Note that I used the absolute value function nested within the NORMSDIST function to give you a formula that would work for two-tail tests no matter if Z came out to be positive or negative. Further note, to determine if 6.88% is relatively high or low, we need the precision to be presented in Step 4. Before doing this, we need to compute the p-values for the t statistics.
P-values for t Test Statistics
To get probability p-values for the t test statistic from the t
distribution, we use the TDIST function of Microsoft Excel. Select an active
cell for the p-value, and then select Insert from the Standard Toolbar, Function,
Statistical, and TDIST. Note that the TDIST function requires the
absolute value of the t statistic we computed in Step 2, the degrees of freedom
which is sample size minus one, and whether the test is one- or two-tails.
Scenario One
Eq. 1.5.10: =TDIST(5.5, 30-1,1)
When =TDIST(5.5,30-1,1) is entered in an active cell to get the p-value associated with the t statistic. Excel returns 3.16E-06, or 0.00000316. This probability would be interpreted similar to the p-value for the Z test statistic interpreted in Eq. 1.5.7. Note that the t value is always entered as a positive number in the TDIST function.
Scenario Two
Eq. 1.5.11: =TDIST(0.18, 30-1,1)
When =TDIST(0.18,30-1,1) is entered in the active cell, Excel returns 0.43. This probability would be interpreted similar to the p-value for the Z test statistic interpreted in Eq. 1.5.8. Note that you do not have to enter =1 - before the function as was done in Eq. 1.5.8 since the t Table in Excel was only constructed for tail probabilities.
Scenario Three
Eq. 1.5.12: = TDIST(1.82,30-1,2)
When =TDIST(0.18,30-1,2) is entered in an active cell, Excel returns 0.079. This p-value would be interpreted similar to the p-value for the Z test statistic interpreted in Eq. 1.5.9. Note that you do not have to multiple the p-value by 2 since for the t distribution, the number of tails for the test statistic is part of the function.
Have you noticed that the Z and the t values and probabilities are similar?
They will be identical at really large sample sizes (above 120) and nearly
identical at large sample sizes (30 or more). They will also be closer near the
peak of the bell-shaped distribution, where probability values are closes to
0.50. Note that in Scenario Two, the p-values were identical at 0.43.
Step
4: Determine the Level of Statistical Significance
In the above
equations, I have provided practical interpretations of low or high p-values
associated with the Z or t test statistics. When the p-value was low, we
rejected the null hypothesis in favor of the alternative. In hypothesis
testing, this would indicate that the analysis is statistically significant.
Scientific convention has established that in order to declare the result
of a hypothesis test statistically significant, there can be no more than a 5%
likelihood that the difference is due to chance (D. Sheskin, 1997). The 5%
threshold is referred to as the level of significance. Knowing the level
of significance for a study, we can now present a simple decision rule for
rejecting or failing to reject the null hypothesis.
When
the p-value is < 0.05, reject the null hypothesis. With such a low probability for the
p-value, there is little likelihood that the observed difference between the
sample mean and hypothesized mean is due to chance - it must be do to some
program, process change, intervention or other effect.
When the p-value is > 0.05, fail to reject the null hypothesis. There
is a high probability for the p-value that the observed difference between the
sample mean and the hypothesized mean is so small that it must be do to chance
involved in sampling error.
While that is the basics, let's examine the alpha level of significance in some more detail. Since we are working with a sample we can make two errors in hypothesis testing:
Type I
Error: Rejecting a
true null hypothesis. In hypothesis testing, the probability of making a type
one error is labeled alpha, the level of significance.
Type II Error: Failing to reject a false null hypothesis. The
probability of making a type two error is labeled beta.
The complementary probabilities are:
Confidence
Coefficient: Failing
to reject a true null hypothesis. This probability is labeled (1 - alpha). We
already saw this in Module 1.4 Notes - it is the basis of the confidence
interval. An alpha level of significance of 0.05 provides a 95% confidence
coefficient.
Power: Rejecting a false null hypothesis. This probability is labeled (1
- beta).
The interested reader is
referred to the
I close this step by saying that when a researcher believes that alpha = 0.05
is too high, they may elect to employ a 1 % level of significance, or even
lower in some cases of medical research. The lower the level of significance,
the less likely one would be to reject the null hypothesis and conclude that
the research project is successful. While 0.05 is common in business
applications, it is a matter of judgment. When the consequences of making a
Type I error are really much more severe than the consequences associated with
a Type II error, then researchers switch to the more conservative alpha = 0.01.
This increases the beta probability which in turn lowers the power of the test
so researchers recognize the tradeoffs. We will adopt the tradition of using 5%
levels of significance for hypothesis testing.
Step 5:
Making the Hypothesis Test Conclusion
The final step puts it
all together with a three part conclusion:
1.
Compare the p-value to alpha.
2. Based on the comparison, state whether to reject or fail to reject the null
hypothesis.
3. Express the statistical decision in terms of the particular situation or
scenario.
Here is the application of the three-part hypothesis test conclusion to our scenarios.
Scenario One
Z
test: Since the p-value of 0.000000019 is < alpha, reject the null
hypothesis, and conclude the population mean is less than 24 days.
t test: Since the p-value of 0.00000316 is < alpha, reject the null
hypothesis, and conclude the population mean is less than 24 days.
Scenario Two
The Z and t test had same p-values: Since the p-value of 0.43 is > alpha, fail to reject the null hypothesis, and conclude that the population is equal to 20.9. To take into account the possibility of a Type II error, the statisticians prefer this statement: there is no evidence that the population mean cycle time is different from 20.9. I don't think the precise wording is as important as the care needed in conducting the analysis.
Scenario Three
Z
test: Since the p-value of 0.688 is > alpha, fail to reject the null
hypothesis, and conclude that the population mean is equal to 20 (again,
there is no evidence that the average cycle time is different from 20).
t test: Since the p-value of 0.79 is > alpha, fail to reject the null
hypothesis, and conclude that the population mean is equal to 20.
I like the three-part conclusion
since it satisfies the statistician with "good science practice," and
the business person since the conclusion is also in "English." When
one reads Research Level I publications, you often simply see p < 0.05 for
the conclusion. That is short hand for: since the p-value is less than alpha of
0.05, reject the null hypothesis in favor of the alternative, and conclude ....
A Note
on Comparing the Confidence Interval to the Two-Tail Hypothesis Test
Recall in Module 1.4 that
the 95% confidence interval for the population mean came out to be 21 +
1.1 or 19.9 to 22.1. In the two-tail hypothesis test of Scenario Three, the
hypothesized mean of 20.0 falls within the range of 19.9 to 22.1. Since this
range includes 20.0, we cannot refute the statement that the population mean is
equal to 20. When the hypothesized mean falls outside the confidence interval,
the p-value of the hypothesis test will be less than the significance level of
0.05 and we will reject the null hypothesis. For example, suppose the null
hypothesis is that the true population mean is 18. This value falls outside the
confidence interval range of 19.9 to 22.1, so we reject the null hypothesis and
conclude that the true population mean is not equal to 18. The p-value for the
hypothesis test will be < 0.05 in this example.
Ethical
Issues
Remember that we are
making inferences based on a sample, and it is assumed that the sample is
unbiased without measurement error. Further, when we report the findings of a
hypothesis test, we need to be as complete as possible so that our study can be
replicated if need be.
I just heard a news report that the famous "Mozart Effect" study done
in 1993 is being disputed. That study presented the hypothesis that classical
music in the background would improve student problem-solving performance on
certain categories of problems involving temporal and spatial dimensions. It
has led to many extensions (playing classical music to babies to make them
"smarter," etc.). This year, researchers at several universities
tried to replicate the results and could not (they failed to reject the null
hypothesis of no difference in performance). The original researcher claimed
this in a news report the week of August 23, 1999, that the replications did
not follow the original data collection method. That researcher is on somewhat
shaky ground however, since the original study involved a convenience sample of
upper division college students. There is nothing wrong with using convenience
samples but one's conclusions cannot be made beyond that
"population". Certainly not to infants.
The other ethical issue involves data snooping. One cannot look at the data,
test statistic values and related p-values and then decide to use a one-
or two-tail test. Recall in Scenario Three that the two-tail p-value was
0.0688, and we failed to reject the null hypothesis at alpha of 0.05. But, if
we used a one-tail test, we would have rejected the null hypothesis at alpha
level of 0.05 since the p-value was 0.0344.
Good science includes establishing your hypothesis, setting the level of
significance, and collecting the data before the p-values are compared
to alpha and the conclusion is reached.
Summary of Excel
Formulas:
Worksheet 1.5.1 provides important Excel functions we covered in Module
1. Pay attention to the sign of the t
and Z scores and the direction of your test of hypothesis before drawing the
final conclusion. When you use Excel
formulas to compute the p-value, you are asked for the number of tails in your
test (it can be upper or lower tail (sometimes indicated as right or left tail
respectively), or both (2-tail test); to find out which to use, you must refer
back to the alternative hypothesis Ha, which will indicate the direction and
number of tails in the test.
Worksheet 1.5.1
Excel
Summary |
=AVERAGE(array) |
=STDEV(array) |
=QUARTILE(array,1)---1
for Q1,3 for Q3, etc. |
=PERCENTILE(array,k)----kth
percentile in decimal |
=STANDARDIZE(x,mean,stdev)----compute
Z score |
=NORMSINV(probability)---returns
Z score, negative or positive =NORMSINV(confidence
level + alpha/2)-- returns Z score for a desired confidence level |
=NORMSDIST(Z
score)---prob of a less than Z occur |
=NORMDIST(x,mean,stdev,TRUE)-----probability
of less than x occur |
=1-NORMSDIST(Z)---probability
of a score exceed Z |
=CONFIDENCE(alpha,stdev,sample
size)---sampling error computed for a Normal dist |
=TINV(alpha,n-1)---computes
t value for a given alpha and sample size |
P-VALUES
IN Excel |
for
a Z test |
Lower (Left) tail test: |
=NORMSDIST(Z) |
Upper (Right) tail test AND: |
=1-NORMSDIST(Z) |
2-tail test:
|
=2*(1-NORMSDIST(ABS(Z)) |
|
for
a t test |
Lower (Left) tail test:: |
t-value is positiveà =1-TDIST(ABS(t-value),n-1*,number
of tails**) |
t-value is negativeà =TDIST(ABS(t-value),n-1,number of tails**) |
Upper (Right) tail test AND: |
t-value is positive à =TDIST(ABS(t-value),n-1,number of tails**) |
t-value is negativeà =1-TDIST(ABS(t-value),n-1,number of
tails**) |
|
2-tail test:
|
=TDIST(ABS(t-value),n-1, number of
tails***) |
* where n
= sample size and n-1 =
degrees of freedom |
** number of tails = 1 |
*** number of tails = 2 |
References:
Anderson, D., Sweeney, D.,
& Williams , T. (2006). Essentials of Modern Business Statistics for
Business with Microsoft Excel.
D.
Groebner, P. Shannon, P. Fry & K. Smith.
Business Statistics: A Decision Making Approach, Seventh Edition,
Prentice Hall, Chapter 8 & 9.
Ken
Black. Business Statistics for Contemporary Decision Making. Fourth Edition,
Wiley. Chapter 9
Sheskin,
D. (1997). Handbook of Parametric and Non parametric Statistical Procedures.
| Return to Module Overview | Return to top of page |