"Testing Hypothesis" 
Index to Module One Notes 
Module 1.4 covered one of the two classic
methods of inferential statistics  confidence interval
estimation. That method is most frequently used to answer
exploratory research questions, such as "what is the average
cycle time," or "what is the average profit contribution?"
Explanatory research questions are frequently answered through
the other method of inferential statistics  the testing of
hypotheses, or hypothesistesting. We start with a belief, claim,
prediction, or assertion (hypothesis) about the parameter of interest
(in this Module we have been studying the sample mean). Then we
gather a sample, compute sample statistics, and make a conclusion
about that hypothesis. Since we are working with a sample, we have to
add an appropriate measure of reliability. The following notes cover
a fivestep methodology for hypothesistesting.
Step One: State Null and
Alternative Hypotheses
To illustrate statistical statements of hypotheses, I'll present
three hypothesis test scenarios. In Scenario One, suppose someone
believes the true average cycle time is less than 24 days. Recall
that cycle time in this illustration is the time between when a
company makes an order for material, and the time the material is
received. But cycle time is a variable closely monitored in many
activities  the time to pay an account, the time to process a
customer service request, the time to hang a bottle of blood for a
blood transfusion, and so forth.
Back to this scenario: the belief or prediction that true average
cycle time is less than 24 days is generally based upon someone's
knowledge of the underlying process. Perhaps they were involved in
making an improvement to cycle time experiences. Last year, the
average cycle time may have been 24 days, improvements were made, and
this year they expect cycle time to improve (< 24 days, on
average). This belief, or research hypothesis, is generally what the
analyst tries to prove or support by gathering evidence. In
statistics, it is called the alternative hypothesis, also
known as the research hypothesis (symbol H_{a}
or you will also see H_{1} in some texts
and journals). The hypothesis that complements the alternative is
called the null hypothesis (symbol H_{0}), or
hypothesis of equality. The statistical hypothesis statements are
written as follows:
Scenario OneH_{o}: Population Mean = 24 (this is the null hypothesis)
H_{a}: Population Mean < 24 (this is the alternative hypothesis)
For Scenario Two, suppose someone believes the true average cycle time to be greater than 20.9 (the alternative hypothesis). Perhaps the scenario is that last year the company was using vendors who shipped by less than truckload and the average cycle time experience was 20.9 days (the null hypothesis). This year they switched to vendors who use truckload (cheaper but takes longer), thus they predict the cycle time will go up compared to last year. The null and alternative hypothesis statements are written:
Scenario TwoH_{o}: Population Mean = 20.9 (this is the null hypothesis)
H_{a}: Population Mean > 20.9 (this is the alternative hypothesis)
For Scenario Three, now suppose last year the average cycle time was 20 and vendors were replaced so that changes in cycle time are expected but no one knows if the changes lead to increased cycle time or decreased cycle time. The alternative hypothesis would be that the cycle time is not equal to 20. The null hypothesis is that the cycle time is equal to 20. These statements for this test would be written:
Scenario ThreeH_{o}: Population Mean = 20 (this is the null hypothesis)
H_{a}: Population Mean =/= 20 (=/= is the symbol for not equal)
Note carefully that each scenario involved two statistical hypothesis statements. The first two scenarios involved directional hypothesis tests or onetailed tests. Specifically, Scenario One is a lowertail test: to support the alternative hypothesis, we would have to find sample means much lower than the hypothesized mean. Scenario Two is specifically called an uppertail directional test: to support the alternative hypothesis, we would have to find sample means much higher than the hypothesized mean. The third scenario involved a non directional or twotailed test. In all tests, the null hypothesis:
1. Contains the equal sign, thus it is sometimes referred to as the hypothesis of no difference or no effect.Some texts write the null hypothesis as > for an alternative written as <; and < for an alternative written as > to make the null hypothesis always opposite in sign. Classical hypothesis testing simply puts the = sign in the null and that makes it easier to understand that the null is always the hypothesis of equality.2. Is stated in specific terms regarding what the true value of the population parameter (in this case, the population mean) is predicted to be (24, 20.5 and 19 were the values for these three separate scenarios).
3. Is the hypothesis to be tested. We either reject or fail to reject the null.If we reject the null hypothesis, we do so in favor of the alternative because the evidence we have gathered supports the alternative. If we fail to reject the null hypothesis, we have insufficient evidence to support the alternative. Thus the null hypothesis "presumes innocence until proven guilty."
In all statements of hypothesis tests, the alternative hypothesis:
1. Does not contain the equal sign.
2. Is the conclusion supported (must be true) when the null hypothesis is rejected (proven to be false).
Please note that researchers and business data analysts would only test one set of statistical hypothesis statements to answer a specific research question with a sample of data. A typical scenario might be scenario one. Last year, the mean cycle time was 24 days before a continuous improvement program was initiated and they want to see if cycle time decreases because of the continuous improvement. I presented two other scenarios for illustration purposes.
How we know whether to fail to reject the null
hypothesis, or to reject the null in favor of the alternative? We
gather a sample set of data from the population of interest, find the
sample statistic that best estimates the population parameter under
investigation, find the probability of getting the sample statistic
if the null hypothesis is true, and make a conclusion based on the
probability. The concept is simple: in scenario one, if our sample
mean comes out to be 7 we would say, "there is no way we could get a
sample mean equal to 7 if the true population mean was equal to 24,
so reject the null in favor of the alternative." But what if the
sample mean came out to be 23.999. There is a fairly high probability
of getting a sample mean of 23.9999, if the true population mean was
in fact 24, just by chance alone. In this case, we would fail to
reject the null hypothesis. In other words, we haven't gathered
enough evidence to reject the null hypothesis  the continuous
improvement program did not work  the sample mean is only different
from the true population mean because of sampling error.
While many hypothesis tests are supported by observation such as
above, we obviously need more precision in making the decision to
reject or fail to reject the null hypothesis. That precision comes in
steps 2 through 5.
Step 2: Determine and Compute
the Test Statistic
The general form of the hypothesis test
statistics is shown in Equation 1.5.1:
Eq. 1.5.1: Test Statistic =(Estimator  Hypothesized Value of Estimator) / Standard Error of the Estimator
There are two test statistics for testing a population mean; the Z and the t. The Z test of hypothesis for a population mean is used when the population standard deviation is known:
Eq. 1.5.2: Z =(Sample Mean  Hypothesized Mean) divided by[Population Standard Deviation / Sq. Rt. (n)]
The assumptions for the Z test are:
1. The Population Standard Deviation is known.2. Numerical data is independently and randomly drawn from a population known to be normally distributed
3. If the population is not normally distributed, it can be approximated by the normal distribution as long as the sample size is large ( > 30)
Suppose we want to test the following hypotheses and know that the population standard deviation is 3:
Scenario OneH_{o}: Population Mean = 24 (this is the null hypothesis)
H_{a}: Population Mean < 24 (this is the alternative hypothesis)
We would gather a sample, compute the sample mean and then solve for Z using Equation 1.5.2. Let's say the sample mean is 21, and the sample size is 30. Then:
Eq. 1.5.3: Z = (21  24) / [ ( 3 / Sq. Rt. (30) ]=  3 / ( 3/ 5.5)
=  3 / 0.55
=  5.5The interpretation is: the sample mean of 21 is 5.5 standard errors less than the hypothesized mean of 24 (21 is quite far from 24 in terms of standard errors and in the direction of the alternative hypothesis, casting doubt on the truth of the null hypothesis, as we will see in Steps 3  5).
This formula could be easily constructed in an active cell on an Excel worksheet. The cell formula would be:= (2124)/((3/SQRT(30)))
The other test statistic is the t. The t test is used when the
population standard deviation is unknown and must be estimated by the
sample standard deviation. Because the population standard
deviation is generally unknown, this is the more common test
statistic. The formula for the t statistic is:
Eq. 1.5.4: t = (Sample Mean  Hypothesized Mean) divided by[Sample Standard Deviation / Sq. Rt. ( n )]
The assumptions for the t test:
1. The population standard deviation is unknown and is estimated by the sample standard deviation.
2. Numerical data is independently and randomly drawn from a normal distribution,
3. If the population is not normal, but not very skewed and the sample size is large (> 30), the t distribution provides a good approximation to the sampling distribution of the sample mean.
Note the only difference in this formula and Eq.
1.5.2 is that we use the sample standard deviation, s, rather than
the population standard deviation. If the sample mean is 21, the
sample standard deviation is 3, the sample size is 30, and the
hypothesized value of the population mean is 24, the t statistic has
a value of  5.5 similar to the result for Z in Eq. 1.5.3. Any
difference in the Z and the t will appear when we compute
probabilities in Step 3, although with large sample sizes, the Z and
the t are identical, as was noted in Module 1.4 Notes.
Before we compute the probabilities, let's compute the values of the
test statistics for Scenarios Two and Three. For each scenario, we
will assume that the population standard deviation and the sample
standard deviation are the same (3), the sample size is 30, and the
sample mean is 21. Since the population and sample standard
deviations are assumed equal, the Z and the t values will be equal.
Scenario TwoEq. 1.5.5: Z = t = (21  20.9) / [ ( 3 / Sq. Rt. (30) ]= 0.1 / ( 3/ 5.5)
= 0.1 / 0.55
= 0.18The interpretation: the sample mean of 21 is just 0.18 standard errors from the hypothesized mean of 20.9 (20.9 is a reasonable expectation if the null hypothesis is indeed true, as we will see in Steps 3  5).
Scenario Three
Eq. 1.5.6: Z = t = (21  20) / [ ( 3 / Sq. Rt. (30) ]= 1.0 / ( 3/ 5.5)
= 1.0 / 0.55
= 1.82The interpretation: the sample mean of 21 is 1.82 standard errors from the hypothesized mean of 20 (since it isn't a clear case of rejecting the null hypothesis as in Scenario One, or failing to reject the null hypothesis as in Scenario Two, we need the precision of Step 3 to make the decision). Please note that since Scenario Three is a twotailed test, we have to consider both the possibility of getting a Z or a t equal to 1.82 or 1.82.
Step 3: Find Probability of
Test Statistics (pValue)
At this point, we want to know the
probability of obtaining a test statistic as small as the calculated
statistic (for < directional alternative hypothesis tests such as
the Scenario One example); the probability of obtaining a test
statistic as large as the calculated statistic (for > directional
alternative hypothesis tests such as the Scenario Two example); or
the probability of obtaining a test statistic as large or as small as
the calculated test statistic (for non directional =/= alternative
hypothesis tests such as the Scenario Three example). In hypothesis
testing, these probability values are called pvalues. I should point
out that I am following the pvalue approach to hypothesis testing to
focus on the approach most widely used in the literature, rather than
the Critical Value approach provided in some texts (Anderson, Sweeney
, Williams, pp. 334337, Chapter 9).
Probability tables for finding pvalues are built into Excel . For
probabilities associated with Z test statistics (ZScores), select an
active cell in an open worksheet, select Insert from the
Standard Toolbar, then Function, Statistical, NORMSDIST, and
then enter the ZScore to get the cumulative probability up to the
ZScore. You may recall the NORMSDIST function from Module 1.3 Notes.
pValues for Z Test Statistics
Scenario OneEq. 1.5.7: =NORMSDIST(5.5)This equation is what you enter in an active cell on an Excel worksheet to get Probability (Z < 5.5) for this onetail test. This is equivalent to stating Probability(Sample Mean < 21 given the true mean is 24). Excel returns 1.9E08 in the active cell.
Interpretation: 1.9 E 08 is scientific notation, meaning move the decimal point eight digits to the left giving 0.000000019. This says the probability of getting a ZScore of less than 5.5 is 0.000000019, a very small probability. Remember, the Zvalue of 5.5 really represents the number of standard errors the sample mean of 21 is from the hypothesized mean of 24. Thus, the probability of us getting a sample mean of 21 is relatively low if the null hypothesis is true (population mean = 24); so the null hypothesis must be rejected in favor of the alternative based on evidence in this sample. We will put more precision in determining what is "relatively low" in Step 4.
Scenario TwoEq. 1.5.8: =1  NORMSDIST(0.18)This equation is what you enter in an active cell of an Excel worksheet to get Probability(Z > 0.18) for this one tail test. This is equivalent to Probability (Sample Mean > 21 given the true mean is 20.9). Excel returns a pvalue of 0.43 in the active cell. Note that since the NORMSDIST function returns a cumulative probability up to the ZScore, to get the cumulative probability above the ZScore we have to use =1  NORMSDIST(0.18) for this uppertail test since we are interested in probabilities above the ZScore of 0.18.
Interpretation: The probability of obtaining a sample mean of 21 is relatively high if the null hypothesis is true (population mean = 20.9); so the null hypothesis cannot be rejected beyond a shadow of a doubt based on the evidence of this sample. As with Scenario One, we will put more precision in determining what is "relatively high" in Step 4.
Scenario Three
Eq. 1.5.9: =2 * (1NORMSDIST(ABS(1.82))This equation is what you enter in an active cell of an Excel worksheet to get Probability(Z > 1.82 or Z < 1.82) for this twotail test. This is equivalent to Probability (Sample Mean > 21 or < 19 given the true mean is 20). Excel returns a pvalue of 0.0688 in the active cell.
Interpretation: The probability of obtaining a sample mean of 21 is 3.44 % if the null hypothesis is true (population mean = 20). But since we are doing a twotail test, we have to multiply 3.44% times 2 since we could just as likely get another sample mean 1.82 standard errors to the left of the hypothesized mean. Note that I used the absolute value function nested within the NORMSDIST function to give you a formula that would work for twotail tests no matter if Z came out to be positive or negative. Further note, to determine if 6.88% is relatively high or low, we need the precision to be presented in Step 4. Before doing this, we need to compute the pvalues for the t statistics.
Pvalues for t Test Statistics
To get probability pvalues for the t test statistic from the t
distribution, we use the TDIST function of Microsoft Excel. Select an
active cell for the pvalue, and then select Insert from the
Standard Toolbar, Function, Statistical, and
TDIST. Note that the TDIST function requires the absolute
value of the t statistic we computed in Step 2, the degrees of
freedom which is sample size minus one, and whether the test is one
or twotails.
Scenario OneEq. 1.5.10: =TDIST(5.5, 301,1)When =TDIST(5.5,301,1) is entered in an active cell to get the pvalue associated with the t statistic. Excel returns 3.16E06, or 0.00000316. This probability would be interpreted similar to the pvalue for the Z test statistic interpreted in Eq. 1.5.7. Note that the t value is always entered as a positive number in the TDIST function.
Scenario TwoEq. 1.5.11: =TDIST(0.18, 301,1)When =TDIST(0.18,301,1) is entered in the active cell, Excel returns 0.43. This probability would be interpreted similar to the pvalue for the Z test statistic interpreted in Eq. 1.5.8. Note that you do not have to enter =1  before the function as was done in Eq. 1.5.8 since the t Table in Excel was only constructed for tail probabilities.
Scenario ThreeEq. 1.5.12: = TDIST(1.82,301,2)When =TDIST(0.18,301,2) is entered in an active cell, Excel returns 0.079. This pvalue would be interpreted similar to the pvalue for the Z test statistic interpreted in Eq. 1.5.9. Note that you do not have to multiple the pvalue by 2 since for the t distribution, the number of tails for the test statistic is part of the function.
Have you noticed that the Z and the t values and probabilities are
similar? They will be identical at really large sample sizes (above
120) and nearly identical at large sample sizes (30 or more). They
will also be closer near the peak of the bellshaped distribution,
where probability values are closes to 0.50. Note that in Scenario
Two, the pvalues were identical at 0.43.
Step 4: Determine the Level of
Statistical Significance
In the above equations, I have provided
practical interpretations of low or high pvalues associated with the
Z or t test statistics. When the pvalue was low, we rejected the
null hypothesis in favor of the alternative. In hypothesis testing,
this would indicate that the analysis is statistically
significant. Scientific convention has established that in order
to declare the result of a hypothesis test statistically significant,
there can be no more than a 5% likelihood that the difference is due
to chance (D. Sheskin, 1997). The 5% threshold is referred to as the
level of significance. Knowing the level of significance for a
study, we can now present a simple decision rule for rejecting or
failing to reject the null hypothesis.
When the pvalue is < 0.05, reject the null hypothesis. With such a low probability for the pvalue, there is little likelihood that the observed difference between the sample mean and hypothesized mean is due to chance  it must be do to some program, process change, intervention or other effect.
When the pvalue is > 0.05, fail to reject the null hypothesis. There is a high probability for the pvalue that the observed difference between the sample mean and the hypothesized mean is so small that it must be do to chance involved in sampling error.
While that is the basics, let's examine the alpha level of significance in some more detail. Since we are working with a sample we can make two errors in hypothesis testing:
Type I Error: Rejecting a true null hypothesis. In hypothesis testing, the probability of making a type one error is labeled alpha, the level of significance.
Type II Error: Failing to reject a false null hypothesis. The probability of making a type two error is labeled beta.
The complementary probabilities are:
Confidence Coefficient: Failing to reject a true null hypothesis. This probability is labeled (1  alpha). We already saw this in Module 1.4 Notes  it is the basis of the confidence interval. An alpha level of significance of 0.05 provides a 95% confidence coefficient.
Power: Rejecting a false null hypothesis. This probability is labeled (1  beta).
The interested reader is referred to the
Anderson, Sweeney and Williams optional reference for additional
details. For our application, remember the simple decision rule. When
the pvalue is < alpha = 0.05, reject the null hypothesis; when
the pvalue is > alpha = 0.05, fail to reject the null
hypothesis.
I close this step by saying that when a researcher believes that
alpha = 0.05 is too high, they may elect to employ a 1 % level of
significance, or even lower in some cases of medical research. The
lower the level of significance, the less likely one would be to
reject the null hypothesis and conclude that the research project is
successful. While 0.05 is common in business applications, it is a
matter of judgment. When the consequences of making a Type I error
are really much more severe than the consequences associated with a
Type II error, then researchers switch to the more conservative alpha
= 0.01. This increases the beta probability which in turn lowers the
power of the test so researchers recognize the tradeoffs. We will
adopt the tradition of using 5% levels of significance for hypothesis
testing.
Step 5: Making the Hypothesis
Test Conclusion
The final step puts it all together with a
three part conclusion:
1. Compare the pvalue to alpha.
2. Based on the comparison, state whether to reject or fail to reject the null hypothesis.
3. Express the statistical decision in terms of the particular situation or scenario.
Here is the application of the threepart hypothesis test conclusion to our scenarios.
Scenario OneZ test: Since the pvalue of 0.000000019 is < alpha, reject the null hypothesis, and conclude the population mean is less than 24 days.
t test: Since the pvalue of 0.00000316 is < alpha, reject the null hypothesis, and conclude the population mean is less than 24 days.Scenario Two
The Z and t test had same pvalues: Since the pvalue of 0.43 is > alpha, fail to reject the null hypothesis, and conclude that the population is equal to 20.9. To take into account the possibility of a Type II error, the statisticians prefer this statement: there is no evidence that the population mean cycle time is different from 20.9. I don't think the precise wording is as important as the care needed in conducting the analysis.Scenario Three
Z test: Since the pvalue of 0.688 is > alpha, fail to reject the null hypothesis, and conclude that the population mean is equal to 20 (again, there is no evidence that the average cycle time is different from 20).
t test: Since the pvalue of 0.79 is > alpha, fail to reject the null hypothesis, and conclude that the population mean is equal to 20.
I like the threepart conclusion since it
satisfies the statistician with "good science practice," and the
business person since the conclusion is also in "English." When one
reads Research Level I publications, you often simply see p < 0.05
for the conclusion. That is short hand for: since the pvalue is less
than alpha of 0.05, reject the null hypothesis in favor of the
alternative, and conclude ....
A Note on Comparing the
Confidence Interval to the TwoTail Hypothesis Test
Recall in Module 1.4 that the 95%
confidence interval for the population mean came out to be 21
+ 1.1 or 19.9 to 22.1. In the twotail hypothesis test of
Scenario Three, the hypothesized mean of 20.0 falls within the range
of 19.9 to 22.1. Since this range includes 20.0, we cannot refute the
statement that the population mean is equal to 20. When the
hypothesized mean falls outside the confidence interval, the pvalue
of the hypothesis test will be less than the significance level of
0.05 and we will reject the null hypothesis. For example, suppose the
null hypothesis is that the true population mean is 18. This value
falls outside the confidence interval range of 19.9 to 22.1, so we
reject the null hypothesis and conclude that the true population mean
is not equal to 18. The pvalue for the hypothesis test will be <
0.05 in this example.
Ethical Issues
Remember that we are making inferences
based on a sample, and it is assumed that the sample is unbiased
without measurement error. Further, when we report the findings of a
hypothesis test, we need to be as complete as possible so that our
study can be replicated if need be.
I just heard a news report that the famous "Mozart Effect" study done
in 1993 is being disputed. That study presented the hypothesis that
classical music in the background would improve student
problemsolving performance on certain categories of problems
involving temporal and spatial dimensions. It has led to many
extensions (playing classical music to babies to make them "smarter,"
etc.). This year, researchers at several universities tried to
replicate the results and could not (they failed to reject the null
hypothesis of no difference in performance). The original researcher
claimed this in a news report the week of August 23, 1999, that the
replications did not follow the original data collection method. That
researcher is on somewhat shaky ground however, since the original
study involved a convenience sample of upper division college
students. There is nothing wrong with using convenience samples but
one's conclusions cannot be made beyond that "population". Certainly
not to infants.
The other ethical issue involves data snooping. One cannot look at
the data, test statistic values and related pvalues and then
decide to use a one or twotail test. Recall in Scenario Three that
the twotail pvalue was 0.0688, and we failed to reject the null
hypothesis at alpha of 0.05. But, if we used a onetail test, we
would have rejected the null hypothesis at alpha level of 0.05 since
the pvalue was 0.0344.
Good science includes establishing your hypothesis, setting the level
of significance, and collecting the data before the pvalues
are compared to alpha and the conclusion is reached.
References:
Anderson, D., Sweeney, D., &
Williams , T. (2001). Contemporary Business Statistics for
Business with Microsoft Excel. Cincinnati, OH: SouthWestern,
Chapter 9 (except Section 9.6).
Sheskin, D. (1997). Handbook of Parametric and Non parametric
Statistical Procedures. Boca Raton, FL: CRC Press LLC.


