"Comparing Two Samples Using a Nonparametric Test" 
Index to Module 4 Notes 
In Module Notes 4.1, we indicated that there
are two assumptions when doing statistical tests involving the
comparison of two samples to make an inference about population
means. The first is that the samples are drawn from normally
distributed populations (or samples are large without extreme skew);
and the second is that the samples have equal variances. When these
assumptions are met, we use the tTest for Two Independent Samples
Assuming Equal Variances. In the situation where the variances are
not equal, but the parent populations are normally distributed (or
samples are large without extreme skew) , we use the tTest for Two
independent Samples Assuming Unequal Variances.
What if we cannot make the assumption that the parent populations
from which the samples were drawn are normal, especially when we are
working with small samples? Then a safe alternative is to use a
nonparametric test, rather than the parametric tests we have
been using. Parametric tests are those tests such as the tTest and
the FTest, which rely upon statistical distributions and assumptions
about the distributions for analysis about population
parameters. Nonparametric tests do not rely upon statistical
distributions. Rather, nonparametric tests for numerical data, such
as the one discussed in this module, make inferences about
populations by studying sample measures of location, such as the
median. Rather than using all of the information available in
samples, these nonparametric tests typically use only the ranks
of the observations (Aczel, p. 625). As such, these type of tests
work well with ordinal as well as interval or ratio scale
numerical data.
This set of Module notes introduces a nonparametric test for the
study of two independent samples called the Wilcoxon Rank Sum Test
(also goes by Mann Whitney U Test, Aczel, p. 644). This will be
followed by a nonparametric test for the study of two related
(dependent) samples called the Wilcoxon SignedRank Test.
Wilcoxon RankSum Test for
Comparing Two Independent Samples
I will continue with the Brand of Gas data. I am interested in
comparing cars tested with Brand A against those tested with Brand B.
The only assumptions required for the Wilcoxon Rank Sum Test are that
the samples are random, and that observations drawn for each sample
are independent from each other. Therefore, this test could not be
used for the paired sample situation of Module 4.2.
The general hypotheses for the Wilcoxon RankSum Test are (Aczel, p.
644):
H_{0}: The distributions of the two populations are identical
H_{a}: The distributions of the two populations are not identical
If we want to state the hypotheses in terms of population medians (Levine, 1999), then we are stating that if there is a difference in the populations, it is a difference in location (Aczel, p. 645):
H_{0}: Median_{A} = Median _{B }H_{a}: Median_{A} =/= Median_{B}
Worksheet 4.3.1 provides the original data.
Note I added two columns and ranked the data within each group.
Worksheet 4.3.1
Brand A Brand B Brand A Brand B 20 20 14 18 20 20.5 15 18.5 19 18.5 16 19 16 20 17 19 15 19 19 20 17 19 20 20 14 18 20 20.5
To perform the Wilcoxon RankSum Test, we replace the observations in
the two samples of size n_{1} and n_{2} with their
combined ranks. There will be n = n_{1 }+ n_{2}
ranks. We can let Brand A be group 1 and Brand B be group 2. So, for
our example:
Eq. 4.3.1: n = n_{1} + n_{2} = n_{A} + n_{B} = 7 + 7 = 14 ranks
We assign the ranks so that rank 1 is given to
the smallest of the combined number, rank 2 to the secondsmallest,
and so forth. If several values are tied, a common practice is to
assign each tied observation "the average of the ranks that would
otherwise have been assigned had there been no ties. " Brand A Brand B A Ranks B Ranks 14 1 15 2 16 3 17 4 18 5 18.5 6 19 8 19 8 19 8 20 11.5 20 11.5 20 11.5 20 11.5 20.5 14 T_{1} = 41 T_{2} = 64
A simple way of assigning the ranks in Excel is shown in Worksheet
4.3.2.
Worksheet 4.3.2
The smallest number is 14, so it gets rank 1, followed by 15, 16, 17,
18 and 18.5 which get ranks 2, 3, 4, 5, and 6 respectively. The next
three numbers are 19, 19 and 19  which are ties. Since these are to
get the ranks 7, 8 and 9, we average the ranks and give each of the
19's the rank of 8. We then proceed through the data until we finish
by assigning rank 14 to the number 21. Note how I maintained group
identity.
The Wilcoxon RankSum Test Statistic is T_{1}, where
T_{1} is assigned to the group with the smallest sample size.
If the groups have the same sample size, as in our example, the
assignment is arbitrary. I'll assign T_{1} to the first
group, Brand A. T_{1} is the sum of the ranks in that group,
as shown in Worksheet 4.3.2. Note that T_{2} is the sum of
the ranks in the second group. Equation 4.3.2 shows a check on the
rankings:
Eq. 4.3.2: T_{1} + T_{2} = n * (n + 1) / 2= 14 * (14 + 1) / 2 = 105
Since T_{1} + T_{2} equals 105 (41 + 64), our rankings check.
The Wilcoxon Rank Sum Test Statistic available in Excel is the Z for samples over size 10 (we are a little shy of that with samples of size 7, but I wanted to use a small size problem to illustrate the technique). To compute the Z for this test, we need the Wilcoxon Rank Sum formula for the mean and the standard deviation of T_{1}.
Eq. 4.3.3: Mean_{T1} = n_{1}(n+1)/2 = 7(14+1)/2 = 52.5
Eq. 4.3.4: Std Dev_{T1} = Sq Rt { [ (n_{1}n_{2}(n+1) ] / 12 } =Sq Rt { [7*7(14+1)] 12 } = 7.826Eq. 4.3.5: Z =(T_{1}  Mean_{T1}) / Std Dev_{T1} = (4152.5) / 7.826 = 1.47
We can use the Excel Function =NORMSDIST(1.47)
to get the pvalue. The pvalue returned by =NORMSDIST(1.46) in an
Excel Cell is 0.07, which represents the cumulative probability of a
zscore less than or equal to 1.47. Since two times this pvalue (2
* 0.07 = 0.14) is greater than 0.05, we fail to reject the null
hypothesis and conclude the populations, from a location point of
view, are the same (no difference in the medians). This is similar to
our conclusion from the tTest for Two Samples in Module Notes 4.1.
Note that I doubled the pvalue because the alternative hypothesis is
twotailed. Brand A Brand C Brand A Brand C 20 20 14 20 20 26 15 23 19 23 16 23 16 24 17 23 15 23 19 24 17 25 20 25 14 23 20 26 Brand A Brand C A Ranks C Ranks 14 1 15 2 16 3 17 4 19 5 20 7 20 7 20 7 23 10 23 10 23 10 24 12 25 13 26 14 Sum of Ranks = T_{1} = 29 T_{2} = 76
Let's do one more example, this time comparing Brand A against Brand
C. Worksheet 4.3.3 provides the data.
Worksheet 4.3.3
Worksheet 4.3.4 provides the rankings.
Worksheet 4.3.4
Equations 4.3.1  4.3.4 are identical to the first example since we
have the same sample sizes in the two groups. The equation to compute
the Z score is repeated as Equation 4.3.6, below:
Eq. 4.3.6: Z = (T_{1}  Mean_{T1}) / Std Dev_{T1} = (2952.5)/7.826 = 3.003
We can use the Excel Function
=NORMSDIST(3.003) to get the pvalue. The pvalue returned by
=NORMSDIST(3.003) in an Excel Cell is 0.0013. Since two times this
pvalue (2 * 0.0013 = 0.0026) is less than 0.05, we reject the null
hypothesis and conclude the populations, from a location point of
view, are different (there is a significant difference in the
medians). This is similar to our conclusion from the tTest for Two
Samples in Module Notes 4.1. Car Brand A Brand B Difference Absolute Difference Rank Signed Rank R+ Signed Rank R 1 20 20 0 0 2 20 20.5 0.5 0.5 1.5 1.5 3 19 18.5 0.5 0.5 1.5 1.5 4 16 20 4 4 5 5 5 15 19 4 4 5 5 6 17 19 2 2 3 3 7 14 18 4 4 5 5 Total 1.5 19.5
This ends our work in comparing two independent samples for
the purpose of making inferences about the two populations from which
they were drawn, and our introduction to a nonparametric statistics
test. There is a nonparametric test for comparing two related
samples, called the Wilcoxon SignedRank Test. We cover that
next.
Wilcoxon SignedRank Test for
Comparing Two Related Samples
Recall that in Module 4.2 Notes we introduced the situation in which
we wanted to compare two samples, but the observations in Sample 1
are related to Sample 2. The example used concerned automobiles that
were run with two different Brands of Gas. But rather than assign 7
cars randomly to Group 1, and a different set of 7 cars randomly
selected and assigned to Group 2, only 7 cars are used. Car number 1
is run with Brand A gas and the mpg is recorded. Then the tank and
lines are purged and the same car is run with Brand B gas and the mpg
is recorded. The same happens for the other six cars in the
sample.
In Module Notes 4.2 we used the Two Group tTest for Paired Samples
for the analysis of the alternative hypothesis that the mean of the
first group was not equal to the mean of the second. That test
requires that the distribution of the differences follows a normal
distribution. When that assumption is seriously violated, it is
better to use a nonparametric test that doesn't require the normality
assumption. For example, if the data being measured involves
preference scores on a 1 to 10 scale, with 10 being the highest, we
may find that answers are biased to the high point of the scale. In
this case, the Wilcoxon SignedRank Test would be better to use than
the Two Group tTest.
Recall the data shown in the Brand A and Brand B columns of Worksheet
4.3.5. It comes from Worksheet 4.2.1 in Module 4.2 Notes. I have
added some columns in order to perform the Wilcoxon SignRank Test
procedure (Mason, 1999).
Worksheet 4.3.5
The hypothesis being tested in the Wilcoxon SignedRank Test
nonparametric method is:
H_{0}: There is no difference in the mpg performance of the two brandsH_{a}: There is a difference in the mpg performance of the two brands
I selected the twotailed alternative
hypothesis since I was simply trying to determine if the brands of
gas gave different mpg performance. If a special additive was put
into Brand B and we anticipated higher mpg performance from Brand B,
then I would use the onetail alternative that Brand B mpg
performance is higher than Brand A. n 2 alpha 0.10 0.05 0.02 0.01 4 0 5 1 0 6 2 2 7 4 3 0 8 7 5 1 0 9 9 8 5 1 10 12 10 3 3 11 16 13 7 5 12 19 17 9 7 13 24 21 12 9 14 28 25 15 12 15 33 30 19 15 16 39 35 23 19 17 45 41 27 23 18 51 47 32 27 19 58 53 37 32 20 65 60 43 37
The first step in the Wilcoxon SignedRant Test is to compute the
difference between each pairs of mpg scores as shown in the column
labeled "Difference" in Worksheet 4.3.5. We only are interested in
positive and negative differences, and samples with 0 differences are
dropped from further analysis. Next, the absolute value of the
differences is stated in the column "Absolute Difference." Ranks are
then assigned using the ranking procedure covered previously in the
Wilcoxon RankSum Test in the first section of these Module 4.3
Notes. Note that higher ranks are assigned to greater absolute
differences. Next, each assigned rank is then given the same sign
from the "Difference" column and reported in the "Signed Rank+" and
"Signed Rank" columns. Finally, the
signed ranks are summed and reported in the bottom "Total" row. The
smaller of the two rank sums is used as the test statistic and
referred to as T.
The critical values for the Wilcoxon RankSum Test are tabled in
Worksheet 4.3.6 (Abridged from Mason, Appendix H, 1999).
Worksheet 4.3.5
alpha
0.050
0.025
0.01
0.005
Selecting an alpha value of 0.05, which gives 2 alpha of 0.10 for the
twotail test, we go down that column to the intersection of the n =
7 row. The critical value of the Wilcoxon RankSum Test is shown to
be 4. If the Test Statistic T is less than the critical value,
we fail to reject the null hypothesis; and reject the null if T
is greater than the critical value. Since T = 1.5 for this
sample, we fail to reject the null hypothesis, and conclude there is
no difference in mpg performance.
When this same data was analyzed with the parametric tTest for Means
for Paired Samples, we rejected the null hypothesis. This should
cause us to recognize the importance of meeting assumptions. If we
are certain that the underlying distribution of the difference scores
is normal, then the tTest for Paired Samples is appropriate. But if
we are not sure about the underlying distribution, especially when we
have small samples and extreme values, the more conservative and
ethical approach would be to use the nonparametric Wilcoxon
SignedRank Test.
This finishes our coverage of nonparametric tests
for comparing two samples. The next topics concern comparing more
than two samples.
References:
Aczel, A. (1993). Complete Business
Statistics (2nd. ed.). Homewood, IL: Irwin.
Levine, D., Berenson, M. & Stephan,
D. (1999). Statistics for Managers Using Microsoft Excel (2nd.
ed.). Upper Saddle River, NJ: PrenticeHall, Chapter
9.
Mason, R., Lind, D. & Marchal, W. (1999). Statistical
Techniques in Business and Economics (10th. ed.).
Boston: Irwin McGraw Hill, Chapter 15.


