Descriptive statistics
are one way of organizing data collected during a study. This kind of statistics
allows the researcher to describe and summarize the data collected. (Another
kind of statistics is inferential statistics which allows the researcher
to estimate how reliably he/she can make predictions and generalize his/her
findings. Inferential statistics will be dealt with in another outline).
Descriptive statistics include measures of central tendency such as the
mean, median, and mode; measures of variability such as modal percentage,
range, and standard deviation; and some correlations such as scatter plots.
Measures of central tendency describe the average member of the
sample whereas measures of variability describe how much dispersion or
variability there is in the sample. As the example in the book demonstrates,
it is useful to know both. If the average age in the sample is 22 with
the ages ranging from 18-25, it is a different population when the average
age is 22 and the range of ages is 17-45. In both cases, the mean or average
age was the same but the amount of variability or dispersion was quite
different.
Understanding levels of measurement: Measurement is the assignment
of numbers to objects or events according to certain rules. Every event
that has a specific number has to be similar to every other event assigned
that number. If you are assigning number 1 or 2 based on being male or
female, it is clear that all males must be assigned one number and all
females assigned another. The level of measurement is determined by the
nature of the object or event being studied. The higher the level of measurement,
the more flexibility the researcher has when choosing statistical procedures.
It is always best to use the highest level of measurement possible. In
ascending order, levels of measurement are nominal, ordinal, interval,
and ratio.
A. Nominal Measurement: this is a way of labeling or categorizing
objects or events. The categories are mutually exclusive; you either have
the characteristic or you don’t. Numbers used here cannot indicate more
or less of a characteristic. Examples include: gender, hair color, marital
status, or religious affiliation. This type of measurement allows the least
amount of manipulation. You usually see the frequency of each event counted.
B. Ordinal Measurement: this level of measurement is used to
show rankings of objects or events. Numbers assigned to categories can
be compared and a member of a higher category is assumed to have more of
a certain characteristic than a member in a lower category. The intervals
between the numbers are not necessarily equal and zero is not absolutely
zero. An example would be class rank. The person who is assigned the #1
has the highest grade point average. All students ranked below #1 have
GPAs lower than #1 but that’s all you know. There may be large or small
differences between the GPAs of the ranked students. The student with the
lowest GPA does not have a total absence of a GPA (at least not usually!)
but is simply lower than the person next highest above him or her. There
are still limited options in mathematical manipulations but in addition
to frequency counts, one can also find the median, percentiles, and rank
order coefficients of correlation.
C. Interval Measurement: this level of measurement shows rankings
of events or objects on a scale with equal intervals between the numbers.
The zero point remains arbitrary. A good example is the Fahrenheit scale
for measuring temperature. The intervals or distances between the degrees
is equal but the zero point is still arbitrary. This type of data allows
the researcher more manipulation, including addition and subtraction of
numbers and calculation of means. At this level, adding or subtracting
5 degrees means the same thing no matter where on the scale you are. The
difference between 75 degrees and 70 degrees is the same as the difference
between 55 degrees and 50 degrees. However, even with equal intervals,
you still cannot say that 80 degrees is twice as hot as 40 degrees. (This
is still an improvement over ordinal or ranked data discussed above. If
you move down 5 places in the student ranking from 25th to 30th place,
the difference in GPA may be very different from anywhere else in the student
ranking).
D. Ratio Measurement: this level of measurement shows rankings
of events or objects on scales with equal intervals and absolute zeros.
The number represents the actual amount of the property or characteristic
that the object possesses. This highest level of measurement is usually
only achieved in physical sciences. Examples are height, weight, pulse,
blood pressure. Any mathematical manipulation may be used on data from
ratio scales.
Frequency Distribution: this is one of the most basic ways of
organizing data. The number of times an event occurs is counted or the
data are grouped and the frequency of each group are counted. With the
example of grades, you could report the frequency of each score or you
could group them into scores that meet the criteria of an A, B, C, D, etc.
and count the frequency of the group. Your groups have to be set up so
that a score can only fall into one group. Your group size needs to be
appropriate for reporting your data, If you grouped scores as those from
0-50 and from 51-100, this would not likely yield very useful information.
Frequency data can also be expressed in graphic form through the histogram
and the frequency polygon. These are similar as they both plot scores or
percentages.
Measures of Central Tendency: these measures always answer questions
about the middle of a sample or group. Because they summarize the members
of a sample, they are summary statistics and are specific to each sample.
Therefore, they also change with each sample. Such summaries might include
a mean and standard deviation for the sample. This tells you the average
of the sample and the amount of variation in the sample. So if one sample
of students taking a test have a mean of 83 with a standard deviation of
5, you have a pretty good sense of where the scores fall. However, it is
important to remember that another sample may look very different. The
term "average" is a non-specific term. There are three terms to remember:
mean, median, and mode. Depending on the distribution, these may not all
give the same answer to the question. "What is the average?"
A. Mode: this is the most frequent score or result obtained.
A distribution can have more than one mode. The number of modes in a distribution
is called the modality of the distribution. For example, if 20 students
take a certain test and 7 of them receive an 85 and 7 receive a 70, with
the other students spread out around these scores, this sample would have
two modes or a bimodal distribution. The mode is most frequently used with
nominal data. The mode can fluctuate widely from one sample to another
and it is unstable. A change in one score can change the mode.
B. Median: this is the middle score or the point at which 50%
of the scores fall above it and 50% fall below it. It is not sensitive
to extremes in high or low scores. It is best used when the data are skewed
in one direction or another and you want to find out the "typical" score.
It is easy to find and can be used with ordinal or higher data.
C. Mean: this is the true average of all the scores and is used
with interval or ratio data. It is what everyone thinks of when they use
the word "average". It is the most widely used measure of central tendency.
Most tests of statistical significance rely on means. It is affected by
every score but is more stable than either the mode or the median. It is
the least affected by chance. The larger the sample size, the less affected
the mean is by an extreme score.
In summary, when looking at measures of central tendency, the mean is
the most stable and the median is the most typical. If a distribution is
symmetrical and unimodal, the mean, mode, and median will coincide. If
the distribution is skewed, the mean will be pulled in the direction of
the long tail. When you have a skewed distribution, you should report all
three statistics.
Normal Distribution: this is a theoretical concept and is based
on the observation that data from repeated measures of interval or ratio
level data group themselves around a midpoint in a distribution in a manner
that closely approximates a normal curve (bell curve). Also, if the means
of a large number of samples of interval or ratio data are calculated and
plotted on a graph, that curve also approximates the normal curve. The
tendency of the means to approximate a normal curve is called the sampling
distribution of the means. The normal curve is symmetrical and unimodal.
The mean, median, and mode are equal. 68% of scores will fall within 1
SD of the mean; 95% within 2 SD of the mean; and 99.7% within 3 SD of the
mean.
A. Skewness: not all samples of data approximate the normal curve.
If one tail is longer than the other, the distribution is described as
skewed. In a positive skew, most of the data are at the low end of the
range and there is a longer tail pointing to the right. This reflects the
bulk of the scores are to the left, but a few high scores are pulling the
distribution of the scores to the right. In a positive skew, the mean is
to the right of the median. In a negative skew, the bulk of the scores
are in the high range and there is a longer tail pointing to the left or
pulling the distribution of the scores to the left. In a negative skew,
the mean is to the left of the median.
B. Symmetry: when 2 halves of a distribution are folded and superimposed
over each other, the distribution is said to be symmetrical. The 2 halves
are equal or are mirror images of each other. The shape does not affect
symmetry. In other words, it can be a tall, skinny curve and symmetrical
or a short, flatter curve that is also symmetrical.
C. Kurtosis: this relates to the peakness or flatness of a distribution.
This happens depending on how spread out the data are. The farther the
data are spread out, the flatter the peak. High peaks are called "leptokurtic"
and flat distributions are called "platykurtic". Neither curve represents
the normal distribution.
Interpretation of Measures of Variability: this refers to how variable or spread out the data is. Even though two samples have the same mean, the distributions could be different both in kurtosis and skew. It’s important to know the variability. Just as with measures of central tendency, the measures of variability are appropriate to specific kinds of measurement and types of distribution. Modal percentage is used with nominal data and is the percentage of cases in the mode. A high modal percentage indicates that there is not very much variability.
A. Range: this is the most simple measure of variability and
is the difference between the highest and lowest scores. A change in either
score changes the range. The range should always be reported with other
measures of variability.
B. Semiquartile Range: this indicates the range of the middle
50% of the scores. It is more stable than the range, because it is less
likely to be changed by a single extreme score. It lies between the upper
and lower quartiles. The upper quartile is the point at which 75% of the
scores fall below. The lower quartile is the point at which 25% of the
scores fall below.
C. Percentile: this simply represents the percentage of cases
that a given score exceeds. A score at the 90% is exceeded by only 10%
of the scores.
D. Standard Deviation: this is the most frequently used measure
of variability and it is based on the idea of a normal curve. It is the
measure of the average deviation of scores away from the mean and should
always be reported with the mean. It takes all scores into account and
is used to interpret individual scores. If you know the mean and the standard
deviation, you know where 68% of the scores fall and you can tell generally
where your score fails in comparison to the rest of the group. One limitation
is that the SD is expressed in terms of the units used in measurement.
The researcher could not compare height in inches and weight in pounds
without converting the scores to Z scores.
E. Z Scores: this is used to compare measurements in standard
units. Each score is converted into a Z score and then Z scores are used
to examine the distance of the scores from the mean. A Z score of 1.5 means
the score is 1.5 SD above the mean whereas a Z score of -2.5 means the
score is 2.5 SD below the mean. This allows the researcher to compare results
that are measured in different units.
There are many kinds of variability. The modal frequency is the
easiest to calculate and the SD is the most useful. The SD is the most
stable statistic. Transforming scores to Z scores standardizes them and
allows comparison of scores than have different measurement units.
Correlation: correlations answer the questions about how much
variables are related to each other. They are most commonly used with ordinal
or higher level data. Scatter plots are visual representations of the strength
and magnitude of the relationship between 2 variables. The strength of
the correlation is seen in how closely the data points approximate a straight
line. In a positive correlation, the higher the score on one variable,
the higher the score on another. In a negative correlation, the higher
the score on one measure, the lower the score on another. No correlation
means there is no relationship between the variables.
Research Methods and Applications
in Health Care Systems
Inferential Statistics
T. Bevins, revised Summer 1999
PLEASE NOTE: While it is helpful to understand how statistical procedures are performed, this knowledge is not critical to understanding published research studies. I will try to summarize what I think you should understand about statistical approaches used in research articles.
Inferential statistics combine mathematics and logic to enable researchers to:
A statistic is a characteristic of a sample (for example the mean or standard deviation of the sample - which we can calculate from our data)
A parameter is a characteristic of the population (the population also has a mean and standard deviation - but we almost never know the characteristics of the population, since we almost never have all of that data)
We use statistics to estimate the parameters. The use of probability tells us how well our statistics estimate the parameter (how this works is beyond the scope of this course).
Inferential statistics are based on the qualifications that:
The principles of probability and the idea of the sampling distribution of the mean are very important to an understanding of statistics. However, it is beyond the scope of the course, so I will not try to include that here. You should know the following:
There are two errors that can occur as we make conclusions based on our calculated statistic:
Statements across the top of the columns state the truth in the population; statements in the far left column state your conclusions based on your sample statistic. Combinations of these situations will yield either correct conclusions or erroneous conclusions. |
|
|
|
|
This is the power of your test. |
The results of your sample statistic lead you to accept the null hypothesis (based on your sample and your calculated statistic, you conclude that there is no difference between groups) |
|
|
Power: the probability of rejecting the null hypothesis, when in reality the null hypothesis is not true. (Although this is not statistically proper, in plain English you can think of power as being able to see a difference (or similarity) when one exists). Power depends on:
Assumptions for many parametric statistical tests:
Question
1. Similarities or relationships: Are there similarities or relationships between or among variables?
(dependent variable dichotomous) Logistic regression
(dependent variable dichotomous, independent variable(s) interval) Multiple discriminant function
(dependent variable dichotomous, independent
variables nominal)
Logistic regression
Dependent
Sign test / Wilcoxon signed ranks test
Multiple indep. variables Factorial ANOVA
Dependent
Repeated measures ANOVA
Chi-square: a statistical method that tests the fit of categorical data (frequency data) to an expected pattern. It can be used to show whether there is a relationship between variables or a difference between levels of variables. This is a good method to deal with nominal data were you can't perform mathematical operations such as computing a mean (tests like ANOVA require mathematical operations). It looks at observed v. expected frequencies. At least one variable must be at a level of measurement above the dichotomous, nominal level.
Confirmatory factor analysis: a statistical method used to analyze the pattern among multiple variables; to confirm whether the patterns among multiple variables are as predicted , and whether variables can be grouped under the expected factor(s) .
Exploratory factor analysis: a statistical method used to analyze the pattern among multiple variables; to explore whether patterns may exist among multiple variables, and whether variables can be grouped under one or more factors.
Factorial ANOVA: a statistical method used to measure whether there are significant differences between/among two or more independent treatment groups or between (a) treatment group(s) and a control group that are independent, when the data is at an interval level. The different groups are considered different levels of one variable. Each individual is measured on more than one independent variable.
Friedman two-way ANOVA: a statistical method used to measure whether there are significant differences between/among two or more independent treatment groups or between (a) treatment group(s) and a control group that are independent, when the data is at an ordinal level. The different groups are considered different levels of one variable. Each individual is measured on two independent variables.
Independence of groups: basically whether one set of measurements is affected by the other. When there are repeated measurements, as in a pretest and posttest type of situation, the measurements are not independent. Individuals who score high on the pretest are likely to also score high on the posttest, so those measurements are not independent.
Kruskal-Wallis one-way ANOVA: a statistical method used to measure whether there are significant differences among multiple independent treatment groups or between treatment groups and a control group that are independent, when the data is at an ordinal level. The different groups are considered different levels of one variable.
Mann-Whitney U (or Wilcoxin Rank Sum): a statistical method used to measure whether there are significant differences between two independent treatment groups or between a treatment group and a control group that are independent, when the data is at an ordinal level.
Multiple R (from regression analysis): a statistical method used to measure the strength of the relationship among multiple interval level variables.
Multiple regression: a statistical method used to test the strength of predictions or explanations about one (dependent) variable (at an interval level of measurement) based on information from multiple (independent) variables?
Multivariate ANOVA: a statistical method used to measure whether there are significant differences between/among two or more independent treatment groups or between (a) treatment group(s) and a control group that are independent, when the data is at an interval level. The different groups are considered different levels of one variable. Each individual is measured on more than one dependent variable.
One-way ANOVA: a statistical method used to measure whether there are significant differences among multiple independent treatment groups or between treatment groups and a control group that are independent, when the data is at an interval level. The different groups are considered different levels of one variable.
Pearson's correlation coefficient: a statistical method used to measure the strength of the relationship between two interval level variables. There need not be a distinction between the dependent and independent variable; with correlation studies there need not be a manipulated and responding variable.
Phi Coefficient: a statistical method used to measure the strength of the relationship between two dichotomous, nominal level variables.
Repeated measures ANOVA: a statistical method used to measure whether there are significant differences among multiple dependent treatment groups or between treatment groups and a control group that are dependent, when the data is at an interval level. The different groups are considered different levels of one variable.
Sign test (or Wilcoxon signed ranks test): a statistical method used to measure whether there are significant differences between two dependent treatment groups or between a treatment group and a control group that are dependent, when the data is at an ordinal level.
Simple regression: a statistical method used to test the strength of predictions or explanations about one (dependent) variable (at an interval level of measurement) based on information from one (independent) variable.
Spearman's rank correlation coefficient: a statistical method used to measure the strength of the relationship between two ordinal level variables
t-Test for dependent samples: a statistical method used to measure whether there are significant differences between two dependent treatment groups/samples or between a treatment group and a control group that are dependent, when the data is at an interval level.
t-Test for independent samples: a statistical method
used to measure whether there are significant differences between
two
independent treatment groups or between a treatment group and a control
group that are independent, when the data is at an interval level.
Study help with practice questions for the use of statistics
in research
1. When testing the hypothesis that the mean posttest score for subjects completing a strengthening program is different from their mean pretest score (when strength is measured in pounds of force), the researcher should use a/an:
b) t-test for dependent samples
c) repeated measures ANOVA
d) Sign test
e) Pearson's correlation coefficient
We are asking if there is a difference between pretest and posttest scores. We are not looking at similarities/relationships, patterns or prediction/explanation.
B. What is the number of groups/samples or variables, or the number of levels of the independent variable?
There is only one group of subjects, but they are measured (sampled) twice (a pretest - posttest situation is a form of repeated measures), so the data can be compared as two samples.
C. What is the scale of measurement?
ratio
interval
ordinal
nominal?
We are measuring strength in pounds of force, and pounds is a ratio variable. Since ratio data meets all of the qualifications for interval data, we can use statistical methods that require at least interval level measurement. Remember, class rank is a rank variable, and gender is a nominal variable.
D. Is there independence of groups/samples? No, a pretest - posttest measure is a dependent measure. You would expect that someone who was the strongest subject on the pretest is expected to be one of the stronger subjects on the posttest - you expect the posttest data to depend to a certain extent on the pretest data - when your data is a repeat of a measure on the same subject you have dependence of groups.
Now follow through the table and see that if the question is
one of difference; the number of samples is two; the scale of
measurement is ratio, and there is dependence of groups/samples,
then
the answer is: t-test for dependent samples, or dependent means t-test.
Be sure you follow your way through the table to get to the t-test for
dependent samples. Also see that the definition of a t-Test for
dependent samples is a statistical method used to measure whether there
are significant differences between two dependent treatment groups/samples
or between a treatment group and a control group that are dependent,
when the data is at least at an interval level.
2. When testing the hypothesis that the mean SAT score for FGCU students in the colleges of 1) Arts and Sciences, 2) Health Professions, 3) Professional Studies, and 4) Business are not different from each other, we should use a/an
b) Multivariate ANOVA
c) one-way ANOVA
d) Kruskal-Wallis one-way ANOVA
e) Multiple R
We are asking if there is a difference between colleges. We are not looking at similarities/relationships, patterns or prediction/explanation.
B. What is the number of groups/samples or variables, or the number of levels of the independent variable?
There are multiple levels (4) of the variable college.
C. What is the scale of measurement?
ratio
interval
ordinal
nominal?
We are measuring the students' score on the SAT, and this is an interval variable.
D. Is there independence of groups/samples? There is independence of groups. You would not expect that someone with a high SAT score in Arts and Sciences will influence the SAT score of someone in Health Professions.
Now follow through the table and see that if the question is
one of difference; the number of levels is multiple; the scale
of measurement is interval, and there is independence of groups/samples,
then
the answer is: one-way ANOVA. Be sure you follow your way through
the table to get to the one-way ANOVA. Also see that the definition
of a one-way ANOVA is a statistical method used to measure whether there
are significant differences among multiple independent treatment
groups or between treatment groups and a control group that are
independent, when the data is at an interval level. The different
groups are considered different levels of one variable.
3. When testing the hypothesis that there is a positive relationship between the hours students spend studying and the grades they earn on tests (on a 0-100 scale), the researcher should use a/an
b) Pearson's product-moment correlation coefficient
c) t-test for independent samples
d) Chi-square test
e) Simple regression
We are asking if there is a relationship between the hours students spend studying and the grades they earn on tests.
B. What is the number of groups/samples or variables, or the number of levels of the independent variable?
There are two variables: hours students spend studying and the grades they earn on tests
C. What is the scale of measurement?
ratio
interval
ordinal
nominal?
This is ratio data; hours students spend studying and the grades they earn on tests are ratio variables.
D. Is there independence of groups/samples? Independence of groups is not a question in selecting a method to deal with relationships. The method itself is examining the issue of dependence or relationship.
Now follow through the table and see that if the question is
one of relationship; the number of variables is two; and the scale
of measurement is ratio, then the answer is: Pearson's product-moment
correlation coefficient (or Pearson's correlation coefficient). Be
sure you follow your way through the table to get to the Pearson's correlation
coefficient. Also see that the definition of a Pearson's correlation
coefficient is a statistical method used to measure the strength of
the relationship between two interval level variables.
4. Two samples of fifth grade students are selected; one sample of 100 students from a rural school, and another sample of 100 students from an urban school. When testing the hypothesis that the frequency of students receiving a passing score (scored on a 0-10 scale) on a test of fitness is the same in both schools, the researcher should use a/an:
b) Pearson's product-moment correlation coefficient
c) t-test for independent samples
d) Chi-square test
e) Simple regression
We are asking if there is a difference (or no difference) between the frequency of students receiving a passing score on a test of fitness in rural v. urban schools.
B. What is the number of groups/samples or variables, or the number of levels of the independent variable?
There are two variables: frequency of students receiving a passing score on a test of fitness (v. students not receiving a passing score on a test of fitness), and school location)
C. What is the scale of measurement?
ratio
interval
ordinal
nominal?
This is frequency data; here the only data reported is the frequency - you do not have student's individual scores. We are not using the measurement of the fitness score, but looking at frequencies of a passing score.
D. Is there independence of groups/samples? Independence of groups is not a question in the Chi square. The method itself is examining the issue of dependence or relationship.
Now follow through the table and see that if the question is one of difference; the number of variables is two; and the data is frequency, then the answer is: Chi-square test. Be sure you follow your way through the table to get to the Chi-square test. Also see that the definition of a Chi-square is a statistical method that tests the fit of categorical data (frequency data) to an expected pattern. It can be used to show whether there is a relationship between variables or a difference between levels of variables. This is a good method to deal with nominal data were you can't perform mathematical operations such as computing a mean (tests like ANOVA require mathematical operations). It looks at observed v. expected frequencies. At least one variable must be at a level of measurement above the dichotomous, nominal level.
5. When testing the hypothesis that the length of stay in a hospital (in days) can be predicted from information about the severity of illness (on an assessment instrument yielding scores from 0-10), age, diagnosis (nominal), and previous health of a patient (using another assessment instrument yielding scores from 0-10), the researcher should use a/an
b) Pearson's product-moment correlation coefficient
c) Multiple regression analysis
d) Chi-square test
e) Confirmatory factor analysis
We are asking a prediction question: can you predict the length of stay in a hospital based on information about the diagnosis, severity of illness, age and previous health of a patient?
B. What is the number of groups/samples or variables, or the number of levels of the independent variable?
There are multiple independent variables (information about the diagnosis, severity of illness, age and previous health of a patient) and one dependent variable: length of stay.
C. What is the scale of measurement?
ratio
interval
ordinal
nominal?
The dependent variable (length of stay) is measured at a ratio level.
D. Is there independence of groups/samples? Independence of groups is not a question in selecting a method to deal with prediction. The method itself is examining the issue of dependence or relationship.
Now follow through the table and see that if the question is
one of prediction; the number of independent variables is multiple;
and the scale of measurement of the dependent variable is ratio,
then the answer is: Multiple regression analysis (or multiple regression).
Be sure you follow your way through the table to get to the multiple regression.
Also see that the definition of a Multiple regression is a statistical
method used to test the strength of predictions
or explanations about one (dependent) variable (at an interval level
of measurement) based on information from multiple (independent)
variables.
Answers:
1. b
2. c
3. b
4. d
5. c