Data Collection: Quantitative Measurement
T. Bevins
3/14/99
Main goals for quantitative data collection:
-
measurement - quantifying information. Link the researcher's abstractions
or theoretical concepts to concrete variables that can be empirically or
objectively examined.
-
systematic collection - meaning that all data is collected in a uniform
way
-
objectivity - meaning that the data is not biased. Remove the influence
of the researcher from data collection.
Methods that quantitative researchers use to collect data (information)
should
be:
-
identifiable
-
repeatable, collected with consistency, and systematically recorded
-
standardized
-
checked and controlled
-
define the major variables
-
appropriate to the problem, the hypothesis, the setting, and the population
-
consistent with the study's objectives, concepts and theories
Some Definitions:
-
Operationalization: translating concepts into observable and measurable
phenomena
-
An operational definition translates the conceptual definition into
behaviors or verbalizations that can be measured.
-
Validity - (a general definition) the strength of the relationship
between an indicator and an underlying concept
-
Reliability - (a general definition) consistency of a measure
(more about reliability and validity later)
Levels of measurement:
First consider these terms:
discrete variable: a variable with a finite number of distinct
values (e.g.. gender)
continuous variable: a variable with an infinite number of values
(e.g.. height)
-
nominal - assign mutually exclusive values to levels
of this variable. Values are not ordered. Values are discrete.
This is a way of assigning a name to each level of the variable but the
level of the variable does not imply any order. Gender is an example
where you could assign the value of 1 to males and 2 to females, but there
is no implication here that females are twice as good as males (and of
course there is no researcher bias here).
-
ordinal - assign mutually exclusive values to levels of this
variable, but at this level there is also an ordering or ranking
of these values. These values are also discrete. The spacing between
each category is not numerically equivalent. The judging of figure
skaters is an example here where getting first place is clearly better
than getting second place (order counts), but there is no implication here
that the first place skater is better than the second place skater by the
same amount that the second place skater is better than the third place
skater (no implication of equal spacing).
-
interval - assign mutually exclusive values to each level
of the variable with the levels of the variable being ordered and
with equal spacing between each level of the variable. These
values become continuous. The zero point is arbitrary, not absolute.
Temperature on the Fahrenheit scale is a good example. Here a temperature
of 10 degrees is warmer than a temperature of 5 degrees. A rise in
temperature from 5 degrees to 10 degrees is equivalent to the rise in temperature
from 10 degrees to 15 degrees, or even equivalent to a rise in temperature
from 80 to 85 degrees. However, a temperature of 0 is not a complete
lack of "temperature" (although this concept is easier for a Vermonter
to understand than a Floridian). Also, since the zero point is not absolute,
you can't say that a temperature of 10 is twice as warm as a temperature
of 5.
-
ratio - assign mutually exclusive values to each level of
the variable with the levels of the variable being ordered and with
equal
spacing between each level of the variable and with the zero point
being absolute. You can say that a subject with a $40,000
income has a higher income than the individual with a $20,000 income (order),
and you can say that the rise in income from $20,00 to $40,000 is equivalent
to the rise in income from $40,000 to $60,000 (equal spacing). You
can also say that an individual with an income of $0 has no income (zero
point is absolute). Having data at the ratio level allows comparisons
to be made in a ratio form. For example you can speak of the one
subject's income ($40,000) being twice as high as the another subjects
income ($20,000).
Most researchers seek to measure their variables at higher levels on
the measurement scale if possible. We will discuss this more when
we talk about statistics. For now let's just say that more sophisticated
forms of statistical analyses can be completed when measurement is conducted
at higher levels on the measurement scale. Measurement at the higher levels
on the measurement scale also allow more discrete inferences to be made
about the data. For example, if you were measuring income at an ordinal
level you could only infer that one subject's income was more or less than
another subject's income. However, if you measure income at a ratio
level, you have a lot more information about your subjects' income since
you have the qualities of equal spacing and an absolute zero point.
With data measured at the ratio level, you can say exactly how much more
income one subject has as compared to another, and you can make mathematical
computations to compare or your subjects' income. That is not possible
when you measure at the ordinal level.
Qualitative v. Quantitative data collection: Data collection in
Qualitative research is a dynamic process that involves ongoing analysis
and the reformulation of the initial query. For a comparison of Qualitative
v. Quantitative data collection, consider the measurement of intelligence
with these two approaches. With quantitative studies, intelligence
testing, in large part, relies on the use of a structured instrument that
involves a paper-and-pencil test - usually in a laboratory-type setting.
Measurements are obtained on dimensions that have been predefined as comprising
the construct of "intelligence". With qualitative studies,
rather than use predefined criteria, the investigator might watch and listen
to subjects in their natural environment to reveal the meaning of intelligence.
Each approach has its value in addressing specific types of research questions/problems.
With the quantitative approach, the structured observation of intelligence
would be appropriate for the investigator who wishes to compare populations
or individuals, measure individual or group progress and development, or
describe population parameters on a standard indicator of intelligence.
On the other hand, qualitative observation of intelligence would
be useful to the researcher who is attempting to develop new understandings
of the construct in populations such as certain minority groups in the
United States, the elderly or non-Western cultures in which the standard
intelligence tests do not seem to apply or fit.
It can also be said that even within the deductive/quantitative process,
there is a place for inductive thinking. Nunnally says: "Although
the data of science must be objective, the scientist must rely on his intuition
for research ideas."
Methods of data collection on objects: (research is conducted
on objects or physical qualities, not subjects)
-
Measurement of objects - can be of physical qualities such as temperature,
hardness...
-
You can also measure qualities of the environment such as loudness,
brightness...
Methods of data collection on subjects: (not as objective
as much of the measurement of objects)
-
physiological
-
tests
-
observational
-
interview
-
questionnaire
-
available data (records)
1. Physiological - measures (usually with specialized equipment)
the physical or biological status of the subject
advantages:
-
objectivity
-
precision
-
sensitivity
disadvantages
-
expense
-
training required
-
presence of some types of devices might change the measurement
2. Tests - There are a variety of tests that can be administered
to subjects to gather data.
some examples of these tests are:
-
psychological tests to determine the amount of an attribute present in
a subject (for example depression). There is not a clear distinction
between tests and questionnaires. "Tests" tend to be more standardized,
but a distinction between these two terms is not absolute, and varies from
text to text.
-
tests of achievement or mastery (for example a licensing exam)
-
tests of aptitude (for example an IQ test)
advantages:
-
objectivity
-
precision
-
sensitivity
-
standardization
disadvantages:
-
bias
-
questions of validity and reliability
-
often times a lot more work must be done to carefully construct the test
(the pay back will usually be that once the test is constructed, gathering
and analyzing the data is a lot easier than with the non-standardized forms
of assessment)
3. Observational - watching with a trained eye for certain specific
events. Observation, interview, and questionnaire techniques are all good
for answering research problems/questions pertaining to psycho-social variables.
In research, observation must be objective and systematic.
a. consider if the observer utilizes:
-
concealment (whether the subjects knows they are being observed
or not) (remember informed consent)
-
intervention (whether the observer provokes an action or not)
b. consider if the observation is:
-
structured (specifying in advance what behaviors or events will
be observed and preparing forms for record keeping, such as categorization
systems, checklists, and rating scales)
-
unstructured (use field notes to record observations)
advantages:
-
sometimes observing the subject may give a more accurate picture of the
behavior in question than asking the subject, or may be the only way to
to gather the data
-
suitable in complex situations that are best viewed as total entities and
that are difficult to measure in parts
-
allows great depth and variety of information to be gathered
disadvantages:
-
interaction between the observer and subject can introduce bias or reactivity
-
there may be a lot of work summarizing and interpreting the data
-
the more the observer needs to make inferences and judgments about what
is being observed, the more likely it is that distortions of the data will
occur
4. & 5. Interviews and Questionnaires - ask subjects
to report data for themselves.
-
Both ways to gather information such as attitudes and beliefs.
-
Questions can be open-ended or close-ended.
open-ended questions:
-
used when the researcher wants the subjects to respond in their own words.
May need to use a technique called content analysis to objectively,
systematically and quantitatively translate subjects' responses into usable
data.
-
used when the researcher may not know all of the possible alternate responses
close-ended questions:
-
fixed number of alternate responses, and the subjects pick their
answer(s)
-
Questions can be direct (as in asking the subject's age) or indirect,
where the researcher uses a combination of items to estimate to what degree
the respondent has some trait or characteristic (combining items to obtain
an overall score).
-
survey research - relies almost entirely on questioning subjects
by either interviews or questionnaires
Scales for subject responses on Interviews and Questionnaires:
Scales are tools for the quantitative measurement of the degree to
which individuals possess a specific attribute or trait.
1. Dichotomous: the answer must be one of two responses, for
example, yes/no
2. Rank Order Questions: (Guttman scale) unidimensional (homogeneous),
with graduated intensity. Items hierarchically arranged so that endorsement
of one item means an endorsement of those items below it.
3. Likert Scale: a scaled response, usually with five levels. Can
be used to evaluate the amount of agreement, for example:
-
strongly agree
-
agree
-
neither agree or disagree
-
disagree
-
strongly disagree
The researcher may be interested in responses to individual questions,
or possibly a total score for the whole questionnaire.
4. Semantic differential: a series of rating scales in which
the respondent is asked to give a judgment about something along an ordered
dimension, usually of 7 points. Ratings are bipolar in that they
specified two opposite ends of a continuum (good-bad). the researcher usually
sums the points across all items for a total score for the questionnaire.
6. Available Data (records):
Not all studies require the researcher to acquire new information.
Sometimes existing information can be examined to study a problem.
Can show trends over time (longitudinal) or comparisons of different
groups at the same time (cross-sectional)
advantages:
-
may be a lot cheaper, faster and efficient way to gather data
-
with historical data, it may be the only way to "go back in time and get
the data"; review the phenomenon in the past or over time
-
decreases the problems of reactivity (Hawthorne effect) and response set
bias. This may be especially true with sensitive issues.
disadvantages:
you may not be assured that the data is representative and unbiased
if you had nothing to do with how the data was collected
Selection of the data collection technique
-
should be guided by the nature of the research question
-
should be guided by the literature review were you may gain information
about how other researchers have measured your variables of interest
-
should be guided by the nature of the sample of subjects being measured
Reliability
-
constancy
-
reproducibility
-
precision
Most measurements that we take have some error. We represent this
by conceiving of the observed score being made up of the true score plus
some random error of measurement.
Building on this concept of measurement we can describe reliability as
a ratio between the true score and an observed score.
If we examine this ratio we can see that if the error is very small, then
the ratio approaches 1. If the error is very large, than the ratio
approaches zero. Reliability coefficients range from 0 to 1, with
a researcher obviously seeking reliability that approaches 1.
DePoy discusses three forms of reliability.
-
stability refers to the consistency of repeated measurements.
This test-retest format examines the reliability of an instrument (test)
in a situation where the same test is given twice to the same subjects
under the same circumstances.
-
tests of internal consistency refer to tests of the homogeneity
of the items within a test. The two major forms of internal consistency
are the split half coefficient and the alpha coefficient.
-
equivalence refers to neither inter-rater reliability or reliability
of alternate forms of the same test. Inter-rater reliability refers
to a comparison of two observers measuring the same event. This can
be measured as a correlation or as a percent of agreement between the observers.
Alternate or parallel forms involves the comparison of two versions of
the same test. With alternate forms, equivalent forms of one test
are administered to the same subjects on the same testing occasion.
Validity
-
Validity refers to how well and instruments measures some underlying concept.
The closer the instrument comes to representing the true definition of
the concept the more valid the instrument.
-
It is important to note that a scale that is valid is necessarily reliable,
but the opposite may not be true. Systematic error makes a measurement
invalid
Three major kinds of validity are discussed in the text
-
content validity addresses the degree to which measurement reflects
the basic content of the phenomenon or domain of interest. In order
to establish content validity the researcher should:
-
specify the full domain through a literature search
-
ensure adequate representation of the domain in the items on the test
-
have a panel of experts review each item on the test
-
criterion validity involves demonstrating a correlation between
the measurement of interest and another standard that has been shown to
be an accurate representation of the underlying concept of interest.
Two types of criterion validity:
-
concurrent validity involves administering the new instrument along
with the already accepted and validated instrument to measure this same
concept on the same sample of individuals. The correlation between
the two measures of this concept represent the validity of the new instrument.
-
predictive validity is using the measurement on an instrument to
predict or estimate the occurrence of a behavior or event. The correlation
between the score on the instrument and the subsequent criterion behavior
or event is a measure of this validity.
-
construct validity is the most complex and comprehensive form of
validation. with construct validity the researcher collects supporting
evidence of the relationship of the test instrument to related and distinct
variables associated with the construct of interest. The correlation
or relationship of the test score to other variables or measures expected
to be related to the construct represents the degree of construct validity.
If you cannot find a data collection or measurement technique, or instrument
that adequately measures the variable(s) of interest you may have to develop
a newly constructed instrument. This is typically a time-consuming
and potentially expensive additional burden to the researcher. The
researcher would need to plan on testing the reliability and validity of
the instrument. Some investigators combined previously tested and
established measures of the variable of interest with the newly constructed
ones in order to demonstrate the relationships and relative strengths of
the new instrument in their study. Other times, the researchers publish
a separate article dealing with the reliability and validity of the new
instrument, and once that is established in the literature they move forward
to conduct research using that new instrument.
DePoy lists 7 steps in instrument construction:
-
review the literature for relevant instrumentation
-
identify the theory from which to develop a new instrument
-
specify the concept or construct to be operationalized into an instrument
-
conceptually define the full range and content of the concept
-
select an instrument format, or potential type of instrument
-
translate the concept into specific items or indicators with appropriate
response categories
-
test the reliability and validity of the instrument
DePoy also lists some potential sources of error that many reduce
the reliability or validity of the instrument:
-
format or design of the instrument
-
clarity of the instrument
-
social desirability of the questions
-
variation in administration of the test
-
situational contaminants
-
response set biases (all "yes" or all "no" responses)
DePoy also reminds us of the influence of the data collector on the
data. The data collector can knowingly or unknowingly bias or
confound the data that they are collecting. Also review the terms
concealment, intervention, structure, open-ended, direct, and social desirability
which were discussed above. They have a lot to do with how the researcher
can have an influence on the data collected.
A concept that contrasts with objectivity and systematic data collection
(associated with quantitative research) is that of the richness of data
found with open-ended, unstructured exploration (associated primarily with
qualitative research). The data collected in the latter form can
be extremely valuable, but it is often more challenging to analyze.
The work is not done beforehand in determining exactly how to collect and
analyze the data, but there is certainly a challenge in determining during,
and after the data collection, how to collect and analyze the data.
To reiterate, a major distinction between qualitative and quantitative
research is that in quantitative research decisions about theory
and method are made before data collection starts. In qualitative
research the method for data collection may evolve during the study,
and theory is not formulated until after data collection and analysis are
at least underway, if not completed.
copyright, Thomas Bevins (3/14/99)
Sources:
LoBiondo-Wood, G., & Haber, J. (1998). Nursing research: Methods,
critical appraisal, and utilization
(4th ed.). St. Louis, MO: Mosby.
DePoy, E. & Gitlin, L. (1998). Introduction to research:
Understanding and applying multiple strategies. St. Louis: Mosby.