Module 3 Notes
Just A
Reminder !
This may be one of the more difficult
modules for you. Do not give up! You will learn this, and the next two
modules will not be nearly as difficult.
In selecting a measurement instrument,
the following are considerations:
Validity:
Does the instrument measure what we think it measures?
Reliability:
does the instrument measure the same way each time its used?
Usability:
is the instrument economical (money and time) and easy to use (ease of
administration, scoring, interpretation, reporting, application)?
NOTE:
We will cover reliability and usability in future modules.
Validity is the adequacy and appropriateness
of the interpretations made from assessments
-
To what extent will the interpretation
of the scores be appropriate, meaningful, and useful for the intended application
of the results?
-
How well does it fulfill the function for
which it is being used?
-
What are the consequences of the particular
uses and interpretations that are made of the results?
We want evidence that the scores actually
reflect whatever we expect them to measure.
Nature
of Validity
Nature of Validity: Appropriateness
of the interpretation of the results
1. Validity is a matter of degree.
We judge the validity to be high, moderate, or low.
2. Validity is specific to some particular
use or interpretation - Can only answer the question of validity in relation
to a given specific task for a given population of examinees.
3. Validity is a unitary concept
4. Validity is an overall evaluative
judgment
Types
of Validity
There are five major types of validity.
Your text calls them "Major Considerations in Assessment Validation". These
five types are: content, test-criterion, construct, face, and consequence.
-
Content performance on a "universe" of
items
-
First, you create or locate your list of
behavioral objectives.
-
Second, you set the test next to the objectives
and begin matching them, this test question measures this objective and
so forth. If the test items can all be linked to a behavioral objective,
this is the first evidence that the validity is high. If there are test
items that cannot be linked to behavioral objectives on your list, or if
there are objectives without a test item matched to it, then the validity
may be moderate or low.
-
Third, count the number of items measuring
each objective. The objectives of greatest importance should have more
items than those of lesser importance. Another "rule of thumb", the minimum
number test items is three per objective. If each objective is measured
by three test items, then this is further evidence that the validity is
high.
Questions to Ask:
a) Does the test content parallel the
curricular objectives in content and process?
b) Are the test and curricular emphases
in proper balance?
c) Is the test free from prerequisites
that are irrelevant or incidental to the present measurement task?
d) Is a logical process link between
Curriculum, Instruction, and Assessment
Best Evidence -- the Table of Specifications.
This is what you are in the process of creating. You eventually will have
a list of objectives, the level of Blooms taxonomy addressed by each objective,
the form and type of assessment, the importance of each objective, and
the items on the assessment that measure each objective.
2.
Test-Criterion Validity
-
Performance on some criterion
-
First, you find a test that measures the
same thing that your test measures.
-
Second, you give both tests to the same
group or similar groups of people.
-
Third, you correlate the tests. This will
give you the concurrent test-criterion validity.
If instead, you hypothesize that your
test will predict future performance at some related activity:
-
First, you identify what behavior / performance
your test predicts (i.e., success in undergraduate programs).
-
Second, you give your test to a group of
people (in this example, high school students).
-
Third, after this group has completed their
first year of college, correlate the test scores with their grades. This
will give you the predictive test-criterion validity.
Best Evidence -- the correlation coefficient
Go to Exercise
1 What does a correlation
look like?
Correlation coefficients range from
+1 to ?1. The sign indicates the direction of the relationship. As was
stated earlier, if the scores are moving in the same direction, the regression
line will go from lower left to upper right, and it is a positive relationship;
hence the positive sign (+). If the scores are moving in opposite directions,
the regression line will go from upper left to lower right, and it is a
negative relationship; hence the negative sign (-).
The number indicates the strength of
the relationship. The closer the number is to 1 the stronger the relationship.
In the first example, because all points are on the line, the correlation
would be 1, indicating a perfect relationship. This rarely happens. In
the second example, the correlation would be above .9 because the points
are very close to the line. The further the points are from the line, the
weaker the relationship and the lower the number.
You will be asked to interpret correlations
and the expectation is that you will address both the direction and the
strength of the relationship. A correct answer for the second example assuming
I told you the correlation was -.95, would be: "As scores on the Perceived
Anxiety Scale go up, scores on the mid-term exam go down. This is a strong
relationship."
You must (1) use the names of the measures
given, (2) address the direction of the regression line, and (3) judge
the strength of the relationship. You can use the term weak for correlations
below .3, moderate for those falling between .3 and .8, and strong for
those above .8.
Go to Exercise
3
3.
Construct Validity
Construct Validity is the degree to
which certain psychological traits or constructs are actually represented
by the test performance
What is a construct? It is a psychological
trait that is NOT directly observable, but is believed to exist based on
observable behaviors that are made in response to the psychological trait.
-
First, define the domain or tasks to be
measured.
-
Second, analyze the mental process required
by the tasks. Example, people with high anxiety will sweat more, will perform
less efficiently, etc. Then, decide how to measure sweat, performance efficiency,
etc.
-
Third, compare the scores of known groups.
Example, give test to those who are known to be high in anxiety, and a
group known to be low in anxiety. Scores should differ greatly for each
group.
-
Fourth, compare the scores before and after
"treatment". Example, give treatment known to reduce anxiety and give test
again. Scores should lower for those undergoing treatment.
-
Fifth, correlate scores with other measures
that are supposed to measure the same thing.
You may realize that the first and second
steps listed here parallel those for content validity and the third through
fifth steps parallel those of test-criterion validity. Construct validity
does encompass both content and test-criterion validity.
Best Evidence -- Correlation Coefficient
and the Table of Specifications
4.
Consequences
What are the consequences for using
the test results?
High stakes decisions vs. low stakes
decisions ? High stakes are those which directly impact the direction of
your future. For example, the tests used to identify students for special
education, college entrance exams, and High School Competency Test (HSCT)
are considered high-stakes. If you do not pass these, the future of your
life is impacted directly. Classroom tests are NOT high stakes. We never
use the results from one classroom test to determine pass/fail or admittance
to special education.
Intended as well as unintended consequences
must be considered by teachers.
Teachers can judge this validity because
they have the following information: They
...
a) know the learning objectives
b) know learning experiences of their
students
c) have made observations of students
consider:
1) do tasks match important learning
objectives?
2) do students study harder in preparation for the assessment?
3) does assessment artificially constrain focus of students' study?
4) does assessment en(dis)courage exploration and creativity?
Best Evidence -- Teacher Judgment
5. Face
Validity
Does the test look like it measures
what it is supposed to measure?
Example
If you entered my class for a mid-term
exam, and you were asked to take out a blank sheet of paper and draw the
persons face to the left of you; you would question the validity of the
test simply because it does not appear to reflect the content you have
been taught. This is most important
to the person taking the test. Most of the time, the test should have face
validity. There are instances where the test should not have face validity.
For example, when the test is used to determine whether or not you have
a split personality, are a kleptomaniac, or other socially unacceptable
behavior.
NOTE:
When a test is evaluated for validity, you should remember that not all
tests will use all five types of validity.
Questions:
-
Which type of validity is most relevant
in the measurement of academic achievement?
-
Which type of validity is most relevant
in the measurement of future success in a related area?
-
Which type of validity is most relevant
in the measurement of a psychological trait?
Factors
In Instrument Which Influence Validity
1. Unclear directions
2. Reading vocabulary / sentence structure
too difficult
3. Ambiguity
4. Inadequate time limits
5. Inappropriate level of difficulty
of items
6. Poorly constructed test items
7. Inappropriate items for outcomes
being measured
8. Test too short
9. Improper arrangement of items
10. Identifiable pattern of answers
Something to think
about . . .
Are you testing what you intended to test, or are you testing something
else?
|