Module
Five
|
|
Reliability |
|
|
|
|
|
|
Sources of Measurement Error Characteristics of Reliability Methods of Testing Reliability Factors
Influencing Reliability Measures
Reliability refers to the consistency of measurement The measurement results must be shown
to be consistent over:
This leads to confidence.
Different Raters -- when less than perfect agreement occurs Different Samples of a Particular Content Domain -- when one contains a greater number of tasks that are more familiar to the examinees than the other
2. An estimate of reliability refers to a particular type of consistency
2. Test-Retest -- with time lapse; estimate of stability 3. Parallel forms -- estimate of equivalence 4. Test-Retest with Parallel Forms -- estimate of stability and equivalence 5. Split-Half -- estimate of internal consistency; used when the content on the test is hetergeneous 6. KR-20 or Coefficient Alpha -- KR-20 is used for items scored right or wrong; Alpha is used when items might receive partial credit; estimate of internal consistency; used when the content on the test is homogeneous 7. Inter-Rater Reliability -- estimate of consistency across raters
How high should reliability be? It depends on how much error we are willing to tolerate. The assessments we use to classify people into special education classes or other similar sorting procedures where the future of the student is seriously effected; these we call high-stakes assessments. For assessments classified as high-stakes, we tolerate very little error; so reliability estimates for these tests should be .9 or higher. The assessments produced by test publishers
that measure personality, social adjustment, vocational interests, aptitude,
and achievement are usually classified as medium-stakes tests, and we will
tolerate more error. Reliability estimates for these assessments should
be .8 or higher.
Any reliability estimate below .5 is evidence that the tests results cannot
be trusted, and should not be used for any purpose.
Remember: Validity coefficients were acceptable as low as .3; these were measures of correlations. However, reliability coefficients below .5 can not be tolerated.
2. The spread of test scores; the narrower the spread, the lower the reliability will be 3. The objectivity of the scoring procedures; the more subjective the scoring procedures the lower the reliability will be 4. The time span between measures (tests); when a test is readministered to the same group, the longer the time span, the lower the reliability will be 5. The level of difficulty of tasks / items; when all items on one test are easy or all are hard or all are of moderate difficulty, the lower the reliability 6. The ability of the students being measured (tested); if they are all of the same ability (MH, gifted, or average), the lower the reliability 7. The number of tasks / items; the fewer the number, the lower the reliability
2. Create a larger spread of test scores 3. Make scoring procedures as objective as possible 4. Shorten the time span between readministrations of the same test (less than 10 days) 5. Create items that vary in difficulty 6. Create a class that is heterogeneous in terms of ability 7. Create a large number of quality items |
Readings
Chapter 4 Reliability and other desired characcteristics from Linn R.L. & Gronlund, N.E. (1995). Measurement and assessment in teaching. Englewood Cliffs, NJ: Merrill. |
|
|
|
|
|
|
|