Module Seven -- Interpretation & Communication of Assessment Results
 Home
Site Map
 Modules
Module 7 Activities
  WebBoard
Resources
 
Module 7 Notes 

   



 

Interpreting Scores and Norms 

    Measures from achievement, aptitude, attitude, psychological scores do not have a true-zero point. This means that in both theory and practice when a student misses every item on achievement, aptitude, attitude, or psychological tests, we do not interpret this to mean the student has no ability, knowledge, or skill in that area. For example, if a person answers none of the items correctly on an intelligence test, we do not say that person has no intelligence. Another example, if a student misses every question on a math subtest of some achievement test, we do not say that student has no ability, knowledge, or skill in math. A third example, if a person gives incorrect answers on all items of a test on anxiety, we do not interpret that to mean that the person has no anxiety about anything ever. 

    The scores from achievement, aptitude, attitude, psychological tests can not be compared directly with each other unless the norming group is taken into consideration and the scale on which the score is based. We addressed this issue in earlier modules on aptitude and achievement tests.


Methods of Interpreting Test Scores 
 

    1. Criterion-Referenced Interpretation 
     
     
        Based on mastery of a specific set of skills 
     
      ASK 
       
        Are the achievement domains clearly defined? 
        Are there enough items for each skill tested? 
        What is the difficulty level of the items? 
        What type(s) of items are used? 
        What is the match of items to objectives?
     

    2. Norm-Referenced Interpretation 
     
     

        Based on comparison of individuals to clearly defined groups (called norming groups) 
     
      ASK 
       
        Are the test norms relevant? 
        Are the test norms representative? 
        Are the test norms up to date? 
        Are the test norms comparable? 
        Are the test norms adequately described?

Types of Test Scores and Defined Purpose 

    Raw scores -- the number of items correct or the number of points earned; not of much use by themselves 

    Grade Equivalent scores -- grade group in which student's raw score is average; used to estimate or monitor growth 

    Standard scores -- terms of standard distance of student's raw score from the mean (average) in terms of standard deviations; used to monitor growth; better at reflecting reality than grade equivalent scores 

    Normal Curve Equivalent  -- a normalized standard score; used to avoid problems with grade equivalent scores and used to describe group performance and to show growth over time 

    Percentile Ranks -- student's relative position in a group in terms of the percentage of students scoring lower than or equal to that student; used to determine relative areas of strengths and weaknesses; can create profile analyses from these scores.

    National Stanines -- normal distribution is divided into nine parts; used to identify relative areas of strengths and weaknesses 

    Scale Scores -- scores on an arbitrarily set common scale; used to measure students' progress across grades in a subject 
     

    You will need to create the following seven figures free hand and fax or mail them to me.
    They are not acceptable if they are xeroxed, scanned, or otherwise copied.
 
 
Figure 1. Cumulative Percentages Rounded
 
 
 
 
    Note that the mean is 50%, the range is 1% to 99%, and the standard deviation varies at different points along the scale. What is considered average for these scores lies between -1 s and +1 s. This symbol s is the symbol for standard deviation. The cumulative percentages rounded which fall within the average range are between 16% and 84%. 
     

Interpretation of Cumulative Percentages 

    If a student scores 52%, then the student has performed as well as or better than 52% of his/her peers. We can calculate the cumulative percentage if we know the standard score, the percent of people who earned less than that standard score and the number of people who earned exactly that standard score. 
     
 
 
Figure 2. Percentile Equivalents
 
 
 
    Note that the mean is 50%, the range is 1% to 99%, and the standard deviation varies at different points along the scale. What is considered average for these scores lies between -1 s and +1 s. This symbol s is the symbol for standard deviation. The percentile equivalent scores which fall within the average range are between 16% and 84%.
  
 
Figure 3. Z-scores
 
 
 
 
    Note that the mean is 0, the range is infinite, and the standard deviation is 1. What is considered average for these scores lies between -1 s and +1 s. This symbol s is the symbol for standard deviation. The Z-scores which fall within the average range are between -1.0 and +1.0.
 
 
Figure 4. T-scores
 
 
 
    Note that the mean is 50, the range is infinite, and the standard deviation is 10. What is considered average for these scores lies between -1 s and +1 s. This symbol s is the symbol for standard deviation. The Z-scores which fall within the average range are between 40 and 60.
 
 
 
Figure 5. Normal Curve Equivalent (NCE) scores
 
 
 
    Note that the mean is 50, the range is 1 to 99, and the standard deviation is 21.06. What is considered average for these scores lies between -1 s and +1 s. This symbol s is the symbol for standard deviation. The NCE scores which fall within the average range are between 28.94 (rounded to 29) and 71.06 (rounded to 71).
 
 
 
 
Figure 6. Stanines
 
 
 
 
    Note that there is no true mean, the range is 1 to 9, and these scores do not parallel the standard deviations. What is considered average for these scores lies between -1 s and +1 s. This symbol s is the symbol for standard deviation. The stanines which fall within the average range are between 4 and 6.
 
 
 
Figure 7. Deviation IQ scores
 
 
 
 
    Note that the mean is 100, the range is approximately 25 to 175, and the standard deviation is 15. What is considered average for these scores lies between -1 s and +1 s. This symbol s is the symbol for standard deviation. The deviation IQ scores which fall within the average range are between 85 and 115. 

    Standard scores are called standard because they have a constant mean and a constant standard deviation. According to this definition, which of the seven scores above are standard scores?


Cautions in Interpreting Any Test Score 

    1. A test score should be interpreted in terms of the specific test from which it was derived. 

    2. A test score should be interpreted in light of all of the student's relevant characteristics. 

    3. A test score should be interpreted according to the type of decision to be made. 

    4. A test score should be interpreted as a band of scores rather than as a specific score. 

    5. A test score should be verified by supplementary evidence. 

    6. Do NOT interpret a grade equivalent score as an estimate of the grade where a student should be placed. 

    7. Do NOT assume that the units are equal at different parts of the scale. 

    8. Do NOT assume that scores on different tests are comparable. 

    9. Do NOT interpret extreme scores as dependable estimates of a student's performance.

    Test publishers provide a variety of ways of presenting results. The figures in your text are just a few of the presentations possible. 

    Look at the figures 17.3 and 17.4 in your text (pages 461 and 463). Notice that the score is presented as a band rather than a point. This is much more accurate and communicates clearly the approximate amount of error the publishers have found in their test. This is an estimate of the standard error of measurement. The "true" score, the student’s true ability in this area, will fall somewhere within the band. It may be near the top of the band, it may be in the middle of the band, or it may be near the end of the band. We do not know exactly where the true score lies, but we assume that the true score for that person lies somewhere within the band.  

    Remember, people create tests, people are not perfect, tests are not perfect. All tests contain some error. The length of the band reflects this error. The longer the band the more error. Remember from the module on reliability that error reduces reliability. Therefore, the longer the band, the more error, and the less reliable the test is. 

     The band is also helpful in determining true differences between scores. Remember, that we do not know exactly where the "true" score lies for this student, but we know it lies somewhere within the band. Using the student profile on page 461 of your text, determine the range of scores for this student in the areas of numerical reasoning and perceptual speed & accuracy. Do you see that the scores for both could be exactly the same depending on where the student’s "true" score is? Scores can not be the same for numerical reasoning and abstract reasoning. So there are real differences between the student’s scores on numerical reasoning and on abstract reasoning. In other words, when the bands overlap common area along the continuum, the scores can not be interpreted as being really different from each other. It is only when the bands do not share common area, that the scores can be interpreted as being really different from each other.

    Using the figures you have created and figure 17.3 in your text (page 461), answer the following:  

    Is this student's score for verbal reasoning: 

    1. above average 
    2. average 
    3. below average 

    Is this student's score for abstract reasoning: 

    1. above average 
    2. average 
    3. below average 
     

  E-mail me if you have questions about this. It will be on the final exam. 



 

Interpreting Scale Scores 

    Scale scores vary from test to test and from grade to grade within the same test. The range, standard deviations, and means vary by test, subtest, and grade. They are very often reported and can be converted to cumulative frequency at midpoint which in turn can be converted to percentile ranks which are much easier to interpret. 

    Given a scale score and the number of students earning below that and the number of students earning exactly that scale score the cumulative frequency at midpoint can be calculated. The definition of cumulative frequency at midpoint is all the students who earned scale scores below a given score plus one half of the students who earned that scale score. 

    An example, if we know that 36 students earned scale scores lower than 400 and 6 students earned scale scores of exactly 400, then we take one half of 6 and add that to 36, and we know that the cumulative frequency at midpoint for a scale score of 400 is 40. If we then divide that by the number of students who took the test, we have the percentile rank. Given that 50 students took the test, we divide 40 by 50 and obtain a percentile rank of 80. Now we know that this student has performed as well as or better than 80% of his/her peers. We would also say that this student is average. 

Readings 
 
     
    Chapter 17 Interpreting Test scores and Norms 

    from Linn R.L. & Gronlund, N.E. (1995). Measurement and assessment in teaching. Englewood Cliffs, NJ: Merrill. 

 
Home
Site Map
Modules
Module 7 Activities
WebBoard
Resources
 
Updated last December 1998 by CF&MD staff.  
Please direct comments to the Website developer.  
Copyright 1999 FGCU, All rights reserved.
 
  
Florida Gulf Coast University  
College of Professional Studies  
School of Education