CHAPTER 14 - ASSEMBLING, ADMINISTERING, AND APPRAISING CLASSROOM TESTS AND
ASSESSMENTS
NOTE: This is for both Objective Tests and Performance Assessments
GOAL: To obtain valid evidence of student learning
Validity is built-in by matching behavioral objectives to test items / performance assessment rubrics.
Validity can be further increased or decreased by the assembling, administration, and scoring procedures.
To evaluate an item or task during assembly,
- Set aside the items for a few days, then review them. As was stated before, the best sequence of events is first, identify your instructional goal and set of behavioral objectives. Second, create your assessment tool. Third, deliver instruction, and fourth, administer the assessment. If you have done these activities in this order, then you can review the test just prior to administering it and you will have, in fact, waited at least a few days between initially writing them and proofing them.
- Ask a peer (fellow teacher) to review and critique the test items. The peer should be someone knowledgeable regarding the content you are assessing. It may be easier to arrange this if you offer to proof their assessment in return.
Guidelines
Is the format appropriate for the behavioral objective being measured?
Does the knowledge, understanding, or thinking skill called forth by the item or task match the specific behavioral objectives and subject matter being measured?
Is the point of the item or task clear?
4. Is the item or task free from excessive verbiage?
- Does the item have an answer that would be agreed upon by experts? How well would experts agree about the degree of excellence of task performance?
6. Is the item or task free from technical errors and irrelevant clues?
7. Is the item or task free from racial, ethnic, social, and gender bias?
8. Is the formatting effective / not distracting?
Arranging the test items
1) Like items should be grouped together
a) by type (short answer, matching, binary choice, or multiple choice) of item
b) by behavioral objective or content area measured
2) Ascending order by difficulty or complexity
3) Do not present both performance tasks and objective test items in the same administration (this is strictly an issue of
time - is there sufficient time to do both? Nearly always the answer is no.)
Preparing Directions
1) Should be written; do not rely on students' auditory memory
2) Sections to be included:
a) Purpose of the test or assessment [optional]
b) Time allowed for completion [required]
c) Basis for responding (for example, only one right answer or more than one) [required]
d) Procedure for recording the responses (on bubble sheets or on test itself)[required]
e) What to do about guessing [optional]
Administering the Test / Assessment
Do not increase anxiety
Allow plenty of time (Rule of Thumb: For objective test items, one minute per item. For performance assessments, add 5-10 minutes to however much time it took them to perform during practice)
Provide a quiet, well-lit, comfortable area with adequate space
Walk around the room. This discourages cheating and makes you more accessible to those who have questions.
For Performance Assessments: Provide students with the rubric you will use to score their performance. AND Encourage them to use the same instrument (ex: piano), tool (ex: hammer), or equipment (ex: tennis racket) that they used to practice with. You put them at an unfair disadvantage if you select the instrument, tool, or equipment for them and inadvertently hand them one they have never used.
Scoring the test
Score the test according to pre-determined rules / rubrics
Check for reliability (consistency) of scoring
Correct for guessing when necessary
Analyzing Student Responses
Did the item function as it was intended to?
- Can help evaluate the effectiveness of each item and of the test as a whole
- Will enable the instructor to build an item bank (test file)
- Provides a basis for remedial work
- Provides a basis for improved classroom instruction
How are responses analyzed?
- Create a matrix with students' names in the first column and their response to each item in each of the following columns. If you have 50 questions, you will have one column for each of the 50 questions. FOR PERFORMANCE ASSESSMENT: The first column is the same, the following columns are for the different aspects or sections of your rubric. Each aspect or section would require enough columns for each of the raters. If you have used three raters, you will need three columns for each aspect or section of the rubric. You would put in these columns the ratings received for each aspect or section.
- Mark all responses that are incorrect. A highlighter pen works well for this. FOR PERFORMANCE ASSESSMENT: Mark all responses that do not agree.
- Look down each column. Count the number of correct responses and divide this by the number of students taking the exam. If this number equals or exceeds 80%, fine. If not, you need to review the item to determine if you feel you covered it adequately. If for some reason you suspect the error is with your instruction or the item itself, you may want to consider throwing this item out. You will need to make a change either in the item or in your instruction in the future (before administering the assessment again). FOR PERFORMANCE ASSESSMENTS: Count the number of agree's and divide by the total number of responses. The 80% rule still applies.
- Again, looking down each column, count the number of times each option is selected. If the incorrect responses are equally distributed across wrong options, then the item is fine. If the incorrect responses are all the same wrong option, then your distractors are not working well and the item will need to be revised before administering the assessment again. FOR PERFORMANCE ASSESSMENTS: Areas (aspects or sections) that receive low agreement, less than 80%, indicate that the descriptor for that area is unclear and needs revision.
- Calculate the reliability. If the estimated reliability is below .5, then work needs to be done to revise the instrument. If the estimated reliability is between .5 and .8, you still need to examine whether or not the reliability of the instrument can be improved. If the estimated reliability is above .8, there is cause to celebrate! FOR PERFORMANCE ASSESSMENTS: Your reliability is your rater agreement which is averaged across all aspects or sections. Do not tolerate less than .8.
- You can calculate the discriminating power of your items, but this is of greater concern with standardized, norm-referenced tests than with classroom tests. I do not require this to be done for the project. FOR PERFORMANCE ASSESSMENTS: This can not be done.
NOTE: There is computer software available that will do this for you for the objective tests that are recorded on bubble sheets, and it is often available in the larger high schools.
Remember!
Classroom assessments can be improved by using simple methods to analyze student responses and by building a file of effective items and tasks.