This assessment was constructed to measure the knowledge of students in the area of measurement. This assessment was designed to be summative in nature and given as a final assessment of learning for the first half of a semester course. The students were undergraduate college level all majoring in some area of education.
Behavioral objectives were created using syllabi from previous instructors of the course, from the textbook, and from state guidelines for pre-service teachers. The behavioral objectives for the first half of the semester course were used to generate items for the mid-term exam. At least one test item was created to measure each behavioral objective. All items were objective in nature with a combination of multiple choice, short-answer, two-choice, and matching. Objective items were used to maximize reliability and objectivity. The type of objective items used were
chosen based on the best match with
the behavioral objective and based on student input regarding clarity of
the format. Due to this match of items to objectives, this instructor believes
the validity to be very good [2].
|
|
|
Given the prompt, list the six levels of Bloom's Taxonomy, list the levels in order. |
|
Given a behavioral objective, identify the level of Bloom's taxonomy at which the objective is written. |
|
Given four behavioral objectives, select the one that is written best. |
|
Given the words: assessment, test, measurement, validity, and reliability, match the appropriate definition to each word. |
|
Given a characteristic, identify whether the characteristic is one of criterion or norm referenced tests. |
|
Given the processes: curriculum, instruction, and assessment, identify the best description of the relationship among these processes. |
There were two administrations of the test: one at 12:30 pm and one at 5:00 pm, both on the same day. Most students completed the exam within an hour with a few students taking up to 90 minutes to complete the exam. All students responded to all of the test items. There were a total of 63 students who took the exam. Forty-eight (76%) of the students were female, and fifteen (24%) were male. All students were education majors taking the course either as a requirement for their Bachelor's Degree or for certification.
Item analysis revealed that the short
answer items, the two-choice items, and the matching items were well-constructed
(see Tables 2, 3, 4, and 6). Students appeared to have little difficulty
understanding what was expected of them and had little difficulty in supplying
the correct answer. The first multiple choice item behaved well (see Table
5). However, on the second multiple choice item, the students were able
to narrow their choice to one of two of the options provided. This increases
the risk of guessing and is a threat to validity and reliability. This
question will need to be re-written so that students who do not know the
answer will see all options as equally plausible.
|
|
|
|
Knowledge -- 62 |
Comprehension -- 1 |
Comprehension -- 61 |
Knowledge -- 1
|
|
Analysis -- 63 |
||
Synthesis -- 63 |
||
Evaluation -- 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The reliability was calculated using Cronbach's alpha, a measure of internal consistency for homogeneous tests where items are permitted partial credit. Due to the complexity of computing this by hand for a large number of items and respondents, a computer software package (SAS 6.12) was used to do the calculations. The reliability was estimated to be 0.85. This is indicates a high level of internal consistency and is acceptable for a classroom test [9].
The validity of this instrument was adequate given the match of items to objectives. Validity was further strengthened by the provision of clear directions, well-constructed items, unidentifiable patterns for answers, and adequate time limits.
Reliability was strengthened by the inclusion of an adequate number of items which were all objective. The reliability of this instrument was estimated at 0.85 indicating a high level of internal consistency and providing evidence that if this test was administered a second time to a similar group of students, the results would be similar.
The results of the item analysis revealed that most of the items behaved as expected. The multiple choice items could be improved by writing new incorrect options that are more plausible to those who do not know the correct answer.
Notes
[1] page one: "INTRODUCTION"; double-space your work (except for tables); titles for sections should be in all caps and centered
[2] You must include a statement regarding the assessment of the validity of the instrument.
[3] note to students: I am only including the first eight items as an example from a previously administered midterm. You need to include all items administered. I am only including the two columns for objectives and test items. You can simply include the last version of your test specification table as Appendix A and reference it in the last sentence of the section on Validity.
[4] This is an example of one way to present your analysis of short answer or fill-in-the blank items.
[5] This is a second example of how to present short answer items.
[6] This is an example of one way to present matching items; you may use "*" to indicate which choices are correct. Notice that you can abbreviate your prompt.
[7] This is an example of how to present multiple choice items.
[8] This is an example of how to present true-false, alternate-choice, or binary-choice items.
[9] You must include a statement regarding the assessment of the reliability of the instrument, whether or not it is acceptable, given the purpose of the instrument.