Module Six -- Published, Standardized & Norm-Referenced Assessments |
|
|
|
|
|
|
Purposes / Advantages of Achievement Tests Purposes / Advantages of Classroom Tests Separate Achievement Tests
Intelligence tests Cautions in Interpreting Test Scores Individually Administered tests
Administering the test During the test administration Improving test taking skills Purposes for testing Should NOT be used as the ONLY criteria for
Types of Test Scores Cautions in Interpreting Any Test Score
2) Directions of administering and scoring are so precisely stated that the procedures are standardized for different uses of the test 3) Norms are based on national samples of students 4) Parallel and comparable forms are usually provided 5) Test manual and other accessory materials are included
2) Evaluating student progress during the school year or over a period of years 3) Determining students' relative strengths and weaknesses in general subject or skill areas. 4) Comparing students' general level of achievement with their peers.
2) Evaluating students' day-to-day progress and their achievement on work units of varying sizes 3) Evaluating knowledge of current developments in rapidly changing content areas such as science and social studies NOTE: You need to be clear about the advantages of each type of test. Classroom tests are better for some purposes while published, standardized, norm-referenced tests are better for other purposes.
Advantage: More closely matches district curriculum frameworks and learning outcomes Why don't districts choose to administer achievement tests from different publishers? Disadvantage:
Norming groups will vary; so you could not compare the reading scores to
the math scores, because the same norming group did not take both tests
(also, it is very expensive)
Some districts may send their objectives
(ex: Lee County might send their Curriculum Frameworks) to a test publisher
and ask that a test be especially designed for their students.
Advantage: More closely matches district curriculum frameworks and learning outcomes Why don't districts choose to have a publisher create a test especially designed for them? Disadvantage:
NO norming group will exist; comparisons can only be made within the district
(also, it is very expensive)
APTITUDE TESTS
2) can be used with students of varying backgrounds 3) can be used when students have had no training in that area Intelligence tests (usually have a . . .)
2) standard deviation of 10, 15, or 16 3) standard error of between 3 and 7
Binet was the first person known to
have been commissioned by his country (France) to create a test that would
identify children who would benefit from traditional schooling. He called
this test an intelligence test, and it worked very well for identifying
those children who would benefit from traditional schooling. Those who
scored high on his test were label intelligent and sent to regular school.
Those who scored low on his test were labeled mentally retarded and sent
somewhere else. This is still the main purpose of intelligence tests. They
were and are not currently designed to measure intelligence as we define
it in modern society. They were and still are designed to predict future
success in traditional school settings, and overall, do this job very well.
2) seek the causes of low scores 3) verify test results by comparison with other information 4) use the test results to improve learning / teaching 5) be cautious in identifying students as under (or over) achievers --- these terms are used much too frequently 6) use of single score (overall score for school success) versus Separate scores for verbal, non-verbal, etc.
Culture Fair Tests have not lived up to expectations, but have helped initiate conversations about the complexity of problems involved in assessing students who are not from the dominant culture. There has been a realization that there are many more cultures than there are languages or countries of origin and variability exists within cultures. One test made "fair" for one culture may not be "fair" for another culture. One test can not be made "fair" for ALL cultures nor can enough tests be created for every culture. We need to acknowledge that these students need multiple assessments conducted in a variety of ways in order to assess their aptitude (ability to learn / succeed). We also need to be clear that we are more often than not trying to predict how well a student will perform in a traditional classroom situation and that this may or may not be accurate for predicting what or how much or how well they can learn. TEST
SELECTION, ADMINISTRATION, AND USE
1) Buros Institute for Mental Measurements
= Mental Measurements Yearbooks;
2) appraising the role of published tests in relation to other measurement procedures and to the constraints of the school situation 3) locating suitable tests 4) obtaining sample items of the tests 5) reviewing test materials in reference to their intended use
b) qualifications needed to administer and interpret the test c) evidence of validity for each recommended use d) evidence of reliability for recommended uses and an indication of equivalence for any equivalent forms provided e) directions for administering and scoring the test f) adequate norms or other bases for interpreting the scores 6) evaluate the information and making a selection; may want to use evaluation form Administering the test
2) select a suitable place to administer the test 3) make provisions to prevent distractions 4) practice giving the test 5) motivate the students During the test administration
2) encourage the students to do their best 3) keep a record of any event during the administration that might impact test scores 4) Collect test materials promptly Following #'s 1-4 is what makes a test standardized. It is important that all students taking the test are experiencing the same conditions. If conditions vary, it will effect the reliability of the test scores and the validity of the inferences made. Improving test taking skills
2) teach test-taking strategies Purposes for testing
b) identifying areas of instruction needing greater emphasis c) identifying discrepancies between learning ability and achievement d) diagnosing learning errors and planning remedial instruction e) clarifying and selecting instructional objectives 2) Individualizing instruction
Should NOT be used as the ONLY
criteria for
2) Assignment to remedial programs 3) Retention or Promotion 4) Evaluating teacher effectiveness = leads to teaching to the test These are stated in the manuals for published, standardized, norm-referenced achievement tests. INTERPRETING TEST SCORES AND NORMS Measures from achievement, aptitude, attitude, psychological scores do not have a true-zero point. This means that even when a person can not answer any of the items or questions correctly, we do not interpret that to mean the person has no knowledge of math or no intelligence or no anxiety. The range of possible scores on any of these tests start above zero. No person can earn a score of zero on any of these tests. Test scorescan not be compared unless the norming group is taken into consideration and the scale on which the score is based.
Are there enough items for each skill tested? (minimum 3 per objective) What is the difficulty level of the items? (should be average = .5) What type(s) of items are used? What is the match of items to objectives? ASK
Are the test norms representative? (look at demographics) Are the test norms up to date? (updated every 5 years or less) Are the test norms comparable? Are the test norms adequately described?
Standard Scores -- distance of student's raw score from the mean (average) in terms of standard deviations; used to monitor growth Normal Curve Equivalent (NCE) -- have a mean of 50 and a standard deviation of 21.06; used to describe group performance and to show growth over time Percentile Rank -- student's relative position in a group in terms of the percentage of students scoring lower; used to determine relative areas of strengths and weaknesses Stanines -- normal distribution is divided into nine parts; used to identify relative areas of strengths and weaknesses Cautions in Interpreting Any Test
Score
2. A test score should be interpreted in light of all of the student's relevant characteristics. 3. A test score should be interpreted according to the type of decision(s) to be made. 4. A test score should be interpreted as a band of scores rather than as a specific score. 5. A test score should be verified by supplementary evidence. 6. Do NOT interpret a grade equivalent score as an estimate of the grade where a student should be placed. 7. Do NOT assume that the units are equal at different parts of the scale. 8. Do NOT assume that scores on different tests are comparable. 9. Do NOT interpret extreme scores as dependable estimates of a student's performance. |
Readings
|
|
|
|
|
|
|
|