Assessment+Quality

=Quality=

To ensure that an assessment strategy will provide accurate information, the technical quality of the measures should be considered. Three aspects of assessment quality are of special concern:

Reliability:
How accurate is the information?

Validity:
Does the assessment measure what it is intended to measure? (Assessment Validity: link)

Fairness:
Is the assessment free of biases against any group of students?

Reliability
There are no perfect measuring tools, either in science, in the kitchen, or in education, so people who use tools to measure things need to know how much error there is likely to be in the information they receive. When we talk about the reliability of an assessment measure we mean to what degree the score on the measure (or on the test as a whole) is accurate. If, for example, a student took the same test again, would she or he get the same score? If students took a comparable test would a similar result be obtained? On commercial tests, the reliability is usually around .80, which is considered high. This means that 80% of the test score reflects "true" performance and 20% reflects measurement error. High reliability comes partly from the fact that commercial tests obtain lots of separate bits of information about what students know; for example, students might answer 30 multiple-choice questions per half-hour. Achieving high reliability with alternative assessment methods is more difficult. The usually longer and more complex responses supply fewer pieces of information about performance. Example: A teacher evaluating a portfolio is likely to review only a handful of student products, giving limited evidence. Moreover, judgment is required for scoring, inevitably bringing in a certain amount of subjective opinion. Subjective judgments may be reflected in inconsistencies between raters. Interrater reliability asks, Would two raters score the assessment the same way? Would the same rater, repeating the scoring session at a different time, score the assessment the same? In selecting an assessment measure, we need to consider whether it would give the same result if repeated, how well its scores correlate with those of other assessments measuring comparable knowledge, and consistency across raters.

Validity
The validity of an assessment tells us whether it is measuring what we think it's measuring. If we want to know how well a student can write, a multiple-choice test of spelling and grammar may not be a valid indicator or how successfully he can write an essay himself, though it may indicate how well he can identify errors of this type in text. There are several ways to establish or measure a test's validity. A panel of experts in the field can review the contents of the measure; performance on the test measure can be compared with actual performance on similar tasks in, for example, a work setting; or we can study the pattern of responses among several tasks measuring the same thing. One of the primary motivations for adopting alternative assessments is to increase validity by making the assessment tasks more like the real-world activities the tests are designed to simulate. Because alternative assessments pose more "authentic" tasks, it is hoped that the assessment scores will more accurately reflect students' ability and knowledge in a given area.

One problem with interpreting the results of some types of alternative assessments, such as senior projects or portfolios, is that they are inherently nonstandardized. The content of each individual's submissions will be different, and the resources available to the student may vary. It is difficult to assign scores fairly to such different products, including taking into consideration factors like access to resources (such as computers or access to experts).

Fairness
If students who otherwise have equal ability score differently on an assessment because of background knowledge or experience that is irrelevant to the assessed skill/knowledge, then the measure is unfair or "biased." Example: A task that assumes the student is familiar with different snow conditions may be biased against students who live in a climate where it never snows.

The fairness of an assessment is usually established by expert committees trained to analyze factors that might disadvantage or benefit particular groups of students. Many advocates believe that alternative assessments are more equitable to all groups because they assign more complete tasks and permit students to address the tasks in ways that are meaningful to them. However, all vocational educators selecting and constructing assessments need to be sensitive to the diverse backgrounds of their students.