Assessment Basics

updated for fall, 2005

Formative versus Summative Assessment

The purpose of formative assessment is to adjust teaching and learning. For example, a grade one teacher may test the decoding skills of a grade one student to determine whether specific phonics instruction is required.

Summative assessments are all forms of assessment that are not formative. For example, a medical student may take medical board exams to determine if she is competent to practice medicine. Another example: a teacher may give a final exam to determine the grade for a course.

From a self-regulated learning perspective, virtually any assessment has the potential to be formative if the student is informed of the results. More detailed feedback increases the formative function of an assessment.

Criterion-Referenced versus Norm-Referenced Assessment

In criterion-referenced assessment, outcomes are presented with respect to the objectives (or curriculum).

In norm-referenced assessment, outcomes are presented in relation to the performance of others.

However, normative information does (and should) drive the design of learning objectives, activities and assessments.

There are three major reference points in assessment: the objective or task (criteria), the abilities of others (norms), the abilities of self.

Criterion-referenced assessment leads one to compare one's current performance to past performance.

Objective versus Subjective Test Items

Objective items (e.g., multiple choice) are more difficult to create but easier to score than subjective items (e.g., essay)

Objective items tend to have higher reliability.

Essay items tend to be more valid assessments of writing ability.

Essay items tend to be more valid assessments of ability to produce complex products.

Reliability and Validity

Reliability is the consistency or repeatability of an assessment.

Validity is how well an assessment measures what it is supposed to measure.

High reliability and high validity:

Low reliability and low validity:

High reliability but low validity:

A thermometer that always reads the same temperature gives highly reliable but invalid measurement.

Reliability is a necessary but not sufficient condition for validity.

Different kinds of reliability:
* test-retest reliability
* inter-rater reliability
* internal consistency

Authentic Tests

Authentic tests are usually more realistic, more "situated." They have the learner perform the task that is the true target of instruction.

For example, an authentic test of baking a cake would have the student actually bake a cake rather than write down the steps in baking a cake.

Authentic tests are more valid because they closely match the test to realistic conditions.

Portfolios

Portfolios have been found to promote learning, but they are difficult to reliably score.

Effects of Assessment

Motivational Effects

* There is evidence that repeated failure is demotivating (recall learned helplessness).

* There is evidence that unremitting success does not prepare students to deal with failure (Clifford, 1990).

* A mixture of success and failure appears to be highly motivating (Clifford, 1990).

* Learning environments with a high level of inter-individual competition and comparison can be demotivating for lower ability students.

* Intra-individual comparison can be highly motivating.

Learning effects

Dempster (1991) found that testing is an effective way to promote learning.

Feedback that corrects errors and explains how to correctly perform is highly effective for learning.

In giving feedback, answer these questions:
What is the key error?
What is the probable reason the student made the error?
How can I guide the student to avoid the error in the furture?

Designing Assessments

Weighting Schemes

Every assessment (exam, assignment, etc.) is assigned a percent weight.

Factors determining weight:
* proportion of objectives assessed
* importance of assessed objectives
* overlap with other assessments
* validity (and reliablity) of assessment
* discrimination index of assessment
* difficulty of assessment

Scoring and Grading

100-point system

How will you combine the assessment scores to obtain a total score?

The 100-point system may be the easiest for everyone to understand. In the 100-point system, each point is worth one percent. Each assessment is "out of" the percent weight that is allocated to it. You can simply add a student's scores on each assessment to obtain the student's total score out of 100.

For example, the think paper for this course is assigned a weight of 25%. Each component of the think paper is allocated some points out of 25, e.g., the writing mechanics component is allocated 10 points out of 25.

The main disadvantage of the 100-point system is that to maintain simplicity the scoring for an assessment must be consistent with the allocated marks, e.g. writing mechanics must be scored out of 10. This is less flexible, and may require scoring with half marks.

Conversion to Lettergrades

Use a standard system that is familiar to all stakeholders: students, administrators, other institutions.

Provide enough information so that students can calculate their lettergrade.

Using Learning Objectives to Plan Assessments

List the learning objectives and select some proportion for assessment. Be aware of which objectives are being assessed and which are not.

An alternative is to use a behavior-content matrix to plan assessments. Using such a matrix can ensure that you have a balanced distribution of test items (or other assessments) over topics and skills.

Combining Different Assessment Types

All assessment types (e.g., objective items, subjective items, problem sets, projects) have strengths and weaknesses.

To obtain valid summative evaluations, combine at least two assessment types. For example, assign a project and a test, or an essay and a performance.

Designing Assignments

Select a task that is highly realistic (authentic), yet practical.

When planning, analyse quality into components. Assign a weight (number of points) to each component.

Provide students with rubrics or specific descriptions for each component. Use these when scoring the students' products.

Be as clear as possible about how the assignment will be scored.

References

Clifford, M. M. (1990). Students need challenge not easy success. Educational Leadership, 48(1), 22-26.

Dempster, F. N. (1991). Synthesis of research on reviews and tests. Educational Leadership, 48 (7), 71-76.