Educational assessment - Misplaced Pages

This is an old revision of this page, as edited by Omicronpersei8 (talk | contribs) at 20:47, 21 July 2006 (Reverted edits by 68.99.19.167 (talk) to version 65023929 by Chris53516 using VS). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 20:47, 21 July 2006 by Omicronpersei8 (talk | contribs) (Reverted edits by 68.99.19.167 (talk) to version 65023929 by Chris53516 using VS)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Assessment is the process of documenting, usually in measurable terms, knowledge, skills, attitudes and beliefs. This article covers educational assessment including the work of institutional researchers, but the term applies to other fields as well including health and finance.

History of assessment

The earliest recorded example of academic assessment arose in China in 206BC when the Han dynasty sought to introduce testing to assist with the selection of civil servants. The objectivity of the assessment was questionable (it being oral and still subject to the whims of the assessors) but it was the first example of introducing merit to the selection process in place of favouritism. In 622AD the Tang dynasty administered formal written exams to candidates for the civil service; these exams lasted for several days and had a pass rate of 2% - and successful candidates were then subjected to an oral assessment by the Emperor. In Europe, tests were used during the Middle Ages to aid the selection of priests and knights, and school children were tested for their knowledge of the catechism. Oral exams were used to assess knowledge and skills demonstrations were used to meassure practical abilities. The University of Paris first introduced formal examinations during the 12 Century. These exams were theological oral disputations. Questions were known in advance, requiring students to memorise and regurgitate answers. In the 1740s, Cambridge University began using (oral) examinations to compare students, similar to the earlier Chinese tests. During the 18th Century, Cambridge and Oxford began testing students' mathematical abilities using written tests and thereafter the use of paper for assessment spread to all subjects. The Unitied States introduced formal written examinations in the 1830s in an attempt to reduce the subjectivity of assessment. Horace Mann introduced written tests in the Boston Public Schools to compare school performance. However, the United States main contribution to the history of testing came during the First World War when the US Army introduced large scale IQ testing to assign massive numbers of recruits to positions within the Army. The Army Alpha, as it was known, consisted of multiple choice questions and was administered to over two million recruits.

Types of assessment

Assessments can be classified in many different ways. The most deliciously important distinctions are: (1) formative and summative; (2) objective and subjective; (3) criterion-referenced and norm-referenced; and (4) informal and formal.

Formative and summative assessments

There are two main types of assessment:

Summative Assessment - Summative assessment is generally carried out at the end of a course or project. In an educational setting, summative assessments are typically used to assign students a course grade.
Formative Assessment - Formative assessment is generally carried out throughout a course or project. Formative assessment, also referred to as educative assessment, is used to aid learning. In an educational setting, formative assessment might be a teacher (or peer) or the learner, providing feedback on a student's work, and would not necessarily be used for grading purposes.

Summative and formative assessment are referred to in a learning context as "assessment of learning" and "assessment for learning" respectively.

A common form of formative assessment is diagnostic assessment. Diagnostic assessment measures a student's current knowledge and skills for the purpose of identifying a suitable program of learning. Self-assessment is a form of diagnostic assessment which involves students assessing themselves. Forward-looking assessment asks those being assessed to consider themselves in hypothetical future situations.

Objective and subjective assessment

Assessment (either summative or formative) can be objective or subjective. Objective assessment is a form of questioning which has a single correct answer. Subjective assessment is a form of questioning which may have more than one current answer (or more than one way of expressing the correct answer). There are various types of objective and subjective questions. Objective question types include true/false, multiple choice, multiple-response and matching questions. Subjective questions include extended-response questions and essays. Objective assessment is becoming more popular due to the increased use of online assessment (e-assessment) since this form of questioning is well-suited to computerisation.

Criterion-referenced and norm-referenced assessments

Criterion-referenced assessment, typically using a criterion-referenced test, as the name implies, occurs when candidates are measured against defined (and objective) criteria. Criterion-referenced assessment is often, but not always, used to establish a person’s competence (whether s/he can do something). The best known example of criterion-referenced assessment is the driving test, when learner drivers are measured against a range of explicit criteria (such as “Not endangering other road users”). Norm-referenced assessment (colloquially known as "grading on the curve"), typically using a norm-referenced test, is not measured against defined criteria. This type of assessment is relative to the student body undertaking the assessment. It is effectively a way of comparing students. The IQ test is the best known example of norm-referenced assessment. Many entrance tests (to prestigious schools or universities) are norm-referenced, permitting a fixed proportion of students to pass (“passing” in this context means being accepted into the school or university rather than an explicit level of ability). This means that standards may vary from year to year, depending on the quality of the cohort; criterion-referenced assessment does not vary from year to year (unless the criteria change).

Informal and formal assessment

Assessment can be either formal or informal. Formal assessment usually a written document, such as a test, quiz, or paper. Formal assessment is given a numerical score or grade based on student performance. Whereas, informal assessment does not contribute to a student's final grade. It usually occurs in a more casual manner, including observation, inventories, participation, peer and self evaluation, and discussion.

Standards of quality

The considerations of validity and reliability typically are viewed as essential elements for determining the quality of any assessment. However, professional and practitioner associations frequently have placed these concerns within broader contexts when developing standards and making overall judgments about the quality of any assessment as a whole within a given context.

Testing standards

In the field of psychometrics, the Standards for Educational and Psychological Testing place standards about validity and reliability, along with errors of measurement and related considerations under the general topic of test construction, evaluation and documentation. The second major topic covers standards related to fairness in testing, including fairness in testing and test use, the rights and responsibilities of test takers, testing individuals of diverse linguistic backgrounds, and testing individuals with disabilities. The third and final major topic covers standards related to testing applications, including the responsibilities of test users, psychological testing and assessment, educational testing and assessment, testing in employment and credentialing, plus testing in program evaluation and public policy.

Evaluation standards

In the field of evaluation, and in particular educational evaluation, the Joint Committee on Standards for Educational Evaluation has published three sets of standards for evaluations. The Personnel Evaluation Standards was published in 1988, The Program Evaluation Standards (2nd edition) was published in 1994, and The Student Evaluation Standards was published in 2003.

Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance.

Validity and reliability

A valid assessment is one which measures what it is intended to measure. For example, it would not be valid to assess driving skills through a written test alone. A more valid way of assessing driving skills would be through a combination of tests that help determine what a driver knows, such as through a written test of driving knowledge, and what a driver is able to do, such as through a performance assessment of actual driving. Teachers frequently complain that some examinations do not properly assess the syllabus upon which the examination is based; they are, effectively, questioning the validity of the exam.

Reliability relates to the consistency of an assessment. A reliable assessment is one which consistently achieves the same results with the same (or similar) cohort of students. Various factors affect reliability – including ambiguous questions, too many options within a question paper, vague marking instructions and poorly trained markers.

A good assessment has both a validity and reliability, plus the other quality attributes noted above for a specific context and purpose. In practice, an assessment is rarely totally valid or totally reliable. A ruler which is marked wrong will always give the same (wrong) measurements. It is very reliable, but not very valid. Asking random individuals to tell the time without looking at a clock or watch is sometimes used as an example of an assessment which is valid, but not reliable. The answers will vary between individuals, but the average answer is probably close to the actual time. In many fields, such as medical research, educational testing, and psychology, there will often be a trade-off between reliability and validity. A history test written for high validity will have many essay and fill-in-the-blank questions. It will be a good measure of mastery of the subject, but difficult to score completely accurately. A history test written for high reliability will be entirely multiple choice. It isn't as good at measuring knowledge of history, but can easily be scored with great precision.