Solved Assignment of 8602

 

ASSIGNMENT NO. 2

 EDUCATIONAL ASSESSMENT AND EVOLUTION : (8602)

 WRITTEN BY:                                       MADIHA AFZAL 

PROGRAMME:                                           B.ED (1.5)

SEMESTER:                                              FISRT 

ALLAMA IQBAL OPEN UNIVERSITY  ISLAMABAD

                                                            

Question No :1

     How the validity of a test  can be measured?

 

Answer :

Test validity:

Validity is the most important issue in selecting a test. Validity refers to what characteristic the test measures and how well the test measures that characteristic.

  • Validity tells you if the characteristic being measured by a test is related to job qualifications and requirements.
  • Validity gives meaning to the test scores. Validity evidence indicates that there is linkage between test performance and job performance. It can tell you what you may conclude or predict about someone from his or her score on the test. If a test has been demonstrated to be a valid predictor of performance on a specific job, you can conclude that persons scoring high on the test are more likely to perform well on the job than persons who score low on the test, all else being equal.
  • Validity also describes the degree to which you can make specific conclusions or predictions about people based on their test scores. In other words, it indicates the usefulness of the test.

Principle of Assessment:

                                                        It is important to understand the differences between reliability and validity. Validity will tell you how good a test is for a particular situation; reliability will tell you how trustworthy a score on that test will be. You cannot draw valid conclusions from a test score unless you are sure that the test is reliable. Even when a test is reliable, it may not be valid. You should be careful that any test you select is both reliable and valid for your situation.

A test's validity is established in reference to a specific purpose; the test may not be valid for different purposes.

For example:

                         The test you use to make valid predictions about someone's technical proficiency on the job may not be valid for predicting his or her leadership skills or absenteeism rate. This leads to the next principle of assessment. Similarly, a test's validity is established in reference to specific groups. These groups are called the reference groups. The test may not be valid for different groups. For example, a test designed to predict the performance of managers in situations requiring problem solving may not allow you to make valid or meaningful predictions about the performance of clerical employees. If, for example, the kind of problem-solving ability required for the two positions is different, or the reading level of the test is not suitable for clerical applicants, the test results may be valid for managers, but not for clerical employees.

Test developers have the responsibility of describing the reference groups used to develop the test. The manual should describe the groups for whom the test is valid, and the interpretation of scores for individuals belonging to each of these groups. You must determine if the test can be used appropriately with the particular type of people you want to test. This group of people is called your target population or target group.

Your target group and the reference group do not have to match on all factors; they must be sufficiently similar so that the test will yield meaningful scores for your group. For example, a writing ability test developed for use with college seniors may be appropriate for measuring the writing ability of white-collar professionals or managers, even though these groups do not have identical characteristics. In determining the appropriateness of a test for your target groups, consider factors such as occupation, reading level, cultural differences, and language barriers.
In order to be certain an employment test is useful and valid, evidence must be collected relating the test to a job. The process of establishing the job relatedness of a test is called validation.

Methods for conducting validation studies:

The Uniform Guidelines discuss the following three methods of conducting validation studies. The Guidelines describe conditions under which each type of validation strategy is appropriate. They do not express a preference for any one strategy to demonstrate the job-relatedness of a test.

  • Criterion-related validation requires demonstration of a correlation or other statistical relationship between test performance and job performance. In other words, individuals who score high on the test tend to perform better on the job than those who score low on the test. If the criterion is obtained at the same time the test is given, it is called concurrent validity; if the criterion is obtained at a later time, it is called predictive validity.
  • Content-related validation requires a demonstration that the content of the test represents important job-related behaviors. In other words, test items should be relevant to and measure directly important requirements and qualifications for the job.
  • Construct-related validation requires a demonstration that the test measures the construct or characteristic it claims to measure, and that this characteristic is important to successful performance on the job.

The three methods of validity-criterion-related, content, and construct-should be used to provide validation support depending on the situation. These three general methods often overlap, and, depending on the situation, one or more may be appropriate. French (1990) offers situational examples of when each method of validity may be applied.

First, as an example of criterion-related validity, take the position of millwright. Employees' scores (predictors) on a test designed to measure mechanical skill could be correlated with their performance in servicing machines (criterion) in the mill. If the correlation is high, it can be said that the test has a high degree of validation support, and its use as a selection tool would be appropriate.

Second, the content validation method may be used when you want to determine if there is a relationship between behaviors measured by a test and behaviors involved in the job. For example, a typing test would be high validation support for a secretarial position, assuming much typing is required each day. If, however, the job required only minimal typing, then the same test would have little content validity. Content validity does not apply to tests measuring learning ability or general problem-solving skills (French, 1990).

Finally, the third method is construct validity. This method often pertains to tests that may measure abstract traits of an applicant. For example, construct validity may be used when a bank desires to test its applicants for "numerical aptitude." In this case, an aptitude is not an observable behavior, but a concept created to explain possible future behaviors. To demonstrate that the test possesses construct validation support, ". . . the bank would need to show (1) that the test did indeed measure the desired trait and (2) that this trait corresponded to success on the job" (French, 1990, p. 260).

Professionally developed tests should come with reports on validity evidence, including detailed explanations of how validation studies were conducted. If you develop your own tests or procedures, you will need to conduct your own validation studies. As the test user, you have the ultimate responsibility for making sure that validity evidence exists for the conclusions you reach using the tests. This applies to all tests and procedures you use, whether they have been bought off-the-shelf, developed externally, or developed in-house.

Validity evidence:

                              It is especially critical for tests that have adverse impact. When a test has adverse impact, the Uniform Guidelines require that validity evidence for that specific employment decision be provided.

The particular job for which a test is selected should be very similar to the job for which the test was originally developed. Determining the degree of similarity will require a job analysis. Job analysis is a systematic process used to identify the tasks, duties, responsibilities and working conditions associated with a job and the knowledge, skills, abilities, and other characteristics required to perform that job.

Job analysis information may be gathered by direct observation of people currently in the job, interviews with experienced supervisors and job incumbents, questionnaires, personnel and equipment records, and work manuals. In order to meet the requirements of the Uniform Guidelines, it is advisable that the job analysis be conducted by a qualified professional, for example, an industrial and organizational psychologist or other professional well trained in job analysis techniques. Job analysis information is central in deciding what to test for and which tests to use.

Using validity evidence from outside studies:

Conducting your own validation study is expensive, and, in many cases, you may not have enough employees in a relevant job category to make it feasible to conduct a study. Therefore, you may find it advantageous to use professionally developed assessment tools and procedures for which documentation on validity already exists. However, care must be taken to make sure that validity evidence obtained for an "outside" test study can be suitably "transported" to your particular situation.

The Uniform Guidelines, the Standards, and the SIOP Principles state that evidence of transportability is required. Consider the following when using outside tests:

  • Validity evidence. The validation procedures used in the studies must be consistent with accepted standards.
  • Job similarity. A job analysis should be performed to verify that your job and the original job are substantially similar in terms of ability requirements and work behavior.
  • Fairness evidence. Reports of test fairness from outside studies must be considered for each protected group that is part of your labor market. Where this information is not available for an otherwise qualified test, an internal study of test fairness should be conducted, if feasible.
  • Other significant variables. These include the type of performance measures and standards used, the essential work activities performed, the similarity of your target group to the reference samples, as well as all other situational factors that might affect the applicability of the outside test for your use.

 

Question No: 2

 what are the rules of writing Multiple choice test items?

Answer:

RULES FOR WRITING MULTIPLE-CHOICE QUESTIONS:

 There are several rules we can follow to improve the quality of this type of written examination.  

1.      Examine only the Important Facts!

                                  Make sure that every question examines only the important knowledge. Avoid detailed questions - each question has to be relevant for the previously set instructional goals of the course. 

2.      Use Simple Language!

                             Use simple language, taking care of spelling and grammar. Spelling and grammar mistakes (unless you are testing spelling or grammar) only confuse students. Remember that you are examining knowledge about your subject and not language skills. 

3.      Make the Questions Brief and Clear!

                                  Clear the text of the body of the question from all superfluous words and irrelevant content. It helps students to understand exactly what is expected of them. It is desirable to formulate a question in such way that the main part of the text is in the body of the question, without being repeated in the answers.  

4.      Form the Questions Correctly!

                               Be careful that the formulation of the question does not (indirectly) hide the key to the correct answer. Student (adept at solving tests) will be able to recognize it easily and will find the right answer because of the word combination, grammar etc, and not because of their real knowledge.

5.      Take into Consideration the Independence of Questions!

                                       Be careful not to repeat content and terms related to the same theme, since the answer to one question can become the key to solve another. 

6.      Offer Uniform Answers!

                                All offered answers should be unified, clear and realistic. For example, unlikely realisation of an answer or uneven text quantity of different answers can point to the right answer. Such a question does not test real knowledge. The position of the key should be random. If the answers are numbers, they should be listed in an ascending order.  

7.      Avoid Asking Negative Questions!

                                    If you use negative questions, negation must be emphasized by using CAPITAL letters, e.g. "Which of the following IS NOT correct..." or "All of the following statements are true, EXCEPT...". 

8.      Avoid Distracters in the Form of "All the answers are correct" or "None of the Answers is Correct"!

                         Teachers use these statements most frequently when they run out of ideas for distracters. Students, knowing what is behind such questions, are rarely misled by it. Therefore, if you do use such statements, sometimes use them as the key answer. Furthermore, if a student recognizes that there are two correct answers (out of 5 options), they will be able to conclude that the key answer is the statement "all the answers are correct", without knowing the accuracy of the other distracters. 

9.      Distracters must be Significantly Different from the Right Answer (key)!

                                    Distracters which only slightly differ from the key answer are bad distracters. Good or strong distracters are statements which themselves seem correct, but are not the correct answer to a particular question. 

10.  Offer an Appropriate Numbers of Distracters.

                            The greater the number of distracters, the lesser the possibility that a student could guess the right answer (key). In higher education tests questions with 5 answers are used most often (1 key + 4 distracters). That means that a student is 20% likely to guess the right answer.  

Advantages:

                             Multiple-choice test items are not a panacea. They have advantages and advantages just as any other type of test item. Teachers need to be aware of these characteristics in order to use multiple-choice items effectively. 

Advantages Versatility Multiple-choice test items are appropriate for use in many different subject-matter areas, and can be used to measure a great variety of educational objectives. They are adaptable to various levels of learning outcomes, from simple recall of knowledge to more complex levels, such as the student’s ability to:

   Analyze phenomena

  Apply principles to new situations

   Comprehend concepts and principles

   Discriminate between fact and opinion

  Interpret cause-and-effect relationships

  Interpret charts and graphs

  Judge the relevance of information

   Make inferences from given data

  Solve problems The difficulty of multiple-choice items can be controlled by changing the alternatives, since the more homogeneous the alternatives, the finer the distinction the students must make in order to identify the correct answer. Multiple-choice items are amenable to item analysis, which enables the teacher to improve the item by replacing distracters that are not functioning properly. In addition, the distracters chosen by the student may be used to diagnose misconceptions of the student or weaknesses in the teacher’s instruction.  

Validity In general, it takes much longer to respond to an essay test question than it does to respond to a multiple-choice test item, since the composing and recording of an essay answer is such a slow process. A student is therefore able to answer many multiplechoice items in time it would take to answer a single essay question. This feature enables the teacher using multiple-choice items to test a broader sample of course contents in a given amount of testing time. Consequently, the test scores will likely be more representative of the students’ overall achievement in the course.  

Reliability Well-written multiple-choice test items compare favourably with other test item types on the issue of reliability. They are less susceptible to guessing than are true-false test items, and therefore capable of producing more reliable scores. Their scoring is more clear-cut than short answer test item scoring because there are no misspelled or partial answers to deal with. Since multiple-choice items are objectively scored, they are not affected by scorer inconsistencies as are essay questions, and they are essentially immune to the influence of bluffing and writing ability factors, both of which can lower the reliability of essay test scores. 

Efficiency  Multiple-choice items are amenable to rapid scoring, which is often done by scoring machines. This expedites the reporting of test results to the student so that any follow-up clarification of instruction may be done before the course has proceeded much further. Essay questions, on the other hand, must be graded manually, one at a time.

Overall multiple choice tests are:

v  Very effective 

v  Versatile at all levels

v  Minimum of writing for student 

v   Guessing reduced 

v  Can cover broad range of content  

Disadvantages Versatility:

                             Since the student selects a response from a list of alternatives rather than supplying or constructing a response, multiple-choice test items are not adaptable to measuring certain learning outcomes, such as the student’s ability to:

1)      Articulate explanations

2)      Display thought processes

3)      Furnish information

4)       Organize personal thoughts.

5)       Perform a specific task

6)       Produce original ideas

7)       Provide examples such learning outcomes are better measured by short answer or essay questions, or by performance tests.   

Reliability although they are less susceptible to guessing than are true false-test items, multiple-choice items are still affected to a certain extent. This guessing factor reduces the reliability of multiple-choice item scores somewhat, but increasing the number of items on the test offsets this reduction in reliability.  

Difficulty of Construction Good multiple-choice test items are generally more difficult and time-consuming to write than other types of test items. Coming up with plausible distracters requires a certain amount of skill. This skill, however, may be increased through study, practice, and experience.  Gronlund (1995) writes that multiple-choice items are difficult to construct. Suitable distracters are often hard to come by and the teacher is tempted to fill the void with a “junk” response.  The effect of narrowing the range of options will available to the test wise student.  They are also exceedingly time consuming to fashion, one hour per question being by no means the exception.  Finally multiple-choice items generally take student longer to complete (especially items containing fine discrimination) than do other types of objective question.

 

Question No: 3

  write a detailed note on scale of measurement?

Answer:

                                 Variables are established and classed using measuring scales. Stanley Stevens, a psychologist, created the four most used measurement scales: nominal, ordinal, interval, and ratio. Each scale of measurement has characteristics that influence how data should be analyzed. Identity, magnitude, equal intervals, and a minimum value of zero are the properties that are assessed.

Properties of Scales of Measurement:

Each value has a distinct meaning, which is referred to as identity.

Magnitude denotes that the values have an ordered relationship to one another, implying that the variables are in a definite order.

Equal Intervals:

                    When data points on a scale are equal, the difference between them is the same. For example, the difference between data points one and two is the same as the difference between data points five and six.

When the scale has a minimum value of zero, it signifies the scale has a true zero point. Degrees, for example, can go below zero without losing their significance. You don't weigh anything if you don't weigh anything.

 

 Scales of Measurement:

 

Data scientists can decide the type of statistical test to run by determining the scale of their data measurement.

1.      Nominal Measurement Scale:

                                          The identity property of data is defined by the nominal scale of measurement. This scale contains some qualities, but there is no numerical value to it. The information can be categorized, but it cannot be multiplied, divided, added, or removed from each other. It's also impossible to quantify the disparity between data points. Eye color and birth country are two examples of nominal data. Nominal data can be further divided into three groups:

 

2.      Ordered Nomenclature:

                                   Some nominal data, such as "cold, warm, hot, and very hot," can be subcategorized in order. Nominal adverbial adverbial adverb Male and female are two examples of nominal data that can be sub-categorized as nominal without order. Dichotomous: The term dichotomous data refers to data that has only two levels or classifications.

3.      Ordinal Measurement Scale

                                   The ordinal scale is used to describe data that is arranged in a certain order. While each value is graded, no information is provided as to what distinguishes the categories from one another. These numbers cannot be increased or decreased. A survey's satisfaction data points, where 'one = happy, two = indifferent, and three = sad,' are an example of this type of data. Ordinal data also describes where someone finished in a race. While first, second, or third place indicate the sequence in which the runners finished, it does not indicate how much ahead the first-place finisher was of the second-place finisher.

4.      Scale of Measuring with Intervals:

                                    Although the interval scale has nominal and ordered data qualities, the difference between data points can be quantified. This type of information displays both the order of the variables as well as the exact differences between them. They can be multiplied or divided, but not added to or subtracted from each other. 40 degrees, for example, is not the same as 20 degrees multiplied by two. The fact that the number zero is an existing variable adds to the essence of this scale. In the ordinal scale, zero denotes the absence of data. Zero has a temperature in the interval scale; for example, if you measure degrees, zero has a temperature.

5.      Scale of Measurement Based on a Ratio:

                                     Properties from all four scales of measurement are included in ratio scales of measurement. The data is nominal and defined by an identity, and it can be sorted into categories, contain intervals, and be broken down into exact values. Ratio factors include things like weight, height, and distance. The ratio scale allows you to add, subtract, divide, and multiply data. Ratio scales are additionally distinguished from interval scales by the presence of a 'true zero.' The number zero denotes the absence of a value point in the data. For example, no one can be 0 centimeters tall or weigh zero kilograms, nor can they be negative millimeters or negative kilograms. Calculating shares or sales are two examples of how this scale can be used. Data scientists can accomplish the most with ratio data points of all the sorts of data on the scales of measurement.

6.      Interval scale:

                                  It’s a numerical scale in which the order is known and the difference between the values has meaning. The interval scale is the third level of measurement and encompasses both nominal and ordinal scales. This scale can also be referred to as an interval variable scale (interval variable is used to describe the meaningful nature of the difference between values).

Examples of this would be time, temperature (Celsius, Fahrenheit), credit score, and more. In each of these examples, the difference in value is known and easily calculated. Someone with a credit score of 720 has a higher score than someone with 650.  We know one is greater than the other and we know EXACTLY how much larger the value is.

Note: There’s a difference between time and duration. Time is an interval scale because there’s no meaningful zero. Can you say when time started? Duration is a ratio scale because there’s a meaningful zero and a starting point can be defined. 5 days is twice as long as 10 days.

This is the first scale where you can do true statistical analysis. Like the ordinal scale, the interval scale doesn’t have a starting point that’s already been decided or true zero.

For example:

            credit score is an interval scale but it starts at 300.

With that being said, every point on the scale is equidistant from the next. On a Celsius scale, each unit is the same size or has the same value. We can, without a doubt, quantify the difference between 5 Celsius and 6 Celsius. There is no true zero because temperature can go into the negatives. Zero is just another point of measurement.

7.      Ratio scale:

Ratio scales are the cream of the crop when it comes to statistical analysis because they have everything you need. A ratio scale has an order, a set value between units, and absolute zero. It’s an interval scale with a true zero.

Examples of ratio scales include concentration, length, weight, duration, and more. Because there’s a zero position, it opens up the doors for inferential and descriptive analysis techniques. Use ratio scales to understand the size of the market, market share, revenue, pricing, etc.

You can only find mode with nominal scales, you can find median with ordinal scales, interval scales lend themselves to mean, mode, and median. Ratio scales can use all of that plus other methods such as geometric mean and coefficient of variation. Arguably, ratio data is the most versatile.

Note: The proportion between two units of a ratio scale is meaningful. On an interval scale, they’re not. For example, 20 pounds is twice the weight of 10 pounds. A credit score of 600 is not twice as good as a credit score of 300 because it’s not a ratio.

Example of ratio scale question

What is your weight in pounds?

Less than 70

70 – 120

121 – 150

More than 150

What is your age?

Less than 20

20 – 30

31 – 40

41 – 50

More than 50

 

Question No: 4

  what are the consideration in conducting parent – teacher conference?

Answer:

Conducting Parent-Teacher Conferences:

                         The first conference is usually arranged in the beginning of the school year to allow parents and teachers to get acquaintance and preparing plan for the coming months. Teachers usually receive some training to plan and conduct such conferences. Following steps may be observed for holding effective parent-teacher conferences.

1.      Prepare for the conference

a)      Review the goals and objectives

b)      Organize the information to present

c)      If portfolios are to discuss, these are well-arranged

d)      Start and keep positive focus

e)       Announce the final date and time as per convenience of the parents and children • Consider socio-cultural barriers of students / parents

f)       Check with other staff who works your advisee • Develop a packet of conference including student’s goals, samples of work, and reports or notes from other staff. 

2.      . Rehearse the conference with students by role-playing

a)      Students present their goals, learning activities, samples of work

b)      Students ask for comments and suggestions from parents 

3.      Conduct conference with student, parent, and advisor.

 Advisee takes the lead to the greatest possible extent

a)      Have a comfortable setting of chairs, tables etc.

b)       Notify a viable timetable for the conferences

c)       Review goals set earlier

d)     Review progress towards goals

e)       Review progress with samples of work from learning activities

f)        Present students strong points first

g)       Review attendance and handling of responsibilities at school and home

h)      Modify goals for balance of the year as necessary

i)         Determine other learning activities to accomplish goals

j)         Describe upcoming events and activities

k)       Discuss how the home can contribute to learning

l)        Parents should be encouraged to share their thoughts on students’ progress

m)     Ask parents and students for questions, new ideas 

4.      Do’s of parent-teacher conferences

a)      Be friendly

b)       Be honest

c)       Be positive in approach

d)     Be willing to listen and explain

e)       Be willing to accept parents’ feelings

f)        Be careful about giving advice

g)      Be professional and maintain a positive attitude

h)       Begin with student’s strengths

i)        Review student’s cumulative record prior to conference

j)         Assemble samples of student’s work

k)      List questions to ask parents and anticipate parents’ questions

l)         Conclude the conference with an overall summary

m)     Keep a written record of the conference, listing problems and suggestions, with a copy for the parents 

5.      Don’ts of the parent teacher conference

a)      Don’t argue

b)       Don’t get angry

c)       Don’t ask embarrassing questions

d)      Don’t talk about other students, parents and teachers

e)       Don’t bluff if you don’t know

f)        Don’t reject parents’ suggestions

g)      Don’t blame parents

h)       Don’t talk too much; be a good listener (www.udel.edu.) 

 

Activities Activity 1: Enlist three pros and cons of test scores. 

Activity 2: Give a self-explanatory example of each of the types of test scores. 

Activity 3: Write down the different purposes and functions of test scores in order of importance as per your experience. Add more purposes as many as you can.  

Activity 4: Compare the modes of reporting test scores to parents by MEAP and NCCA. Also conclude which is relatively more appropriate in the context of Pakistan as per your point of view. 

Activity 5: In view of the strengths and shortcomings in above different grading and reporting systems, how would you briefly comment on the following characteristics of a multiple grading and reporting system for effective assessment of students’ learning?  a) Grading and reporting system should be guided by the functions to be served. b) It should be developed cooperatively by parents, students, teachers, and other school personnel. c) It should be based on clear and specific instructional objectives. d) It should be consistent with school standards. e) It should be based on adequate assessment. f) It should provide detailed information of student’s progress, particularly diagnostic and practical aspects. g) It should have the space of conducting parent-teacher conferences.   

Activity 6:  Explain the differences between relative grading and absolute grading by giving an example of each. 

Activity 7: Faiza Shaheen, a student of MA Education (Secondary) has earned the following marks, grades and GPA in the 22 courses at the Institute of Education & Research, University of the Punjab. Calculate her CGPA. Note down that that maximum value of GPA in each course is 4.   

Activity 8: Write Do’s and Don’ts in order of priority as per your perception. You may add more points or exclude what have been mentioned above.

 

Question No:5

     Write a note on advantages and disadvantages of crieterion reference testing.

Answer:

Reference test:

                        A criterion-referenced test is designed to measure how well test takers have mastered a particular body of knowledge. The term "criterion- referenced test" is not part of the everyday vocabulary in schools, and yet, nearly all students take criterion-referenced tests on a routine basis. These tests generally have an established "passing" score. Students know what the passing score is and an individual's test score is determined by knowledge of the course material.

It is important to distinguish between criterion-referenced tests and norm-referenced tests. The standardized tests used to measure how well an individual does relative to other people who have taken the test are norm-referenced.

Advantage of Criterion Referenced Test:

                                            Following are the major advantages of criterion referenced tests:

1.      First, students are only tested on their knowledge of specific goals or standards. For example, if you had taught a lesson on adding fractions, you will give the student a test on adding fractions. If he or she scores 85% that means that that particular student has learned 85% of that goal. If a student does not score particularly well, then the teacher can adjust their instruction accordingly.

2.      Another benefit is that if students do not seem to master a particular standard, the teacher will be able to go back and teach that standard again until the student performs better. Let’s say that we taught a lesson on Fahrenheit and Celsius. A student understands Fahrenheit, as shown on an assessment, but their knowledge of Celsius isn’t so good. The teacher then can go back and teach Celsius again. In special education it is nice because we have the freedom to spend more time on specific content and not worry so much about meeting the state standards.

3.      For special educators we have to focus our teaching based on the students’ IEP’s. Being able to focus our instruction based on the students’ needs is another benefit of criterion-referenced assessment. The students need to make progress toward their annual goals and objectives and the use of this type of assessment allows for that because again their scores are compared only to how they perform.

4.       Another good reason to using criterion-referenced assessments in special education is that it only tests students on what they can do. Tests like the SAT’s, which are norm-referenced, score students in relation to how they score against other people. For students with special needs, norm-referenced assessments do not tell teachers much about their abilities because the material is higher than their level.

5.      Criterion-referenced assessments are needs based, meaning the tests are created with what the students’ needs are. If a student really needs to improve their knowledge of proper nouns, then a test will be created on proper nouns.

6.      Teachers can also create their own tests, which are criterion-referenced as well. Also, tests that come with textbooks are also criterion-referenced because they only test on specific areas of knowledge.

7.      When discussing the advantages of criterion referenced tests, it is also important to mention that since students are only judged against themselves, they have a better chance of scoring high, which will help improve their self-esteem as well. Studies show that students with special needs tend to have lower self-esteem. Any way that we can help students feel better about themselves is a great opportunity.

8.      One thing to remember is that each student is an individual and is different. By using criterion-referenced assessments in your classroom, you can meet the individual needs of the students and differentiate your assessments with the sole purpose of helping the students achieve to their fullest potential.

  

 Disadvantages of Criterion-Referenced Tests:

                                 Criterion-referenced tests have some built-in disadvantages. Creating tests that are both valid and reliable requires fairly extensive and expensive time and effort. In addition, results cannot be generalized beyond the specific course or program. Such tests may also be compromised by students gaining access to test questions prior to exams. Criterionreferenced tests are specific to a program and cannot be used to measure the performance of large groups. 

Although these assessments are becoming more popular in the special education field they do have some drawbacks. These include:

1.      It does not allow for comparing the performance of students in a particular location with national norms. For example, a school would be unable to compare 5th grade achievement levels in a district, and therefore be unable to measure how a school is performing against other schools.

2.      It is time-consuming and complex to develop. Teachers will be required to find time to write a curriculum and assessments with an already full work-load. It might require more staff to come in and help.

3.      It costs a lot of money, time and effort. Creating a specific curriculum takes time and money to hire more staff; and most likely the staff will have to be professionals who have experience.

4.      It needs efficient leadership and collaboration, and lack of leadership can cause problems - for instance, if a school is creating assessments for special education students with no well-trained professionals, they might not be able to create assessments that are learner-centered.

5.      It may slow the process of curriculum change if tests are constantly changed. It is diifficult for curriculum developers to know what is working and what is not working because tests tend to be different from one school to another. It would require years of collecting data to know what is working and what is not.

Despite it’s flaws, criterion-referenced assessments will still be important in special education because comparing scores of students with special needs to average students will not achieve much in measuring the student’s current level of performance.

 

References:

1.      Anderson, L.W. (2003). Classroom Assessment – Enhancing the Quality of Teacher Decision Making. London: Lawerence Erlbaum Associates, Publishers.

2.       Barber, B.L., Paris, S.G., Evans, M., & Gadsden, V.L. (1992). Policies for Reporting test Results to Parents. USA: Pennsylvania State University.

3.      Brualdi, A. (1998). Teacher comments on report cards. Practical Assessment, Research & Evaluation, 6(5).

4.      Canter, A. (1998). Understanding test scores. Accessible at: http://www.wyanclotle.org/SpecialEd/Understanding_test_scores.htm

5.      Hall, K. (1990). Determining the Success of Narrative Report Cards. Unpublished Manuscript. (ERIC Documents No. 334 013).

6.      Hopkins, K.D. & Stanley, J.C. (1981). Educational and Psychological Measurement and Evaluation (6th ed.). New Dehli: Pearson Education.

Post a Comment

Animal Welfare Information

Animal Welfare Information

Contact Form

Name

Email *

Message *

Powered by Blogger.
Javascript DisablePlease Enable Javascript To See All Widget