ASSIGNMENT NO. 2
PROGRAMME: B.ED (1.5)
SEMESTER: FISRT
ALLAMA IQBAL
OPEN UNIVERSITY
Question
No :1
How
the validity of a test can be measured?
Answer :
Test validity:
Validity is the
most important issue in selecting a test. Validity refers to what
characteristic the test measures and how well the
test measures that characteristic.
- Validity
tells you if the characteristic being measured by a test is related to job
qualifications and requirements.
- Validity
gives meaning to the test scores. Validity evidence
indicates that there is linkage between test performance and job
performance. It can tell you what you may conclude or predict about
someone from his or her score on the test. If a test has been demonstrated
to be a valid predictor of performance on a specific job, you can conclude
that persons scoring high on the test are more likely to perform well on
the job than persons who score low on the test, all else being equal.
- Validity
also describes the degree to which you can make specific conclusions or
predictions about people based on their test scores. In other words, it
indicates the usefulness of the test.
Principle of Assessment:
It is important to understand the
differences between reliability and validity.
Validity will tell you how good a test is for a particular situation;
reliability will tell you how trustworthy a score on that test will be. You
cannot draw valid conclusions from a test score unless you are sure that the
test is reliable. Even when a test is reliable, it may not be valid. You should
be careful that any test you select is both reliable and valid for your
situation.
A test's validity is established in reference to
a specific purpose; the test may not be valid for different purposes.
For example:
The test you use to
make valid predictions about someone's technical proficiency on the job may not
be valid for predicting his or her leadership skills or absenteeism rate. This
leads to the next principle of assessment. Similarly, a test's validity is
established in reference to specific groups. These groups are called the
reference groups. The test may not be valid for different groups. For example,
a test designed to predict the performance of managers in situations requiring
problem solving may not allow you to make valid or meaningful predictions about
the performance of clerical employees. If, for example, the kind of
problem-solving ability required for the two positions is different, or the
reading level of the test is not suitable for clerical applicants, the test
results may be valid for managers, but not for clerical employees.
Test developers have the responsibility of describing the reference groups used
to develop the test. The manual should describe the groups for whom the test is
valid, and the interpretation of scores for individuals belonging to each of
these groups. You must determine if the test can be used appropriately with the
particular type of people you want to test. This group of people is called
your target population or target group.
Your target
group and the reference group do not have to match on all
factors; they must be sufficiently similar so that the test will yield
meaningful scores for your group. For example, a writing ability test developed
for use with college seniors may be appropriate for measuring the writing ability
of white-collar professionals or managers, even though these groups do not have
identical characteristics. In determining the appropriateness of a test for
your target groups, consider factors such as occupation, reading level,
cultural differences, and language barriers.
In order to be certain an employment test is
useful and valid, evidence must be collected relating the test to a job. The
process of establishing the job relatedness of a test is called validation.
Methods for conducting validation studies:
The Uniform
Guidelines discuss the following three methods of conducting
validation studies. The Guidelines describe conditions under
which each type of validation strategy is appropriate. They do not express a
preference for any one strategy to demonstrate the job-relatedness of a test.
- Criterion-related
validation requires demonstration of a
correlation or other statistical relationship between test performance and
job performance. In other words, individuals who score high on the test
tend to perform better on the job than those who score low on the test. If
the criterion is obtained at the same time the test is given, it is called
concurrent validity; if the criterion is obtained at a later time, it is
called predictive validity.
- Content-related
validation requires a demonstration that the
content of the test represents important job-related behaviors. In other
words, test items should be relevant to and measure directly important
requirements and qualifications for the job.
- Construct-related
validation requires a demonstration that the
test measures the construct or characteristic it claims to measure, and
that this characteristic is important to successful performance on the
job.
The three
methods of validity-criterion-related, content, and construct-should be used to
provide validation support depending on the situation. These three general
methods often overlap, and, depending on the situation, one or more may be
appropriate. French (1990) offers situational examples of when each method of
validity may be applied.
First, as an example of criterion-related
validity, take the position of millwright. Employees' scores (predictors) on a
test designed to measure mechanical skill could be correlated with their
performance in servicing machines (criterion) in the mill. If the correlation
is high, it can be said that the test has a high degree of validation support,
and its use as a selection tool would be appropriate.
Second, the content validation method may be
used when you want to determine if there is a relationship between behaviors
measured by a test and behaviors involved in the job. For example, a typing
test would be high validation support for a secretarial position, assuming much
typing is required each day. If, however, the job required only minimal typing,
then the same test would have little content validity. Content validity does
not apply to tests measuring learning ability or general problem-solving skills
(French, 1990).
Finally, the third method is construct validity.
This method often pertains to tests that may measure abstract traits of an
applicant. For example, construct validity may be used when a bank desires to
test its applicants for "numerical aptitude." In this case, an
aptitude is not an observable behavior, but a concept created to explain
possible future behaviors. To demonstrate that the test possesses construct
validation support, ". . . the bank would need to show (1) that the test
did indeed measure the desired trait and (2) that this trait corresponded to
success on the job" (French, 1990, p. 260).
Professionally developed tests should come with
reports on validity evidence, including detailed explanations of how validation
studies were conducted. If you develop your own tests or procedures, you will
need to conduct your own validation studies. As the test user, you have the
ultimate responsibility for making sure that validity evidence exists for the
conclusions you reach using the tests. This applies to all tests and procedures
you use, whether they have been bought off-the-shelf, developed externally, or
developed in-house.
Validity evidence:
It is especially
critical for tests that have adverse impact. When a test has adverse impact,
the Uniform Guidelines require that validity evidence for that
specific employment decision be provided.
The particular job for which a test is selected should be very similar to the
job for which the test was originally developed. Determining the degree of
similarity will require a job analysis. Job analysis is a
systematic process used to identify the tasks, duties, responsibilities and
working conditions associated with a job and the knowledge, skills, abilities,
and other characteristics required to perform that job.
Job analysis information may be gathered by direct observation of people
currently in the job, interviews with experienced supervisors and job
incumbents, questionnaires, personnel and equipment records, and work manuals.
In order to meet the requirements of the Uniform Guidelines, it is
advisable that the job analysis be conducted by a qualified professional, for
example, an industrial and organizational psychologist or other professional
well trained in job analysis techniques. Job analysis information is central in
deciding what to test for and which tests to use.
Using validity evidence from outside studies:
Conducting your
own validation study is expensive, and, in many cases, you may not have enough
employees in a relevant job category to make it feasible to conduct a study.
Therefore, you may find it advantageous to use professionally developed
assessment tools and procedures for which documentation on validity already
exists. However, care must be taken to make sure that validity evidence
obtained for an "outside" test study can be suitably
"transported" to your particular situation.
The Uniform Guidelines,
the Standards, and the SIOP Principles state that
evidence of transportability is required. Consider the following when using
outside tests:
- Validity
evidence. The validation procedures used in the
studies must be consistent with accepted standards.
- Job
similarity. A job analysis should be performed to
verify that your job and the original job are substantially similar in
terms of ability requirements and work behavior.
- Fairness
evidence. Reports of test fairness from outside
studies must be considered for each protected group that is part of your
labor market. Where this information is not available for an otherwise
qualified test, an internal study of test fairness should be conducted, if
feasible.
- Other significant variables. These include the type of performance measures and
standards used, the essential work activities performed, the similarity of
your target group to the reference samples, as well as all other
situational factors that might affect the applicability of the outside
test for your use.
Question
No: 2
what are
the rules of writing Multiple choice test items?
Answer:
RULES
FOR WRITING MULTIPLE-CHOICE QUESTIONS:
There
are several rules we can follow to improve the quality of this type of written
examination.
1.
Examine
only the Important Facts!
Make sure that every question examines only
the important knowledge. Avoid detailed questions - each question has to be
relevant for the previously set instructional goals of the course.
2.
Use
Simple Language!
Use simple language, taking care of spelling
and grammar. Spelling and grammar mistakes (unless you are testing spelling or
grammar) only confuse students. Remember that you are examining knowledge about
your subject and not language skills.
3.
Make the
Questions Brief and Clear!
Clear the
text of the body of the question from all superfluous words and irrelevant
content. It helps students to understand exactly what is expected of them. It
is desirable to formulate a question in such way that the main part of the text
is in the body of the question, without being repeated in the answers.
4.
Form the
Questions Correctly!
Be careful that the formulation of the
question does not (indirectly) hide the key to the correct answer. Student
(adept at solving tests) will be able to recognize it easily and will find the
right answer because of the word combination, grammar etc, and not because of
their real knowledge.
5.
Take
into Consideration the Independence of Questions!
Be careful not to repeat content and terms
related to the same theme, since the answer to one question can become the key
to solve another.
6.
Offer Uniform Answers!
All offered
answers should be unified, clear and realistic. For example, unlikely
realisation of an answer or uneven text quantity of different answers can point
to the right answer. Such a question does not test real knowledge. The position
of the key should be random. If the answers are numbers, they should be listed
in an ascending order.
7.
Avoid
Asking Negative Questions!
If you use negative questions, negation must
be emphasized by using CAPITAL letters, e.g. "Which of the following IS
NOT correct..." or "All of the following statements are true,
EXCEPT...".
8.
Avoid
Distracters in the Form of "All the answers are correct" or
"None of the Answers is Correct"!
Teachers use these statements most frequently
when they run out of ideas for distracters. Students, knowing what is behind
such questions, are rarely misled by it. Therefore, if you do use such
statements, sometimes use them as the key answer. Furthermore, if a student
recognizes that there are two correct answers (out of 5 options), they will be
able to conclude that the key answer is the statement "all the answers are
correct", without knowing the accuracy of the other distracters.
9.
Distracters
must be Significantly Different from the Right Answer (key)!
Distracters which only slightly differ from
the key answer are bad distracters. Good or strong distracters are statements
which themselves seem correct, but are not the correct answer to a particular
question.
10. Offer an Appropriate Numbers of Distracters.
The greater the
number of distracters, the lesser the possibility that a student could guess
the right answer (key). In higher education tests questions with 5 answers are
used most often (1 key + 4 distracters). That means that a student is 20%
likely to guess the right answer.
Advantages:
Multiple-choice test items are not a panacea.
They have advantages and advantages just as any other type of test item.
Teachers need to be aware of these characteristics in order to use multiple-choice
items effectively.
Advantages Versatility Multiple-choice test
items are appropriate for use in many different subject-matter areas, and can
be used to measure a great variety of educational objectives. They are
adaptable to various levels of learning outcomes, from simple recall of
knowledge to more complex levels, such as the student’s ability to:
• Analyze phenomena
• Apply
principles to new situations
• Comprehend concepts and principles
• Discriminate between fact and opinion
•
Interpret cause-and-effect relationships
•
Interpret charts and graphs
• Judge
the relevance of information
• Make inferences from given data
• Solve
problems The difficulty of multiple-choice items can be controlled by changing
the alternatives, since the more homogeneous the alternatives, the finer the
distinction the students must make in order to identify the correct answer.
Multiple-choice items are amenable to item analysis, which enables the teacher
to improve the item by replacing distracters that are not functioning properly.
In addition, the distracters chosen by the student may be used to diagnose
misconceptions of the student or weaknesses in the teacher’s instruction.
Validity In general, it takes much longer to
respond to an essay test question than it does to respond to a multiple-choice
test item, since the composing and recording of an essay answer is such a slow
process. A student is therefore able to answer many multiplechoice items in
time it would take to answer a single essay question. This feature enables the
teacher using multiple-choice items to test a broader sample of course contents
in a given amount of testing time. Consequently, the test scores will likely be
more representative of the students’ overall achievement in the course.
Reliability Well-written multiple-choice test
items compare favourably with other test item types on the issue of
reliability. They are less susceptible to guessing than are true-false test
items, and therefore capable of producing more reliable scores. Their scoring
is more clear-cut than short answer test item scoring because there are no
misspelled or partial answers to deal with. Since multiple-choice items are
objectively scored, they are not affected by scorer inconsistencies as are essay
questions, and they are essentially immune to the influence of bluffing and
writing ability factors, both of which can lower the reliability of essay test
scores.
Efficiency
Multiple-choice items are amenable to rapid scoring, which is often done
by scoring machines. This expedites the reporting of test results to the
student so that any follow-up clarification of instruction may be done before
the course has proceeded much further. Essay questions, on the other hand, must
be graded manually, one at a time.
Overall multiple choice tests are:
v
Very effective
v
Versatile at all levels
v
Minimum of writing for student
v
Guessing
reduced
v
Can cover broad range of content
Disadvantages
Versatility:
Since the student selects a response from a
list of alternatives rather than supplying or constructing a response,
multiple-choice test items are not adaptable to measuring certain learning
outcomes, such as the student’s ability to:
1)
Articulate explanations
2)
Display thought processes
3)
Furnish information
4)
Organize
personal thoughts.
5)
Perform
a specific task
6)
Produce
original ideas
7)
Provide
examples such learning outcomes are better measured by short answer or essay
questions, or by performance tests.
Reliability although they are less susceptible
to guessing than are true false-test items, multiple-choice items are still
affected to a certain extent. This guessing factor reduces the reliability of
multiple-choice item scores somewhat, but increasing the number of items on the
test offsets this reduction in reliability.
Difficulty of Construction Good multiple-choice
test items are generally more difficult and time-consuming to write than other
types of test items. Coming up with plausible distracters requires a certain
amount of skill. This skill, however, may be increased through study, practice,
and experience. Gronlund (1995) writes
that multiple-choice items are difficult to construct. Suitable distracters are
often hard to come by and the teacher is tempted to fill the void with a “junk”
response. The effect of narrowing the
range of options will available to the test wise student. They are also exceedingly time consuming to
fashion, one hour per question being by no means the exception. Finally multiple-choice items generally take
student longer to complete (especially items containing fine discrimination)
than do other types of objective question.
Question
No: 3
write a
detailed note on scale of measurement?
Answer:
Variables are
established and classed using measuring scales. Stanley Stevens, a
psychologist, created the four most used measurement scales: nominal, ordinal,
interval, and ratio. Each scale of measurement has characteristics that
influence how data should be analyzed. Identity, magnitude, equal intervals,
and a minimum value of zero are the
properties that are assessed.
Properties of Scales of Measurement:
Each value has a distinct meaning,
which is referred to as identity.
Magnitude denotes that the values
have an ordered relationship to one another, implying that the variables are in
a definite order.
Equal Intervals:
When data points on a scale
are equal, the difference between them is the same. For example, the difference
between data points one and two is the same as the difference between data points
five and six.
When the scale has a minimum value
of zero, it signifies the scale has a true zero point. Degrees, for example,
can go below zero without losing their significance. You don't weigh anything
if you don't weigh anything.
Scales of Measurement:
Data scientists can decide the type
of statistical test to run by determining the scale of their data measurement.
1.
Nominal Measurement Scale:
The
identity property of data is defined by the nominal scale of measurement. This
scale contains some qualities, but there is no numerical value to it. The
information can be categorized, but it cannot be multiplied, divided, added, or
removed from each other. It's also impossible to quantify the disparity between
data points. Eye color and birth country are two examples of nominal data.
Nominal data can be further divided into three groups:
2.
Ordered Nomenclature:
Some nominal data, such as "cold, warm, hot, and very hot,"
can be subcategorized in order. Nominal adverbial adverbial adverb Male and
female are two examples of nominal data that can be sub-categorized as nominal
without order. Dichotomous: The term dichotomous data refers to data that has
only two levels or classifications.
3.
Ordinal Measurement Scale
The ordinal scale is used to describe data that is arranged in a certain
order. While each value is graded, no information is provided as to what
distinguishes the categories from one another. These numbers cannot be
increased or decreased. A survey's satisfaction data points, where 'one =
happy, two = indifferent, and three = sad,' are an example of this type of
data. Ordinal data also describes where someone finished in a race. While
first, second, or third place indicate the sequence in which the runners
finished, it does not indicate how much ahead the first-place finisher was of
the second-place finisher.
4.
Scale of Measuring with Intervals:
Although the interval scale has nominal and ordered data qualities, the
difference between data points can be quantified. This type of information
displays both the order of the variables as well as the exact differences
between them. They can be multiplied or divided, but not added to or subtracted
from each other. 40 degrees, for example, is not the same as 20 degrees
multiplied by two. The fact that
the number zero is an existing variable adds to the essence of this scale. In the
ordinal scale, zero denotes the absence of data. Zero has a temperature in the interval scale; for example, if you measure
degrees, zero has a temperature.
5.
Scale of Measurement Based on a Ratio:
Properties
from all four scales of measurement are included in ratio scales of
measurement. The data is nominal and defined by an identity, and it can be
sorted into categories, contain intervals, and be broken down into exact
values. Ratio factors include things like weight, height, and distance. The
ratio scale allows you to add, subtract, divide, and multiply data. Ratio
scales are additionally distinguished from interval scales by the presence of a
'true zero.' The number zero denotes the absence of a value point in the data.
For example, no one can be 0 centimeters tall or weigh zero kilograms, nor can
they be negative millimeters or negative kilograms. Calculating shares or sales
are two examples of how this scale can be used. Data scientists can accomplish
the most with ratio data points of all the sorts of data on the scales of
measurement.
6. Interval scale:
It’s a numerical scale in which the order is known and the difference
between the values has meaning. The interval scale is the third level of
measurement and encompasses both nominal and ordinal scales. This scale can
also be referred to as an interval variable scale (interval
variable is used to describe the meaningful nature of the
difference between values).
Examples of this would be time, temperature (Celsius,
Fahrenheit), credit score, and more. In each of these examples, the difference
in value is known and easily calculated. Someone with a credit score of 720 has
a higher score than someone with 650. We know one is greater than the
other and we know EXACTLY how much larger the value is.
Note: There’s a difference
between time and duration. Time is an interval scale because there’s no
meaningful zero. Can you say when time started? Duration is a ratio scale
because there’s a meaningful zero and a starting point can be defined. 5 days
is twice as long as 10 days.
This is the first scale where you can do true statistical
analysis. Like the ordinal scale, the interval scale doesn’t have a starting
point that’s already been decided or true zero.
For example:
credit score
is an interval scale but it starts at 300.
With that being said, every point on the scale is equidistant
from the next. On a Celsius scale, each unit is the same size or has the same
value. We can, without a doubt, quantify the difference between 5 Celsius and 6
Celsius. There is no true zero because temperature can go into the negatives.
Zero is just another point of measurement.
7. Ratio scale:
Ratio scales are the cream of the crop when it comes to
statistical analysis because they have everything you need. A ratio scale has
an order, a set value between units, and absolute zero. It’s an interval scale
with a true zero.
Examples of ratio scales include concentration, length, weight,
duration, and more. Because there’s a zero position, it opens up the doors for
inferential and descriptive analysis techniques. Use ratio scales to understand
the size of the market, market share, revenue, pricing, etc.
You can only find mode with nominal scales, you can find median
with ordinal scales, interval scales lend themselves to mean, mode, and median.
Ratio scales can use all of that plus other methods such as geometric mean and
coefficient of variation. Arguably, ratio data is the most versatile.
Note: The proportion between two units
of a ratio scale is meaningful. On an interval scale, they’re not. For example,
20 pounds is twice the weight of 10 pounds. A credit score of 600 is not twice
as good as a credit score of 300 because it’s not a ratio.
Example of
ratio scale question
What is your weight in pounds?
Less than 70
70 – 120
121 – 150
More than 150
What is your age?
Less than 20
20 – 30
31 – 40
41 – 50
More than 50
Question
No: 4
what
are the consideration in conducting parent – teacher conference?
Answer:
Conducting
Parent-Teacher Conferences:
The first conference is usually arranged in
the beginning of the school year to allow parents and teachers to get
acquaintance and preparing plan for the coming months. Teachers usually receive
some training to plan and conduct such conferences. Following steps may be
observed for holding effective parent-teacher conferences.
1.
Prepare
for the conference
a)
Review the goals and objectives
b)
Organize the information to present
c)
If portfolios are to discuss, these are
well-arranged
d)
Start
and keep positive focus
e)
Announce
the final date and time as per convenience of the parents and children •
Consider socio-cultural barriers of students / parents
f)
Check with other staff who works your advisee •
Develop a packet of conference including student’s goals, samples of work, and
reports or notes from other staff.
2.
. Rehearse the conference with students by
role-playing
a)
Students present their goals, learning
activities, samples of work
b)
Students ask for comments and suggestions from
parents
3.
Conduct
conference with student, parent, and advisor.
Advisee
takes the lead to the greatest possible extent
a)
Have a comfortable setting of chairs, tables
etc.
b)
Notify a
viable timetable for the conferences
c)
Review
goals set earlier
d)
Review progress towards goals
e)
Review
progress with samples of work from learning activities
f)
Present
students strong points first
g)
Review
attendance and handling of responsibilities at school and home
h)
Modify goals for balance of the year as
necessary
i)
Determine other learning activities to
accomplish goals
j)
Describe
upcoming events and activities
k)
Discuss
how the home can contribute to learning
l)
Parents should be encouraged to share their thoughts
on students’ progress
m)
Ask
parents and students for questions, new ideas
4.
Do’s of
parent-teacher conferences
a)
Be friendly
b)
Be
honest
c)
Be
positive in approach
d)
Be willing to listen and explain
e)
Be willing
to accept parents’ feelings
f)
Be
careful about giving advice
g)
Be professional and maintain a positive
attitude
h)
Begin
with student’s strengths
i)
Review student’s cumulative record prior to
conference
j)
Assemble
samples of student’s work
k)
List questions to ask parents and anticipate
parents’ questions
l)
Conclude
the conference with an overall summary
m)
Keep a
written record of the conference, listing problems and suggestions, with a copy
for the parents
5.
Don’ts
of the parent teacher conference
a)
Don’t argue
b)
Don’t
get angry
c)
Don’t
ask embarrassing questions
d)
Don’t
talk about other students, parents and teachers
e)
Don’t
bluff if you don’t know
f)
Don’t
reject parents’ suggestions
g)
Don’t blame parents
h)
Don’t
talk too much; be a good listener (www.udel.edu.)
Activities Activity 1: Enlist three pros and
cons of test scores.
Activity 2: Give a self-explanatory example of
each of the types of test scores.
Activity 3: Write down the different purposes
and functions of test scores in order of importance as per your experience. Add
more purposes as many as you can.
Activity 4: Compare the modes of reporting test
scores to parents by MEAP and NCCA. Also conclude which is relatively more
appropriate in the context of Pakistan as per your point of view.
Activity 5: In view of the strengths and
shortcomings in above different grading and reporting systems, how would you
briefly comment on the following characteristics of a multiple grading and
reporting system for effective assessment of students’ learning? a) Grading and reporting system should be
guided by the functions to be served. b) It should be developed cooperatively
by parents, students, teachers, and other school personnel. c) It should be
based on clear and specific instructional objectives. d) It should be
consistent with school standards. e) It should be based on adequate assessment.
f) It should provide detailed information of student’s progress, particularly
diagnostic and practical aspects. g) It should have the space of conducting
parent-teacher conferences.
Activity 6:
Explain the differences between relative grading and absolute grading by
giving an example of each.
Activity 7: Faiza Shaheen, a student of MA
Education (Secondary) has earned the following marks, grades and GPA in the 22
courses at the Institute of Education & Research, University of the Punjab.
Calculate her CGPA. Note down that that maximum value of GPA in each course is
4.
Activity 8: Write Do’s and Don’ts in order of priority as per your perception. You may add more points or exclude what have been mentioned above.
Question
No:5
Write
a note on advantages and disadvantages of crieterion reference testing.
Answer:
Reference
test:
A criterion-referenced test is designed to
measure how well test takers have mastered a particular body of knowledge. The
term "criterion- referenced test" is not part of the everyday
vocabulary in schools, and yet, nearly all students take criterion-referenced
tests on a routine basis. These tests generally have an established
"passing" score. Students know what the passing score is and an
individual's test score is determined by knowledge of the course material.
It is important to distinguish between criterion-referenced
tests and norm-referenced tests. The standardized tests used to measure how
well an individual does relative to other people who have taken the test are
norm-referenced.
Advantage
of Criterion Referenced Test:
Following are the major advantages of
criterion referenced tests:
1.
First, students are only tested on their
knowledge of specific goals or standards. For example, if you had taught a
lesson on adding fractions, you will give the student a test on adding
fractions. If he or she scores 85% that means that that particular student has
learned 85% of that goal. If a student does not score particularly well, then
the teacher can adjust their instruction accordingly.
2.
Another benefit is that if students
do not seem to master a particular standard, the teacher will be able to go back
and teach that standard again until the student performs better. Let’s say that
we taught a lesson on Fahrenheit and Celsius. A student understands Fahrenheit,
as shown on an assessment, but their knowledge of Celsius isn’t so good. The
teacher then can go back and teach Celsius again. In special education it is
nice because we have the freedom to spend more time on specific content and not
worry so much about meeting the state standards.
3.
For special educators we have to
focus our teaching based on the students’ IEP’s. Being able to focus our
instruction based on the students’ needs is another benefit of
criterion-referenced assessment. The students need to make progress toward
their annual goals and objectives and the use of this type of assessment allows
for that because again their scores are compared only to how they perform.
4.
Another good reason to using
criterion-referenced assessments in special education is that it only tests
students on what they can do. Tests like the SAT’s, which are norm-referenced,
score students in relation to how they score against other people. For students
with special needs, norm-referenced assessments do not tell teachers much about
their abilities because the material is higher than their level.
5.
Criterion-referenced assessments
are needs based, meaning the tests are created with what the students’ needs
are. If a student really needs to improve their knowledge of proper nouns, then
a test will be created on proper nouns.
6.
Teachers can also create their own
tests, which are criterion-referenced as well. Also, tests that come with
textbooks are also criterion-referenced because they only test on specific
areas of knowledge.
7.
When discussing the advantages of
criterion referenced tests, it is also important to mention that since students
are only judged against themselves, they have a better chance of scoring high,
which will help improve their self-esteem as well. Studies show that students
with special needs tend to have lower self-esteem. Any way that we can help
students feel better about themselves is a great opportunity.
8.
One thing to remember is that each
student is an individual and is different. By using criterion-referenced
assessments in your classroom, you can meet the individual needs of the
students and differentiate your assessments with the sole purpose of helping
the students achieve to their fullest potential.
Disadvantages of Criterion-Referenced Tests:
Criterion-referenced tests have some built-in
disadvantages. Creating tests that are both valid and reliable requires fairly
extensive and expensive time and effort. In addition, results cannot be
generalized beyond the specific course or program. Such tests may also be
compromised by students gaining access to test questions prior to exams.
Criterionreferenced tests are specific to a program and cannot be used to
measure the performance of large groups.
Although these assessments are becoming more
popular in the special education field they do have some drawbacks. These include:
1.
It does not allow for comparing the
performance of students in a particular location with national norms. For
example, a school would be unable to compare 5th grade achievement levels in a
district, and therefore be unable to measure how a school is performing against
other schools.
2.
It is time-consuming and complex to
develop. Teachers will be required to find time to write a curriculum and
assessments with an already full work-load. It might require more staff to come
in and help.
3.
It costs a lot of money, time and
effort. Creating a specific curriculum takes time and money to hire more
staff; and most likely the staff will have to be professionals who have
experience.
4.
It needs efficient leadership and
collaboration, and lack of leadership can cause problems - for
instance, if a school is creating assessments for special education students
with no well-trained professionals, they might not be able to create
assessments that are learner-centered.
5.
It may slow the process of curriculum
change if tests are constantly changed. It is diifficult for curriculum
developers to know what is working and what is not working because tests tend
to be different from one school to another. It would require years of
collecting data to know what is working and what is not.
Despite it’s flaws, criterion-referenced
assessments will still be important in special education because comparing
scores of students with special needs to average students will not achieve much
in measuring the student’s current level of performance.
References:
1.
Anderson, L.W. (2003). Classroom Assessment –
Enhancing the Quality of Teacher Decision Making. London: Lawerence Erlbaum
Associates, Publishers.
2.
Barber,
B.L., Paris, S.G., Evans, M., & Gadsden, V.L. (1992). Policies for
Reporting test Results to Parents. USA: Pennsylvania State University.
3.
Brualdi, A. (1998). Teacher comments on report
cards. Practical Assessment, Research & Evaluation, 6(5).
4.
Canter, A. (1998). Understanding test scores.
Accessible at: http://www.wyanclotle.org/SpecialEd/Understanding_test_scores.htm
5.
Hall, K. (1990). Determining the Success of
Narrative Report Cards. Unpublished Manuscript. (ERIC Documents No. 334 013).
6.
Hopkins, K.D. & Stanley, J.C. (1981).
Educational and Psychological Measurement and Evaluation (6th ed.). New Dehli:
Pearson Education.
Post a Comment