|
FAQ - Top 10 Test Topics
1.
Why use tests when you can make judgments without them?
2. What are the three most important characteristics of a good
test?
3. What does "reliability" mean?
4. What does "validity" mean, when used to describe a test?
5. What makes a test "useful"?
6. What is a normal distribution?
7. What is a percentile score?
8. What is a T-Score?
9. What is a correlation coefficient?
10. How can I learn about the reliability, validity
and usefulness of the tests offered at this web site?
1. Why use tests when you can make
judgments without them?
Well-designed tests have several advantages over human judgment based on
less objective information. Tests can more quickly and accurately measure
many traits. School teachers use tests to measure how much students learn,
as it would be too time-consuming for the teacher to interview each student
about the subject matter. Also, human judgment is often biased by factors of
which the person judging is unaware. For example, people tend to
overestimate their intelligence levels, perhaps because being "stupid" is a
grounds for social rejection. When we rate another person's personality
traits, as when assessing job applicants, we tend to give people higher
ratings if they are of our same ethnic background or if we see them as
handsome or beautiful. We do this without being aware. Well-designed tests
are not contaminated by such biases. Thus, they provide more accurate and
fair measures.
Example: In lectures on this topic I ask a group of people to imagine they
are screening applicants for a job that requires past national government
leadership experience, a college education, high intelligence and good
control of personal sexual feelings on the job. I ask them to rate a
candidate on two of these traits: verbal intelligence and personal feeling
control. They are to use a scale from 1 (low) to 10 (high). The job
candidate I have them rate is William J. Clinton, ex-President of the United
States. The ratings for intelligence that I get typically range from about 3
to 10. For "personal feelings control" they range from 1 to 7. Clinton was a
Rhodes scholar and almost certainly had verbal intelligence above the 90th
percentile (above and I.Q. Of 120). Therefore, on a reliable test of verbal
intelligence he would earn a score equivalent on our rating scale of "10". A good test of sexual
feeling control on the job might have given a score of 4. The test scores
would not vary from one examination to another, being more reliable than
simple human judgment which, in this example, varies widely, from 3 to 10 and
from 1 to 7. For most raters, Clinton's sexual indiscretions probably seemed
"stupid". This impression probably lowers their estimates of his
intelligence, distorting their rating of him on this trait. To be fair to
Mr. Clinton in a hiring situation, tests would be more appropriate than the
judgment of one or another of our raters.
^
Top
2. What are the three most important
characteristics of a good test?
Reliability, validity and usefulness.
^
Top
3. What does "reliability" mean?
Test reliability is the accuracy with which a given score for a given
individual person measures the trait in question. For a score to be reliable
the person must take the test carefully and conscientiously. As far as the
test itself is concerned, good reliability can be assured if the test
questions are carefully written and there are enough of them. For some
traits, 30 questions are desirable. For other traits, only 6 or 10 questions
are enough. For gender, age, years of education and high school grade point
average, only one question each is enough.
If we wish to measure several aspects of a trait, such as verbal, spatial
and memory aspects of intelligence, we may need 30 questions for each
aspect. If we plan to score a test using different norms for each of several
age levels, then more than 30 questions may be necessary to obtain reliable
measures for the full range of the trait at each age level.
Well-designed tests include items which have been carefully crafted and have
passed one or more statistical tests to assure that they are contributing
well to the total test score.
Reliability is indicated by a statistic, such as an "alpha coefficient".
Reliability of .70 is sometimes adequate. .80 is good. .90 or above is
excellent.
^
Top
4. What does "validity" mean, when used
to describe a test?
Test validity is the accuracy with which a test measures the trait it claims
to measure. The test must first have adequate reliability, as described
above. To be valid, content of the test questions should look right; a test
claiming to measure arithmetic addition skills should consist of addition
problems, not subtraction or division problems. A test of the personality
trait of Extroversion should contain items about social interactions with
people, not feelings of depression or anxiety. Another way to document
validity is to see if scores on the test are concurrently related as
expected to other information, such as scores on other tests that are
trusted to measure the same thing and other information to which the trait
is related. For example, verbal intelligence is known to be positively
related to school grades; persons with higher intelligence tend to get
higher grades. Therefore, any test of verbal intelligence should show such a
positive relationship.
^
Top
5. What makes a test "useful"?
A test is useful if it helps someone make decisions more
effectively than without it. Tests are found useful to measure progress in
school classes, how much teenagers and adults know about State driving rules
and how much intelligence and background knowledge is had by persons
applying for college and for the Armed services. They are found useful by
employers when hiring for private industry and by the government, such as
the Postal Service, which uses the Civil Service Examination to screen
postal worker applicants.
Some tests are more useful than others. For example, one test for depression
may be quite reliable and valid but only provide one score for overall
depression, another test may also provide separate score for aspects of depression, such as suicidal tendencies and personal problem areas.
The second test may be more useful because it provides this added detail.
I built my test for depression to provide many scores for
separate aspects of depression, including suicidal tendencies and causes. I
find this test more useful because it provides more information important
when assessing depressed clients.
^
Top
6. What is a normal distribution?
A normal distribution is a pattern of test scores arranged from lowest to
highest. It shows the frequency of scores at each level. A normal
distribution of scores is highest in the middle and tapers smoothly to each
end, in a bell shape. Most complex biological and psychological traits are
normally distributed. Height, weight, intelligence, Extroversion, depression
and business management aptitude are all examples. These traits are all
"complex" in that the underlying factors contributing to them are numerous.
For example, many facets make up intelligence as a global trait describing a
person's aptitude for understanding and solving problems in general.
Some psychological traits are not normally distributed, but are skewed to
one side. Homicide endorsement, is an example. Most persons get very low
scores on a measure of this trait; most persons do not endorse murder as a
way to solve personal problems. A few persons do, and their scores trail off
in a rather thin stream to the right of the majority in a typical frequency
distribution graph or chart.
^
Top
7. What is a percentile score?
A percentile score is a standard score
which tells where a given raw score falls relative to other persons who have
taken the test. Percentile scores range from 1 to 100. If on a test of 30
questions a raw score of 16 is at the 50th percentile, then 50 out of 100
people who take the test are likely to get raw scores of 15 or lower and the
rest higher than 15. A percentile score of 90 means 90 of 100 persons who
have taken the test have gotten raw scores lower than the one corresponding
to the 90th percentile. Percentile scores help tested persons understand
what their test scores mean by telling them how they did on the test
compared to other people.
^
Top
8. What is a T-Score? A T-score is
another standard score, like the percentile score in some respects. T-scores
are typically set with a mean (average) of 50 and two thirds of all scores
falling between 40 and 60. T-scores can be set with a mean of 50 and two
thirds of scores falling between 22 and 78. In this system, most T-scores
will fall between 1 and 100, approximating percentile scores. T-scores are
more appropriate than percentile scores for research purposes, so they are
often included in test reports.
^
Top
9. What is a correlation coefficient?
It is a statistic widely used in psychological research, including test
design. It shows the degree of relationship between two measures. It can
range from - 1.00 to + 1.00. If high scores on one trait (e.g. intelligence)
are associated with high scores on the other measure (e.g. school grades),
then the correlation is positive, e.g. .52. If high scores on one measure
are associated with low scores on the other, then it is negative, e.g. -.68
between a measure of warmongering disposition and intelligence. When
correlations are so high that they are very unlikely to have occurred by
chance alone, then researchers can be confident that the two traits are
significantly related to each other and advise persons to make decisions
based on the test scores for those traits. If correlations are based on very
large samples of persons, e.g. 100 or more, then correlations even as low as
.20 can be significant (not due to chance) and provide valuable information.
^
Top
10. How can I learn about the
reliability, validity and usefulness of the tests offered at this web site?
You may read the description of each test in the Products section. If you are a professional,
you can register as one and then read the manual for each test in the web site.
^
Top
|