TestMaster, Inc. | FAQ & Help

Testing Information

(Click to expand)

The Product Content

The Product Delivered

Fees

Validation Studies

Validation studies are valuable for two primary reasons. The first one is to check on the validity of the test battery for screening job applicants in a given company. This sort of study can be done by having 40 current employees who do a given job take a battery of the tests deemed to be appropriate for that job. For example, the secretary battery discussed immediately above would be appropriate for testing 40 current secretaries. These secretaries are rated by HR personnel on their job performance using in-house performance data. Dr. mechanic he can help the department develop rating forms to provide reliable and valid quantitative measures of this in-house data. Then Dr. mechanic he would run correlations between the obtained test scores and the in-house performance data. Hopefully, there will be significant correlations between the test scores and this data. Such correlations are evidence for the validity of the battery as a screening procedure. This testing process can be conducted before deciding whether or not to use the battery to screen job applicants. Thus, testing 40 secretaries at $60 each plus a few hundred dollars for the research project would yield information upon which to base a confident decision about the potential value of the testing process for screening job applicants. In this example, forabout $3000 a customer can have objective information upon which to inform a test battery use decision.

One important value of doing such testing is to address issues raised by federal regulations about the use of job screening procedures of all sorts, including interviews, background checks, etc. Federal regulations require that companies keep a record of the possible adverse impact of their various hiring procedures. If a given procedure tends to result in higher rate of hiring for majority candidates, such as Caucasian/White males compared to the rate of hiring of minority group members, such as women, African-Americans, Mexican-Americans or elders, then the company can continue using the hiring procedure, whether interviewing, background checks or a test battery, as long as they can document that it is valid in their setting. Validity is established by significant correlations between scores on the hiring procedure and in-house performance data. In addition, a company using hiring procedures that have adverse impact is obligated to continue looking for equally valid procedures that have less adverse impact. Because decades of research has shown that good psychological test measures are the most valid way of screening job applicants, and because the tests offered in the current Testmaster Inc product have good reliability and high face validity, the chances are that the Testmaster battery of tests will be as good or better than any other batteries the using company may consider.

Frequently Asked Questions

1. Why use tests when you can make judgments without them?

Well-designed tests have several advantages over human judgment based on less objective information. Tests can more quickly and accurately measure many traits. School teachers use tests to measure how much students learn, as it would be too time-consuming for the teacher to interview each student about the subject matter. Also, human judgment is often biased by factors of which the person judging is unaware. For example, people tend to overestimate their intelligence levels, perhaps because being "stupid" is a grounds for social rejection. When we rate another person's personality traits, as when assessing job applicants, we tend to give people higher ratings if they are of our same ethnic background or if we see them as handsome or beautiful. We do this without being aware. Well-designed tests are not contaminated by such biases. Thus, they provide more accurate and fair measures.

Example: In lectures on this topic I ask a group of people to imagine they are screening applicants for a job that requires past national government leadership experience, a college education, high intelligence and good control of personal sexual feelings on the job. I ask them to rate a candidate on two of these traits: verbal intelligence and personal feeling control. They are to use a scale from 1 (low) to 10 (high). The job candidate I have them rate is William J. Clinton, ex-President of the United States. The ratings for intelligence that I get typically range from about 3 to 10. For "personal feelings control" they range from 1 to 7. Clinton was a Rhodes scholar and almost certainly had verbal intelligence above the 90th percentile (above and I.Q. Of 120). Therefore, on a reliable test of verbal intelligence he would earn a score equivalent on our rating scale of "10". A good test of sexual feeling control on the job might have given a score of 4. The test scores would not vary from one examination to another, being more reliable than simple human judgment which, in this example, varies widely, from 3 to 10 and from 1 to 7. For most raters, Clinton's sexual indiscretions probably seemed "stupid". This impression probably lowers their estimates of his intelligence, distorting their rating of him on this trait. To be fair to Mr. Clinton in a hiring situation, tests would be more appropriate than the judgment of one or another of our raters.

2. What are the three most important characteristics of a good test?

Reliability, validity and usefulness.

3. What does "reliability" mean?

Test reliability is the accuracy with which a given score for a given individual person measures the trait in question. For a score to be reliable the person must take the test carefully and conscientiously. As far as the test itself is concerned, good reliability can be assured if the test questions are carefully written and there are enough of them. For some traits, 30 questions are desirable. For other traits, only 6 or 10 questions are enough. For gender, age, years of education and high school grade point average, only one question each is enough.

If we wish to measure several aspects of a trait, such as verbal, spatial and memory aspects of intelligence, we may need 30 questions for each aspect. If we plan to score a test using different norms for each of several age levels, then more than 30 questions may be necessary to obtain reliable measures for the full range of the trait at each age level.

Well-designed tests include items which have been carefully crafted and have passed one or more statistical tests to assure that they are contributing well to the total test score.

Reliability is indicated by a statistic, such as an "alpha coefficient". Reliability of .70 is sometimes adequate. .80 is good. .90 or above is excellent.

4. What does "validity" mean, when used to describe a test?

Test validity is the accuracy with which a test measures the trait it claims to measure. The test must first have adequate reliability, as described above. To be valid, content of the test questions should look right; a test claiming to measure arithmetic addition skills should consist of addition problems, not subtraction or division problems. A test of the personality trait of Extroversion should contain items about social interactions with people, not feelings of depression or anxiety. Another way to document validity is to see if scores on the test are concurrently related as expected to other information, such as scores on other tests that are trusted to measure the same thing and other information to which the trait is related. For example, verbal intelligence is known to be positively related to school grades; persons with higher intelligence tend to get higher grades. Therefore, any test of verbal intelligence should show such a positive relationship.

5. What makes a test "useful"?

A test is useful if it helps someone make decisions more effectively than without it. Tests are found useful to measure progress in school classes, how much teenagers and adults know about State driving rules and how much intelligence and background knowledge is had by persons applying for college and for the Armed services. They are found useful by employers when hiring for private industry and by the government, such as the Postal Service, which uses the Civil Service Examination to screen postal worker applicants.

Some tests are more useful than others. For example, one test for depression may be quite reliable and valid but only provide one score for overall depression, another test may also provide separate score for aspects of depression, such as suicidal tendencies and personal problem areas. The second test may be more useful because it provides this added detail. I built my test for depression to provide many scores for separate aspects of depression, including suicidal tendencies and causes. I find this test more useful because it provides more information important when assessing depressed clients.

6. What is a normal distribution?

A normal distribution is a pattern of test scores arranged from lowest to highest. It shows the frequency of scores at each level. A normal distribution of scores is highest in the middle and tapers smoothly to each end, in a bell shape. Most complex biological and psychological traits are normally distributed. Height, weight, intelligence, Extroversion, depression and business management aptitude are all examples. These traits are all "complex" in that the underlying factors contributing to them are numerous. For example, many facets make up intelligence as a global trait describing a person's aptitude for understanding and solving problems in general.

Some psychological traits are not normally distributed, but are skewed to one side. Homicide endorsement, is an example. Most persons get very low scores on a measure of this trait; most persons do not endorse murder as a way to solve personal problems. A few persons do, and their scores trail off in a rather thin stream to the right of the majority in a typical frequency distribution graph or chart.

FAQ & Help

Testing Information

(Click to expand)

Frequently Asked Questions

Top 10 Test Topics

1. Why use tests when you can make judgments without them?

2. What are the three most important characteristics of a good test?

3. What does "reliability" mean?

4. What does "validity" mean, when used to describe a test?

5. What makes a test "useful"?

6. What is a normal distribution?

7. What is a percentile score?

8. What is a T-Score?

9. What is a correlation coefficient?

10. How can I learn about the reliability, validity and usefulness of the tests offered at this web site?

William A. McConochie, Ph.D