Reliability and Validity in Testing - What do They Mean?

Michael Callans's picture

It is a common mistake to assume the terms “reliability” and “validity” have the same meaning. While they are related, the two concepts are very different. In an effort to clear up any misunderstandings, I have defined each here for you.

Reliability

Of the two terms, reliability is the simpler concept to explain and understand. If you are focusing on the reliability of a test, all you need to ask is—are the results of the test consistent? If I take the test today, a week from now and a month from now, will my results be the same?

If an assessment is reliable, your results will be very similar no matter when you take the test. If the results are inconsistent, the test is not considered reliable.

Validity

Validity is a bit more complex because it is more difficult to assess than reliability. There are various ways to assess and demonstrate that an assessment is valid, but in simple terms, validity refers to how well a test measures what it is supposed to measure.

There are several approaches to determine the validity of an assessment, including the assessment of content, criterion-related and construct validity. 

  • An assessment demonstrates content validity when the criteria it is measuring aligns with the content of the job. Also, the extent to which that content is essential to job performance (versus useful-to-know) is part of the process in determining how well the assessment demonstrates content validity.

    For example, the ability to type quickly would likely be considered a large and crucial aspect of the job for an executive secretary compared to an executive. While the executive is probably required to type, such a skill is not as nearly as important to performing that job. Ensuring an assessment demonstrates content validity entails judging the degree to which test items and job content match each other. 
  • An assessment demonstrates criterion-related validity if the results can be used to predict a facet of job performance. Determining if an assessment predicts performance requires that assessment scores are statistically evaluated against a measure of employee performance.

    For example, an employer interested in understanding how well an integrity test identifies individuals that are likely to engage in counterproductive work behaviors might compare applicants’ integrity test scores to how many accidents or injuries those individuals have on the job, if they engage in on-the-job drug use, or how many times they ignore company policies. The degree to which the assessment is effective in predicting such behaviors is the extent to which it exhibits criterion-related validity.

  • An assessment demonstrates construct validity if it is related to other assessments measuring the same psychological construct--a construct being a concept used to explain behavior (e.g., intelligence, honesty). 

    For example, intelligence is a construct that is used to explain a person’s ability to understand and solve problems.  Construct validity can be evaluated by comparing intelligence scores on one test to intelligence scores on other tests  (i.e., Wonderlic Cognitive Ability Test to the Wechsler Adult Intelligence Scale).

Reliable and Valid?

The tricky part is that a test can be reliable without being valid. However, a test cannot be valid unless it is reliable. An assessment can provide you with consistent results, making it reliable, but unless it is measuring what you are supposed to measure, it is not valid.

What are the biggest questions you have surrounding reliability and validity? 

Comments

Dear Michael,

The Open University would like to use the graphic with the three target boards describing reliability and validity in their course materials for students.

Are you the copyright holder of this image, and if so, could you give us permission to use this image? If you are not the copyright holder, please provide a contact email address, or other means, of contacting the copyright holder.

Thank you for your kind assistance.

V Fallon
The Open University
United Kingdom

Add new comment