Reliability is basically the consistency (or repeatability) of an instrument when measuring the same phenomenon over time.
Example: Let’s say you want to develop a way to measure ADHD. You come up with a set of questions that you think will distinguish persons with ADHD from persons who do not have the disorder. You call your measure the ADHD test. You want it to be valid and reliable in the following ways
Since ADHD is considered trait-like (that is, it doesn’t go away in a few days like a cold), you should be able to give the ADHD test to children when they enter first grade and again at winter break and find that the same children still test positive for ADHD (assuming none are being treated for the disorder in the meantime).
Interrater Reliability: the extent to which two or more judges or raters agree on a given
measure or assessment.
If the teacher and the psychologist both test a child with your ADHD test, they should both get the same result concerning that child.
Internal Consistency Reliability: the degree to which different items of an assessment are
related to each other.
If the items in your ADHD test correlate with each other, your test has internal consistency. However, if you have thrown in questions that measure something other than ADHD (such as items about food preferences or sensitivity to cold, which are not related to the ADHD construct), these items will not correlate well with other items and will bring down the internal consistency of your test.
Alternate Form Reliability: the extent to which individuals taking two somewhat different
forms of a measure score similarly on both measures.
Let’s say your ADHD test is successful but clinicians tell you it takes too long to administer. So you decide to create a shortened version. Your short version should correlate well with the longer version in order to have alternate form reliability.
Validity is basically the ability of an instrument to measure what it says it measures.
Content Validity: the extent to which a measure adequately samples various aspects of the
construct of interest.
ADHD involves problems with sustained attention, impulsivity, and hyperactivity. Your ADHD test will have content validity to the extent that it includes items that address all these aspects of the ADHD construct.
Criterion Validity: the extent to which a measure agrees with some established measure of the
construct being tested.
The criterion (or established gold standard) for your ADHD test would be the diagnosis rendered by a neuropsychologist who had conducted extensive testing on a child. If your ADHD test finds that ADD is present in the same children that are identified by the long and expensive neuropsych testing, then you have achieved criterion validity.
Internal Validity: the extent to which experimental results can be attributed to the manipulation
of the independent variable. (Note that there are a number of threats to internal validity. See next page).
Say you want to determine whether your new treatment for ADHD works. You test a number of children for ADHD using your new validated ADHD test, you give half of them a treatment, and then you test all the children again, expecting that the children who had the treatment will have improved scores on the ADHD test. If no factor other than your treatment could have been responsible for the improvement in scores, then your experiment has internal validity. However, there are many potential threats to validity (see the following section on threats to validity).
External Validity: the extent to which results can be generalized to other populations and
You originally validated your ADHD test using suburban 5th graders. Now you need to find out whether it will generalize to inner city 3rd graders, etc. If the tests are reliable with other samples of subjects, your test has external validity.