[This text is machine generated and may contain errors.]
~
THE JOURNAL OF EXPERIMENTAL EDUCATION
(Volume 35, Number 4, Summer 1967)
THE MAXIMUM RELIABILITY OF A MULTIPLE-
CHOICE TEST AS A FUNCTION OF NUMBER OF
ITEMS, NUMBER OF CHOICES, AND
GROUP HETEROGENEITY
GRAHAM J. BURKHEIMER
DONALD W. ZIMMERMAN
RICHARD H. WILLIAMS
East Carolina College
IN PREVIOUS papers (7, 8) it has been shown
that chance success due to guessing introduces an
unavoidable source of error into multiple-choice
test scores. This particular class of error is neg-
atively correlated with true scores. The usual equa-
tions for test reliability and other intercorrelations
among components of test scores depend upon the
assumption that the correlations between true scores
and error scores and between error scores and er-
ror scores on parallel forms of a test are zero, In
previous papers (6, 8, 9, 10) more general equa-
tions for these intercorrelation terms, whichdo not
depend upon the above assumptions, have been pre-
sented,
Because of the presence of chance success due
to guessing the reliability of a multiple-choice test
has a maximum value. In other words, if allsources
of error other than chance success due to guessing
were eliminated, the reliability of a test would re-
main at some value less than unity because of the
unavoidable error due to guessing. The computer
simulation method described previously (8) gave re-
liabilities for several kinds of tests, under the as-
sumption that only error due to guessing is present.
The purpose of this paper is to determine these val-
ues using analytic methods. An equation for the
maximum reliability of a multiple-choice test, which
involves only number of items, number of choices,
and mean and variance of true scores (group hetero-
geneity) is derived.
Horst (2) derived equations indicating the maxi-
mum correlation between two different tests. Be-
ginning with these, Roberts (5) derived equations for
maximum reliability of a test. These results in-
volve item difficulties and are based on assumptions
concerning intercorrelations among items. The re-
lation of number of alternative choices to test reli-
ability has also been investigated by Carroll (1),
Lord (3), and Plumlee (4). The present paper dif-
fers from these approaches to the problem in that
it does not involve item difficulties, but considers
only components of variance of test scores. It in-
volves no assumptions about intercorrelations among
items and holds for the case in which there isaneg-
ative correlation between true scores and error
scores introduced by guessing. The result is rela-
tively simple in form.
VARIANCE OF ERROR SCORES AND OF
OBSERVED SCORES
When chance success due to guessing is the only
source of error, the error scores for those true
scores having a fixed value, T, will approach a bi-
nomial distribution as the number of cases con-
sidered increases without limit. Therefore, we can
write Kr
Ties ee oe
T
Kr
where Ep is the mean of the error scores for the
true scores having some fixed value, Eny is an er-
ror score for one of thse true scores, and Kris the
number of true scores having that particular value.
90
THE JOURNAL OF EXPERIMENTAL EDUCATION
Here np indicates the mean of a binomial distribution
Since n = N - T and p = 1/a (7) we have
K
_ KyN - KT
» art
a
where N is the total number of items and a is the number of choices per item
Summating, as the true score value varies from O to N, we can write
N Kf
~ KN - KyT
3] ) en) .
a
T=0 oi=l
T=O
or KN - =T
[4]. ee
a
Equation [ 4] gives, in other words, the sum of the error scores in the entire distribution of test scores
The variance of error scores corresponding to the true scores for a fixed value, T, is
KT
» ET} ( y ETj y
ae
T c
As Kr increases without limit this variance is also given by the binomial formula, npq, where q= (1 -
(1-1). Therefore we can write
a
Kr Kr ;
[6] » Eri �,� Eq, )
i= i= 1 1
an Ew ah (ett ").
K a a
= T
Solving [ 6] for
= gives
ye =
Kr
( » Eri
i geo et -
= KAN - KT
Lae | Ti a= T trl +
i=]
Kop
Substituting [ 2] in [7] gives
1 1
#6. fs Boe 2
», ET a [KpN - KpT ] [KpN*° - 2K,NT + KyT* J.
N
N
3] ) » ETi = y = De n- Ker] + -
2 2
" [KpN* - 2KpNT + KpTT].
T=O i=l T=O T=O
OY ao
2
[10] 2E" = ",
Summating, as the true score value varies from O to N
K
N jt
, leads to the following result
- 2NXT + Z2T?).
p), or
BURKHEIMER -" ZIMMERMAN - WILLIAMS 91
Total error variance is given by
7 aaa tea 0) =)
[ai] so = _
4, ee K?
Substituting [4] and [10] in [11] and reducing gives
(=T)°
: sa aa ha
[3224 Se a eo ( F ) + (N- T), which can also be written as
a
T
ae
3
I
J
ed es
In a similar manner it can be shown that the variance of observed scores is given by the following equation:
ein ot 3
" (N-7).
a a
[14] S6 =
Equations [13] and[14], then, give the variance of error scores and observed scored under the assumption
that chance success due to guessing is the only source of error. These variances are expressed as a function
of number of items, number of choices, and mean and variance of true scores.
CORRELATION BETWEEN ERROR SCORES ON PARALLEL FORMS OF A TEST
An expression will now be derived for the correlation between error scores on parallel forms of a test.
This correlation can be written as follows:
2 @4@o
[15] r,, ="""" , or
ee 2
Ks
as ya 2)
BU ee eee ae cn K
i 5 SE2 - (SE)?
K
Expressions for DE and XE? are given in[4] and[10]. An expression is needed, therefore, for DE,E, in
order to determine reg. We begin by finding the sum of E,E, values for a fixed E, value and a fixed T value.
In other words, we consider a joint distribution or error scores on parallel forms of a test for each T value.
We can write A)
Kr, Kr,
[17] » Ei; =E, » Boj , Since E, is fixed. Using [2] gives
j=l j=l
Kk,
N-T
[18] »ELEp; = EiKy, (";" ). Since for a fixed value of E,, Ky, = Kg, » we have
j=1
KE,
NT
[19] ) B.E5j = EsKp, ("" ).
j=l
Sum mating now over the E, values gives the following:
92 THE JOURNAL OF EXPERIMENTAL EDUCATION
N-T Kg,
[ 20] - YE, E2j = os KR, which can also be written as
E,-0 j=l
N-T Kg, Ky
[21] » vB, 1E 2; = Pa ) » Eri, where Kr, indicates the total number of cases for the fixed value of T.
E, 20. j=l i=l
Again using the equation [2 | we have
N-T Kg,
[ 22] » YE, 1E2; =
E,=O j= =]
N-T K
Ea KyN? - 2KpNT + KyT?
[ 23] » Bio; =
E,=O j=l
We now need only summate over the T values to obtain YE, E, for the entire distribution. Doing this, we ob-
tain
or
N-T Kg,
2K.NT + KpT *
[24] x=E Ae 3 VEsEs; = pe. , or
ae | j leon q2
T=0 E,=0 j=l T=O
1
[25] DEE, = rae (KN2- 2NZT + DT?).
Substituting equation [ 4] and [ 25] in the numerator of [ 16] and simplyfing, gives
(@T)°
ae eee
| 26] «e... =
a� (28%. "" a)
K
Dividing by K in both numerator and denominator leads to the following result:
s2 ;
t
[27] roo = a
a� Se
Reliability is given by
s2
[ 28] ne 7. (1 - r,,,) (Reference 8). Substituting [ 27| in [28] we have
Oo
22 2 2 2
a So 7 a Set St
[29] ro, = 22
aS5
Subtracting [13] from [14] gives
2-2 228 2 2
[30] as, - a's, = (a- 1)'s; - 8; .
BURKHEIMER -"- ZIMMERMAN "- WILLIAMS 93
Substituting this result in [ 29] and simplifying, we
have
(a- 1)? sf
[ 31] ae i
So
This expression gives maximum reliability interms
of number of choices, variance of true scores, and
variance of observed scores. Substituting the value
for s6 given by [14] leads to the following alterna-
tive result:
(a - 1) sf
r = nen
Oe ere + eT
[ 32]
This equation, then, gives the maximum relia-
bility of a multiple-choice test as afunction of num-
ber of items, number of choices, variance of true
scores, and mean of true scores. It indicates that
maximum reliability depends on group heterogeneity
as well as test length and number of choices. _
ioe Pee, ee Si ag &
Since O = T+ E and, from [4], E ="",;" , we
can write
Pay ew ae
a* 2
Solving [ 14] for ar; substituting the results, to-
gether with [33], in [31], and simplifying gives
another expression for maximum reliability:
N-O
[ 34] Jes i= zee
ALTERNATIVE EQUATIONS FOR CORRELATION
BETWEEN ERROR SCORES ON PARALLEL
FORMS
Substituting [13] in [27] and simplifying, we
have
Sf
[35] r_ = - " ?
nae ee Nome de
which is similar in form to [32]. Equation[ 34] can
be written in this form:
N-O
[36] s* Q- ro,) =
Equation [ 28] can be written as follows:
[87] s5 - 1r5,) = 82 (1 - Tee):
Substituting the right hand side of [37] in [36] and
simplifying, we have
N-O
og as
as?
�,�
[38] r
COMPUTER CHECKS
The equations presented above give the values of
Yoo and reg which would be expected if chance suc-
cess due to guessing were the only source of error
in multiple-choice tests. The reliabilities of actual
tests would be expected to be less thanthese values
because of the presence of other sources of error.
In addition, if reliability were determinedfrom a
finite number of ordered pairs of observed scores
on parallel forms of atest, with only error due to
guessing present, there would be sampling variabil-
ity of the reliability coefficient. The binomial dis-
tribution of error scores assumed in derivation of
the equations, in other words, would be only approx-
imated for any finite number of true scores.
As the number of ordered pairs of scores on par-
allel forms increases without limit, however, the re-
liability coefficient would be expected to come clos-
er and closer to the values given by the equations.
In a previous paper (8) a method of determining the
reliability coefficient by a computer simulation
method was described. It was shown that for fairly
large numbers of scores (samples of 100, 400, 700,
and 1000) the estimates given by the method were
stable. For example, for ten samples of 400 scores,
the reliability of a 100-item, two-choice test was
indicated as . 89, .88, .89, .87, .90, .88, . 89,.89,
. 89, and . 87.
In Table 1 the reliabilities given by the computer
simulation method are compared to the values given
analytically by equations [ 31], [ 32], and[ 34] above.
Also, the correlations between error scores on par-
allel forms given by the computer program are com-
pared to the values given by equations [27], [35],
and [38] above. In making these checks we begin
with a distribution of true scores having a certain
mean and a certain variance. The computer pro-
gram then generates error scores which depend up-
on the magnitude of the true scores, as a model of
guessing error, and these are added tothe true
scores to give observed scores. Repeating the pro-
cedure gives results comparable to observed scores
on parallel forms of a test, when guessing is the
only source of error. Finally, product-moment
correlations between the two sets of observed scores '
give an indication of test reliability. Also, corre-
lation between the two sets of error scoresis found,
as well as the means and variances of all distribu-
tions.
It can be seen from the table that the values given
by the computer program correspond closely to the
values predicted from the equations presented in
this paper.
REFERENCES
1. Carroll, J. B., ~~The Effect of Difficulty and
Chance Success on Correlations Between Items
94
TABLE 1
THE JOURNAL OF EXPERIMENTAL EDUCATION
COMPARISON OF COMPUTER RESULTS WITH VALUES PREDICTED FROM EQUATIONS
N=10 N=10 N=100 N=100
a=2 a=9 a=2 a=5
pes . 44 . 74 og 97
roo** 46 77 . 89 96
rae . 44 . 16 . 89 oF
rou . 42 . 76 . 89 97
ee 46 17 . 89 . 65
Yee} . 44 15 . 88 mr!
LeeT ** . 44 ey | . 89 . 66
Togiiit .45 223 . 89 . 65
* Value obtained from computer program :
** Value obtained by substituting computer data in equation [ 31]
*** Value obtained by substituting computer data in equation [ 32]
**** Value obtained by substituting computer data in equation [ 34]
1 Value obtained from computer program ?
11 Value obtained by substituting computer data in equation Bg
111 Value obtained by substituting computer data in equation [ 35]
1111 Value obtained by substituting computer data in equation [ 38]
or Between TestsTT, Psychometrika, X (1945),
pp; i"22.
2. Horst, P., ~~The Maximum Expected Correla-
tion Between Two Multiple-Choice TestsTT,
Psychometrika, XIX (1954), pp. 291-296.
3. Lord, F. M., ~~Reliability of Multiple-Choice
Tests as a Function of Number of Choices per
ItemTT, Journal of Educational Psychology,
XXXV (1944), pp. 175-180.
4, Plumlee, L. B., ~~The Effect of Difficulty and
Chance Success on Item-Test Correlation and
on Test ReliabilityTT, Psychometrika, XVII
(1952), pp. 69-86.
5. Roberts, A. O. H., ~~The Maximum Reliability
of a Multiple-Choice TestTT, Psychologia Afri-
cana, IX (1962), pp. 286-293.
6. Williams, R. H., and Zimmerman, D. W.,
~~Some Conjectures Concerning the Index of
Reliability and Related Quantities When True
Scores and Error Scores on Mental Tests are
Not IndependentTT,
The Journal of Experimen-
tal Education, XXXV, No. 2 (Winter 1966), pp.
16-79,
10.
» Gimmerman, D. W., andWilliams,
. Zimmerman, D. W., and Williams, R. H., ~~Ef-
fect of Chance Success Due to Guessing on Er-
ror of Measurement in Multiple-Choice TestsTT,
Psychological Reports, XVI (1965), pp. 1193-
1196.
ie «Fae
~~Chance Success Due to Guessing and Non-in-
dependence of True Scores and Error Scores
in Multiple-Choice Tests: Computer Trials
with Prepared DistributionsTT, Psychological
Reports, XVII (1965), pp. 159-165.
» Limmerman, D. W., andWilliams, R. H.,
~~Independence and Non-independence of True
Scores and Error Scores in Mental Tests: As-
sumptions in the Definition of Parallel FormsTT,
The Journal of Experimental Education, XXXV,
No. 3 (Spring 1967), pp. 59-64.
Zimmerman, D. W., Williams, R.H.,and
Rehm, H. H.,~~Test Reliability When Error
Scores Consist of Independent and Non-inde-
pendent ComponentsTT, The Journal of Experi-
mental Education, XXXV, No. 1 (Fall 1966),
pp. To"16,
4