[This text is machine generated and may contain errors.]

~THE JOURNAL OF EXPERIMENTAL EDUCATION(Volume 35, Number 4, Summer 1967)THE MAXIMUM RELIABILITY OF A MULTIPLE-CHOICE TEST AS A FUNCTION OF NUMBER OFITEMS, NUMBER OF CHOICES, ANDGROUP HETEROGENEITYGRAHAM J. BURKHEIMERDONALD W. ZIMMERMANRICHARD H. WILLIAMSEast Carolina CollegeIN PREVIOUS papers (7, 8) it has been shownthat chance success due to guessing introduces anunavoidable source of error into multiple-choicetest scores. This particular class of error is neg-atively correlated with true scores. The usual equa-tions for test reliability and other intercorrelationsamong components of test scores depend upon theassumption that the correlations between true scoresand error scores and between error scores and er-ror scores on parallel forms of a test are zero, Inprevious papers (6, 8, 9, 10) more general equa-tions for these intercorrelation terms, whichdo notdepend upon the above assumptions, have been pre-sented,Because of the presence of chance success dueto guessing the reliability of a multiple-choice testhas a maximum value. In other words, if allsourcesof error other than chance success due to guessingwere eliminated, the reliability of a test would re-main at some value less than unity because of theunavoidable error due to guessing. The computersimulation method described previously (8) gave re-liabilities for several kinds of tests, under the as-sumption that only error due to guessing is present.The purpose of this paper is to determine these val-ues using analytic methods. An equation for themaximum reliability of a multiple-choice test, whichinvolves only number of items, number of choices,and mean and variance of true scores (group hetero-geneity) is derived.Horst (2) derived equations indicating the maxi-mum correlation between two different tests. Be-ginning with these, Roberts (5) derived equations formaximum reliability of a test. These results in-volve item difficulties and are based on assumptionsconcerning intercorrelations among items. The re-lation of number of alternative choices to test reli-ability has also been investigated by Carroll (1),Lord (3), and Plumlee (4). The present paper dif-fers from these approaches to the problem in thatit does not involve item difficulties, but considersonly components of variance of test scores. It in-volves no assumptions about intercorrelations amongitems and holds for the case in which there isaneg-ative correlation between true scores and errorscores introduced by guessing. The result is rela-tively simple in form.VARIANCE OF ERROR SCORES AND OFOBSERVED SCORESWhen chance success due to guessing is the onlysource of error, the error scores for those truescores having a fixed value, T, will approach a bi-nomial distribution as the number of cases con-sidered increases without limit. Therefore, we canwrite KrTies ee oeTKrwhere Ep is the mean of the error scores for thetrue scores having some fixed value, Eny is an er-ror score for one of thse true scores, and Kris thenumber of true scores having that particular value.

90THE JOURNAL OF EXPERIMENTAL EDUCATIONHere np indicates the mean of a binomial distributionSince n = N - T and p = 1/a (7) we haveK_ KyN - KT» art awhere N is the total number of items and a is the number of choices per itemSummating, as the true score value varies from O to N, we can writeN Kf~ KN - KyT3] ) en) .aT=0 oi=lT=Oor KN - =T[4]. eeaEquation [ 4] gives, in other words, the sum of the error scores in the entire distribution of test scoresThe variance of error scores corresponding to the true scores for a fixed value, T, isKT» ET} ( y ETj yaeT cAs Kr increases without limit this variance is also given by the binomial formula, npq, where q= (1 -(1-1). Therefore we can writeaKr Kr ;[6] » Eri �,� Eq, )i= i= 1 1an Ew ah (ett ").K a a= TSolving [ 6] for= givesye =Kr( » Erii geo et -= KAN - KTLae | Ti a= T trl +i=]KopSubstituting [ 2] in [7] gives1 1#6. fs Boe 2», ET a [KpN - KpT ] [KpN*° - 2K,NT + KyT* J.NN3] ) » ETi = y = De n- Ker] + -2 2" [KpN* - 2KpNT + KpTT].T=O i=l T=O T=OOY ao2[10] 2E" = ",Summating, as the true score value varies from O to NKN jt, leads to the following result- 2NXT + Z2T?).p), or

BURKHEIMER -" ZIMMERMAN - WILLIAMS 91Total error variance is given by7 aaa tea 0) =)[ai] so = _4, ee K?Substituting [4] and [10] in [11] and reducing gives(=T)°: sa aa ha[3224 Se a eo ( F ) + (N- T), which can also be written asaTae3IJed esIn a similar manner it can be shown that the variance of observed scores is given by the following equation:ein ot 3" (N-7).a a[14] S6 =Equations [13] and[14], then, give the variance of error scores and observed scored under the assumptionthat chance success due to guessing is the only source of error. These variances are expressed as a functionof number of items, number of choices, and mean and variance of true scores.CORRELATION BETWEEN ERROR SCORES ON PARALLEL FORMS OF A TESTAn expression will now be derived for the correlation between error scores on parallel forms of a test.This correlation can be written as follows:2 @4@o[15] r,, ="""" , oree 2Ksas ya 2)BU ee eee ae cn Ki 5 SE2 - (SE)?KExpressions for DE and XE? are given in[4] and[10]. An expression is needed, therefore, for DE,E, inorder to determine reg. We begin by finding the sum of E,E, values for a fixed E, value and a fixed T value.In other words, we consider a joint distribution or error scores on parallel forms of a test for each T value.We can write A)Kr, Kr,[17] » Ei; =E, » Boj , Since E, is fixed. Using [2] givesj=l j=lKk,N-T[18] »ELEp; = EiKy, (";" ). Since for a fixed value of E,, Ky, = Kg, » we havej=1KE,NT[19] ) B.E5j = EsKp, ("" ).j=lSum mating now over the E, values gives the following:

92 THE JOURNAL OF EXPERIMENTAL EDUCATIONN-T Kg,[ 20] - YE, E2j = os KR, which can also be written asE,-0 j=lN-T Kg, Ky[21] » vB, 1E 2; = Pa ) » Eri, where Kr, indicates the total number of cases for the fixed value of T.E, 20. j=l i=lAgain using the equation [2 | we haveN-T Kg,[ 22] » YE, 1E2; =E,=O j= =]N-T KEa KyN? - 2KpNT + KyT?[ 23] » Bio; = E,=O j=lWe now need only summate over the T values to obtain YE, E, for the entire distribution. Doing this, we ob-tainorN-T Kg, 2K.NT + KpT *[24] x=E Ae 3 VEsEs; = pe. , orae | j leon q2T=0 E,=0 j=l T=O1[25] DEE, = rae (KN2- 2NZT + DT?).Substituting equation [ 4] and [ 25] in the numerator of [ 16] and simplyfing, gives(@T)°ae eee| 26] «e... =a� (28%. "" a)KDividing by K in both numerator and denominator leads to the following result:s2 ;t[27] roo = aa� SeReliability is given bys2[ 28] ne 7. (1 - r,,,) (Reference 8). Substituting [ 27| in [28] we haveOo22 2 2 2a So 7 a Set St[29] ro, = 22aS5Subtracting [13] from [14] gives2-2 228 2 2[30] as, - a's, = (a- 1)'s; - 8; .

BURKHEIMER -"- ZIMMERMAN "- WILLIAMS 93Substituting this result in [ 29] and simplifying, wehave(a- 1)? sf[ 31] ae iSoThis expression gives maximum reliability intermsof number of choices, variance of true scores, andvariance of observed scores. Substituting the valuefor s6 given by [14] leads to the following alterna-tive result:(a - 1) sfr = nenOe ere + eT[ 32]This equation, then, gives the maximum relia-bility of a multiple-choice test as afunction of num-ber of items, number of choices, variance of truescores, and mean of true scores. It indicates thatmaximum reliability depends on group heterogeneityas well as test length and number of choices. _ioe Pee, ee Si ag &Since O = T+ E and, from [4], E ="",;" , wecan writePay ew aea* 2Solving [ 14] for ar; substituting the results, to-gether with [33], in [31], and simplifying givesanother expression for maximum reliability:N-O[ 34] Jes i= zeeALTERNATIVE EQUATIONS FOR CORRELATIONBETWEEN ERROR SCORES ON PARALLELFORMSSubstituting [13] in [27] and simplifying, wehaveSf[35] r_ = - " ?nae ee Nome dewhich is similar in form to [32]. Equation[ 34] canbe written in this form:N-O[36] s* Q- ro,) =Equation [ 28] can be written as follows:[87] s5 - 1r5,) = 82 (1 - Tee):Substituting the right hand side of [37] in [36] andsimplifying, we haveN-Oog asas?�,�[38] rCOMPUTER CHECKSThe equations presented above give the values ofYoo and reg which would be expected if chance suc-cess due to guessing were the only source of errorin multiple-choice tests. The reliabilities of actualtests would be expected to be less thanthese valuesbecause of the presence of other sources of error.In addition, if reliability were determinedfrom afinite number of ordered pairs of observed scoreson parallel forms of atest, with only error due toguessing present, there would be sampling variabil-ity of the reliability coefficient. The binomial dis-tribution of error scores assumed in derivation ofthe equations, in other words, would be only approx-imated for any finite number of true scores.As the number of ordered pairs of scores on par-allel forms increases without limit, however, the re-liability coefficient would be expected to come clos-er and closer to the values given by the equations.In a previous paper (8) a method of determining thereliability coefficient by a computer simulationmethod was described. It was shown that for fairlylarge numbers of scores (samples of 100, 400, 700,and 1000) the estimates given by the method werestable. For example, for ten samples of 400 scores,the reliability of a 100-item, two-choice test wasindicated as . 89, .88, .89, .87, .90, .88, . 89,.89,. 89, and . 87.In Table 1 the reliabilities given by the computersimulation method are compared to the values givenanalytically by equations [ 31], [ 32], and[ 34] above.Also, the correlations between error scores on par-allel forms given by the computer program are com-pared to the values given by equations [27], [35],and [38] above. In making these checks we beginwith a distribution of true scores having a certainmean and a certain variance. The computer pro-gram then generates error scores which depend up-on the magnitude of the true scores, as a model ofguessing error, and these are added tothe truescores to give observed scores. Repeating the pro-cedure gives results comparable to observed scoreson parallel forms of a test, when guessing is theonly source of error. Finally, product-momentcorrelations between the two sets of observed scores 'give an indication of test reliability. Also, corre-lation between the two sets of error scoresis found,as well as the means and variances of all distribu-tions.It can be seen from the table that the values givenby the computer program correspond closely to thevalues predicted from the equations presented inthis paper.REFERENCES1. Carroll, J. B., ~~The Effect of Difficulty andChance Success on Correlations Between Items

94TABLE 1THE JOURNAL OF EXPERIMENTAL EDUCATIONCOMPARISON OF COMPUTER RESULTS WITH VALUES PREDICTED FROM EQUATIONSN=10 N=10 N=100 N=100a=2 a=9 a=2 a=5pes . 44 . 74 og 97roo** 46 77 . 89 96rae . 44 . 16 . 89 oFrou . 42 . 76 . 89 97ee 46 17 . 89 . 65Yee} . 44 15 . 88 mr!LeeT ** . 44 ey | . 89 . 66Togiiit .45 223 . 89 . 65* Value obtained from computer program :** Value obtained by substituting computer data in equation [ 31]*** Value obtained by substituting computer data in equation [ 32]**** Value obtained by substituting computer data in equation [ 34]1 Value obtained from computer program ?11 Value obtained by substituting computer data in equation Bg111 Value obtained by substituting computer data in equation [ 35]1111 Value obtained by substituting computer data in equation [ 38]or Between TestsTT, Psychometrika, X (1945),pp; i"22.2. Horst, P., ~~The Maximum Expected Correla-tion Between Two Multiple-Choice TestsTT,Psychometrika, XIX (1954), pp. 291-296.3. Lord, F. M., ~~Reliability of Multiple-ChoiceTests as a Function of Number of Choices perItemTT, Journal of Educational Psychology,XXXV (1944), pp. 175-180.4, Plumlee, L. B., ~~The Effect of Difficulty andChance Success on Item-Test Correlation andon Test ReliabilityTT, Psychometrika, XVII(1952), pp. 69-86.5. Roberts, A. O. H., ~~The Maximum Reliabilityof a Multiple-Choice TestTT, Psychologia Afri-cana, IX (1962), pp. 286-293.6. Williams, R. H., and Zimmerman, D. W.,~~Some Conjectures Concerning the Index ofReliability and Related Quantities When TrueScores and Error Scores on Mental Tests areNot IndependentTT,The Journal of Experimen-tal Education, XXXV, No. 2 (Winter 1966), pp.16-79,10.» Gimmerman, D. W., andWilliams,. Zimmerman, D. W., and Williams, R. H., ~~Ef-fect of Chance Success Due to Guessing on Er-ror of Measurement in Multiple-Choice TestsTT,Psychological Reports, XVI (1965), pp. 1193-1196.ie «Fae~~Chance Success Due to Guessing and Non-in-dependence of True Scores and Error Scoresin Multiple-Choice Tests: Computer Trialswith Prepared DistributionsTT, PsychologicalReports, XVII (1965), pp. 159-165.» Limmerman, D. W., andWilliams, R. H.,~~Independence and Non-independence of TrueScores and Error Scores in Mental Tests: As-sumptions in the Definition of Parallel FormsTT,The Journal of Experimental Education, XXXV,No. 3 (Spring 1967), pp. 59-64.Zimmerman, D. W., Williams, R.H.,andRehm, H. H.,~~Test Reliability When ErrorScores Consist of Independent and Non-inde-pendent ComponentsTT, The Journal of Experi-mental Education, XXXV, No. 1 (Fall 1966),pp. To"16,4