[This text is machine generated and may contain errors.]

Psychological Reports, 1965, 16, 1193-1196. ? Southern Universities Press 1965EFFECT OF CHANCE SUCCESS DUE TO GUESSING ON ERROROF MEASUREMENT IN MULTIPLE-CHOICE TESTSDONALD W. ZIMMERMAN AND RICHARD H. WILLIAMSEast Carolina CollegeSummary."Chance success due to guessing is treated as a component of theerror variance of a multiple-choice test score. It is shown that for a test of givenitem structure the minimum standard error of measurement can be estimated bythe formula Y (N"X)/a, where N is the total number of items, X is the score,and @ is the number of alternative choices per item. The significance of non-independence of true score and this component of error score on multiple-choicetests is discussed.The reliability of a test is limited by a number of factors which taken to-gether are said to constitute oerror.? For actual tests these factors and the relativecontribution of each are unknown.One contribution to the error variance of a multiple-choice test score, how-ever, is apparent from examination of the test. This is the chance error due tothe guessing inherent in this type of test. Suppose a test consists of N itemswith a alternative choices for each item. If a person had no knowledge of thesubject matter but marked the answers to all items with the aid of a table ofrandom numbers, a score of (1/a)N correct answers could be expected. If thenumber of alternatives per item is small (in most multiple-choice tests 4 or 5), asubstantial component of the total score may be accounted for by successfulguessing.More important, however, is the variability of the component of the totalscore attributable to successful guessing. It is error variance which limits the re-liability of a test. If all persons obtained the same number correct by guessing,there would be no problem. A constant would be added to the score. Differentersons, however, will receive different increments in score as a result of more-or-less successful guessing. In fact, the number of correct guesses will presumablyfollow a binomial distribution with mean xp (where zm is the number of itemsguessed and p is the probability of a correct guess) and variance npg (where ¢isl1" p).In scoring tests a ocorrection formula,? in which a fraction of the number ofwrong answers is subtracted from the number right, is sometimes used (cf. Lord,1963). The correction makes the scores of those persons who guess more com-arable to the scores of those who, for one reason or another, do not guess. Itshould be noted, however, that the correction has no effect upon variability intro-duced by guessing. For some scores, in other words, the formula will undercor-rect, for others it will overcorrect. .The item structure of a test (the number of items and the number of alterna-

1194 D. W. ZIMMERMAN & R. H. WILLIAMStive choices per item) can thus be considered as contributing a component oferror of measurement which is unavoidable. A simple formula can be derivedby which this value can be estimated for any particular test.The following symbols will be used.N = total number of items on the testXx = observed scorefi = true scorea = number of alternative choices per itemn = number of items on which guesses are madeSemin == minimum error variance (variance of distribution of number of items guessedcorrectly )Semin = minimum standard error of measurement.The error variance in which we are interested is the variance of the distribu-tion of number of items which are guessed correctly. This will be given by thebinomial formula, mpg, where m is the number of items on which guesses aremade, p is the probability of a successful guess, and g is 1"p. For a multiple-choice test p will be 1/a, where a is.the number of alternative choices per item,and g will be 1" (1/2).In order to use the binomial formula the number of items on which guessesare made must be estimated. First, the conventional ocorrection formula? forguessing can be used to estimate the true score.T = X-[1/(a-1)](N-X) . [1]Or, as usually expressed, the true score is estimated by subtracting a fraction ofthe number of items owrong? from the number oright.? This fraction is onedivided by one less than the number of alternatives per item (14 of numberwrong for a test with 5 alternatives, 14 for 4 alternatives, and so on).The number of items on which guesses are made can then be found by sub-tracting this result from the total number of items on the test.nm = N"{ X-[1/(a-1)](N-X) + . [2]Finally, this result is substituted in the binomial formula to give the varianceof the distribution of number of items guessed correctly,Semin = [N"} X-[1/(a-1)](N-X) ¢] (1/2) [1-(1/a)] . [3]Simplifying, the following result is obtained:Semin w= (N-X)/a « [4]The square root gives the minimum standard error of measurement,Se mins y (N-X) /a . [| ;The value obtained by the formula is a minimum in that, if all other sourcesof error were eliminated, a standard error of this value would still be contributedby the item structure of the test.Derivation of the formula is based on assumptions which are only approxi-mated in an actual situation. Failure of these assumptions to hold precisely wouldGUESSING ERROR IN MULTIPLE-CHOICE TESTS 1195further increase the standard error of measurement. For example, a person takinga test may eliminate some of the alternatives for a given item because he hasartial information. There is never a sharp distinction between oknowing theanswer? and oguessing? (cf. Horst, 1933). In an actual test, therefore, the prob-ability of a successful guess will be somewhat greater than 1/a.Also, as said previously, there are many other sources of error in addition toguessing. Results obtained using the formula given above, therefore, must beconsidered as lower limits. Actual standard errors will be greater than the cal-culated values. The formula may prove useful, however, in giving a rough ideaof what can be expected from any particular type of item structure.For example, consider a test of 100 items, with 4 alternatives per item, and a scoreof 50. Use of the formula shows a minimum standard error of measurement of 3.5. Or,as an extreme example, consider a o~true-false? test of 10 items and a score of 5. This is aspecial case of a multiple-choice test, where a is 2. Calculation shows a minimum standarderror of measurement of 1.6. In this case the standard error would be almost as large asthe standard deviation of the true scores which would be expected. Here the formula con-firms what one would suspect, that short otrue-false? tests are quite unreliable.An important feature of this minimum standard error of measurement is that it varieswith true score. The higher the true score, the lower its value will be. The standard errorof measurement as usually understood is a fixed value for a given test. The confidence in-terval for true score which is established is the same width for any observed score. Thisdifference reflects the special characteristics of the class of error variance considered in theresent paper.TABLE 1MINIMUM STANDARD ERROR OF MEASUREMENT FOR A MULTIPLE-CHOICE TESTWITH N ITEMS AND a ALTERNATIVE CHOICES PER ITEMN/a 2 3 4 5 N/a 2 3 4 510 1.6 1.3 ee 1.0 90 4.7 3.9 3.4 3.020 Ze 1.8 1.6 1.4 100 5.0 4.1 oe 6 ee30 2 Ze 1.9 Ley 110 ie. 4.3 a7 3.340 3.2 2.6 ZZ 2.0 120 aD 4.5 3.9 3.550 5D 2.9 25 2.2 130 Dol 4.7 4.1 3.660 3.9 3.2 | 2.4 150 6.1 5.0 4.3 3.970 4,2 Fee 3.0 2.6 200 Ti 5.8 5.0 4.5380 4.5 3.7 5 2.8 250 8.0 6.5 5.6 5.0Table 1 shows the values of the minimum standard error for selected values of N anda. These include those which would most often occur in tests. In calculating these valuesit has been assumed that the score being considered is 14 the total number of items.An implication of the above consideration concerns non-independence of true scoreand error score in multiple-choice tests. In test theory it has been assumed often that errorscore and true score are uncorrelated. For multiple-choice tests where there is chance suc-cess due to guessing this assumption cannot be made. Those persons with low true scoreswill guess on more items and thus receive relatively higher error scores. On the other

1196 D. W. ZIMMERMAN & R. H. WILLIAMShand, those persons with high true scores will guess on fewer items and receive lower errorscores. Therefore, there will be a negative correlation between true score and error score.As shown above, minimum standard error of measurement is a decreasing function of truescore.The extent to which non-independence of true score and error score is a serious prob-lem for test theory is not certain. Possibly the inaccuracy introduced by neglecting this re-lationship is not large. A similar situation has been found to be true in the case of otherstatistical problems where the fit of the theoretical model to the actual situation is imper-fect (Box, 1953; Norton, 1953). In the case of multiple-choice tests, however, the fact ofnon-independence is clear and its possible effect could be large.REFERENCESBox, G. E. P. Non-normality and tests on variances. Biometrika, 1953, 40, 318-335.Horst, A. P. The difficulty of a multiple choice test item. J. educ. Psychol., 1933, 24,229-232.LorD, F. M. Formula scoring and validity. Educ. psychol. Measmt, 1963, 23, 663-672.NORTON, D. W. An empirical investigation of some effects of non-normality and hetero-geneity on the F-distribution. Unpublished Ph.D. thesis in Education, State Univer.of Iowa. Reported in E. F. Lindquist, Design and analysis of experiments in psy-chology and education. Boston: Houghton-Mifflin, 1953.Accepted May 10, 1965.