<?xml version="1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.tei-c.org/ns/1.0 http://digital.lib.ecu.edu/tei/xsd/tei_P5.xsd">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>
        </title>
        <author>
        </author>
        <respStmt>
          <resp>Text encoded by</resp>
          <name>Digital Collections</name>
        </respStmt>
      </titleStmt>
      <publicationStmt>
        <distributor>East Carolina University. J. Y. Joyner Library</distributor>
        <address>
          <addrLine>Digital Collections</addrLine>
          <addrLine>Joyner Library, East Carolina University</addrLine>
          <addrLine>East Fifth Street, Greenville NC 27858-4353 USA</addrLine>
        </address>
        <date>2012</date>
      </publicationStmt>
      <sourceDesc>
        <bibl>
        </bibl>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <samplingDecl>
        <p>All quotation marks retained as data.</p>
        <p>All end-of-line hyphens have been removed, and the trailing part of a word has been joined to the preceding line.</p>
        <p>All smart quotes have been converted into straight quotes.</p>
      </samplingDecl>
      <classDecl>
        <taxonomy xml:id="LCSH">
          <bibl>Library of Congress Subject Headings</bibl>
        </taxonomy>
      </classDecl>
    </encodingDesc>
    <profileDesc>
      <creation>
        <date>
        </date>
      </creation>
      <langUsage xml:lang="en-US">
        <language ident="en-US" usage="100">English</language>
      </langUsage>
      <textClass>
        <keywords scheme="#LCSH">
          <list>
            <item>
            </item>
          </list>
        </keywords>
      </textClass>
    </profileDesc>
  </teiHeader>
  <text>
    <body>
      <div type="other">
        <p rend="align(centerbold)">[This text is machine generated and may contain errors.]</p>
        <pb facs="00079309_0001" />
        <p>sychological Reports, 1965, 17, 159-165. © Southern Universities Press 1965<lb /><lb />CHANCE SUCCESS DUE TO GUESSING AND NON-INDEPENDENCE<lb />OF TRUE SCORES AND ERROR SCORES IN MULTIPLE-CHOICE<lb />TESTS: COMPUTER TRIALS WITH PREPARED DISTRIBUTIONS!<lb /><lb />DONALD W. ZIMMERMAN AND RICHARD H. WILLIAMS<lb />East Carolina College<lb /><lb />Summary."The effect of chance success due to guessing upon the variance<lb />of multiple-choice test scores was estimated from prepared distributions of large<lb />numbers of scores. Each score consisted of an assumed otrue score? component<lb />and an oerror score? component generated by a computer. A large negative<lb />correlation was found between true scores and error scores and a positive correla-<lb />tion between error scores and error scores. The equation showing reliability in<lb />terms of components of variance was derived under the more restrictive assump-<lb />tion that there is a correlation between true scores and error scores, and the result<lb /><lb />Som Moy 1 " [ (5e?/ 50?) (1 " ree)<lb /><lb />was obtained. The fact that reliability can be positive even though error vari-<lb />ance and observed variance are equal was discussed.<lb /><lb />In a previous paper (Zimmerman &amp; Williams, 1965) it has been shown<lb />that the minimum standard error of measurement of a multiple-choice test can<lb />be estimated by the formula \/ (N"X) /a, where N is the total number of items,<lb />X is the score, and a is the number of alternative choices per item. A standard<lb />error of this value could be expected, even if all other sources of error were<lb />eliminated, because of the factor of chance success due to guessing inherent in<lb />multiple-choice tests.<lb /><lb />The minimum standard error due to guessing varies with true score. The<lb />higher the true score, the smaller the increment possible because of guessing,<lb />the lower the true score, the larger the increment possible. Therefore, there is a<lb />negative correlation between true score and the component of error score attrib-<lb />utable to guessing.<lb /><lb />In test theory it has been assumed often that true scores and error scores are<lb />uncorrelated. If guessing is a factor, however, this assumption does not hold.<lb />The above considerations have the following implications. (1) In multiple-<lb />choice tests there will be a negative correlation between true scores and error<lb />scores. The degree of the correlation will depend upon the relative contribution<lb />of error variance due to guessing to the total error variance. (2) On alternate<lb />forms of a test there will be a positive correlation between error scores. (3) The<lb />reliability of a test (correlation between observed scores on alternate forms) will<lb />be limited to a maximum value, depending upon the relative contribution of<lb />error variance due to guessing to total error variance.<lb /><lb /><lb />~The authors wish to thank F. Milam Johnson and LaJon Hutton of the Department of<lb />Mathematics at East Carolina College for providing the facilities of the East Carolina<lb />Computer Center for this project.</p>
        <pb facs="00079309_0002" />
        <p>160 D. W. ZIMMERMAN &amp; R. H. WILLIAMS<lb /><lb />One way to consider this problem is in terms of the equation showing the<lb />relation between variance of true scores, variance of observed scores, and vari-<lb />ance of error scores. The equation is<lb /><lb />Sos tse + Artesise . [1]<lb /><lb />The correlation term at the right has usually been assumed to be zero, making<lb />observed variance the sum of true variance and error variance. The above con-<lb />siderations suggest, however, that the correlation term will not be zero when<lb />scores on multiple-choice tests are considered.<lb /><lb />Whether or not this fact is of any importance depends upon the extent to<lb />which variance due to guessing contributes to total error variance. If the cor-<lb />relation term were negligible, the inaccuracy of neglecting the relationship<lb />would not be large.<lb /><lb />For any particular test it is impossible to determine how much error vari-<lb />ance due to factors other than guessing is present. The reliability of the test de-<lb />ends upon the total variance due to all sources of error. Furthermore, reliability<lb />depends upon the heterogeneity of the group tested. There is no simple way,<lb />therefore, to estimate the extent to which the correlation of true scores and error<lb />scores will affect reliability.<lb /><lb />The approach taken here was to estimate the influence of these factors by<lb />correlating hypothetical distributions of large numbers of scores using a com-<lb />uter. For these distributions, where a certain mean of true scores and a certain<lb />variance of true scores were assumed, estimates were obtained for the above inter-<lb />correlations.<lb /><lb />METHOD<lb /><lb />A distribution of 1000 otrue? scores (¢) was prepared. The scores ranged<lb />from 0 to 10 and were distributed binomially. The mean was 5 and the standard<lb />deviation, 1.6. This could approximate the distribution of true scores of persons<lb />taking a otrue-false? test of 10 items, where the mean is one-half the total num-<lb />ber of items.<lb /><lb />An IBM 1620 computer was programmed to perform the following opera-<lb />tions. First, each score was subtracted from 10 to determine the number of<lb />oguesses? to be made. The oguesses? were made by entering a table of random<lb />numbers, considering the first 10"z digits, and summing the number of even<lb />digits. ~This value is comparable to the error score due to guessing for a true-<lb />false test of 10 items, when a true score of ¢ and guesses on 10"+# items are<lb />assumed.<lb /><lb />The same procedure was followed for each of the 1000 scores in the distri-<lb />bution. This gave a distribution of 1000 oerror? scores (e,). Then, the entire<lb />rocedure was repeated a second time to give a second distribution of 1000<lb />oerror? scores (@.). Each score in the ¢ column was then added to its corre-<lb />sponding score in the e; column to give an oobserved? score (0). Each score<lb /><lb />a?</p>
        <pb facs="00079309_0003" />
        <p>CHANCE SUCCESS ON MULTIPLE-CHOICE TESTS 161<lb /><lb />in the ¢ column was also added to its corresponding score in the e2 column to<lb />give another oobserved? score (02). Finally, the Pearson product-moment cor-<lb />relation coefficient was obtained between the 1000 pairs of oobserved? scores<lb />(0, and 02). Correlation coefficients were also obtained between the ¢ scores<lb />and the e; scores, between the ¢ scores and the es scores, and between the e;<lb />scores and the és scores.<lb /><lb />The correlation between the 0, and the o» scores can be considered an esti-<lb />mate of the reliability of the test. It would be comparable to the correlation<lb />between alternate forms of a test, where the only source of error is that attrib-<lb />utable to guessing.<lb /><lb />RESULTS<lb />The following results were obtained: 5,2 = 1.84, 5,2 = 1.83, rie = ".59,<lb />toe. we Ge ts. Se<lb />re 2<lb /><lb />2 i 2<lb /><lb />The same procedure described above was repeated using distributions of<lb />100, 400, and 700 scores, in order to see how variable the results would be for<lb />different samples. The values obtained for a distribution of 400 scores were<lb />approximately the same as for 1000 scores, the correlation coefficients differing<lb />at most by .02. Therefore, for all the other cases considered a distribution of<lb />400 scores was used.°<lb /><lb />The same procedure was then repeated for another 10-item 2-choice test<lb />(with different variance of true scores), for a 10-item 5-choice test, for a 100-<lb />item 2-choice test, and for a 100-item 5-choice test. In these four cases the pre-<lb />ared distributions of true scores were normal, had a mean of one-half the total<lb />number of items, and had a standard deviation of approximately 1/5 the total<lb />number of items.<lb /><lb />Comparison of these four cases shows the way in which the above correla-<lb />tions vary with test length and number of alternative choices per item. Com-<lb />arison of the 10-item 2-choice tests with different variances shows the way<lb />in which the correlations vary with different distributions of true scores. The<lb />results are summarized in Table 1.<lb /><lb />In all four cases the variance of observed scores is less than the variance of<lb />true scores. Apparently, the fact that chance success adds proportionately more<lb />to low true scores results in observed scores with smaller variance than true<lb />scores in all four cases. For all four cases there is a high negative correlation<lb /><lb /><lb />"Actually, the computer program yielded 5 columns of error scores and 5 columns of ob-<lb />served scores as a check upon the variability of the results. The variances and correlation<lb />coefficients given above are the means of the 5 values obtained for s.?, 507, and rte and for<lb />the 10 values obtained for Toe, and To 9. The computer program gave all results to four<lb /><lb />decimal places and the values reported here have been rounded to two decimal places. The<lb />variability was not large. For example, for the five correlations between true scores and<lb />error scores for the 100-item 5-choice test the values were: ".78, ".81, ".80, ".81,<lb />and ".81. All 10 of the reliability coefficients for the 100-item, 5-choice test were .97.<lb /><lb />T</p>
        <pb facs="00079309_0004" />
        <p>162 D. W. ZIMMERMAN &amp; R. H. WILLIAMS<lb /><lb /><lb /><lb /><lb /><lb />TABLE 1<lb />COMPUTER RESULTS FROM PREPARED DISTRIBUTIONS<lb />N=10 N=10 N=100 N=100<lb />77 fessies I a= Bea a=5<lb /><lb />Too. 44 74 89 97<lb />Tte ".68 ".42 ".94 ".80<lb />Toe 46 7s. 89 .65<lb />s ane 1.04 109.62 22.98<lb />"id 2.16 5.32 109.34 259.34<lb />os" 3.99 3.99 387.24 387.24<lb />To 0 i 44 74 89 97<lb /><lb /><lb /><lb />*Predicted from Equation [10].<lb /><lb />between true scores and error scores. There is also a positive correlation of<lb />error scores with error scores.<lb /><lb />It is seen that reliability increases with both length of test and number of<lb />alternative choices per item. For the short tests the increase in reliability with<lb />number of alternative choices is large. The results are consistent with the te-<lb />sults obtained by Remmers and his associates (Denney &amp; Remmers, 1940; Rem-<lb />mers &amp; Ewart, 1941; Remmers &amp; House, 1941), who showed empirically that<lb />the reliability of various tests increases with number of choices, These results<lb />are also consistent with the equations given by Roberts (1962), who expressed<lb />maximum reliability in terms of average difficulty of items, test length, and<lb />number of choices.<lb /><lb />These two variables interact. Increase in test length from 10 items to 100<lb />items increases reliability from .44 to .89, when a = 2. But increase in test<lb />length from 10 items to 100 items increases reliability from .74 to .97, when<lb />a4 = 5. Or, conversely, increase in number of alternative choices per item in-<lb />creases reliability from .44 to .74, when N = 10. And increase in number of<lb />alternative choices per item increases reliability from .89 to .97, when N = 100.<lb /><lb />Also, the variance of the distribution of true scores is important. The 10-<lb />item 2-choice test first considered, which has smaller variance (not shown in<lb />table), has lower reliability (.34) than the 10-item 2-choice test with greater<lb />variance shown in the table (.44).<lb /><lb />In all the cases above the quantities s,2 and 5,2 and the ratio Se°/5o" also<lb />change. The ratio decreases with both increase in test length and increase in<lb />number of alternative choices per item.<lb /><lb />DISCUSSION<lb />One fact of interest is that the variance of error scores is approximately the<lb />same as the variance of observed scores for both the 10-item 2-choice test and<lb />the 100-item 2-choice test. Consider the usual equation showing reliability in<lb />terms of error variance and observed variance:<lb /><lb />. a<lb /><lb />©</p>
        <pb facs="00079309_0005" />
        <p>="*<lb /><lb />Cd<lb /><lb /><lb />CHANCE SUCCESS ON MULTIPLE-CHOICE TESTS 163<lb /><lb />To o" 1 gat Phe fs . [2]<lb />: 2<lb /><lb />From this equation it is expected that, when error variance is equal to observed<lb />variance, reliability is zero. Nevertheless, the reliability of the 10-item 2-choice<lb />test, as shown by the computer data, is approximately .44 and the reliability of<lb />the 100-item 2-choice test approximately .89. The reason for this can be seen<lb />by considering Equation [1]. In deriving [2] from [1] the correlation term at<lb />the right has been dropped. The present data show, however, that this term is,<lb />in fact, a large negative correlation. If this term is negative, then, reliability<lb />can be positive, even though error variance and observed variance are equal.<lb /><lb />Another way to say this is that chance success due to guessing makes ob-<lb />served scores less variable because of the negative correlation between error<lb />scores and true scores. Even though observed variance and error variance are<lb />nearly equal, reliability remains a positive value.<lb /><lb />A check was made by substituting in Equation [1] all values given by the<lb />computer data for the 100-item 5-choice test. The observed variance predicted<lb />from [1], given 5,7, 5.7, 71e, 54, and 5, is 259.39. The observed variance yielded<lb />by the computer program is 259.34.<lb /><lb />Because of the importance of these correlation terms it is necessary to derive<lb />the equation showing reliability in terms of components of variance under the<lb />more restrictive assumptions that intercorrelations among true scores, error<lb />scores, and observed scores exist. Reliability of a test (correlation between al-<lb />ternate forms) can be expressed as follows:<lb /><lb />To o = Dxixe / N50 SO.  [3]<lb />12 5 a «4<lb />where x; and x» are deviations of observed scores from the mean of observed<lb />scores. That is, x} = 0,"M, and x2 =0."M,.<lb />1 2<lb /><lb />Since observed score is the sum of true score and error score, since the true<lb />scores on alternate forms are the same, and since the standard deviations of ob-<lb />served scores on alternate forms are the same, we can write<lb /><lb />fet 9c tae [S(¢+ a) (¢+ e)]/Ns.? , [4]<lb /><lb />or<lb />foo = (ZF + Dew + Tet + Derers) / Ns? . [5]<lb />12<lb /><lb />This can be rewritten as<lb />Too == (1/50) (S527 + re the St +e 15051 + feeSeSe) . [6]<lb />12 1 z 2 2 132 1 2 e<lb /><lb />It is assumed that 5, "=s,. Therefore,<lb />| <lb />foo = (1/507) (st? = 2rteStSe " SeVs e ) : [7]<lb />12 12<lb /><lb />Transposing [1] gives</p>
        <pb facs="00079309_0006" />
        <p>164 D. W. ZIMMERMAN &amp; R. H. WILLIAMS<lb /><lb />$1? +27 teStSe =f Ay "Se <lb />Substituting this result in [7] gives<lb />Too " (1/507) ?,? 7% ee se +. SeTe e ) . [9|<lb />12 ae<lb />Simplifying, the following result is obtained:<lb /><lb />re. 8 | = [ (56° / 50") (1 "~ ree) | ~ [10]<lb /><lb />This result differs from [2] only by the factor (1 " re e ). If re e were<lb /><lb />small, reliability would be close to the value given by |2]. The results aad by<lb />the computer, however, show rz ¢ to be large. Equation 10 indicates, then, that<lb />i<lb /><lb />the reliability of a test can be positive, even though error variance is equal to<lb />observed variance, because of the factor (1 " to e ie<lb /><lb />As a check, the values yielded from this Seah were substituted in [10].<lb />The reliability predicted from [10] for the 100-item 5-choice test, given 5,?, 5 te<lb />and vo e. is 97. The reliability from this program is .97. The other checks are<lb /><lb />reiele in Table 1.<lb /><lb />Conclusions<lb /><lb />When chance success due to guessing is the only source of error in a multi-<lb />le choice test, the following can be concluded: (1) There is a large negative<lb />correlation between true scores and error scores for any test length and for any<lb />number of alternative choices per item. (2) The variance of observed scores may<lb />be less than the variance of true scores. (3) For otrue-false? tests the variance<lb />of error scores may equal the variance of observed scores. For tests with more<lb />alternative choices per item the variance of error scores becomes less than the<lb />variance of observed scores. (4) Reliability increases with test length. (5) Re-<lb />liability increases with number of alternative choices per item. (6) Effects 4<lb />and 5 interact. For otrue-false? tests, reliability increases greatly with increase<lb />in test length. For tests with 5 choices per item, reliability increases slightly<lb />with increase in test length. For short tests, reliability increases greatly with in-<lb />crease in number of alternative choices per item. For long tests, reliability in-<lb />creases slightly with increase in number of alternative choices per item. (7)<lb />For any test length and for any number of alternative choices per item, there is<lb />a positive correlation between error scores on alternate forms. This correlation<lb />increases with test length and decreases with number of alternative choices per<lb />item. (8) The above correlations depend upon the distribution of true scores.<lb />For increased variance of true scores the correlation between true scores and<lb />error scores is higher, the correlation between error scores and error scores is<lb />higher, and reliability is higher. (9) The relationship among these quantities<lb />is expressed by the following equation:<lb /><lb />foo = L" (se /te) U~ ree) :</p>
        <pb facs="00079309_0007" />
        <p>CHANCE SUCCESS ON MULTIPLE-CHOICE TESTS 165<lb /><lb />REFERENCES<lb /><lb />DENNEY, H. R., &amp; REMMERS, H. H. Reliability of multiple-choice measuring instruments<lb />as a function of the Spearman-Brown prophecy formula: Il. J. educ. Psychol.,<lb />1940, 31, 699-704.<lb /><lb />REMMERS, H. H., &amp; Ewart, E. Reliability of multiple-choice measuring instruments<lb />as a function of the Spearman-Brown prophecy formula: Ill. J. educ. Psychol.,<lb />1941, 32, 61-66.<lb /><lb />REMMERS, H. H., &amp; HOUSE, J. M. Reliability of multiple-choice measuring instru-<lb />ments as a function of the Spearman-Brown prophecy formula: IV. J. educ. Psy-<lb />chol., 1941, 32, 372-376.<lb /><lb />ROBERTS, A. O. H. The maximum reliability of a multiple-choice test. Psychologia<lb />Africana, 1962, 9, 286-293.<lb /><lb />ZIMMERMAN, D. W., &amp; WILLIAMS, R. H. Effect of chance success due to guessing on<lb />error of measurement in multiple-choice tests. Psychol. Rep., 1965, 16, 1193-1196.<lb /><lb />Accepted July 2, 1965.</p>
      </div>
    </body>
  </text>
</TEI>