Testing with Personal Probabilities: Eleven Year Olds Can Correctly Estimate The …

IFSR Newsletter 1998 Vol. 17 No. 1 March
A. DIRKZWAGER
ARIED@XS4ALL.NL
Testing with personal probabilities is an advantageous alternative to multiple choice testing because guessing is eliminated. A crucial condition is that the subjects should be well calibrated. They are well calibrated when they report their true probabilities of being right. It is shown to be feasible in an ordinary school setting to have even 11 year olds learn to be well calibrated in quite a short time, especially if probability testing is the method they become accustomed to. An interactive computer program to take a test (TESTBET), which was used in the experiment, is described. Its use in regular education is feasible, and the educational advantages are discussed.
If one does not know the right answer guessing is the optimal response strategy with multiple choice tests. Guessing can be expected to reduce the reliability and validity of the test scores). A better method is to have the subjects report their personal probabilities on each of the answer alternatives. Someone who knows need not guess. He or she can assign a probability of one to the right alternative. Someone, who does not know at all, to whom all alternatives are equally likely, need not guess either. He or she may assign equal probabilities to all alternatives.
The majority of examinees, however, will be unsure regarding response alternatives. Some alternatives are more likely to be true than others are. Also in that case, the best one can do is to ask the subject to report the likelihood of each alternative, the personal probability that he or she is right if one picks that alternative in a multiple choice situation.
If the subject is able and motivated to report these probabilities correctly, one has a fine continuous measure of knowledge on that item; namely the probability assigned to the correct answer, or any monotonous function of this probability.
This function is the scoring rule of the test. The simplest scoring rule is the identity function; one takes the probability assigned to the correct alternative for the item score and sums up the item scores to obtain the total test score. With this rule, however, the best test taking strategy is always to assign a probability of one to the most likely alternative, however unlikely it may seem to be true. So again pure guessing is enhanced. A proper scoring rule should be used instead.
A proper scoring rule is a rule such that the subject maximizes the expected score if and only if he or she reports the probabilities truly. Dirkzwager derived such a scoring rule as a linear function of the logarithm of the probability assigned to the correct alternative with constants chosen such that the maximum score per item is 100 points and such that the score is zero if all alternatives are assigned equal probabilities. This score becomes negative for lower probabilities, meaning that in those cases the subject is not only just “uninformed” but even misinformed. In fact he or she may be even holding a serious fallacy. With multiple choice no distinction is possible between these cases.
Two questions arise with regard to this method:

  1. Is one not measuring two different factors,
    1. knowledge and
    2. some personality trait like (self)confidence?
  2. Are subjects able to report their probabilities correctly, that is to say, are they well calibrated or at least can they learn to be?
| Category: IFSR NEWS