Applying optimal scheduling of practice in the Chinese Learnlab

PIs Pavlik, MacWhinney, Wu, Koedinger
Faculty MacWhinney, Wu, Koedinger
Postdocs Pavlik
Others with > 160 hours Dozzi, Lili Wu
Learnlab Chinese
Number of students 450
Total Participant Hours >1150
Datashop? Current to Spring 2007



  • Spring 2006
    • Software debugging and testing
    • Parameterization data collected from approximately 80 students and 160 hours in Elementary Chinese II
    • Parameterization data collected from approximately 20 students and 40 hours in Elementary Spanish I

  • Summer 2006
    • Multi-unit tutor and experiment piloted

  • Fall 2006
    • Multi-unit tutor applied in the following experiment:
The vocabulary tutor will be deployed in both Online and Classroom Chinese I classes for an efficacy test. The first 8 units (excluding Unit 1) of each class will be split into two tutors each with content for 4 units. Each of these 4 unit tutors will be an experiment replication, so that the experiment design is replicated twice for each class track. During these 4 unit in-vivo experiments, the tutor will alternate between required units and voluntary units, and the order of this alternation will be randomly assigned by the tutor software for each student. In each tutor, the first unit will be assessed before the 3rd unit and the 2nd unit will be assessed before the 4th unit. This design will allow a comparison of whether requiring the tutor provides an advantage to learning at a long-term interval. The tutor will also administer a brief survey of students to get self-reports of vocabulary study time from students (both inside and outside the tutor). This survey will be given from within the tutor and will take less than 5 minutes total for each 4 unit tutor. The hypothesis is that students will do better when required to use the tutor despite not spending greater overall time studying vocabulary (both inside and outside the tutor). Further, Sue-mei has offered to administer an in class assessment of vocabulary using a paper and pencil test after each 4 unit tutor. This will give a measure of transfer outside the tutor that is hypothesized to reveal similar effects. The probable benefit to students is from learning Chinese vocabulary more easily. All tutor curriculum is matched one-for-one with the words taught in the respective courses.

  • Spring 2007
    • Multi-unit tutor made cumulative
    • Comparison "flashcard" ecological control created
    • Tutor applied to directly compare the flashcard version with the cumulative optimized scheduling version

  • Fall 2007 -- Vocabulary practice
    • Multi-unit tutor now allows flexible student choice of unit or cumualtive practice
    • Students may choose a flashcard version or the optimized version
    • Between-subjects preference experiment for flashcard or optimized version
    • Prequiz/postquiz design to measure long-term learning and transfer.


  • Fall 2007 -- Radical practice
    • Between-subjects comaprison in which students practiced Chinese radicals or Hanzi characters that were not on the pre/pos quizzes
    • Randomized assignement
    • Prequiz/postquiz design to measure accelerated future learning on previously unstudied Hanzi characters


Research question

Does the optimized scheduling of practice produced by the Chinese vocabulary tutor result in measurable difference in performance for students?

Background and significance

Efforts to use practice scheduling algorithms date to the early 60's. One seminal example is Atkinson's (1972) German vocabulary tutor. While these efforts have often produced positive results, such programs have never been employed in the classroom in a consistent fashion. Perhaps this is due to the many practical issues involved with integrating such a system into the context of a course curriculum.

Dependent variables

Normal post-test
The tutor functions using an "assistments" type task where every drill practice is also a measure of normal learning.
Long-term retention
The experiment includes long-term assessments at various intervals. This includes both in tutor and paper and pencil tests of long-term vocabulary performance.
Transfer learning
Long-term assessments may be given (50% of the time) using pairings not drilled by tutor. These transfer tests will show whether and to what extent students can use what is learned int he tutor flexibly in new contexts.

Accelerated future learning - In the radical study (Fall 2007).

Independent variables

The amount practice for a particular group of subjects. Also, within subjects the amount of practice for any individual item.

Radical experiment (Fall 2007) -- we manipulated whether students got radical practice or Hanzi practice.


The dependent variables will reveal benefits for individuals using the tutor as compared to individuals studying with other methods.

Radical study hypotheses (Fall 2007) was that radical training would allow faster learning of previously unlearned Hanzi characters by providing knowledge components that would transfer to accelerate future Hanzi learning.


In Chinese, 7 sections of Chinese I class participated in an experiment in which students were randomized to either a) have unit 3 voluntary and unit 4 required or b) have unit 3 required and unit 4 voluntary. This crossover within-subjects experiment tested whether there was an advantage for requiring students to use the system 15 minutes compared to not requiring usage. For each student we computed the score advantage for the required unit vs. voluntary unit on a paper and pencil test of both units (10 items for each unit given approximately one month later). Results were not significant after a careful reanalysis of the data.

For the Spring 2007 semester, the classroom version results were interesting. There are differences in practice amounts between the control (flashcard) and experimental (optimized) between-subjects conditions. Specifically, students get about twice as many drill trials in the optimized condition (significant p<.001), about twice as many correct responses per minute (p<.001), a reduction in errors of 36% (p<.001), and about 2 minutes longer practice (p<.05). The longer practice and somewhat less attrition (significant p< 0.05 when subjects with performance of less than 10% correct were excluded) for optimized subjects suggest they prefer the optimized conditions. Results on the final quiz indicated a small advanatge for the optimized subjects (p<.05) for the earlier units in the course. Not surprisingly these early units were also the ones that showed the greater attrition for the flashcard session. Unforutnately, examination of learning curves for this dataset show that the optimziation model was flawed and not optimal. Specifially, the learning curves show a U-shaped dip (quite visible in the DataShop) where perfromance was strangley low. Conditional analysis showed that the model was overly optimistic about the learning following a failed drill and prematurely widened schedules. It was surprising that despite this problem the optimized condition did as well as it did.

Of course, the spacing of practice tends to be wider for the control subjects, since they are moving through a random order of the stimuli. This probably results for a large portion of the difference above. Further, the control condition allows more metacognitive control since subject must decide after each test whether they want that item repeated during the following pass through the set or not. However, both of these procedures might make the differences above during practice unrepresentative of any long-term effects of the conditions, since the wider spacing and metacogntive control of the flashcard control condition might improve long-term efficiency. Further, there also is a cumulative component to the comparison, since the optimization condition allows more efficient review of prior units. In the control condition, subjects are allowed the option of going through the full cumulative set after they finish each pass through the current unit set. Although this allows cumulative review for control subjects, it does not provide it in the efficient manner of the optimized condition in which cumulative review is interleaved with current practice using an expanding spacing for each old item.

In Fall 2007 vocabualry classroom work we are currently seeing a strong preference for the optimized condition. Considering the care that was taken to make this comparison unbiased, this seems to indicate that students percieve greater advantage for using the optimized version.

In Fall 2007 radical classroom work we are finding a sginficiant advantage for radical training. This advantage amounts to a twice as much improvement (approx 14% vs 7%) in the learning rate for subjects that were assigned the one-hour radical practice session.


Assuming the tutor is more efficient than other methods, one would expect that students using it would perform better in less time, perform the same in less time, or perform better in the same amount of time.

Transfer results have not yet been analyzed.


Optimizing the practice schedule

