Applying optimal scheduling of practice in the Chinese Learnlab
The vocabulary tutor will be deployed in both Online and Classroom Chinese I classes for an efficacy test. The first 8 units (excluding Unit 1) of each class will be split into two tutors each with content for 4 units. Each of these 4 unit tutors will be an experiment replication, so that the experiment design is replicated twice for each class track. During these 4 unit in-vivo experiments, the tutor will alternate between required units and voluntary units, and the order of this alternation will be randomly assigned by the tutor software for each student. In each tutor, the first unit will be assessed before the 3rd unit and the 2nd unit will be assessed before the 4th unit. This design will allow a comparison of whether requiring the tutor provides an advantage to learning at a long-term interval. The tutor will also administer a brief survey of students to get self-reports of vocabulary study time from students (both inside and outside the tutor). This survey will be given from within the tutor and will take less than 5 minutes total for each 4 unit tutor. The hypothesis is that students will do better when required to use the tutor despite not spending greater overall time studying vocabulary (both inside and outside the tutor). Further, Sue-mei has offered to administer an in class assessment of vocabulary using a paper and pencil test after each 4 unit tutor. This will give a measure of transfer outside the tutor that is hypothesized to reveal similar effects. Another aspect of the design is caused by the fact that many students quit after 15 minutes, which is before the tutor introduces all the items. Since item introduction will be randomized within-subjects, this means that we will be able to conduct within-subjects tests of learning as a function of practice with individual words by the student. This will give another direct measure of the benefit of supplementary practice using the tutor. The probable benefit to students is from learning Chinese vocabulary more easily. All tutor curriculum is matched one-for-one with the words taught in the respective courses.
Do the optimal schedules of practice produced by the Chinese vocabulary tutor result in measurable difference in performance for students?
Background and significance
Efforts to use practice scheduling algorithms date to the early 60's. One seminal example is Atkinson's (1972) German vocabulary tutor. While these efforts have often produced positive results, such programs have never been employed in the classroom in a consistent fashion. Perhaps this is due to the many practical issues involved with integrating such a system into the context of a course curriculum.
Normal learning - The tutor functions using an "assistments" type task where every drill practice is also a measure of normal learning. Long-term learning - The experiment includes long-term assessments at various intervals. This includes both in tutor and paper and pencil tests of long-term vocabulary performance. Transfer learning - Long-term assessments may be given (50% of the time) using pairings not drilled by tutor. These transfer tests will show whether and to what extent students can use what is learned int he tutor flexibly in new contexts. Accelerated future learning - Measures of accelerated future learning will be gathered by examining ...
The amount practice for a particular group of subjects. Also, within subjects the amount of practice for any individual item.
The dependent variables will reveal benefits for individuals using the tutor as compared to individuals studying with other methods.
In Chinese, 7 sections of Chinese I class participated in an experiment in which students were randomized to either a) have unit 3 voluntary and unit 4 required or b) have unit 3 required and unit 4 voluntary. This crossover within-subjects experiment tested whether there was an advantage for requiring students to use the system 15 minutes compared to not requiring usage. For each student we computed the score advantage for the required unit vs. voluntary unit on a paper and pencil test of both units (10 items for each unit given approximately one month later). Errors were less (M = 0.90, SD = 1.4) for required compared to voluntary usage (M = 1.5, SD = 1.7) t(53) = 3.0, p < .005, with a Cohen’s d effect size = 0.41.
For the Spring 2007 semester, the classroom version results so far (2/7/07) look good. There are differences in practice amounts between the control (flashcard) and experimental (optimized) between-subjects conditions that are of a similar magnitude to those that caused the positive results above. Specifically, students get about twice as many drill trials in the optimized condition (significant p<.001), about twice as many correct responses per minute (p<.001), a reduction in errors of 36% (p<.001), and about 2 minutes longer practice (p<.05). The longer practice and somewhat less attrition (not yet significant) for optimized subjects suggest they prefer the optimized conditions.
Of course, the spacing of practice tends to be wider for the control subjects, since they are moving through a random order of the stimuli. This probably results for a large portion of the difference above. Further, the control condition allows more metacognitive control since subject must decide after each test whether they want that item repeated during the following pass through the set or not. However, both of these procedures might make the differences above during practice unrepresentative of any long-term effects of the conditions, since the wider spacing and metacogntive control of the flashcard control condition might improve long-term efficiency. Further, there also is a cumulative component to the comparison, since the optimization condition allows more efficient review of prior units. In the control condition, subjects are allowed the option of going through the full cumulative set after they finish each pass through the current unit set. Although this allows cumulative review for control subjects, it does not provide it in the efficient manner of the optimized condition in which cumulative review is interleaved with current practice using an expanding spacing for each old item.
For these reasons only long-term assessment (including a transfer component to show robustness) is adequate to assess the effects of the conditions on robust learning. Long-term assessment allows possible difference in the conditions to emerge and show the practical utility of the approaches. Intermediate term and final exam related assessments for this Spring 2007 study are in planning stages and suggestions are welcome.
Assuming the tutor is more efficient than other methods, one would expect that students using it would perform better in less time, perform the same in less time, or perform better in the same amount of time.
Transfer results have not yet been analyzed.
Pavlik Jr., P. I. (2006). Transfer effects in Chinese vocabulary learning. In R. Sun (Ed.), Proceedings of the Twenty-Eighth Annual Conference of the Cognitive Science Society (pp. 2579). Mahwah, NJ: Lawrence Erlbaum.
Pavlik Jr., P. I. (in press-a). Timing is an order: Modeling order effects in the learning of information. In F. E., Ritter, J. Nerb, E. Lehtinen & T. O'Shea (Eds.), In order to learn: How order effects in machine learning illuminate human learning. New York: Oxford University Press.
Pavlik Jr., P. I. (in press-b). Understanding and applying the dynamics of test practice and study practice. Instructional Science.
Pavlik Jr., P. I., & Anderson, J. R. (2004,November). Optimizing Paired-Associate Learning. Poster presented at the 45th Annual Meeting of the Psychonomic Society, Minneapolis, MN.
Pavlik Jr., P. I., & Anderson, J. R. (2005). Practice and forgetting effects on vocabulary memory: An activation-based model of the spacing effect. Cognitive Science, 29(4), 559-586.