Chi - Induction of Adaptive Pedagogical Tutorial Tactics
The goal of this project is to investigate on application of Reinforcement Learning (RL) to derive adaptive pedagogical strategies directly from pre-existing interaction data. Pedagogical strategies are policies to decide the next system's action when there are multiple ones available. More specifically, this project is designed to: 1) help computer tutors employ effective, adaptive pedagogical policies; 2) test the viability of using RL, especially POMDP, to induce pedagogical policies, 3) show that pedagogical policies is a potential source of learning power for computer tutors to improve students' learning; and 4) explore the underlining causes of the effectiveness of the induced policies.
For any forms of learning environment including ITSs, the system's behaviors can be viewed as a sequential decision process wherein, at each discrete step, the system is responsible for selecting the next action to take. Each of these system decisions affects successive user's actions and performances. It is unclear how to make each decision effectively because its impact on learning cannot often be observed immediately and the effectiveness of one decision also depends on the effectiveness of subsequence decisions. Ideally, an effective learning environment should craft and adapt its actions to users' needs. However, there is no existing well-established theory on how to make these system decisions effectively. Most of existing ITSs, for example, either employ fixed pedagogical policies providing with little adaptability or employ hand-coded pedagogical rules that seek to implement existing cognitive or instructional theories. These theories may or may not have been well-evaluated.
In this project, we apply RL to improve the effectiveness of an ITS by induce pedagogical policies direct from a pre-existing student-computer interactivity data. More specifically, we focused on the two types of tutorial decisions: Elicit vs.Tell (ET) and Justify vs. Skip-Justify (JS). When making ET decisions the tutor decides whether to elicit the next step from the student or to tell them the step directly. The JS decisions address points where the tutor may optionally ask students to justify an answer they have taken or entry they have made. Neither type of decisions is well-understood in that there are many theories but no widespread consensus on how or when an action should be taken. Thus, we investigate on applying and evaluating RL to induce pedagogical tutorial tactics from pre-existing interactivity data.
Planned accomplishments for PSLC Year 6
Previously, we applied MDPs model to induce pedagogical policies from the data. However, a framework more suitable for this task is the Partially Observable Markov Decision Process (POMDP). POMDPs allow for realistic modeling of the students’ knowledge levels, the students’ intentions, and other hidden state components by incorporating them into the state space. POMDPs explicitly represent two sources of uncertainty: non-determinism in the control process and partial observability of the students’ knowledge levels. In the former case, outcomes of the tutorial actions or students’ knowledge level are not deterministic; in the latter, the underlying students’ knowledge levels are observed indirectly via incomplete or imperfect observations. The goal of year 6 is to explore the use of POMDP models.
Integrated Research Results and High Profile Publication
We focused on the two types of tutorial decisions: Elicit vs.Tell (ET) and Justify vs. Skip-Justify (JS). When making ET decisions the tutor decides whether to elicit the next step from the student or to tell them the step directly. The JS decisions address points where the tutor may optionally ask students to justify an answer they have taken or entry they have made. Neither type of decisions is well-understood in that there are many theories but no widespread consensus on how or when an action should be taken. Thus, we investigate on applying and evaluating RL to induce pedagogical tutorial tactics from pre-existing interactivity data.
Previously, a particular RL model, a Markov Decision Process (MDP), was applied to automatically derive adaptive pedagogical strategies directly from pre-existing student-computer interactivity data. The effectiveness of RL-induced tutorial tactics was then tested on real human subjects with random assignments. Results showed that after solving the same problems in the same amount of time, the induced pedagogical policies significantly improved students' learning gains up to about 60% compared with less effective pedagogical policies: t(55) = 3.058, p = 0.003, d = 0.81 (M = 0.41, SD = 0.19 for the Experimental Group and M = 0.25, SD = 0.21 for the Control group) (Chi et al., 2010a). Overall, our results showed that these fine-grain tutorial decisions indeed do matter to learning.
Moreover, the pedagogical policies employed by the Experimental group were derived from the log files from two pre-existing training corpora. Since the Experimental Group experienced the identical procedure and training materials as the students collected in the pre-existing training corpora, a post-hoc comparison was done among the three groups. Results showed that while no significant differences were found on pre-test score and time on training among the three groups, there were significant differences among the three groups on both post-test scores and NLG scores: F(2, 127) = 5.16, p = 0.007 and F(2, 127) = 7.57, p = 0.001 respectively (Chi et al., 2010b). More specifically, the Experimental Group significantly out-performed the two previous groups in terms of posttest scores and NLG. This result suggested that RL can be fruitfully applied to induce more effective, adaptive pedagogical strategies from less effective pre-existing data.
Compared with previous research on applying RL to induce pedagogical policies on ITSs, this project so far has at least two major contributions. First, we showed that using a relatively small exploratory corpus as training corpus for inducing pedagogical policies is a feasible approach. Second, we empirically showed that the RL induced policies indeed made students learn deeper or better.
Moreover, while much of previous research on applying RL to ITSs and non-tutoring NL Dialogue systems used pre-defined state representation, our approach in this project is to begin with a large set of features to which a series of feature-selection methods were applied to reduce them to a tractable subset. By doing log analysis, we shed some lights on the relative effectiveness of different feature selection methods and which features among the ones defined were most involved in the final induced policies.
The most frequent features appeared in the induced policies employed by the Experimental group are: StepDifficultyPS: a Problem Solving Contextual feature which encodes a step's difficulty level and its value is roughly estimated from the Combined Corpus based on the percentage of answers that were correct on the step.
tutConceptsToWordsPS: a Problem Solving Contextual feature which represents the ratio of the physics concepts to words in the tutor's dialogue.
tellsSinceElicitA: an Autonomy feature which represents the number of tells the student has received since the last elicit.
durationKCBetweenDecisionT: a Temporal Situation feature which represents the time since the last tutorial decision was made on the current KC.
Currently, we are working on exploring the use of POMDP models. More specifically, we have defined preliminary structures of the POMDP and selected a training corpus that can be used to learn the POMDP. The selected structure for POMDP is knowledge tracing and Additive Factor Model (AFM). We also want to investigate how certain factors would impact the effectiveness of the induced tutorial tactics and these factors include (1) the choice of Training Corpus, (2) the choice of the state modeling and representation, and (3) the choice of the reward functions and so on.
Additionally, we have been worked on investigating the causes of the effectiveness of the induced pedagogical tactics in the previous study by applying learning decomposition. We want to compare the RL-induced tutorial tactics with pre-existing learning theory, such as zone of proximal development (ZPD) (Vygotsky; 1978) and assistant dilemma (Koedinger et al. 2008) and so on.
Chi, M., Vanlehn, K., Litman, D (2010a). The More the Merrier? Examining Three Interaction Hypotheses. Proceedings of the 32nd Annual Conference of the Cognitive Science Society (CogSci2010), Portland, Oregon.
Chi, M. VanLehn, K., and Litman, D. (2010b). Do Micro-Level Tutorial Decisions Matter: Applying Reinforcement Learning To Induce Pedagogical Tutorial Tactics. Proceedings 10th International Conference on Intelligent Tutoring Systems (ITS2010) (pp 224-234).
Chi, M. VanLehn, K. Litman, D., and Jordan, P. (2010c). Inducing Effective Pedagogical Strategies Using Learning Context Features. Proceedings Eighteenth International Conference on User Modeling, Adaptation, and Personalization (UMAP2010). (pp 147-158).
• Chi, M., Vanlehn, K., Litman, D, and Jordan, P. (Accept upon revisions). Empirically Evaluating the Application of Reinforcement Learning to the Induction of Effective Pedagogical Tactics.
Year 6 Project Deliverables
Year 6 Project Deliverable 1. 1) A selection of POMDP policies, beginning with the ones in step 3 of the 6th month milestones. 2. 2) Explore different ways to evaluate these POMDP policies.
6th Month Milestone
By May 1st (Min Chi starts Nov 1), 2010 we will 1. Extract initial features & explore use of additional features based on knowledge tracing and detectors of affect and motivation. 2. Design initial POMDP models 3. Derive initial policies by exploring the impact of different corpora, different feature selection or extraction approaches, clustering methods, and so on.