Students working

Announcements

[view all announcements]

Upcoming Talks

  • No scheduled talks
[view all talks]

Upcoming Events

[view all events]

Co-Training

Theory Committee Members

Bruce McLaren -

Vincent Aleven -

Tom Mitchell -

Ken Koedinger -

Bob Siegler -

Charles Perfetti -

Noboru Matsuda -

Jie Yang -

Dave Yaron -

Lisa Anthony -

William Cohen -

Liu Ying -

Sharon Nelson-LeGall -

Min Wang -

Julie Booth -

Alex Renkl -

Ron Salden -

Jordi Cuadros -

Dawn McCormick -

Indra Rustandi -

John LaPlante -

Kurt Van Lehn -

Julie Booth -

Much research in machine learning and human learning has advocated various kinds of "multiples" to assist learning:

  • " multiple representations (e.g., machine learning: Liere & Tadepalli, 1997; human learning: Ainsworth & Van Labeke, in press),
  • " multiple strategies (e.g., machine learning: Michalski & Tecucci 1997; Saitta, Botta, & Neri, 1993; human learning: Klahr & Siegler, 1978);
  • " multiple learning tasks (e.g., machine learning: Caruana, 1997; Case, Jain, Ott, Sharma, & Stephan, 1998; human learning: Holland, Holyoak, Nisbett, & Thagard, 1986);
  • " multiple data sources (e.g., machine learning: Blum & Mitchell, 1998; Collins & Singer, 1999).

For instance, experiments in machine learning have demonstrated how more robust, generalizable learning can be achieved by training a single learner on multiple related tasks (Caruana 1997) or by training multiple learning systems on the same task (Blum & Mitchell 1998; Cohen 2002; Collins & Singer 1999; Mitchell, 1999; Muslea, Minton, & Knoblock, 2002; Riloff & Jones 1999). Blum and Mitchell (1998) provide both empirical results and a proof of the circumstances under which strategy combinations enhance learning. In particular, co-training approach for combining multiple learning strategies yields better learning to the extent that the learning strategies produce "uncorrelated errors" - when one is wrong the other is often right. Donmez et al. (2005) demonstrate using a multi-dimensional collaborative process analysis that regularities across multiple codings of the same data can be exploited for the purpose of improving text classification accuracy for difficult codings.

Experiments in human learning have demonstrated similar results whereby, for instance, instruction that combines rules (verbal descriptions) and examples yields better results than either alone (Holland, Holyoak, Nisbett, & Thagard, 1986) or, for instance, iterative instruction of both procedures and concepts yields better learning (Rittle-Johnson & Koedinger, 2002; Rittle-Johnson, Siegler, & Alibali, 2001). However, a rigorous causal theory of these results, along the lines of Blum and Mitchell, is lacking.

A key goal of the co-training cluster is to understand the conditions under which such multiple approaches to learning are effective. Close collaboration between machine learning researchers and human learning researchers is beginning to provide bi-directional payoffs and benefit to both fields. In particular, the data intensive methods of machine learning complement the knowledge-intensive methods of human learning, but both fields are exploring the use of multiple methods-are there similar principles for combining multiple methods? When does importing methods from one field to the other increase the effectiveness of the learner's set of methods?

More generally, the experiments and theoretical work of this cluster will target a richer more rigorous theory of the important role of meta-cognitive processes in human learning (Bransford, Brown & Cocking, 2000; Chipman, Segal, & Glaser, 1985). We hypothesize that much of meta-cognition involves a student coordinating conclusions and resolving conflicts resulting from their use of multiple knowledge sources, whether they be internal representations, strategies, external data or human sources. The rest of this section discusses specific research projects to be conducted in the first few years of the PSLC.

Does video help humans learn spoken Chinese? Machine learning algorithms like co-training have been successful in a variety of tasks, including learning to recognize phonemes based both on audio signals and video of lip motion (de Sa & Ballard, 1998; Roy, 2001); learning word lexicons based on both the word and its linguistic context (Collins & Singer, 1999); and learning to disambiguate word sense based on multiple occurrences of the word within the same passage (Yarowsky, 1995). Can similar principles be used to construct more effective teaching strategies for humans? For instance, do humans learn phonemes of a second language better if both audio and video are provided, and does the improvement obey the theorems of co-training? Drs. Ying Liu, Charles Perfetti and Min Wang are pursuing a project plan to test these hypotheses with experiments to be conducted in the Chinese LearnLab course. (Note that some of these studies pursue fluency issues and are discussed below.) One study comparing audio input only vs. audio plus video inputs has been carried out. Behavioral measures on several tasks showed that audio plus video condition produced faster recognition (taken to indicate more robust learning) of Chinese Characters than did audio only, especially for characters whose pronunciations were less familiar to the learner. The site visit report commented in detail on the theoretical interpretation of this study:

"In particular, one of their initial studies in the LearnLab has demonstrated that providing students both audio and visual presentations of the pronunciation of Chinese characters leads to faster learning than either modality alone. This is an interesting validation that the use of multiple representations does indeed improve human learning in a real classroom setting. However, the connection between the details of the specific algorithmic and theoretical results in co-training and the human learning results is still unclear. Co-training exploits the independence and sufficiency of two views to provide self-supervision on a set of unlabeled examples. The existing study on human-learning does not appear to directly exploit this particular use of multiple representations. Therefore, it is unclear to what extent the current theoretical results on co-training explain the results of the current human-learning study."

We whole-heartedly agree that greater theoretical clarity on the interpretation on this study will be fruitful. The concern raised above is whether the examples being given students during training are "unlabeled". On one perspective, the learners receive labels in the form of sounds, namely, the correct pronunciation of the character. The sound is a label insofar as it directs the learner to produce that sound. If the learners have already acquired skill in reproducing sounds, it seems correct to consider the heard sound as a label for the spoken sound, which is the response to be learned.

On an alternative perspective, the heard sound is only another cue-an aural input that is correlated imperfectly with the spoken syllable, just as the video of the speaker is. The heard sound does not tell the learner how to produce the syllable, and Chinese phonemes include some marked deviation from those in English. Further, only some of this production information is present in the video. On this perspective both the video and the spelling are partially correlated (partly independent) information sources for what is to be learned and neither one is a label. To press this perspective, we could suggest that a fully labeled example would involve mechanically moving students' lips! On this perspective, our human learners start with some labeled examples, just as co-training does; in this case, the labels are their prior experience with related phonemes in English. But, then they must continue learning with (at least partially) unlabeled data.

These subtleties come to into focus only because we take seriously the exploration of the relationship between machine learning and human learning. This exploration is leading Mitchell, Perfetti, and Liu into further theoretical and experimental work that should have bi-directional pays for both machine learning and psychology. The plan includes the following:

1. Tom Mitchell will work with the others to align the assumptions of co-training with the empirical observations about learning Chinese. Specific theoretical issues concern the relative independence of information sources and what counts as labeled data in our learning paradigm. Co-training theory may need to be expanded to consider the possibility that the component features of the two input sources (audio and video) and the associated response (month movements) are partially overlapping and learning might be better modeled at this feature level.

2. Develop specific hypotheses based on the theoretical alignment and carry out experiments to test these hypotheses. At this point, two hypotheses have been formulated and we plan to test at least one or two of them this year.

a. Hypothesis 1: Learning will be more effective with a third "modality" that provides a meaning cue. On our analysis of what is being learned, the subject learns a triple of graphic form, pronunciation, and meaning. We see co-training advantages on simple lexical decisions as well as naming, suggesting that establishing the character in the orthographic lexicon was facilitated by bi-modal input. But our explanations have focused only on the learning of pronunciations. To the extent that meanings support the learning of pronunciations we will find, according to co-training assumptions, that an independent representation of the meaning will facilitate learning.

b. Hypothesis 2: Learning will be facilitated by pin-yin spelling. If pin-yin does serve as a label, then the co-training advantage will much larger in the absence of pin-yin than in the presence of it.

Does multi-strategy learning accelerate cognitive skill acquisition? As one form of test of the co-training hypothesis in human learning, new PSLC postdoc Julie Booth with advisors Robert Siegler (Psychology) and Ken Koedinger (Human-Computer Interaction) will perform microgenetic studies that contrast the independent use of conceptual (sense-making and procedural learning strategies with their combined use. The general experimental design to be applied in multiple domains contrasts two base conditions that represent single strategy learning with a two multi-strategy conditions: (a) drill-and-practice for foundational skill building; (b) guided discovery for sense making; (c) multiple strategies in sequence: conceptual encoding instruction followed by conceptually grounded procedural practice; (d) multiple strategies integrating hand-over-hand conceptual and procedural learning

How does error detection transition from supervised to self-supervised? Speaking a second language is a time-stressed task. In the rush to recall appropriate words, students cannot afford the time to recall grammatical or other details, so they make errors without noticing them. If their speech is recorded, they can usually find many of their errors once relieved of the time pressure of recalling the next word. However, even after listening to and transcribing their speech, students may be unable to detect some errors, and it is up to the instructor to find them and point them out. A longitudinal study of student self-correction in the English as Second Language (ESL) course will test a hypothesized transition from 1) instructor detected errors, 2) to student detected errors offline, 3) to student detected errors during speech and 4) finally to fluency (no error). This study will require the identification of knowledge components in this domain (a learning theory activity in its own right) that will be consistently tracked across the four phases. Besides exploring the role of self-supervised learning, a co-training cluster issue, this study may also inform the dialogue cluster as follows. The work of catching errors is shared among three "agents", the student during the speech; the student afterwards and the instructor and they engage extended dialogue over time that should help to develop feature validity and strength of the knowledge components being learned. This dialogue of student, text, and instructor serves as a mechanism to increase feature validity in the sense that what is learned from self-correction can be transferred to new speaking events. Thus, in terms of second language learning, increased feature validity can be operationalized as increased accuracy across spoken texts without instructor intervention (McCormick & O'Neill).

Simulated Students. PLSC will pursue theoretical breadth in a Simon-style conceptual theory of robust learning. However, within the co-training cluster in particular, we will also pursue theoretical depth in a Newell-style computational theory. The Simulated Student project by post doc Noboru Matsuda and Drs. William Cohen (machine learning) and Ken Koedinger (Human-Computer Interaction and Psychology) is a key piece of that effort. They are using machine-learning techniques to create simulated students, or SimStudents that simulate human learning processes, particularly the creation of knowledge components and refinement of feature validity by learning from examples of problem solutions, descriptions of knowledge components, and feedback on problem-solving practice opportunities. We will use SimStudents to explore the effect of alternative instructional manipulations and use the theory to guide design and predict outcomes before doing expensive experiments with human students.

We build upon prior success in simulated student experiments, for instance, a study by Koedinger and MacLaren (1997) employing Anderson's ACT-R theory (Anderson & Lebière, 1998) and studies employing explanation-based learning techniques (Ur & VanLehn, 1995; VanLehn, Jones & Chi, 1992). By collaborating with leading machine learning researchers, we will bring more recent techniques like Inductive Logic Programming (Bergadano & Gunetti, 1995; Lavrac & Dzeroski, 1994; Muggleton & De Raedt, 1994) to bear and also provide a forcing function to improve those algorithms.

In year 1, we successfully demonstrated a working prototype of the SimStudent algorithm in the Algebra equation-solving domain (Matsudo, Cohen, & Koedinger, 2005). The approach combines three machine learning techniques, especially the FOIL method, a version of Inductive Logic Programming. FOIL is a machine learning method that discriminates relevant from irrelevant features, that is, it increases the feature validity of knowledge components. In this case, the knowledge components are modeled as if-then production rules and features appear in the if-part of these productions -- an approach consistent with cognitive theories like ACT-R (Anderson & Lebière, 1998) and Soar (Newell, 1990). The SimStudent learns production rules that not only model correct steps of human students, but also some typical errors. The SimStudent model has been used to inspire a curriculum sequence study in algebra that will be run in conjunction with Anthony's Multi-modal Math Tutor project described below. We also plan to demonstrate the generality of the SimStudent algorithms in the coming year, first in the Chemistry domain and then in other domains. We anticipate these studies influencing and being influenced by the data mining of student interaction data supported by PSLC's Data Shop.

Integrating Robust Learning and Cognitive Load Theory. Further studies are planned within co-training cluster and they too address how a learner can achieve sense making by comparing and reasoning about different sources of information or instruction. Prior research on the pedagogical effects of multiple information sources has used the concept of "cognitive load" to explain the results (Clark & Mayer, 2003). If a manipulation that increases some objective measure of student effort, such as time on task or depth of utterances, also increases learning, then it is said to increase "germane" cognitive load. If it decreases learning, then it is said to increase "extraneous" cognitive load. These terms are descriptive, but they lack predictive power. When is an instructional manipulation going to lead to extraneous vs. germane cognitive load? We can use concepts from our developing theory of robust learning toward providing a more generative and predictive theory.

An instructional manipulation will improve the learning of a knowledge component to the extent that it engages one of the three kinds of learning processes: 1) creating the knowledge component through active interpretation of a verbal description or analogical induction from an example, 2) refining its feature validity through sense-making strategy like dialog or co-training, 3) strengthening it through retrieval in practice opportunities. We might say germane load is incurred when one of these germane learning processes is activated. Instructional manipulation that results in other kinds of processes or strategies, extraneous ones, will not enhance learning. Other projects within the co-training cluster provide examples of studies investigating these learning processes.

Does "contiguity" of descriptions and examples aid learning in the classroom? Vincent Aleven, Alexander Renkl (external to PSLC), and Ron Salden are extending laboratory results on the "contiguity effect", the idea that placing verbal descriptions and examples of the same knowledge component in close visual proximity will aid learning. They are extending prior results empirically, by performing in vivo learning experiments in the Geometry LearnLab, and theoretically, by providing a deeper explanation of these results in PSLC terms.

In these studies, when students answer geometry questions about the measures of angles and lines, they can enter their answer directly on the diagram or in a table. Some of the responses refer to other angles and lines in the diagram, which are named either conventionally (e.g., Line PQ, Angle PQR) or with single-letter, color-coded names. The target knowledge components are geometric inference rules, like knowing the sum of the angles of a triangle is 180. When students engage in the processing of angle labels, they are distracted from fully engaging in learning processes, interpreting and inducing, needed to create and refine the target knowledge components. As a further consequence of this distraction they may engage in guessing strategies rather than reasoning. Shallow knowledge results from these less appropriate knowledge events and, hence, less feature validity and robust learning.

Does multi-modal entry of math equations improve learning? This project starts with a hypothesis and preliminary evidence that handwritten entry of symbols, like algebra equations, is much faster and easier, more fluent, than keyboard entry. Thus, a student who is handwriting will have more headroom for learning, as fluent or automatically executed lower level skill leaves more cognitive capacity for higher-level learning. But, the challenge remains that current handwriting recognition is not good enough to accurately process equations written by high school students. We hypotheses that the general idea of co-training can meet this challenge by having students speak the terms of the equation as they enter them. The system can then perform "co-recognition" illustrated as follows. When processing a handwritten "s", the handwriting recognizer is uncertain whether it is a "5" or an "s". When processing the spoken "s", the speech recognizer is uncertain whether it is an "f" or an "s". The meta-level co-recognizer does sense making: Because "s" is the only high probability candidate from each recognizer, it correctly concludes the student input was an "s". In addition, to advancing the technology of handwriting recognition, this project will also perform studies to test the fluency hypothesis that easier low-level equation entry will facility better higher-level learning of algebraic concepts and procedures. Note that this project tests principles derived from the fluency cluster as well as from co-training. (Anthony, Yang & Koedinger).

Does personalizing instructional messages improve in vivo learning? A chemistry tutoring system uses either informal, personal hints (e.g., "you convert from grams to moles by ...") or formal, impersonal hints (e.g., "the conversion from grams to moles is done by ..."). A key idea behind the co-training theorem is that redundant input sources can be used to produce better learning. A similar idea in psychology is the "dual code" theory (Paivio, 1986) that explains experimental results showing concrete words (e.g., shoe) are recalled more effectively than equally frequent abstract words (e.g., love). The dual code explanation is that concrete words are stored in an image form (e.g., a mental image of a shoe) as well as their verbal form, whereas abstract words are stored in only their verbal form. Personalized hints may have a similar effect whereby personalized instruction is stored in conjunction with existing strong knowledge of oneself as well as its usual semantic form whereas impersonal hints are stored only in their semantic form. In addition to attempting to replicate in a realistic in vivo classroom setting prior laboratory results demonstrating this personalization effect (Clark & Mayer, 2003), this project will also do theoretical integration work to explain results in connection with co-training theory (McLaren, Yaron & Koedinger).