Difference between revisions of "Baker - Building Generalizable Fine-grained Detectors"

From LearnLab
Jump to: navigation, search
(Connections)
m (Reverted edits by Woolerystixmaker (Talk); changed back to last version by Alida)
 
(14 intermediate revisions by 3 users not shown)
Line 6: Line 6:
 
| '''PIs''' || Ryan Baker, Vincent Aleven
 
| '''PIs''' || Ryan Baker, Vincent Aleven
 
|-
 
|-
| '''Other Contributers''' || Sidney D'Mello (Consultant, University of Memphis), Ma. Mercedes T. Rodrigo (Consultant, Ateneo de Manila University)
+
| '''Other Contributors''' || Sidney D'Mello (Consultant, University of Memphis), Ma. Mercedes T. Rodrigo (Consultant, Ateneo de Manila University)
  
 
|-
 
|-
Line 15: Line 15:
 
| '''LearnLab Site''' || TBD
 
| '''LearnLab Site''' || TBD
 
|-
 
|-
| '''LearnLab Course''' || Algebra, Geometry, Chemistry, Chinese
+
| '''LearnLab Course''' || Algebra, Geometry, Chemistry, MathTutor, ScienceAssistments
 
|-
 
|-
| '''Number of Students''' || TBD
+
| '''Number of Students''' || 78 so far; total TBD
 
|-
 
|-
| '''Total Participant Hours''' || TBD
+
| '''Total Participant Hours''' || 444 so far; total TBD
 
|-
 
|-
| '''DataShop''' || TBD
+
| '''Data available in DataShop''' || [https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=431 Dataset: CMU VlabHomeworks F2010]<br>
 +
[https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=448 Dataset: Affect Detectors and Questionnaires Greenville 2010-11]<br>
 +
* '''Pre/Post Test Score Data:''' TBD
 +
* '''Paper or Online Tests:''' TBD
 +
* '''Scanned Paper Tests:''' TBD
 +
* '''Blank Tests:''' TBD
 +
* '''Answer Keys: ''' TBD
 
|}
 
|}
  
 
=== Abstract ===
 
=== Abstract ===
This project, joint between M&M and CMDM, will create a set of fine-grained detectors of affect and M&M behaviors. These detectors will be usable by future projects in these two thrusts to study the impact of learning interventions on these dimensions of students’ learning experiences, and to study the inter-relationships between these constructs and other key PSLC constructs (such as measures of robust learning, and motivational questionnaire data). It will be possible to apply these detectors retrospectively to existing PSLC data in DataShop, in order to re-interpret prior work in the light of relevant evidence on students’ affect and M&M behaviors.  
+
This project, joint between M&M and CMDM, will create a set of fine-grained detectors of affect and M&M behaviors. These detectors will be usable by future projects in these two thrusts to study the impact of learning interventions on these dimensions of students’ learning experiences, and to study the inter-relationships between these constructs and other key PSLC constructs (such as measures of robust learning, and motivational questionnaire data). It will be possible to apply these detectors retrospectively to existing PSLC data in [[DataShop]], in order to re-interpret prior work in the light of relevant evidence on students’ affect and M&M behaviors.  
  
 
=== Background & Significance ===
 
=== Background & Significance ===
Line 55: Line 61:
 
H2: We hypothesize that models of behaviors such as gaming the system, and off-task behavior, in combination with models of affect/behavior dynamics, can make affect detectors more accurate.
 
H2: We hypothesize that models of behaviors such as gaming the system, and off-task behavior, in combination with models of affect/behavior dynamics, can make affect detectors more accurate.
  
H3: We hypothesize that models created using data from three LearnLabs will perform significantly better than chance in data from a fourth LearnLab, with no re-training.  
+
H3: We hypothesize that models created using data from three LearnLabs will perform significantly better than chance in data from a fourth LearnLab, with no re-training (or limited EM-based modification that requires no new labeled data).  
  
 
H4: We hypothesize that these affect models will become a valuable component of future research in the M&M and CMDM thrusts.
 
H4: We hypothesize that these affect models will become a valuable component of future research in the M&M and CMDM thrusts.
Line 61: Line 67:
 
=== Research Process ===
 
=== Research Process ===
  
We will develop detectors of the M&M (metacognitive & motivational) behaviors of gaming the system, off-task behavior, proper help use, on-task conversation, help avoidance and self-explanation without scaffolding. This set of behaviors has already been effectively detected in mathematics LearnLabs. We will model the dynamics between these behaviors and student affect (following on work in the PSLC and at Memphis), in order to be able to leverage these detectors to create detectors of the affective states of flow, boredom, confusion, and frustration (the dynamics models will enable us to set Bayesian priors for how likely an affective state is at a given time).  
+
We will develop detectors of the M&M (metacognitive & motivational) behaviors of gaming the system, off-task behavior, proper help use, on-task conversation, help avoidance and self-explanation without scaffolding. This set of behaviors has already been effectively detected in mathematics LearnLabs. We will model the dynamics between these behaviors and student affect (following on work in the PSLC and at Memphis), in order to be able to leverage these detectors to create detectors of the affective states of engaged concentration, boredom, confusion, and frustration (the dynamics models will enable us to set Bayesian priors for how likely an affective state is at a given time).  
  
These detectors will be developed for multiple LearnLabs, and the generalizability of detectors across LearnLabs will be one of the focuses of study during this project. We anticipate developing detectors for Algebra and Geometry, Chinese/FaCT, and the Chemistry Virtual Lab. Each of these learning environments presents a context where complex learning occurs, fine-grained interaction behavior is logged, and the outputs of the detectors will provide leverage on a number of research questions of interest.  
+
These detectors will be developed for multiple LearnLabs, and the generalizability of detectors across LearnLabs will be one of the focuses of study during this project. We anticipate developing detectors for Algebra and Geometry, the Chemistry Virtual Lab, MathTutor, and Science ASSISTments. Each of these learning environments presents a context where complex learning occurs, fine-grained interaction behavior is logged, and the outputs of the detectors will provide leverage on a number of research questions of interest.  
  
“Ground truth” for the M&M behavior categories will be established through quantitative field observations. “Ground truth” for the affect categories will be established by field observations and infrequent pop-up questions. Work will be conducted to increase the reliability of quantitative field observations of affect to a standard considered appropriate by psychology journals, through repeated coding and discussion sessions and the development of a detailed coding manual based on prior work to code affect in field settings and work to code emotions from facial expressions. A limited degree of video will be used during the training process (but not during the main coding of affect for data mining, due to the relatively high cost of obtaining and coding video data in school settings).  
+
“Ground truth” for the M&M behavior categories will be established through quantitative field observations. “Ground truth” for the affect categories will be established by field observations and infrequent pop-up questions. Work will be conducted to increase the reliability of quantitative field observations of affect to a standard considered appropriate by psychology journals, through repeated coding and discussion sessions and the development of a detailed coding manual based on prior work to code affect in field settings and work to code emotions from facial expressions.  
  
Models will be developed solely using distilled log file data of the sort currently collected in DataShop (more sophisticated sensors will NOT be included in this project). The models will be built with a combination of machine learning, and knowledge engineering (specifically, through leveraging and adapting existing knowledge engineered models such as Aleven et al’s help-seeking model and Shih et al’s self-explanation model). Generalization of models across learning environments will involve expectation maximization to adapt models to new data sets, and/or leveraging the CTLVS1 taxonomy to develop meta-models that relate prediction features to design features. We will first develop models for individual learning environments and then extend them across environments.
+
Models will be developed solely using distilled log file data of the sort currently collected in [[DataShop]] (more sophisticated sensors will NOT be included in this project). The models will be built with a combination of machine learning, and knowledge engineering (specifically, through leveraging and adapting existing knowledge engineered models such as Aleven et al’s help-seeking model and Shih et al’s self-explanation model). Generalization of models across learning environments will involve expectation maximization to adapt models to new data sets, and/or leveraging the CTLVS1 taxonomy to develop meta-models that relate prediction features to design features. We will first develop models for individual learning environments and then extend them across environments.
  
 
=== Research Plan ===
 
=== Research Plan ===
  
1. Develop software for conducting field observations (cf. Baker et al, 2004) with PDAs and synchronizing with [[PSLC DataShop]] data, and questionnaire prompting (months 1-3) (in coordination with [[Nokes - Questionnaires]])
+
1. Develop software for conducting field observations (cf. Baker et al, 2004) with PDAs and synchronizing with [[DataShop]] data -- software development completed, as of Aug 2010 synchronization verification in progress
  
 
2. Study and improve quantitative field coding of student affect states
 
2. Study and improve quantitative field coding of student affect states
Line 77: Line 83:
 
* The Research Associate and Assistant will conduct multiple coding and discussion sessions with the PI, and develop a detailed coding manual (including some video examples)
 
* The Research Associate and Assistant will conduct multiple coding and discussion sessions with the PI, and develop a detailed coding manual (including some video examples)
 
   
 
   
3. Collect training data (months 4-7)
+
3. Collect training data (months 4-7) -- as of Aug 2010 first data set collected, other data collection in progress
  
 
* Starting first in one LearnLab and rolling across LearnLabs, so that we have all the data for one LearnLab first. Collecting data on all constructs at once. Then the programmer/PI can start developing detectors for constructs in first LearnLab, while the RAs keep collecting more data in the second and subsequent LearnLabs  
 
* Starting first in one LearnLab and rolling across LearnLabs, so that we have all the data for one LearnLab first. Collecting data on all constructs at once. Then the programmer/PI can start developing detectors for constructs in first LearnLab, while the RAs keep collecting more data in the second and subsequent LearnLabs  
 
* Quantitative field observations (cf. Baker et al, 2004)
 
* Quantitative field observations (cf. Baker et al, 2004)
* Randomized infrequent polling of student affect, motivation in popup windows
 
(“Which of these best describes how you’re feeling? [frustrated] [bored] [etc.]”) (in coordination with [[Nokes - Questionnaires]])
 
  
 
4. Develop detectors (months 5-8)
 
4. Develop detectors (months 5-8)
Line 88: Line 92:
 
*      Utilizing combination of existing data mining tools and code previously used by Baker to create Latent Response Model-based detectors of [[Gaming the System]] and [[Off-Task Behavior]]  
 
*      Utilizing combination of existing data mining tools and code previously used by Baker to create Latent Response Model-based detectors of [[Gaming the System]] and [[Off-Task Behavior]]  
  
* Develop and leverage behavior-affect temporal dynamics models (cf. D’Mello et al, 2006; Baker, Rodrigo, & Xolocotzin, 2007) to create priors for predicting affect
+
* Develop and leverage behavior-affect temporal dynamics models (cf. D’Mello et al, 2007; Baker, Rodrigo, & Xolocotzin, 2007) to create priors for predicting affect
  
 
* Use log data to predict field observations, student responses
 
* Use log data to predict field observations, student responses
Line 115: Line 119:
  
 
Affective States:
 
Affective States:
* Engaged Concentration (a subset of [[Flow]]) (cf. Baker et al under, review)
+
* Engaged Concentration (a subset of [[Flow]]) (cf. Baker et al, 2010)
* Boredom  
+
* Boredom (Kapoor, Burleson, & Picard, 2007)
* Confusion
+
* Frustration (Kapoor, Burleson, & Picard, 2007)
* Frustration
+
  
 
M&M Behaviors:
 
M&M Behaviors:
Line 131: Line 134:
 
=== Planned Studites ===
 
=== Planned Studites ===
  
In 2010 and 2011, data will be collected in the Algebra, Geometry, Chemistry, and Chinese LearnLabs.
+
In 2010, data will be collected in the Algebra, Geometry, Chemistry, MathTutor, and Science ASSISTments.
  
 
=== Explanation ===
 
=== Explanation ===
 
=== Further Information ===
 
=== Further Information ===
 
=== Connections ===
 
=== Connections ===
 
[[Nokes - Questionnaires]]
 
  
 
=== Annotated Bibliography ===
 
=== Annotated Bibliography ===
 
=== References ===
 
=== References ===
 +
 +
Aleven, V., McLaren, B., Roll, I., & Koedinger, K. (2006). Toward meta-cognitive tutoring: A model of help seeking with a Cognitive Tutor. International Journal of Artificial Intelligence and Education, 16, 101-128.
 +
 +
Baker, R.S.J.d. (2007) Modeling and Understanding Students' Off-Task Behavior in Intelligent Tutoring Systems. Proceedings of ACM CHI 2007: Computer-Human Interaction, 1059-1068.
 +
 +
Baker, R.S., Corbett, A.T., Koedinger, K.R., Wagner, A.Z. (2004) Off-Task Behavior in the Cognitive Tutor Classroom: When Students "Game The System". Proceedings of ACM CHI 2004: Computer-Human Interaction, 383-390.
 +
 +
Baker, R.S.J.d., Rodrigo, M.M.T., Xolocotzin, U.E. (2007) The Dynamics of Affective Transitions in Simulation Problem-Solving Environments. Proceedings of the Second International Conference on Affective Computing and Intelligent Interaction.
 +
 +
D'Mello, S. K., Picard, R. W., and Graesser, A. C. (2007) Towards an Affect-Sensitive AutoTutor. Special issue on Intelligent Educational Systems – IEEE Intelligent Systems, 22(4), 53-61.
 +
 +
Kapoor, A., Burleson, W., & Picard, R. W. (2007). Automatic prediction of frustration. International Journal of Human-Computer Studies, 65, 724-736.
 +
 +
Shih, B., Koedinger, K., and Scheines, R. (2008) A Response Time Model for Bottom-Out Hints as Worked Examples. Proceedings of the 1st International Conference on Educational Data Mining, 117-126.
 +
 
=== Future Plans ===
 
=== Future Plans ===

Latest revision as of 08:31, 29 August 2011

Building Generalizable Fine-grained Detectors

Summary Table

Study 1

PIs Ryan Baker, Vincent Aleven
Other Contributors Sidney D'Mello (Consultant, University of Memphis), Ma. Mercedes T. Rodrigo (Consultant, Ateneo de Manila University)
Study Start Date February, 2010
Study End Date February, 2011
LearnLab Site TBD
LearnLab Course Algebra, Geometry, Chemistry, MathTutor, ScienceAssistments
Number of Students 78 so far; total TBD
Total Participant Hours 444 so far; total TBD
Data available in DataShop Dataset: CMU VlabHomeworks F2010

Dataset: Affect Detectors and Questionnaires Greenville 2010-11

  • Pre/Post Test Score Data: TBD
  • Paper or Online Tests: TBD
  • Scanned Paper Tests: TBD
  • Blank Tests: TBD
  • Answer Keys: TBD

Abstract

This project, joint between M&M and CMDM, will create a set of fine-grained detectors of affect and M&M behaviors. These detectors will be usable by future projects in these two thrusts to study the impact of learning interventions on these dimensions of students’ learning experiences, and to study the inter-relationships between these constructs and other key PSLC constructs (such as measures of robust learning, and motivational questionnaire data). It will be possible to apply these detectors retrospectively to existing PSLC data in DataShop, in order to re-interpret prior work in the light of relevant evidence on students’ affect and M&M behaviors.

Background & Significance

Glossary

Metacognition and Motivation

Computational Modeling and Data Mining

Gaming the system

Off-Task Behavior

Affect

Frustration

Boredom

Flow

Engaged Concentration

Hypotheses

H1: We hypothesize that it will be possible to develop reasonably accurate detectors of student affect for four LearnLabs, that detect affect using only the data from the interaction between the student and the keyboard/mouse.

H2: We hypothesize that models of behaviors such as gaming the system, and off-task behavior, in combination with models of affect/behavior dynamics, can make affect detectors more accurate.

H3: We hypothesize that models created using data from three LearnLabs will perform significantly better than chance in data from a fourth LearnLab, with no re-training (or limited EM-based modification that requires no new labeled data).

H4: We hypothesize that these affect models will become a valuable component of future research in the M&M and CMDM thrusts.

Research Process

We will develop detectors of the M&M (metacognitive & motivational) behaviors of gaming the system, off-task behavior, proper help use, on-task conversation, help avoidance and self-explanation without scaffolding. This set of behaviors has already been effectively detected in mathematics LearnLabs. We will model the dynamics between these behaviors and student affect (following on work in the PSLC and at Memphis), in order to be able to leverage these detectors to create detectors of the affective states of engaged concentration, boredom, confusion, and frustration (the dynamics models will enable us to set Bayesian priors for how likely an affective state is at a given time).

These detectors will be developed for multiple LearnLabs, and the generalizability of detectors across LearnLabs will be one of the focuses of study during this project. We anticipate developing detectors for Algebra and Geometry, the Chemistry Virtual Lab, MathTutor, and Science ASSISTments. Each of these learning environments presents a context where complex learning occurs, fine-grained interaction behavior is logged, and the outputs of the detectors will provide leverage on a number of research questions of interest.

“Ground truth” for the M&M behavior categories will be established through quantitative field observations. “Ground truth” for the affect categories will be established by field observations and infrequent pop-up questions. Work will be conducted to increase the reliability of quantitative field observations of affect to a standard considered appropriate by psychology journals, through repeated coding and discussion sessions and the development of a detailed coding manual based on prior work to code affect in field settings and work to code emotions from facial expressions.

Models will be developed solely using distilled log file data of the sort currently collected in DataShop (more sophisticated sensors will NOT be included in this project). The models will be built with a combination of machine learning, and knowledge engineering (specifically, through leveraging and adapting existing knowledge engineered models such as Aleven et al’s help-seeking model and Shih et al’s self-explanation model). Generalization of models across learning environments will involve expectation maximization to adapt models to new data sets, and/or leveraging the CTLVS1 taxonomy to develop meta-models that relate prediction features to design features. We will first develop models for individual learning environments and then extend them across environments.

Research Plan

1. Develop software for conducting field observations (cf. Baker et al, 2004) with PDAs and synchronizing with DataShop data -- software development completed, as of Aug 2010 synchronization verification in progress

2. Study and improve quantitative field coding of student affect states

  • The Research Associate and Assistant will conduct multiple coding and discussion sessions with the PI, and develop a detailed coding manual (including some video examples)

3. Collect training data (months 4-7) -- as of Aug 2010 first data set collected, other data collection in progress

  • Starting first in one LearnLab and rolling across LearnLabs, so that we have all the data for one LearnLab first. Collecting data on all constructs at once. Then the programmer/PI can start developing detectors for constructs in first LearnLab, while the RAs keep collecting more data in the second and subsequent LearnLabs
  • Quantitative field observations (cf. Baker et al, 2004)

4. Develop detectors (months 5-8)

  • Utilizing combination of existing data mining tools and code previously used by Baker to create Latent Response Model-based detectors of Gaming the System and Off-Task Behavior
  • Develop and leverage behavior-affect temporal dynamics models (cf. D’Mello et al, 2007; Baker, Rodrigo, & Xolocotzin, 2007) to create priors for predicting affect
  • Use log data to predict field observations, student responses
  • Student-level cross-validation used for assessing goodness of detectors

5. Develop meta-detectors (months 9-12)

  • Use expectation maximization to adapt models to new data sets
  • Leverage the CTLVS1 taxonomy to develop meta-models that relate prediction features to design features
  • Cross-validation at grain-size of transfer between units or corresponding (within each LearnLab) to validate appropriateness for whole LearnLab
  • Test goodness of models when {train on 3 tutors, transfer to tutor #4} to evaluate effectiveness for entirely new tutors

Independent Variables

n/a (see Research Plan)

Dependent Variables

n/a (see Research Plan)

Affective States and M&M Behaviors to be Modeled

Affective States:

  • Engaged Concentration (a subset of Flow) (cf. Baker et al, 2010)
  • Boredom (Kapoor, Burleson, & Picard, 2007)
  • Frustration (Kapoor, Burleson, & Picard, 2007)

M&M Behaviors:

Planned Studites

In 2010, data will be collected in the Algebra, Geometry, Chemistry, MathTutor, and Science ASSISTments.

Explanation

Further Information

Connections

Annotated Bibliography

References

Aleven, V., McLaren, B., Roll, I., & Koedinger, K. (2006). Toward meta-cognitive tutoring: A model of help seeking with a Cognitive Tutor. International Journal of Artificial Intelligence and Education, 16, 101-128.

Baker, R.S.J.d. (2007) Modeling and Understanding Students' Off-Task Behavior in Intelligent Tutoring Systems. Proceedings of ACM CHI 2007: Computer-Human Interaction, 1059-1068.

Baker, R.S., Corbett, A.T., Koedinger, K.R., Wagner, A.Z. (2004) Off-Task Behavior in the Cognitive Tutor Classroom: When Students "Game The System". Proceedings of ACM CHI 2004: Computer-Human Interaction, 383-390.

Baker, R.S.J.d., Rodrigo, M.M.T., Xolocotzin, U.E. (2007) The Dynamics of Affective Transitions in Simulation Problem-Solving Environments. Proceedings of the Second International Conference on Affective Computing and Intelligent Interaction.

D'Mello, S. K., Picard, R. W., and Graesser, A. C. (2007) Towards an Affect-Sensitive AutoTutor. Special issue on Intelligent Educational Systems – IEEE Intelligent Systems, 22(4), 53-61.

Kapoor, A., Burleson, W., & Picard, R. W. (2007). Automatic prediction of frustration. International Journal of Human-Computer Studies, 65, 724-736.

Shih, B., Koedinger, K., and Scheines, R. (2008) A Response Time Model for Bottom-Out Hints as Worked Examples. Proceedings of the 1st International Conference on Educational Data Mining, 117-126.

Future Plans