Master’s Thesis

Nel: A Spoken Language Physics Tutor

My thesis consisted of designing and coding an Intelligent Tutoring System (ITS) that included a conversational speech interface and animated agent (“Nel”) to better emulate human interaction (that is, “Nel” spoke the problems to the students and they in turn spoke their answers). Nel was designed after working closely with Dr. Nels Madsen of Auburn University’s Physics Department (the inspiration and catalyst for the project). Ultimately the ITS attempted to provide an alternative means of gaining assistance with homework assignments when the professor or tutors were not available.

Database Design

Dr. Madsen explained that problems are often solved by conquering a series of smaller steps. Thus, in the database (and throughout the interface) each problem was broken down into individual steps then each step had a question, answer, and optional hint.

Diagram of Nel's Organization

High-level diagram of the database design.

Data Entry Form

The system was built to be subject independent (physics was just the primary area of Dr. Madsen and, thus, the initial focus). The system included a generic Web interface for the instructor to enter the problems and accompanying steps, following the database’s organization.

Problem Statement page

The textbook’s problem statement wording and any figures from the problem are entered in this first page.

Step Information including initial comments and hints

The solutions are broken into steps by the instructor and each step includes a description and question for the student to answer as well as optional hints for any anticipated points of confusion. The benefit of a Web interface is being able to easily add to, or modify, these steps whenever necessary.

Answers for the step previously entered

When students speak an answer to that step, it is checked against not only the correct response but optional anticipated incorrect answers entered by the instructor with accompanying explanations for further instruction.

Tutoring System User Interface

The system components included the following:

  • Haptek Player Agent – 3-D character created by Haptek, Inc.
  • AT&T Natural Voices – natural-sounding text-to-speech
  • Speech Application Language Tags – captured spoken input for Web interpretation
  • XML Grammars – format for the “speech grammars” — the application’s vocabulary
  • Web Languages – HTML, PHP, and JavaScript to bring it all together

Based on the system design of problems and sub-steps, the system can be looked at in 2 components: finding and retrieving the problem and traversing the steps.

Problem Retrieval

The words entered in the Problem Statement of the data entry interface are used to create the main speech grammar for recognizing the student. The words from this text are used to create a “bag of words” and the strategy of “closest match” (between what the system heard vs. the statements in the database) is used to recognize and retrieve the problem the student wants to work on.

Home Page

The home page contains 2 frames – one for introductory content (left pane) and one that contains the animated agent (right). Students press the button under the agent and speak the problem statement they want to work on (and have the system find).

Problem Retrieval

Once the closest match is found, the system displays the problem (and figure if applicable) and asks the student to confirm the problem try again. Notice that the agent turns towards the problem to further emphasize the instructions and emulate human behavior.

Step Traversal
Once the student confirms the problem, they simply repeat the same interaction of answering the questions from each step (as entered by the instructor). This process continues until they have reached the final step/answer for the problem.

Problem Retrieval

The system now contains 3 frames – the problem and agent frames that are consistent throughout and each step’s description and question in a third frame. The animated agent reads aloud each step as they appear. In the agent frame students can push to speak an answer or ask for a hint (if available).