An Introduction to Natural Language Generation

Home Versión español PDF Version

Course description

To pass the venerable Turing Test of Intelligence, computers must be able to, in part, communicate information in a natural language. That is, given some input semantic representation (such as run(peter)), a computer should be able to generate a correct sentence in any human language: "Peter runs" in English, or "Peter [Pedro?] corre", in Spanish. The subfield of Artificial Intelligence that deals with these issues is called Natural Language Generation (or NLG, for short). The results of NLG research are used in several applications such as embedding interactivity in Non-Player Characters (NPCs) of Massively multiplayer online role-playing games (MMORPGs), in automatic translation, dialogue and tutorial systems, and for the generation of online summarizations of massive numerical databases, to name but a few.

This course is an advanced introduction to the problems, methods and techniques of Natural Language Generation. We will be using, testing, re-coding and, when possible, improving on a current well-known general purpose NLG system. Some of the topics to be covered include feature structures (or attribute-value matrices), unification, feature structure typing, and grammatical formalisms like functional unification grammars and head-driven phrase structure grammars.

General objectives

At the end of this course, the student should be able to:

  1. read, understand and evaluate data structures and algorithms for the efficient automatic generation of natural language;
  2. develop problem-solving skills through the analysis, evaluation, improvement and creation of algorithms for NLG;
  3. develop communicative skills through reading and writing scholarly papers and through oral presentation of results.

Prerequisites

Algorithms and data structures, Formal languages and compiler theory. Some introduction to Natural Language Processing would be an asset, but it is not strictly necessary. Medium-level programming abilities.

Evaluation

  1. Presentation (50% of the final mark):
  2. Final exam (20% of the final mark):
  3. Final report (30% of the final mark):

Specific contents

  1. Week 1: Introduction.
  2. Weeks 2-3: FUF and SURGE.
  3. Weeks 4-6: Feature structures, Re-entrancy.
  4. Weeks 7-10: Traversal, Subsumption
  5. Weeks 11-13: Unification.
  6. Week 14: Integration: generating Spanish.
  7. Week 16: Conclusions and wrap-up.

Resources

  1. To create feature structures in LaTeX, see here.

Important notes

  1. This course will involve programming. All algorithms will be developed in Python. The choice of language is not arbitrary, but in the attempt to contribute to the ongoing efforts of the Natural Language Toolkit group.
  2. This course will involve a high dose of real, cutting-edge computer science research. It is in the nature of scientific work that we don't always know where we are going. Be prepared to spin wheels sometimes. Those who need high levels of course structuring should probably refrain from taking this course.
  3. Classes will be taught in Spanish.