An Introduction to Natural Language Generation
Home Versión español PDF Version
Course description
To pass the venerable Turing Test of Intelligence, computers must be
able to, in part, communicate information in a natural language. That
is, given some input semantic representation (such as run(peter)), a
computer should be able to generate a correct sentence in any human
language: "Peter runs" in English, or "Peter [Pedro?] corre", in
Spanish. The subfield of Artificial Intelligence that deals with these
issues is called Natural Language Generation (or NLG, for short). The
results of NLG research are used in several applications such as
embedding interactivity in Non-Player Characters (NPCs) of Massively
multiplayer online role-playing games (MMORPGs), in automatic
translation, dialogue and tutorial systems, and for the generation of
online summarizations of massive numerical databases, to name but a
few.
This course is an advanced introduction to the problems, methods and
techniques of Natural Language Generation. We will be using, testing,
re-coding and, when possible, improving on a current well-known
general purpose NLG system. Some of the topics to be covered include
feature structures (or attribute-value matrices), unification, feature
structure typing, and grammatical formalisms like functional
unification grammars and head-driven phrase structure grammars.
General objectives
At the end of this course, the student should be able to:
- read, understand and evaluate data structures and algorithms
for the efficient automatic generation of natural language;
- develop problem-solving skills through the analysis,
evaluation, improvement and creation of algorithms for NLG;
- develop communicative skills through reading and writing
scholarly papers and through oral presentation of results.
Prerequisites
Algorithms and data structures, Formal languages and compiler
theory. Some introduction to Natural Language Processing would be an
asset, but it is not strictly necessary. Medium-level programming
abilities.
Evaluation
- Presentation (50% of the final mark):
- This involves a fairly in-depth research on one of the topics of
the course (see Specific contents for a list), the coding of
the n-best structures or algorithms, and the oral presentation
of the results to the class.
- Failing this portion of the evaluation means failing the course.
- Final exam (20% of the final mark):
- A comprehensive final exam at the regularly scheduled exam
period.
- Final report (30% of the final mark):
- This should be a well-organized paper on the topic of your
group's presentation,
- it should include an introduction, the bibliographic review, the
code, an explanation of the code and its documentation,
- it must be written in English, and
- it must be written in LaTeX.
Specific contents
- Week 1: Introduction.
- [intro] K. van der Linden. (2000). Natural language generation. In Speech
and Language Processing: An Introduction to Natural Language
Processing, Computational Linguistics, and Speech Recognition,
Jurafsky, D. and Martin, J., Prentice Hall, First
edition, chapter 19, pages 763-798.
- [theory] E. Reiter and
R. Dale. Building applied natural language generation systems. Natural
Language Engineering, 3: 57-87, 1997.
- Weeks 2-3: FUF and SURGE.
- [intro] Kay, M.
Functional Unification Grammar: a formalism for machine translation. In
Proceedings of the 22nd Annual Meeting on Association For
Computational Linguistics (Stanford, California, July 02 - 06,
1984). Annual Meeting of the ACL. Association for Computational
Linguistics, Morristown, NJ, 75-78, 1984.
- [theory] M. Elhadad and
J. Robin. Controlling Content Realization with Functional Unification Grammars. In
Aspects of Automated Natural Language Generation; R. Dale,
E. Hovy, D. Rosner and O. Stock editors, 89-104, Springer Verlag, 1992.
- [practice] FUF and SURGE pages
- [practice] M. Elhadad. FUF Manual. Technical report. Ben Gurion University
of the Nagev.
- [practice]
M. Elhadad. An Overview of SURGE: A reusable comprehensive syntactic realization component. Technical
report. Ben Gurion University of the Nagev.
- [practice] M. Elhadad &
J. Robin. SURGE: a Comprehensive Plug-in Syntactic Realization Component for Text Generation. Technical
Report 96-03, Dept of Mathematics and Computer Science, Ben
Gurion University, Beer Sheva, Israel.
- Weeks 4-6: Feature structures, Re-entrancy.
- Weeks 7-10: Traversal, Subsumption
- [intro] M. Elhadad. Types in functional unification grammar. In
Proceedings of the 28th meeting of the ACL, Pittsburgh, PA, 1990.
- [intro] A. Copestake. Type constraints and inheritance. Chapter 3, Typed
feature structures made simple. In A. Copestake, Implementing
feature structure grammars. (no e-copy. Copy will be on my office
door for xeroxing), 2000.
- [theory]
A. Copestake. Appendix: definitions of typed feature structures. Natural
Language Engineering (appendix to special issue on efficient
processing with HPSG) 6:1, 2000.
- [theory] S. Wintner and
A. Sarkar. A note on typing feature structures. Computational
Linguistics, 28(3):389-397, 2002.
- [practice] R. Malouf, J. Carroll and
A. Copestake. Efficient feature structure operations without compilation. Natural
Language Engineering 6:1, 2000.
- [practice] Relevant portions of the FUF, CFUF and NLTK code.
- Weeks 11-13: Unification.
- [intro] K. Knight. Unification: A Multidisciplinary Survey. ACM Computing
Surveys, 21(1), 1989.
- [intro] A. Copestake. Unification. Chapter 3,
Typed feature structures made simple. In A. Copestake,
Implementing feature structure grammars, 2000. (no e-copy. Copy
will be on my office door for xeroxing).
- [theory] G. Escalada-Imaz and
M. Ghallab. A practically efficient and almost linear unification algorithm. Artif. Intell. 36,
2 (Sep. 1988), 249-263, 1988.
- [theory] Martelli, A. and Montanari,
U. 1982. An Efficient Unification Algorithm. ACM Transactions on
Programming Languages and Systems 4, 2 (Apr. 1982), 258-282.
- [practice] Relevant portions of the FUF, CFUF and NLTK code.
- Week 14: Integration: generating Spanish.
- This is a practical session on how to amalgamate all the code
into a single operational program. We will test this program by
generating sample Spanish sentences.
- Week 16: Conclusions and wrap-up.
Resources
- To create feature structures in LaTeX, see here.
Important notes
- This course will involve programming. All algorithms will be
developed in Python. The choice of language is not arbitrary, but
in the attempt to contribute to the ongoing efforts of the
Natural Language Toolkit group.
- This course will involve a high dose of real, cutting-edge
computer science research. It is in the nature of scientific work
that we don't always know where we are going. Be prepared to spin
wheels sometimes. Those who need high levels of course structuring
should probably refrain from taking this course.
- Classes will be taught in Spanish.