Declarative Programming for Natural Language Generation

Contents
Overview
Algorithms
Specifications
Publications
Software
Links

Overview
Natural language generation (NLG) research aims at constructing systems that output speech or text to describe the content of internal computer states and representations.

The task brings two kinds of obstacles.

Specifications. NLG systems require large and complex resources that describe how to use language. The research challenge is to construct such resources quickly, correctly and generally.
Algorithms. Once you develop suitable communicative resources, the research challenge is to use them soundly and efficiently. Formulating the needed procedures can be difficult and involved.

I began by studing algorithmic aspects of NLG. This research culminated in the SPUD system described in the next subsection.

Because SPUD constructs sentences by reasoning from a declarative grammar, SPUD brings distinctive opportunities for exploring specifications in NLG. I explore these questions, which occupy me presently, in a later subsection.

Algorithms for NLG
NLG is typically broken down into stages of discourse planning (to select information and organize it into coherent paragraphs), sentence planning (to choose words and structures to fit information into sentence-sized units), and realization (to determine surface form of output, including word order, morphology and final formatting or intonation). The SPUD system combines the generation steps of sentence planning and surface realization by using a lexicalized grammar to construct the syntax and semantics of a sentence simultaneously.

The basic input to SPUD consists of a set of communicative goals to achieve. SPUD seeks to satisfy three kinds of constraint in mapping these goals onto a sentence that accomplishes them. Any fact to be communicated must be contributed directly or by inference from an element in an abstract grammatical structure, including lexical items. Any reference to a domain entity must be elaborated into a description that distinguishes the entities from its distractors (the salient alternatives to it in context). And all this conceptual material must be organized into a grammatical surface form.

SPUD builds its sentence step-by-step. Each step adds to the ongoing sentence a lexicalized entry encoded as an elementary tree in Lexicalized Tree-Adjoining Grammar (LTAG). By using these LTAG structures, SPUD always works with concrete grammatical surface forms. Each tree is paired with logical formulae that, by referring to a rich discourse model, characterize the semantic and pragmatic contribution that the tree makes to the sentence. By keeping track of these contributions as it builds the sentence, SPUD is able to assess the shared information that the sentence uses to identify discourse entities and the new information that the sentence adds about them.

The advantages of this scheme include these:

Syntactic constraints are handled early and naturally, ensuring the expressibility of selected concepts.
The order of adding content is flexible, allowing varied sentences without complicated search.
Grammatical knowledge is stated once only, inside the generator.
It enables efficient descriptions that refer implicitly to parameters from the context.
It enables efficient descriptions that exploit the hearer's ability to recognize inferential links to material elsewhere in the sentence.

These advantages make the SPUD algorithm a natural one to use to formulate natural and concise output for dialogue interfaces to independent applications.

Specifications for NLG
In NLG, the ideal way to specify linguistic resources is by example. In most NLG applications, you can readily find or construct gold-standard texts that you would like your system to generate. These texts themselves should determine NLG resources with minimal further effort. The research problem is to articulate methodologies for preparing texts, and procedures for automatic text analysis, that suffice to bootstrap a generator from data.

Now, clearly we need to annotate the syntactic structure behind the text - a tree or some similar formal description. We also need to know something about the speaker's intentions in producing the text - for example, the domain elements that the speaker intended to refer to. But what else?

We explored this problem manually, by building a SPUD grammar for a small set of sentences. We were surprised to discover a methodology that allows specifications to be developed systematically, by applying simple heuristics connected with clear corpus evidence. We have since been working to automate this methodology. For example, a slightly simpler syntactic formalism allows us to go directly from a treebank to syntactic grammar entries. Meanwhile we hypothesize that we will then be able to go automatically from an annotated utterance to a schema for the context the utterance presupposes and the communicative goals that the utterance addresses. These two steps amount to an algorithm that compiles a potential output sentence into the abstract input and the grammatical resources that would be required for SPUD to create the sentence.

Publications

Stone and Doran 97	Matthew Stone and Christine Doran. Sentence Planning as Description Using Tree-Adjoining Grammar. Proceedings of ACL 1997, pages 198--205. This paper outlines SPUD from first principles, with an emphasis on SPUD's linguistic representations and SPUD's choice of appropriate syntactic and referential forms in discourse context.
Stone and Doran 96	Matthew Stone and Christine Doran. Paying Heed to Collocations. Proceedings of INLG 1996, pages 91--100. This is a survey of computational semantics in SPUD. By combining lexical semantics that refers to a wide variety of entities, a discourse model that assigns those entities differing levels of salience, and a generation process that models reference to those entities, SPUD can generate many kinds of collocations incrementally and compositionally.
Stone and Webber 98	Matthew Stone and Bonnie Webber. Textual Economy through Close Coupling of Syntax and Semantics. Proceedings of INLG 1998, pages 178--187. This paper investigates efficient descriptions of objects that exploit inferential links to information elsewhere in a sentence. Such descriptions are possible only if, like SPUD, a generator can consider syntax and semantics simultaneously, and can assess quickly and reliably how the hearer will interpret the current (possibly incomplete) sentence.
Stone 00	Matthew Stone. On Identifying Sets. INLG 2000, pages 116-123. This paper aims at developing a computational account of the interpretation of plural noun phrases which would allow SPUD to generate descriptions of sets of entities. The challenge is to avoid listing all sets of salient entities in representing the interpretation, and to avoid searching over sets of salient entities in constructing interpretations using the grammar. A preliminary version appeared in the 1999 ESSLLI workshop on the generation of nominal expressions.
Stone et al 00	Matthew Stone, Tonia Bleam, Christine Doran and Martha Palmer. Lexicalized Grammar and the Description of Motion Events. Fifth Workshop on Tree-Adjoining Grammar and Related Formalisms (TAG+) 2000. This paper works out a methodology for designing grammatical resources for verbs that allow SPUD to match uses of verbs in a corpus. In particular, we show how SPUD allows the syntax, the syntax-semantics interface, and the semantics of entries to be formulated separately, relying in each case on specific distributional evidence.
Cassell et al 00	Justine Cassell, Matthew Stone and Hao Yan. Coordination and context-dependence in the generation of embodied conversation. This paper applies the SPUD system and the methodology for constructing SPUD specifications to the problem of generating communicative behaviors, including speech, intonation and gesture, for an embodied conversational agent.
Stone et al 01	Matthew Stone, Christine Doran, Bonnie Webber, Tonia Bleam and Martha Palmer. Microplanning with communicative intentions: the SPUD system. This paper provides a systematic description of SPUD, including motivations, algorithms, examples and methodology for specification. Distributed on arxiv.org, and as RuCCS TR 65, 2001. Comments welcome and appreciated.
Stone 02	Matthew Stone. Lexicalized Grammar 101. ACL Workshop on Tools and Methodologies for Teaching Natural Language Processing, 2002, pages 76-83. This paper advocates a lightweight version of TAG for teaching and exploratory research, which allows the straightforward lexicalization of treebank parses and a natural implementation of the SPUD algorithm.
Stone 03	Matthew Stone. Specifying Generation of Referring Expressions by Example. AAAI Spring Symposium on Natural Language Generation in Spoken and Written Dialogue, 2003, pages 133--140. This paper starts to look at how to construct knowledge bases automatically from annotated data, which allow SPUD to generate specified sentences.

Software

SPUD lite

A new lightweight implementation of SPUD in Prolog, from June 2002. Includes a parser/interpreter for the same formalism and model of interpretation, and code to infer context and communicative goals from examples.

SPUD v0.01

The old, clunky SPUD
This version is provided to evaluate SPUD for research purposes only (Feb 2, 1999).

SPUD is written in SML of New Jersey, a free fast functional programming language.

Examples

SPUD data files
Many people find that the best way to understand SPUD concretely is to take a concrete look at how you write a grammar for SPUD and how you specify to SPUD the knowledge base and communicative goals underlying particular sentences and what output it produce in response.

Links

SPUD Coauthors	Tonia Bleam Justine Cassell Christy Doran Martha Palmer Paul Tepper Bonnie Webber
NLG Resources	ACL SIGGEN John Bateman and Michael Zock's list of NLG systems
Related NLP	InDiGen The XTAG English TAG Grammar Comp Ling Tools at Penn

August 1, 2002