(funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 751250.)
What is ThReDS about?
One of the most fundamental human faculties is reference: the capacity to use signs to identify things in the world – from familiar concrete objects to complex abstract thoughts – and bring them to the mind of another. This extraordinary ability is at the core of many forms of human exchange, from asking for the salt at the dinner table to collaboratively building a solar probe. Still, the process is poorly understood: we do not know how humans build a shared representation of their environment and use it to refer successfully. The goal of this proposal is to propose a new linguistic model of the process by which people understand and produce references, and to test this model empirically through a computer simulation. This includes modelling the acquisition of the shared knowledge representations that speakers use to generate appropriate identifying descriptions in a variety of linguistic ways.
Let’s imagine that I am telling someone about my weekend. I explain that I visited a place called ‘the Killerton Estate’ and was particularly struck by the magnificent gardens. I utter the sentence The oaks in the park were planted in the 15th Century! Later on, my interlocutor has forgotten the name of the estate and asks me the following: What was the name of the place with the old trees? This simple exchange involves several fundamental semantic phenomena: the hearer has processed my description of Killerton and built a representation of the place which integrates information about the trees in the park (demonstrating knowledge acquisition). When struggling to recall its name, she decides that the oaks are a particularly good identifier for the place (showing the ability to choose discriminative features for an entity). Finally, she does not refer to them in the way I did, using the words oak and 15th Century. Instead, she uses the description the old trees, having inferred that oaks are trees and that a tree planted six centuries ago is old (using inference over background knowledge). I suggest that a full model of reference should accommodate a joint formalisation for all three phenomena. To this date, no such model exists.
In computational linguistics, the linguistic realisation of the act of reference is studied in the field of Referring Expression Generation (REG).1 REG is the task of automatically producing a fragment of language (a ‘referring expression’) that uniquely identifies an object in a domain (the old trees). Typically, the domain is modelled via a simplified representation, in the form of a set of attributes for each entity in that domain. The algorithm then selects the attributes that best identify the object under consideration and uses them to construct a natural-sounding referring expression. One limitation of REG is that it entirely relies on having a prior representation of the domain, associated with a fixed vocabulary. This fails to account for the way speakers acquire such prior knowledge. Further, the domains explored in REG contain individual instances only rather than the actual mix of individuals, pluralities and concepts found in natural language. In theoretical linguistics, model-theoretic semantics (MTS) gives an account of reference which assumes a correspondence relation between linguistic entities (words, phrases, sentences) and the actual world: e.g. the word tree corresponds to the set of individual trees in the world.2 The world itself is represented as a ‘model’, i.e. a precise description of the entities and events in that world, as well as their properties and relations. MTS is truth-conditional, in that it provides tools to check the truth of a sentence given the model. This allows for both lexical and logical inferences to be performed. MTS has also shown interest in accounting for the more dynamic and cognitive side of models (e.g. update processes).3 But there is still no approach that would explain model acquisition in a large scale fashion, from the type of unrestricted output that humans are exposed to.
ThReDS develops the hypothesis that a full simulation of the reference mechanism can be built by combining the logical aspects of MTS, the algorithms of REG, and a representation of meaning which has had little to say about reference so far: distributional semantics (DS). DS defines meaning through usage.4 The meaning of the word tree, for example, is the set of contexts (linguistic or otherwise) associated with tree – its so-called distribution. Each context is weighted so as to show how characteristic it is for the word under consideration (leaf is more characteristic for tree than glass). The resulting representation is a point in a vector space, where the space’s dimensions are contexts. In current experimental settings, distributions are typically obtained from large text corpora, sometimes enriched with perceptual input (images, sounds), suggesting a data-driven account of how word meanings are acquired. DS has proved to be very powerful in modelling a range of lexical semantics phenomena, ranging from word similarity5 (tree and hedge are more similar than tree and castle) to hyponymy6 (oaks are a subset of trees) and even some aspects of composition7 (e.g. a model of how old and tree combine to form the meaning of old tree). It also provides a cognitively plausible account of how concepts are acquired from raw data: we know that, at least at the coarse-grained level, DS representations can model brain activation.8 It has however failed to account for logical phenomena that MTS naturally models, such as quantification. It has also only focused on modelling generic, conceptual information. It is therefore unclear how DS should be transformed to represent the specific attributes of individual entities and sets of entities.
In ThReDS, I propose to build a computational framework able to derive an MTS-like model of the world from language data (using DS), and to use it to produce references understandable by humans, with REG. The approach has three steps. 1) Standard DS methods are extended to provide representations for individuals and sets of things (e.g. from the sentence The oaks in the park were planted in the 15th Century, we want to derive a vector for the oaks which encodes not only information about co-occurring words such as park and plant, but also general knowledge about oaks). 2) From the resulting distributional model, we derive a world representation in the form of a set-theoretic model (e.g. from what was said about the oaks we infer that they are in the set of trees and the set of old things). 3) From the set-theoretic model, we generate new referring expressions to the entities we know of (e.g. the old trees).
Several questions will be answered: Q1) Can distributions be built for individuals and groups (rather than just for concepts)? Q2) How can we derive set-theoretic models from distributions? Q3) How should such models be represented to allow the use of REG algorithms over them? Q4) What kind of semantic theory is compatible with Q1-3?