Modelling the Evolution of Language

EPSRC Grant GR/N01118

Home

Summary     Objectives & Workplan       Background         Results        Publications        Software

 

 

RESULTS OF RESEARCH

The description of results will be organised around the major research issues addressed during the research. For more details on the individual models and results, please download the publications.

§         Grounding transfer in neural networks

§         Evolution of syntax

§         Co-evolution of language and brain

§         Language universals

§         Language in evolutionary robots

 

1 Symbols and Words: Grounding transfer in neural networks

During the research two new models were developed to address the symbol-grounding problem (Harnad, 1990) and grounding-transfer capabilities of neural networks. The first model was based on an extension of Cangelosi et al. (2000) work to deal with larger category sets and to look at different strategies for the transfer of grounding in new categories. The architecture of the network was significantly changed. The connections were modularly organised, by dividing hidden units into two groups, one for processing shapes, and one for features. The networks were first trained to classify retinal images using the backpropagation algorithm. Training stimuli consisted of four animal shapes (e.g., horses) and four texture features (e.g., stripes). Subsequently, nets were trained to name these stimuli (entry-level stage). Learning occurs during these two phases trough direct trial and error experience supervised by corrective feedback (‘sensorimotor toil’). Therefore, names acquired this way can be considered as symbols grounded in retinal input. During the third training phase (higher-level stage) the nets acquired new names (e.g. "zebra") defined on the sole basis of symbolic strings containing combinations of previously grounded names (‘symbolic theft’). In the following test phase (symbol transfer test), new retina images exhibiting combinations of previously learned shapes and features were presented to the networks (e.g. images of zebras obtained by combining a horse shape and the stripe feature). Although these nets had never seen these images before, they were able to correctly categorise and name images with entry- and higher-level symbols. This result clearly showed that grounding is "transferred" from directly grounded names to higher-order ones. Moreover, the networks were able to give the correct sensorimotor response when they received the name of a higher-level category in input (inverse grounding transfer).

In a different model, the architecture of the network was changed (Figure 1 Left). Modular connections were kept only between the hidden and output layer, while the input and hidden layer were fully connected. In fact, the previous separation of connections from regions of the input retina to groups of hidden units was implausible and only due to the peculiar organisation of stimuli. In this new simulation, a new and expanded stimulus set was used (Figure 1 Right). Again, the networks were able to transfer the grounding to new categories acquired via language.

 

Figure 1: (Left) Neural network architecture for second model with hidden-output modularity. (Right) Sample of images for categorisation training and test

 

All these results support the approach to symbol grounding based on fully connectionist models. The same network processes both the sensorimotor grounding and the acquisition of new categories through symbolic learning. The modular organisation of the hidden units suggests that it is important that sensorimotor grounding is separated for different classification features. In fact, when a fully distributed network was used, the grounding transfer was difficult to achieve in scaled-up models.  We are currently working on various extensions of this model. For example, in order to improve the psychological plausibility and scalability of connectionist approaches to the symbol grounding, new algorithms are being used, such as Kohonen’s self organizing maps and Hebbian learning the basic categorization stage. These extensions will be part of Thomas Riga’s PhD research programme in Plymouth.

2 Evolution of syntax

Two main simulations were conducted to study the evolution of syntax in grounded multi-agent systems. Both are based on the original model of the PI (Cangelosi 1999; 2001) that showed the emergence of early syntactic categories resembling the word categories of verbs and nouns. A first study directly expanded the 1999 model (mushroom foraging scenario) and looked at the interaction between language and learning in the form of the Baldwin effect. In the second study, a completely new model was developed to analyse the evolutionary acquisition of verbs and nouns in object manipulation tasks.

The Baldwin Effect has been explicitly used as an argument for the explanation of the origins of language and the evolution of a Language Acquisition Device. In new simulations on the evolution of compositional languages in foraging agents (Cangelosi 2001), the role of cultural variation and of learning costs in the Baldwin Effect was specifically addresses. Results showed that when there is a high cost associated with language learning, agents gradually assimilate in their genome some explicit features (e.g. lexical properties) of the specific language they are exposed to. When the structure of the language is allowed to vary using a process of cultural transmission, Baldwinian processes cause, instead, the assimilation of a predisposition to learn, rather than any structural properties associated with a specific language (Figure2). The analysis of the mechanisms underlying such a predisposition (using categorical perception techniques) supports Deacon's hypothesis regarding the Baldwinian inheritance of general underlying cognitive capabilities that serve language acquisition. This is in opposition to the thesis that argues for assimilation of structural properties needed for the specification of a fully blown Language Acquisition Device (Pinker & Bloom, 1990.

Figure 2 - The difference in learning error between the initial and final generations indicates the presence of the Baldwin effect for a predisposition to learn. Rather than possessing the full lexicon in the first epoch, the network starts with a high error level, which decreases quickly in few generations.

The second model used a different behavioural task for the evolution of verbs and nouns. It simulated a simple two-segment arm that had to manipulate objects in a 2-D environment. The lexicon of verbs (names of actions) and nouns (names of objects) was not evolved autonomously by the agents, but was provided externally. The analyses of results shed some light on the reciprocal influences between language and non-linguistic cognition, on the differences between nouns and verbs, and on the internal organization of neural networks that use language in an ecological context. In particular it was shown that language has a beneficial effect on non-linguistic cognition if it emerges on the already existing basis of non-linguistic skills, but not if it evolves together with them. The basis for this beneficial influence of language on behaviour appears to be that language produces better internal categorical representations of reality. That is, more similar representations of different situations that must be responded to with the same action, and more different internal representations of similar situations that must be responded to with different behaviours. This effect is accentuated in verbs (Figure 3). Verbs have a more beneficial effect on behaviour than nouns because verbs, by their nature, tend to covary with the organism's actions while nouns tend to covary with the objects of reality that may be responded to with different actions in different occasions. Finally, the model also permits some comparisons between the computational model of language evolution and the literature on children’s language acquisition. It shows that the evolution of nouns precedes that of verbs, as observed in children’s language development (Tomasello & Brooks, 1999)

Figure 3 - Categorical perception measurements in the model of the evolution of verbs and nouns. Note the increase in between-category distances between no_language and all language conditions, and between noun_only and verb_only conditions.

 3 Co-evolution of language and brain

Categorisation and language are some of the fundamental abilities of cognitive organisms. Computational modelling through neural networks permits the investigation of the functional role of categorical perception in category learning and language acquisition. In particular, the use of neural network models permits the investigation of the neural mechanisms underlying both learning phenomena. Categorical perception have been hypothesized to constitute the groundwork of cognition, as in the case of the acquisition and evolution of language (Harnad, 1987; Cangelosi & Harnad, 2000). In addition, it has been hypothesised (Deacon, 1997) that the co-evolution of language and other cognitive and symbolic abilities have played a major role in the evolution of human language.

As seen in Figure 3, evolutionary neural networks produce enhanced categorical perception effects have been found in syntactic categories such as nouns and verbs. When the network must respond to the same object in different contexts with different actions (verbs), the similarity space of verbs is optimised with respect to that of nouns (Cangelosi & Parisi, 2001). It was also hypothesized that verbs have a more beneficial effect on behaviour than nouns because the latter tend to covary with the network's sensorimotor tasks (actions/verbs), while nouns tend to covary with the objects of reality that may be responded to with different actions in different occasions. To understand better the neural mechanisms behind category learning, language processing and sensorimotor knowledge, the method of synthetic brain imaging (Arbib et al., 2000) has been applied to these artificial neural networks. The PI adapted Arbib’s et al. computational neuroscience method to the evolutionary connectionist models used in this research. Analyses on data obtained from different experimental conditions (e.g. manipulations of the network architecture) showed that the representations of perceptual categories and syntactic classes are sensitive to the internal organization of the network and to the level of integration of linguistic information with sensorimotor knowledge (Cangelosi & Parisi, 2003). Moreover, these models show functional organizations that reflect those observed in human experiments (Cappa & Perani 2003). For example, the synthetic brain imaging on the evolutionary model showed that verbs are active in the network module specialised for integrating sensorimotor knowledge (corresponding to the Prefrontal motor cortex), while nouns are active in the sensory/associative processing module (corresponding to associative, temporal areas of the brain) (Figure 4).

Figure 4 – Data for synthetic brain imaging (fMRI) in the neural network. Note the high activation for nouns in the first hidden layer (sensory processing module) and the high activation for verbs in the second hidden layer (sensorimotor integration module).

These neural network and synthetic brain imaging studies support a series of general hypothesis on the interaction between category learning, language emergence and the evolution of the brain. First, categorical perception induced by language can be seen as an instance of the Whorfian Hypothesis (Whorf, 1964). Our language influences the way the world looks to us. Second, the enhancement of dissimilarities in the category similarity space due to language acquisition (symbolic theft) and its beneficial effects in the emergence of language highlight some of the evolutionary and adaptive advantages of language. This can also be used to support Deacon’s hypothesis on the co-evolution of language and brain. Finally, the use of evolutionary models of category and language learning produces some functional and architectural equivalence between cognitive computational models and real organisms.

4 Language universals under cognitive constraints

It is not always straightforward to draw conclusions about the evolution of characteristically human language from the results of abstract computational models (Turner, 2002). For instance, in Kirby’s (2001) simulations, a combinatorial syntax emerges in the languages of the agents, but the ability to combine symbols to create composite meanings is as equally characteristic of computer programming ‘languages’ and other language-like systems as it is of natural human languages. Some examples of properties that are unique to the syntax of human language are case constraints (e.g., in English these constraints determine where a speaker can use he versus him), agreement constraints (e.g., in English these constraints make utterances like “I are happy” ungrammatical), binding constraints (e.g., in English these constraints require him to refer to someone other than George in a sentence like “George attacked him”, but allow co-reference in a sentence like “George’s father attacked him”), constraints on displacement (e.g., in English these constraints allow the man to appear in different structural positions in paraphrases like “The dog bit the man” and  “The man was bitten”), constraints on word-order (e.g., subjects precede verbs in the basic word order of all languages), and other constraints on structure (e.g., sentences in all languages are hierarchically structured in a way that can be represented using binary trees). All of these constraints are found in some form in all human languages, but there is almost no research in the literature addressing their evolutionary origins. One of the research activities made possible by the grant was the development of a new computational methodology to begin to address this question.

The method, developed by Huck Turner, comes from the realisation that computers now make it possible to perform optimisations very rapidly so that hypotheses of the form “language constraint X is optimal for Y” can be tested realistically. For instance, if it could be shown computationally that the word orders observed in human language texts are optimal for minimising demands on working memory, then given that there are many more ways that words could be ordered that are not optimal, it would be surprising if working memory didn’t have something to do with the evolution of the specific word orders that we observe. Indeed this specific hypothesis has received preliminary support under the research undertaken so far.

In exploring the link between word order and working memory, it was necessary to develop a theory of what is represented when an utterance is held in working memory. The theory that Huck Turner developed is essentially a representational variant of the derivational theory of Chomsky (1995, chapter 4), but with some modifications. Under the theory, a representation of a sentence’s structure can be completely specified by the unordered set of ‘dependencies’ that hold between its tokens. We therefore have a hypothesis about what is held in syntactic working memory when a sentence has been parsed. The descriptive adequacy of the theory was tested by applying it to the description of the first chapter of The Wizard of Oz. It was unclear how the formalism could be used to describe certain phenomena appearing in this sample including idioms, aspects of non-declarative quoted speech, and co-ordinated structures arising from the use of conjunctions, but most of the rest of the text can be described with only trivial modifications.

The theory was implemented in a computational parser to demonstrate how this representation could be constructed from an input text. The parser identifies dependencies by matching features of lexical tokens. The system has a number of interesting properties that may have useful applications: 1) It can analyse the structure of a sentence incrementally so does not need to wait until the whole sentence is input to start processing, 2) it can determine the category of unknown words based on the context in which they occur thus providing the potential to automatically acquire syntactic information, and 3) text from different languages can be mixed together in the same file and parsed without having to specify which part of the text is from which language.

Word-order optimisation studies are on-going. Huck Turner is in the process of producing a version of the Wizard of Oz sample text with optimised word order to see if the resulting word order is the same as that of the original English text. The variable that is being optimised is the length of the dependencies in the structural descriptions of the sentences – a property that is a function of word-order specifications. If the optimal word order of the resulting text is the same as English word order, then it will suggest an explanation for why English has the word order that it has.

5 Emergence of communication in evolutionary robots

Evolutionary robotics approach has been successfully applied to the synthesis of robots able to exploit sensorimotor coordination (Nolfi, 2002), body and brain co-evolution (Lipson and Pollack, 2000) and competing and cooperative collective behaviours (Baldassarre et al., 2002). In this grant, a new model was developed by Davide Marocco to do new experiments on the emergence of communication. They are based on an extended version of Nolfi and Marocco’s (2002) model for the emergence of sensorimotor categorization. In this new model, the robotic agents share the explicit categorization of objects (spheres and cubes) with which it interacts (Figure 5). The activation of the output linguistic nodes in the robot’s neural controller is the signal (“name”) sent to another agent to instruct it on what to do with the object. Agents will be selected on their ability to manipulate objects correctly, not on their (linguistic) ability to name them correctly. A variety of experiments were executed to test the role of different sensorimotor, social and cognitive factors in the emergence of communication.

Figure 5 - View of the evolutionary robotic arm while it interacts with the sphere

The simulation of this evolutionary robotics model of the evolution of communication showed that: (a) the emergence of language brings direct benefits to the agents and the population, in terms of increased fitness and comprehension ability; (b) there is a benefit in communicating with your kin-related agents (e.g. between parents and children), since this improves the possibilities of successfully evolving shared lexicons also by maintaining stable and reliable signals; (c) good sensorimotor and cognitive abilities permit the establishment of a link between production and comprehension/behavioural abilities; (d) the kinship relation between speaking parents and listening offspring does not fully explain the emergence of communication – instead, this is important in the early stages of communication because it exploits the cognitive benefits of positive production/fitness correlations.

Most of these results also have important implications for the theories and hypotheses on the origins of language. For example, this simulation highlights and explains the role of cognitive factors in the emergence of communication (Burling, 1993). In particular, the model supports the hypothesis that the ability to form categories constitutes the grounding for the subsequent evolution of words and language (Harnad, 1996; Cangelosi & Harnad, 2000 cf. also §2.1). In addition, future developments of this model could also have an impact on computational investigations of the mirror neuron hypothesis for the origins of language (Arbib, 2002).