Connectionist Modelling of Quantifiers

EPSRC Grant GR/S26569

Home

Summary     Objectives & Workplan       Background         Results        Publications

 

 

RESULTS OF RESEARCH

The following is a summary of the key advances from the experimental and modelling research. For more details on the individual experiments, models and results, you can read the articles published from this research and available in the publications page.

1. Introduction to Vague Quantifiers

 (see background page)

2. Key Advances

2.1 Initial Theoretical Developments and Experimental Findings

Early on in the grant we became interested in the issue of why context effects exist for quantifiers, and we became aware that the literature on context effects does not provide adequate explanation of the origins of such effects. Exploring the number judgment literature, studies have revealed at least three strategies used by the brain: a fast and accurate processing of small groups of four or fewer items in almost constant response time (subitizing), a slow process of serial counting of more than five (less than 9) objects, and a more error prone estimation process for larger groups of objects (>9) (e.g. Trick & Pylyshyn, 1993; Mandler & Shebo, 1982). We suspected that when participants are asked to produce or comprehend  linguistic descriptions of scenes, they may base their linguistic judgments on the more error prone estimation process (as it takes too much time to count objects), and therefore we hypothesised that contextual variables that affect quantifier judgements are precisely those variables that affect the error prone number estimation process. So we predicted that quantifier comprehension and production is based on estimated number rather than actual number, and that the origins of context effects may reside in perceptual processes (consonant with the work of Barsalou, 1999; Zwaan, 2004 and Coventry & Garrod, 2004).

2.1.1 Experiments 1 – 3: Context effects and quantifiers judgements: using sentences without pictures

The first experiments[1] in the grant did not involve the use of visual scenes at all. We hoped that we would be able to establish the relative influence of a range of context variables using sentences alone with careful pilot work to select objects and materials. Experiment 1 used pretests to establish the estimated set size, size, etc, for a long list of animals (e.g., pandas, horses, ants, ladybirds, etc.). We were then able to select animals (NOUNS) controlled for these variables in order to establish the relative extent to which these variables affect the ratings of quantifiers to describe a given number of those animals. For example, There are 15 pandas versus There are 15 bears controls for size while manipulating set size.  As the set size for pandas is smaller than the set size for bears, we expected that low magnitude quantifiers would be more appropriate for the bears than for the pandas. The rating scales used for this (and all later experiments) were of the form There are QUANTIFIER NOUNS, and participants had to rate the appropriateness of the sentences to describe the number of objects in the text (or in the pictures in later experiments), using a scale from 1-7 where 1 = totally inappropriate and 7 = totally appropriate. The terms used included two low (a few, few) and two high magnitude quantifiers (many, lots of), together with a mid-range magnitude quantifier (several), thus sampling a range of types of vague quantifiers (consistent with those used in previous studies). The results of the first experiment did not reveal any context effects, but on debriefing it became clear that participants’ knowledge of animals was extremely variable, making the manipulations less controlled than was apparent from the pretests. Despite using a multi-level modelling approach to the analysis in order to take this individual variability into account, the data were not clean enough to draw any meaningful conclusions. In order to alleviate these problems, Experiments 2 and 3 switched to using abstract objects in order to provide participants with the data about set size, size and other context variables. For example, participants were given the following: ARGS live on the planet BOK. There are 1000 ARGS on BOK. They are small and furry creatures. There are 20 ARGS. There are QUANT ARGS (rated in relation to the previous sentence). Aside from a clean effect of set size, the results showed no evidence of other context effects. This appeared to be because participants were not paying attention to the information given (despite pitching the experiment as a memory experiment in Experiment 3), and/or because the information about number was given explicitly in the text. This led us to believe that context effects may have to do with estimation procedures used for real (or imagined scenes), and therefore later experiments used real images rather than sentences with numbers explicitly cued. 

 

 

 

2.1.2 Experiments 4-8: Manipulating number of objects and a range of contextual variables in visual scenes

The task involved participants rating how appropriate sentences of the form There are [QUANTIFIER] striped/white fish were to describe given pictures of fish (whilst ensuring that the effects found for fish also occur across a wider range of materials). The rating scales and quantifiers were the same as those used in Experiments 1-3. The scenes used varied the number of striped and white fish present. The fish mentioned in the sentence to be rated will hereafter be calls the focus fish, and the other fish not mentioned will be called the other fish, but we varied whether the white fish or striped fish were the focus fish. The number of focus fish varied from 3-18 in increments of three, and the number of other fish varied from 0-18 in increments of three. Participants were instructed that they had to rate how appropriate each sentence is to describe each picture. Sentences were presented together under each picture, always in the same order (from low to high magnitude quantifiers). The order of pictures was randomized.

All the experiments manipulated the number of focus fish and the number of other fish present. In addition Experiment 4 manipulated the spacing between fish (spaced apart or close together). Experiment 5 manipulated grouping (fish mixed together or grouped), Experiment 6 manipulated both spacing and grouping, and Experiment 7 manipulated the function of the fish (whether the focus fish and other fish were all blowing bubbles, or whether only one group was shown blowing bubbles). Figure 1 illustrated the first three manipulations.

Experiment 8 differed from Experiments 4-7 in that it involved the use of abstract objects rather than real objects so that we could manipulate set size in a controlled fashion (as in Experiments 2-3). Using e-prime computer presentation, participants saw a picture showing all the ARGS on the planet ZOG and then they saw a picture showing a number of ARGS (3-18 as before), and had to judge the appropriateness of sentences of the form The are QUANTIFIER ARGS to describe the picture.

The results of these experiments show consistent effects of all the contextual variables on quantifier judgements. We therefore identified a range of new context effects for vague quantifiers, as well as delineating their relative importance and how they interact with each other, and with number and quantifier.

2.1.3 Experiments 9-10: Number Judgement Experiments

Using the same pictures used in Experiments 4-8, we wanted to establish if the same new contextual variables that affect quantifier judgements also affect number judgements under time pressure for the same scenes when presented for a short period of time. If so, this would provide evidence that quantifier judgements may be due to estimates of the number of objects present in the scenes to be described. Consistent with the number literature reviewed above, when the number of objects increases beyond a small number, we expected that the estimated numbers given by participants would deviate from the actual number of objects present. Furthermore, we were also interested to establish whether spacing, grouping and set size also affect number estimates, and if so we would be able to extend understanding of the error prone number estimation process as well as tracing the origins of such effects for language comprehension.

Participants in Experiment 9 had the task of estimating how many fish were shown in the scenes used in Experiments 4-7. Scenes were presented in blocks, and at the start of each block participants were told to estimate the number of either striped fish or white fish. Scenes were randomized within blocks. Practice trials were given at the start of each block to ensure participants were estimating the right type of fish, and a reminder prompt was given at the beginning of test trials. For each trial, a fixation cross was presented in the middle of the screen for 500msec followed by the scene for 500msec followed by a mask (a chequered board) presented with a space in it in which to type estimates. Participants responded by typing in their estimated numbers using the computer keyboard. Experiment 10 used the same methodology, but using the materials from Experiment 8. The results showed clean effects of all the contextual variables, consistent with the effects founds for quantifier ratings. Therefore, we identified a correspondence between number estimates in visual scenes and quantifier judgements for the first time.           

2.1.4 Experiment 11: Counting versus estimating

If estimates of numbers of objects in a scene provide an adequate explanation for context effects, we reasoned that asking participants to count the number of fish in a visual scene prior to rating quantifiers would result in accurate information about number, and therefore would eliminate the context effects found in the previous experiments. Using a between subjects design, we re-ran Experiment 6, but with a counting condition. As predicted, context effects were found for the non-counting condition, but disappeared for the counting condition.

2.1.5 Experiments 12-14 Varying the similarity of other objects to focus set

Experiment 7 failed to find an effect of the similarity of the function of the focus fish and the other fish present on quantifier judgements. Given that we found effects of function for quantifiers previously (e.g., Newstead & Coventry, 2000) as well as for other closed class categories (e.g., Coventry & Garrod, 2004), we were surprised by this. We therefore re-ran Experiment 7 using materials involving stronger functional relations. For example, the focus objects included men playing guitars, and the other objects were either women playing guitars or woman not playing guitars. In Experiment 12 we varied the number and similarity (same function or no function) of the other objects. In Experiment 13 we varied the similarity of species of the other objects as well as function (e.g., woman playing guitars or monkeys playing guitars), and in Experiment 13 we varied the grouping of other objects in addition to species similarity and functional similarity. All of these variables affected quantifier judgements, and furthermore the results suggest that similarity of both form and function are important in determining the relevance of the other objects in the scene. 

2.1.6 Additional Experiments

Experiments 15-16 were replications of Experiments 6 and 8, only in the Italian language. The results for Italian directly mirrored the results found for English, indicating that the results we found for English are not language specific.

2.2 Development of Computational Model and Simulation Results

2.2.1 Architecture of the model

The computational model consists of a hybrid artificial vision-connectionist architecture (Figure 2). The model has four main modules: (1) Vision Module, (2) Compression Networks, (3) Quantification Network, and (4) Dual-Route Network. This architecture is partially based on a previous model on the grounding of spatial language (Coventry et al. 2005; Cangelosi et al. 2005). The overall idea was to ground the connectionist and linguistic representation of quantification judgments directly in input visual stimuli.

The Vision Module processes visual scenes involving varying quantities of two types of objects: striped fish and white fish. It uses a series of Ullman-type vision routines to identify the constituent objects in the scene. The input to the Vision module consists of static images with the two kinds of fish. The system must pay attention to striped fish, whilst white fish are only used as distracters (or vice versa). The input images are processed at a variety of spatial scales and resolutions for object features yielding a visual buffer. The processing of each image results in two retinotopically organized arrays of 30x40 activations (one per fish type). The output of the vision module represents data of isotropic receptive fields.

The Compression Networks are needed to convert the output data from the vision module into compressed neural representations of the input scene. This is to reduce the complexity of the vision module output. Two separate auto-associative networks are used, one for each of the object types in the scene (stripy fish and white fish). Both networks have 1200 input and output units, and 30 hidden units. The activation values of the hidden units are utilized by the following networks to make quantification and linguistic judgments. The compression network for each type of fish learned to autoassociate all the stimuli with varying numbers of fish.

The Quantification Network is a feedforward multi-layer perceptron trained to reproduce the quantification judgments of the number of fish made by participants during experiments (Experiment 9). Some simulations only focused on the estimation of the number of focus fish, those the participants are asked to consider when making quantification decisions (section 2.2.2). In other simulation experiments, the same network has to estimate both sets of fish (section 2.2.3). In this case, the network has 60 input units (30 per compressed fish type), 50 hidden nodes, and 2 output nodes. Each output node has a modified activation function that produces activation values in the range 0 to 20, to include the actual range of 0 to 18 striped fish used in the stimulus set.

Figure 2. - Modular architecture of the model

The fourth module consists of a Dual-Route Neural Network. This architecture combines visual and linguistic information for both linguistic production and comprehension tasks (Cangelosi et al. 2000; Plunkett et al. 1992). This is the core linguistic component of the model, as it integrates visual and linguistic knowledge to produce a description of the visual scene. The network receives, in input, information on the scene through the activation values of the compression (and quantification) networks’ hidden units. It then produces, in output, judgments regarding the appropriateness ratings for the quantifier terms describing the visual scene. The activation values of the linguistic output nodes correspond to rating values given by participants for the five quantifiers considered: a few, few, several, many, lots of. After training with data from psycholinguistic experiments, the network is capable of producing two different outputs: (1) acceptability ratings for quantifiers given only the vision inputs (language production) and (2) imaginary output pictures, given only a description of the scene in terms of quantifiers (comprehension). Results of the simulation on the production route (predicted ratings for the quantifiers) can be compared to the actual ratings from experiments with human participants.

2.2.2 Simulations on Psychological Number Judgements

Two main directions of research in connectionist counting modelling can be identified in the literature. The first approach focuses on the tasks of learning number sequences and sequential series (e.g. Rodriguez et al., 1999). The second approach has instead focused on the tasks of counting of the number of objects in input visual scenes (e.g. Dehaene & Changeux, 1993). In the computational model we have developed, we draw a distinction between the actual number of objects in a scene presented for description, and “psychological number” (estimation of numbers as in Experiments 9-10 above).

Here we report data on the compression autoassociative networks and the quantification experiments. Results with the autoassociative networks show that the model is able to learn both training stimuli (average RSM error of 0.04) and novel generalisation stimuli (average RSM error of 0.081). This permits a significant reduction of complexity of the 1200 output values of the visual module into only 25 compressed hidden activation values. Results with the quantification networks also are good. Networks have an average training error of 0.042 (= 0.02% considering the single output node with activation range 0-20) and generalisation error of 1.56 (=8%).

The quantification network already provides useful insights for the identification of important factors in psychological quantification. For example, the analysis of the hidden activation of the quantifier network highlights the factors that play a major role in the production of quantification judgments, and compares these to data from psychological experiments. We have carried out some analyses of the hidden layer activations of the network, using principal components factor analysis (PCA). These indicate that the networks use both the information on the number of fish and the different spacing between fish. The first factor, that explains 59.43% of variance, clearly groups the hidden representation of scenes by the number of fish in it. The second factor groups stimuli by the two sizes of inter-fish distances, explaining 14.41% of the variance.  The relevance of these two mechanisms is consistent with the results of psychological experiments, where both the number and the spacing factors are statistically significant. However, in contrast to the empirical data, the network does not seem to use the information on the grouping of fish (separate groups vs. mixed stripy/white fish). The PCA only shows a marginal effect on the placement of the stripy fish when these are separated by the top/bottom position. A reason for the lack of the effects of grouping in the network hidden data could be explained by the fact that the network only processes the stripy fish data (for these simulations). As a result, “grouping” does not mean much to the network, except for the cases in which the grouping causes the placement of all the stripy fish in the top or the bottom part of the scene. Further investigation of the grouping effects therefore required the integration of both stripy and white fish in the same quantification network.

2.2.3 Simulations on Quantifier Ratings

In the simulation on the ratings of linguistic quantifiers, different combinations of the architecture described in 2.2.1 were used. The Quantification network could be included, or not, depending on the theoretical assumption on the role of number estimation during linguistic quantifier rating. For example, when only the compression network output is used, the dual-route network will require 60 input visual units and 5 input linguistic nodes, one for each of the 5 quantifiers. The 60 visual nodes corresponded to the 30 hidden units of the two compression networks (of stripy fish and white fish). The output layer has the same number and type of units as those in the input layer.

The model uses as input stimuli to the vision module 216 scenes used in quantification experiments with participants (Experiments 4-7). The 216 scenes are first presented to the vision module. Its output is then used to train the autoassociative networks of the Compression module. For the training, 195 scenes are used as training stimuli and 21 as generalization test stimuli. The learning rate is 0.01 and momentum 0.8. The networks are trained for 2000 epochs. The autoassociative network is able to learn both training stimuli (average RMS error of 0.019 and 0.014 for stripy and fish data respectively) and novel generalisation stimuli (average RMS error of 0.080 and 0.070 for stripy and white fish data). This permits a significant reduction of complexity of the 1200 output values of the visual module into only 30 compressed hidden activation values.

For the dual-route training, 195 scenes are used as training stimuli and 21 as generalization test stimuli. The learning rate is 0.001 and momentum 0.8. The networks were trained for 1000 epochs. Test results showed that this autoassociative network is able to learn both training stimuli (average RMS error of 0.051) and novel generalisation stimuli (average RMS error of 0.084). Such results show the ability of the whole model to map visual scenes of objects into vague linguistic quantifiers.

2.3 Do Language and Vision Bootstrap each Other?

The experimental results are consistent with the notion that language is grounded in perceptual processes (e.g., Barsalou, 1999). The other side of the coin from this is whether language affects non-linguistic processes, commonly referred to as the “Linguistic Relativity” hypothesis (Gumperz & Levinson, 1996). To investigate linguistic relativism in number and linguistic quantifier use, we carried out two sets of simulations. In the first set of simulations, we directly address the facilitator role of linguistic quantifiers (Q) in number estimation behaviour (N). Here we compared the simulation in which number estimation precedes quantifier judgements (Simulation 1: N-NQ) vs. the case in which quantifier judgements precede number estimation (Simulation 2: Q-NQ). In N-NQ, we first trained the network to learn to count fish using participants’ “number estimation” data (N) until the count error of novel scenes reaches 0.06. Subsequently, the network is also trained on the ratings of quantifiers whilst continuing the fish number estimation training (N&Q). In Q-QN, the network first learns to rate quantifiers up to the error of 0.06. Subsequently, the networks have to learn to estimate the number of fish whilst continuing to learn to rate quantifiers. When we compare the average number of learning trials needed to reach a 0.06 quantifier rating error in N-NQ vs. Q-QN, we respectively obtain 13493 vs. 11200 iterations. This suggests that in the second simulation (Q-QN) the prior linguistic knowledge of vague quantifiers facilitates (= requires less learning trials) the later acquisition of the ability to estimate the number of fish.

In the second study, we focussed on the role of number estimates as a facilitator in the later acquisition of linguistic quantifiers. Here we compared the simulation in which counting precedes quantifiers (Simulation 3: N-NQ, as above) vs. the case in which quantifiers are learned in parallel with numbers (Simulation 4: NQ). The training stops when the quantifier rating error of novel scenes is 0.06 . In the NQ only simulation, the network is trained from the beginning to count fish and rate quantifiers, until the quantifier rate error of novel scenes reaches a level of 0.06. When we compare the average number of learning trials needed to reach a 0.06 quantifier rating error in N-NQ vs. NQ, we respectively obtain 800 vs. 2200 iterations. This also suggests that prior knowledge of the estimated number of fish in a scene helps the later acquisition of vague quantifiers by requiring less learning trials. Overall, these simulations demonstrate the close interaction between knowledge of estimated number and the use of vague quantifiers. Using simulations in this novel way, the results in the number domain provide the first evidence that knowledge of number and language to describe number bootstrap each other.

 

 References

Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and Brain Sciences, 22(4), 577-660.

Bass, B. M., Cascio, W. F., & O’Connor, E. J. (1984). Magnitude estimations of frequency and amount. Journal of Applied Psychology, 53, 313-320.

Cangelosi, A., Coventry, K.R., Rajapakse, R., Bacon, A., Newstead S.N. (2005). Grounding language into perception: A connectionist model of spatial terms and vague quantifiers. In: Cangelosi, A., Bugmann, G., Borisyuk, R. (eds.), Modeling Language, Cognition and Action: Proceedings of the 9th Neural Computation and Psychology Workshop. World Scientific, Singapore

Cangelosi, A., Greco, A., Harnad S.: From robotic toil to symbolic theft: Grounding transfer from entry-level to higher-level categories. Connection Science, 12 (2000) 143-162

Coventry, K. R. & Garrod, S. C. (2004). Saying, Seeing and Acting: The Psychological Semantics of Spatial Prepositions. Essays in Cognitive Psychology Series. Psychology Press. Hove and New York.

Coventry, K. R., Cangelosi, A., Rajapakse, R., Bacon, A., Newstead, S., Joyce, D., Richards, L. V. (2005). Spatial prepositions and vague quantifiers: Implementing the functional geometric framework. In C. Freksa et al. (eds.), Spatial Cognition, Volume IV. Reasoning, Action and Interaction, pp 98-110. Lecture notes in Computer Science. Springer

Dehaene, S. & Changeux, J.P. (1993). Development of elementary numerical abilities: A neuronal model. Journal of Cognitive Neuroscience (vol. 5(4), pp. 390-407).

Gumperz, J.J. & Levinson S.C. (Eds.) (1996). Rethinking Linguistic Relativity. Cambridge: Cambridge University Press.

Hormann, H. (1983). Then calculating listener, or how many are einige, mehrere and ein paar (some, several and a few). In R. Bauerle, C. Schwarze, & A, von Stechow (Eds.), Meaning, use and interpretation of language. Berlin; De Gruyter.

Mandler, G. & Shebo, B.J. (1982). Subitizing: An Analysis of its Component Processes. Journal of Experimental Psychology: General (vol. 111, pp. 1-22).

Moxey, L. M., Sanford, A. J., & Dawydiak, E. J. (2001). Denials as controllers of negative quantifier focus. Journal of Memory and Language, 44, 427-442.

Moxey, L.M., Sanford, A.J. (1993). Communicating Quantities. A Psychological Perspective. Lawrence Erlbaum Associates; Hove, East Sussex (1993)

Newstead, S.N., Coventry, K.R. (2000). The role of expectancy and functionality in the interpretation of quantifiers. European Journal of Cognitive Psychology, 12(2), 243–259

Plunkett K., Sinha C., Møller M.F., Strandsby O. (1992). Symbol grounding or the emergence of symbols? Vocabulary growth in children and a connectionist net. Connection Science, 4: 293-312

Reyna, V. F. (1981). The language of possibility and probability: Effects of negation on meaning. Memory and Cognition, 9, 642-650.

Rodriguez, P., Wiles, J., Elman, J.L. (1999). A Recurrent Neural Network that Learns to Count. Connection Science (vol. 11(1), pp. 5-40).

Trick, L.M. & Pylyshyn, Z. (1993). What enumeration studies can show us about spatial attention: Evidence for preattentive processing. Journal of Experimental Psychology: Human Perception and Performance (vol. 19, pp. 331-351).

Zwaan, R.A. (2004). The immersed experiencer: toward an embodied theory of language comprehension. In: B.H. Ross (Ed.), The Psychology of  Learning and Motivation, Vol. 44 (pp. 35-62). New York: Academic Press.

 

 


 
 
 

 



[1] All the experiments used within-subject designs unless otherwise indicated and sample sizes varied from 30-40 participants per experiment. The data were analysed using within-subject analyses of variance unless otherwise specified. All results described in the text were statistically significant (with the a level set at p = 0.05).