|
RESULTS OF RESEARCH
The following is a
summary of the key advances from the experimental and modelling research.
For more details on the individual experiments, models and results, you can
read the articles published from this research and available in the publications page.
1. Introduction to Vague Quantifiers
2. Key Advances
2.1 Initial Theoretical
Developments and Experimental Findings
Early
on in the grant we became interested in the issue of why context effects
exist for quantifiers, and we became aware that the literature on context
effects does not provide adequate explanation of the origins of such
effects. Exploring the number judgment literature, studies have revealed at
least three strategies used by the brain: a fast and accurate processing of
small groups of four or fewer items in almost constant response time
(subitizing), a slow process of serial counting of more than five (less
than 9) objects, and a more error prone estimation process for larger
groups of objects (>9) (e.g. Trick & Pylyshyn, 1993; Mandler &
Shebo, 1982). We suspected that when participants are asked to produce or
comprehend linguistic descriptions
of scenes, they may base their linguistic judgments on the more error prone
estimation process (as it takes too much time to count objects), and
therefore we hypothesised that contextual variables that affect quantifier
judgements are precisely those variables that affect the error prone number
estimation process. So we predicted that quantifier comprehension and
production is based on estimated number rather than actual number,
and that the origins of context effects may reside in perceptual processes
(consonant with the work of Barsalou, 1999; Zwaan, 2004 and Coventry &
Garrod, 2004).
2.1.1 Experiments 1 – 3: Context effects and quantifiers judgements:
using sentences without pictures
The first experiments
in the grant did not involve the use of visual scenes at all. We hoped that
we would be able to establish the relative influence of a range of context
variables using sentences alone with careful pilot work to select objects
and materials. Experiment 1 used pretests to establish the estimated set
size, size, etc, for a long list of animals (e.g., pandas, horses, ants,
ladybirds, etc.). We were then able to select animals (NOUNS) controlled
for these variables in order to establish the relative extent to which these
variables affect the ratings of quantifiers to describe a given number of
those animals. For example, There are
15 pandas versus There are 15
bears controls for size while manipulating set size. As the set size for pandas is smaller
than the set size for bears, we expected that low magnitude quantifiers
would be more appropriate for the bears than for the pandas. The rating scales used for this
(and all later experiments) were of the form There are QUANTIFIER NOUNS, and participants had to rate the
appropriateness of the sentences to describe the number of objects in the
text (or in the pictures in later experiments), using a scale from 1-7
where 1 = totally
inappropriate and 7 = totally appropriate. The terms used included two low
(a few, few) and two high
magnitude quantifiers (many, lots of),
together with a mid-range magnitude quantifier (several), thus sampling a range of types of vague quantifiers
(consistent with those used in previous studies). The
results of the first experiment did not reveal any context effects, but on
debriefing it became clear that participants’ knowledge of animals was
extremely variable, making the manipulations less controlled than was
apparent from the pretests. Despite using a multi-level modelling approach
to the analysis in order to take this individual variability into account,
the data were not clean enough to draw any meaningful conclusions. In order
to alleviate these problems, Experiments 2 and 3 switched to using abstract
objects in order to provide participants with the data about set size, size
and other context variables. For example, participants were given the
following: ARGS live on the planet BOK. There are 1000 ARGS on BOK. They
are small and furry creatures. There are 20 ARGS. There are QUANT ARGS
(rated in relation to the previous sentence). Aside from a clean effect of
set size, the results showed no evidence of other context effects. This
appeared to be because participants were not paying attention to the
information given (despite pitching the experiment as a memory experiment
in Experiment 3), and/or because the information about number was given
explicitly in the text. This led us to believe that context effects may
have to do with estimation procedures used for real (or imagined scenes),
and therefore later experiments used real images rather than sentences with
numbers explicitly cued.

2.1.2 Experiments 4-8: Manipulating number of objects and a range of
contextual variables in visual scenes
The task involved participants rating how appropriate
sentences of the form There are
[QUANTIFIER] striped/white fish were to describe given pictures of fish
(whilst ensuring that the effects found for fish also occur across a wider
range of materials). The rating scales and quantifiers were the same as
those used in Experiments 1-3.
The scenes used varied the number of striped and white fish present. The
fish mentioned in the sentence to be rated will hereafter be calls the focus fish, and the other fish not
mentioned will be called the other
fish, but we varied whether the white fish or striped fish were the focus
fish. The number of focus fish varied from 3-18 in increments of three, and
the number of other fish varied from 0-18 in increments of three.
Participants were instructed that they had to rate how appropriate each
sentence is to describe each picture. Sentences were presented together under
each picture, always in the same order (from low to high magnitude
quantifiers). The order of pictures was randomized.
All the experiments manipulated the number of focus fish
and the number of other fish present. In addition Experiment 4 manipulated
the spacing between fish (spaced apart or close together). Experiment 5
manipulated grouping (fish mixed together or grouped), Experiment 6
manipulated both spacing and grouping, and Experiment 7 manipulated the
function of the fish (whether the focus fish and other fish were all
blowing bubbles, or whether only one group was shown blowing bubbles).
Figure 1 illustrated the first three manipulations.
Experiment 8 differed from Experiments 4-7 in that it
involved the use of abstract objects rather than real objects so that we
could manipulate set size in a controlled fashion (as in Experiments 2-3).
Using e-prime computer presentation, participants saw a picture showing all
the ARGS on the planet ZOG and then they saw a picture showing a number of
ARGS (3-18 as before), and had to judge the appropriateness of sentences of
the form The are QUANTIFIER ARGS to describe the picture.
The results of these experiments show consistent effects
of all the contextual variables on quantifier judgements. We therefore identified
a range of new context effects for vague quantifiers, as well as
delineating their relative importance and how they interact with each
other, and with number and quantifier.
2.1.3 Experiments 9-10: Number Judgement Experiments
Using the same pictures used in Experiments 4-8, we
wanted to establish if the same new contextual variables that affect
quantifier judgements also affect number judgements under time pressure for
the same scenes when presented for a short period of time. If so, this
would provide evidence that quantifier judgements may be due to estimates
of the number of objects present in the scenes to be described. Consistent
with the number literature reviewed above, when the number of objects
increases beyond a small number, we expected that the estimated numbers
given by participants would deviate from the actual number of objects
present. Furthermore, we were also interested to establish whether spacing,
grouping and set size also affect number estimates, and if so we would be
able to extend understanding of the error prone number estimation process
as well as tracing the origins of such effects for language comprehension.
Participants in Experiment 9 had the task of estimating how
many fish were shown in the scenes used in Experiments 4-7. Scenes were
presented in blocks, and at the start of each block participants were told
to estimate the number of either striped fish or white fish. Scenes were
randomized within blocks. Practice trials were given at the start of each
block to ensure participants were estimating the right type of fish, and a
reminder prompt was given at the beginning of test trials. For each trial,
a fixation cross was presented in the middle of the screen for 500msec
followed by the scene for 500msec followed by a mask (a chequered board)
presented with a space in it in which to type estimates. Participants
responded by typing in their estimated numbers using the computer keyboard.
Experiment 10 used the same methodology, but using the materials from
Experiment 8. The results showed clean effects of all the contextual
variables, consistent with the effects founds for quantifier ratings.
Therefore, we identified a correspondence between number estimates in visual
scenes and quantifier judgements for the first time.
2.1.4 Experiment 11: Counting versus estimating
If estimates of numbers of objects in a scene provide an
adequate explanation for context effects, we reasoned that asking
participants to count the number of fish in a visual scene prior to rating
quantifiers would result in accurate information about number, and
therefore would eliminate the context effects found in the previous
experiments. Using a between subjects design, we re-ran Experiment 6, but
with a counting condition. As predicted, context effects were found for the
non-counting condition, but disappeared for the counting condition.
2.1.5 Experiments 12-14 Varying the similarity of other objects to
focus set
Experiment
7 failed to find an effect of the similarity of the function of the focus
fish and the other fish present on quantifier judgements. Given that we
found effects of function for quantifiers previously (e.g., Newstead &
Coventry, 2000) as well as for other closed class categories (e.g.,
Coventry & Garrod, 2004), we were surprised by this. We therefore
re-ran Experiment 7 using materials involving stronger functional
relations. For example, the focus objects included men playing guitars, and
the other objects were either women playing guitars or woman not playing
guitars. In Experiment 12 we varied the number and similarity (same
function or no function) of the other objects. In Experiment 13 we varied
the similarity of species of the other objects as well as function (e.g., woman
playing guitars or monkeys playing guitars), and in Experiment 13 we varied
the grouping of other objects in addition to species similarity and
functional similarity. All of these variables affected quantifier
judgements, and furthermore the results suggest that similarity of both
form and function are important in determining the relevance of the other
objects in the scene.
2.1.6 Additional Experiments
Experiments
15-16 were replications of Experiments 6 and 8, only in the Italian
language. The results for Italian directly mirrored the results found for
English, indicating that the results we found for English are not language
specific.
2.2 Development of Computational Model and Simulation Results
2.2.1 Architecture of the model
The
computational model consists of a hybrid artificial vision-connectionist
architecture (Figure 2). The model has four main modules: (1) Vision
Module, (2) Compression Networks, (3) Quantification Network, and (4)
Dual-Route Network. This architecture is partially based on a previous
model on the grounding of spatial language (Coventry et al. 2005; Cangelosi
et al. 2005). The overall idea was to ground the connectionist and
linguistic representation of quantification judgments directly in input
visual stimuli.
The Vision
Module processes visual scenes involving varying quantities of two
types of objects: striped fish and white fish. It uses a series of
Ullman-type vision routines to identify the constituent objects in the
scene. The input to the Vision module consists of static images with the
two kinds of fish. The system must pay attention to striped fish, whilst
white fish are only used as distracters (or vice versa). The input images
are processed at a variety of spatial scales and resolutions for object
features yielding a visual buffer. The processing of each image results in
two retinotopically organized arrays of 30x40 activations (one per fish
type). The output of the vision module represents data of isotropic
receptive fields.
The Compression
Networks are needed to convert the output data from the vision module
into compressed neural representations of the input scene. This is to
reduce the complexity of the vision module output. Two separate
auto-associative networks are used, one for each of the object types in the
scene (stripy fish and white fish). Both networks have 1200 input and
output units, and 30 hidden units. The activation values of the hidden
units are utilized by the following networks to make quantification and
linguistic judgments. The compression network for each type of fish learned
to autoassociate all the stimuli with varying numbers of fish.
The Quantification
Network is a feedforward multi-layer perceptron trained to reproduce
the quantification judgments of the number of fish made by participants during
experiments (Experiment 9). Some simulations only focused on the estimation
of the number of focus fish, those the participants are asked to consider
when making quantification decisions (section 2.2.2). In other simulation
experiments, the same network has to estimate both sets of fish (section
2.2.3). In this case, the network has 60 input units (30 per compressed
fish type), 50 hidden nodes, and 2 output nodes. Each output node has a
modified activation function that produces activation values in the range 0
to 20, to include the actual range of 0 to 18 striped fish used in the
stimulus set.

Figure 2.
- Modular architecture of the model
The
fourth module consists of a Dual-Route Neural Network. This
architecture combines visual and linguistic information for both linguistic
production and comprehension tasks (Cangelosi et al. 2000; Plunkett et al.
1992). This is the core linguistic component of the model, as it integrates
visual and linguistic knowledge to produce a description of the visual
scene. The network receives, in input, information on the scene through the
activation values of the compression (and quantification) networks’ hidden
units. It then produces, in output, judgments regarding the appropriateness
ratings for the quantifier terms describing the visual scene. The
activation values of the linguistic output nodes correspond to rating
values given by participants for the five quantifiers considered: a few, few, several, many, lots of.
After training with data from psycholinguistic experiments, the network is
capable of producing two different outputs: (1) acceptability ratings for
quantifiers given only the vision inputs (language production) and (2)
imaginary output pictures, given only a description of the scene in terms
of quantifiers (comprehension). Results of the simulation on the production
route (predicted ratings for the quantifiers) can be compared to the actual
ratings from experiments with human participants.
2.2.2 Simulations on Psychological Number Judgements
Two
main directions of research in connectionist counting modelling can be
identified in the literature. The first approach focuses on the tasks of
learning number sequences and sequential series (e.g. Rodriguez et al.,
1999). The second approach has instead focused on the tasks of counting of
the number of objects in input visual scenes (e.g. Dehaene & Changeux,
1993). In the computational model we have developed, we draw a distinction
between the actual number of objects in a scene presented for description,
and “psychological number” (estimation of numbers as in Experiments 9-10
above).
Here we
report data on the compression autoassociative networks and the
quantification experiments. Results with the autoassociative networks show
that the model is able to learn both training stimuli (average RSM error of
0.04) and novel generalisation stimuli (average RSM error of 0.081). This
permits a significant reduction of complexity of the 1200 output values of
the visual module into only 25 compressed hidden activation values. Results
with the quantification networks also are good. Networks have an average
training error of 0.042 (= 0.02% considering the single output node with
activation range 0-20) and generalisation error of 1.56 (=8%).
The
quantification network already provides useful insights for the
identification of important factors in psychological quantification. For
example, the analysis of the hidden activation of the quantifier network
highlights the factors that play a major role in the production of
quantification judgments, and compares these to data from psychological
experiments. We have carried out some analyses of the hidden layer
activations of the network, using principal components factor analysis
(PCA). These indicate that the networks use both the information on the
number of fish and the different spacing between fish. The first factor,
that explains 59.43% of variance, clearly groups the hidden representation
of scenes by the number of fish in it. The second factor groups stimuli by
the two sizes of inter-fish distances, explaining 14.41% of the variance. The relevance of these two mechanisms is
consistent with the results of psychological experiments, where both the
number and the spacing factors are statistically significant. However, in
contrast to the empirical data, the network does not seem to use the
information on the grouping of fish (separate groups vs. mixed stripy/white
fish). The PCA only shows a marginal effect on the placement of the stripy
fish when these are separated by the top/bottom position. A reason for the
lack of the effects of grouping in the network hidden data could be
explained by the fact that the network only processes the stripy fish data
(for these simulations). As a result, “grouping” does not mean much to the
network, except for the cases in which the grouping causes the placement of
all the stripy fish in the top or the bottom part of the scene. Further
investigation of the grouping effects therefore required the integration of
both stripy and white fish in the same quantification network.
2.2.3 Simulations on Quantifier Ratings
In the
simulation on the ratings of linguistic quantifiers, different combinations
of the architecture described in 2.2.1 were used. The Quantification
network could be included, or not, depending on the theoretical assumption
on the role of number estimation during linguistic quantifier rating. For
example, when only the compression network output is used, the dual-route
network will require 60 input visual units and 5 input linguistic nodes,
one for each of the 5 quantifiers. The 60 visual nodes corresponded to the
30 hidden units of the two compression networks (of stripy fish and white
fish). The output layer has the same number and type of units as those in
the input layer.
The
model uses as input stimuli to the vision module 216 scenes used in
quantification experiments with participants (Experiments 4-7). The 216
scenes are first presented to the vision module. Its output is then used to
train the autoassociative networks of the Compression module. For the
training, 195 scenes are used as training stimuli and 21 as generalization
test stimuli. The learning rate is 0.01 and momentum 0.8. The networks are
trained for 2000 epochs. The autoassociative network is able to learn both
training stimuli (average RMS error of 0.019 and 0.014 for stripy and fish
data respectively) and novel generalisation stimuli (average RMS error of
0.080 and 0.070 for stripy and white fish data). This permits a significant
reduction of complexity of the 1200 output values of the visual module into
only 30 compressed hidden activation values.
For the
dual-route training, 195 scenes are used as training stimuli and 21 as
generalization test stimuli. The learning rate is 0.001 and momentum 0.8.
The networks were trained for 1000 epochs. Test results showed that this
autoassociative network is able to learn both training stimuli (average RMS
error of 0.051) and novel generalisation stimuli (average RMS error of
0.084). Such results show the ability of the whole model to map visual
scenes of objects into vague linguistic quantifiers.
2.3 Do Language and Vision
Bootstrap each Other?
The
experimental results are consistent with the notion that language is
grounded in perceptual processes (e.g., Barsalou, 1999). The other side of
the coin from this is whether language affects non-linguistic processes, commonly
referred to as the “Linguistic Relativity” hypothesis (Gumperz &
Levinson, 1996). To investigate linguistic relativism in number and
linguistic quantifier use, we carried out two sets of simulations. In the
first set of simulations, we directly address the facilitator role of
linguistic quantifiers (Q) in number estimation behaviour (N). Here we
compared the simulation in which number estimation precedes quantifier
judgements (Simulation 1: N-NQ) vs. the case in which quantifier judgements
precede number estimation (Simulation 2: Q-NQ). In N-NQ, we first trained
the network to learn to count fish using participants’ “number estimation”
data (N) until the count error of novel scenes reaches 0.06. Subsequently,
the network is also trained on the ratings of quantifiers whilst continuing
the fish number estimation training (N&Q). In Q-QN, the network first
learns to rate quantifiers up to the error of 0.06. Subsequently, the
networks have to learn to estimate the number of fish whilst continuing to
learn to rate quantifiers. When we compare the average number of learning
trials needed to reach a 0.06 quantifier rating error in N-NQ vs.
Q-QN, we respectively obtain 13493 vs. 11200 iterations. This suggests that
in the second simulation (Q-QN) the prior linguistic knowledge of vague
quantifiers facilitates (= requires less learning trials) the later
acquisition of the ability to estimate the number of fish.
In the second study, we focussed on the role of number
estimates as a facilitator in the later acquisition of linguistic
quantifiers. Here we compared the simulation in which counting precedes
quantifiers (Simulation 3: N-NQ, as above) vs. the case in which
quantifiers are learned in parallel with numbers (Simulation 4: NQ). The
training stops when the quantifier rating error of novel scenes is 0.06 .
In the NQ only simulation, the network is trained from the beginning to
count fish and rate quantifiers, until the quantifier rate error of novel
scenes reaches a level of 0.06. When we compare the average number of
learning trials needed to reach a 0.06 quantifier rating error in N-NQ vs.
NQ, we respectively obtain 800 vs. 2200 iterations. This also suggests that
prior knowledge of the estimated number of fish in a scene helps the later
acquisition of vague quantifiers by requiring less learning trials.
Overall, these simulations demonstrate the close interaction between
knowledge of estimated number and the use of vague quantifiers. Using
simulations in this novel way, the results in the number domain provide the
first evidence that knowledge of number and language to describe number
bootstrap each other.
References
Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral
and Brain Sciences, 22(4), 577-660.
Bass, B.
M., Cascio, W. F., & O’Connor, E. J. (1984). Magnitude estimations of
frequency and amount. Journal of Applied Psychology, 53, 313-320.
Cangelosi, A., Coventry, K.R., Rajapakse, R., Bacon,
A., Newstead S.N. (2005). Grounding language into perception: A
connectionist model of spatial terms and vague quantifiers. In: Cangelosi,
A., Bugmann, G., Borisyuk, R. (eds.), Modeling Language, Cognition and
Action: Proceedings of the 9th Neural Computation and Psychology Workshop.
World Scientific, Singapore
Cangelosi, A., Greco, A., Harnad S.: From robotic toil
to symbolic theft: Grounding transfer from entry-level to higher-level
categories. Connection Science, 12 (2000) 143-162
Coventry, K. R. & Garrod, S. C.
(2004). Saying, Seeing and Acting: The Psychological Semantics of
Spatial Prepositions. Essays in Cognitive Psychology Series. Psychology
Press. Hove and New York.
Coventry, K. R., Cangelosi, A.,
Rajapakse, R., Bacon, A., Newstead, S., Joyce, D., Richards, L. V. (2005).
Spatial prepositions and vague quantifiers: Implementing the functional
geometric framework. In C. Freksa et al. (eds.), Spatial Cognition,
Volume IV. Reasoning, Action and Interaction, pp 98-110. Lecture notes
in Computer Science. Springer
Dehaene, S. & Changeux, J.P. (1993). Development of
elementary numerical abilities: A neuronal model. Journal of Cognitive
Neuroscience (vol. 5(4), pp. 390-407).
Gumperz, J.J. & Levinson S.C. (Eds.) (1996). Rethinking
Linguistic Relativity. Cambridge: Cambridge University Press.
Hormann, H. (1983). Then calculating listener, or how
many are einige, mehrere and ein paar (some, several and a few). In R.
Bauerle, C. Schwarze, & A, von Stechow (Eds.), Meaning, use and
interpretation of language. Berlin; De Gruyter.
Mandler, G. & Shebo, B.J. (1982). Subitizing: An
Analysis of its Component Processes. Journal of Experimental Psychology:
General (vol. 111, pp. 1-22).
Moxey,
L. M., Sanford, A. J., & Dawydiak, E. J. (2001). Denials as controllers of
negative quantifier focus. Journal of Memory and Language, 44,
427-442.
Moxey, L.M., Sanford, A.J. (1993). Communicating
Quantities. A Psychological Perspective. Lawrence Erlbaum Associates; Hove, East Sussex (1993)
Newstead, S.N., Coventry, K.R. (2000). The role of
expectancy and functionality in the interpretation of quantifiers. European
Journal of Cognitive Psychology, 12(2), 243–259
Plunkett K., Sinha C., Møller M.F., Strandsby O. (1992).
Symbol grounding or the emergence of symbols? Vocabulary growth in children
and a connectionist net. Connection Science, 4: 293-312
Reyna, V. F. (1981). The language of possibility and
probability: Effects of negation on meaning. Memory and Cognition,
9, 642-650.
Rodriguez, P., Wiles, J., Elman, J.L. (1999). A
Recurrent Neural Network that Learns to Count. Connection Science
(vol. 11(1), pp. 5-40).
Trick, L.M. & Pylyshyn, Z. (1993). What enumeration
studies can show us about spatial attention: Evidence for preattentive
processing. Journal of Experimental Psychology: Human Perception and
Performance (vol. 19, pp. 331-351).
Zwaan, R.A. (2004). The immersed experiencer: toward an
embodied theory of language comprehension. In: B.H. Ross (Ed.), The
Psychology of Learning and
Motivation, Vol. 44 (pp. 35-62). New York: Academic Press.
|