|
RESULTS OF RESEARCH
The following is a
summary of the experimental and modelling research results obtained during
the project. For more details on the individual experiments, models and
results, please visit the publications page.
§
1.
Initial Theoretical Developments and Experimental Findings
§
2.
Development of Computational Model
§
3.
Interplay between Experimental and Computational Work
§
References
1 Initial
Theoretical Developments and Experimental Findings
Given the myriad findings in the literature
regarding the influence of extra-geometric variables, it was important to
review the findings, classify them into types of extra-geometric
influences, and develop a framework in which to understand these
influences. The “functional geometric framework” that emerged from this
work early on in the grant (see Coventry & Garrod, 2004; in press),
formed the basis for the generation of the experimental work reported here,
and provided theoretical and empirical constraints for the modelling work
outlined below. An edited volume on spatial language from cognitive and
computational perspectives also emerged partly from the early work in the
grant (see Coventry & Olivier, 2002). We first overview the main
results from the experimental work, prior to outlining the model.
2.1.1 Experiments 1 – 4: Object
features/object function and Over/Under/Above/Below
The starting point for the experiment work was to
examine the relative influence of geometric and extra-geometric factors on
the comprehension of over, under, above and below. These terms were a
particularly useful starting point as they have been shown to be
differentially affected by geometric and extra-geometric relations (Coventry
et al., 2001); the comprehension of over/under is more affected by
functional relations than the comprehension of above/below while conversely
the comprehension of above/below is more influenced by geometric relations
than the comprehension of over/under.
Experiments 1-4 addressed the issue of the
relative extent to which the weightings for geometry and function are
driven by individual prepositions and lexical entries for nouns versus
information in the visual scene about how the objects are functioning. For
example, a large golf umbrella affords greater protection from rain, while
an umbrella full of holes diminishes the object’s usefulness as a protector
from rain. In Experiments 1 and 2, the scenes of the type used on the left
hand side of Figure 1 were compared with scenes involving larger objects in
the same position (centre-of-mass controlled, Experiment 1), or with
objects the same size but full of holes (thus compromising their protection
function, Experiment 2). The methodology used for these experiments (and
the other experiments unless otherwise indicated) involved the presentation
of pictures together with sentences of the form The located object is
preposition the reference object, and the task for participants was to rate
the appropriateness of each sentence to describe each picture using a
Lickert scale (range from 1 = totally unacceptable to to 9 = totally
acceptable). For the first experiment , increasing the size of the
protecting object was found to increase the size of the function effect for
all four prepositions, and effectively reduce the differences between
prepositions. In other words, as the function depicted by an object in a
scene becomes greater, function tends to determine acceptability for all
four prepositions. In contrast, in the second experiment, the function
effect was much diminished for the objects with the holes in them as
expected given that the rain in the “functional” condition when the
umbrella has holes will still pass through the umbrella and wet the man.
Furthermore, for Experiment 2, we also asked a second group of participants
to estimate the percentage of rain that was likely to make contact with the
person for each scene used. This allowed a direct examination of the
relationship between judgements about what will happen in the scenes and
acceptability for spatial prepositions. We found a significant correlation
between judgements and acceptability for over, under, above and below
overall (r = -0.52), although the correlation was higher for over/under (r
= -0.78) than for above/below (r = -0.33) as expected.
Experiments 3 and 4 involved the same
manipulations as Experiments 1 and 2, only this time the scenes used
involved conflicts between reference frames, manipulated by rotating the
man in Figure 1 rather than the protecting object (see for example the
Viking/shield/spear scenes in Figure 1). Rotating the Viking away from the
vertical place (scenes in the penultimate and last columns in Figure 1),
for example, produces conflicts between the absolute (gravitational) and
intrinsic (object-centred) frames of reference. Thus for these scenes the
shield is above the Viking is appropriate for the absolute frame of
reference but inappropriate for the intrinsic frame of reference.
Experiment 3 manipulated the size of the protecting object and Experiment 4
manipulated the completeness of the protecting object. The results for
these experiments were consistent with the results from Experiments 1 and
2. Increasing the size of the object magnified the effect of the function
of the object, while adding holes to the object diminished the size of the
function effect. Additionally, frame of reference conflicts were more in
evidence for above and below than for over and under. Finally, for
Experiment 4 we also asked a second group of participants to estimate the
percentage of rain, for example, that was likely to make contact with the
person for each scene. Correlations between these judgements and ratings
for over, under, above and below mirrored the results for Experiment 2.
Summary of Results. Changing the degree of
protection function affects acceptability ratings. Furthermore, predictions
of what will happen in a scene correlated with acceptability judgements
(but more so for over/under than for above/below). These results indicate
that processing of how objects are functioning in context in the visual
scene being is essential to establish the appropriateness of over, under,
above and below, consistent with the work of Glenberg (1997) and Barsalou
(1999). Please note that Experiments 1-4 are in submission to Journal of
Memory and Language (Coventry et al., in submission).
2.1.2 Experiments 5-10: The time
course of processing of geometric and extra-geometric information
Experiments 1-4 used acceptability ratings as a
dependent variable. Such a measure does not give information regarding the
time-course of the processing of information present in the scene to be
described. For that reason, we ran a series of sentence-picture
verification tasks where participants were presented with sentences of the
form The located object is preposition the reference object followed by a
picture. Participants had to indicate as quickly as possible whether a
sentence was a correct description of the picture that followed.
Experiments 5 and 6 used materials such as those displayed in Figure 1 (but
modified to control for visual complexity, etc.). Experiments 7 and 8
compared small and large objects where reference frames were always aligned
(Experiment 7) or where they conflicted to varying degrees (Experiment 8).
Experiments 9 and 10 mirrored Experiments 7 and 8, only this time complete
and incomplete objects were compared.
Summary of Results. For all experiments, the data
for the mean number of true responses replicates the rating data analyses in
Experiments 1-4. In relation to the time taken to make a true response, the
data were informative about the speed of processing of geometry and
function. Quicker responses were found for functional than for
non-functional scenes, and quicker responses were found for scenes where
the located and reference objects were aligned than when they were
misaligned. Please note that
Experiments 5-10 are in preparation for journal submission (Coventry et
al., in preparation).

Figure 1.
Sample scenes used Coventry, Prat-Sala and Richards (2001)
2.1.3 Experiments 11-15:
Experiments for the model
Given the constraints imposed by the visual
processing modules in the computational model we outline below, we ran a number
of experiments using images/movies generated which would be easy to process
for the model (see Figure 2), using the same basic methodology as
Experiments 1-4. Experiments 11-15 involved three different reference
objects (a plate, a dish and a bowl) pre-tested in a sorting task and a
rating task to be the prototypical dimensions of these objects, and a
variety of other objects which were all containers (e.g., a jug). Each
container was presented in each of 3 x 2 positions “higher” than the other
objects (representing 3 levels of distance on the x axis and two levels on
the y axis from the other object). Crucially the container was shown to
pour liquid such that it ended up reaching the plate/dish/bowl (the
functional condition), or missed the plate/dish/bowl (non-functional
condition), or liquid was not present. In Experiment 11 participants saw
movies of the pouring scenes (or static scenes for the no liquid condition
given that no movement was involved). The results showed effects of
geometry and function together with interactions between these variables
and over/under versus above/below. Experiment 12 compared the full movies
with just the (single frame) end states, and this established that seeing
the full movie makes no difference to acceptability ratings, it is what
happens to the liquid that counts. Experiment 13 then compared end states
to an earlier frame in the movie showing the liquid starting to protrude
from the pouring container in order to assess whether participants predict
what will happen to the liquid in order to make judgements about the
appropriateness of over/under/above/below. Although acceptability ratings
were overall lower for the predicted scenes rather than the end state
scenes, effects of geometry, function and interactions between these
variables and over/under versus above/below were still present, indicating
that participants do predict where the liquid will go in order to ascertain
the appropriateness of these prepositions. Experiment 14 confirmed this by
finding a correlation between judgements of how much of the liquid will
make contact with the appropriate part of the plate/dish/bowl and
acceptability ratings for the over/under/above/below.

Figure 2. An
example of frames taken for a movie sequence for a functional scene in
Experiments 11-13.
Experiment 15 tested the relative importance of
geometric and extra-geometric variables for in/on/over/above. The scenes
used involved a solid on top of a pile of other solids in/on a
plate/dish/bowl, and location control was manipulated such that the located
and reference objects were shown to move together at the same rate (strong
location control condition), or the located object was shown to move
independently of the plate/dish/bowl while still remaining in contact with
the other solids in/on the plate/dish/bowl (non-location control
condition), or the scene was presented statically. The geometry of the
scene was also manipulated by varying the height of pile of objects in/on
the plate/dish/bowl. Results showed that both geometry and degree of
location control affect the appropriateness of in and on the describe
scenes, and these variables interacted with the type of reference object
(plate/dish/bowl; objects are more likely to be on a plate and in a bowl
than vice versa).
Summary of Results. The results from these
experiments show that actual movement or predicted movement over time
influence judgements for over/under/above/below/in/on, but that this
information is weighted as a function of preposition, and of the objects
(plate/dish/bowl) in the scene. These data were used in the modelling work
described in section 2.2.
2.1.4 Additional Experiments
In addition to the experiments run, in
collaboration with additional researchers we ran a number of further
studies which included the following;
(a) We also ran Experiments 1-4 in Spanish
revealing similar effects to English (see Coventry & Guijarro-Fuentes,
2004).
(b) We found the same effects of location control
and geometry found in Experiment 15 using a production methodology with
children aged from 4;1 to 7;1 (see Richards et al., 2004; Richards &
Coventry, in press).
(c) We also found similar influences of geometric
and extra-geometric variables on the time taken to arrange objects given a
set of spatial instructions (see Coventry et al., 2003), but weaker
influences of these variables on “mental” versions of similar problems (see
Coventry et al., 2002).
(d) We also ran 7 experiments using abstract two
dimensional shapes (following the experiments of Regier and Carlson, 2001).
These experiments examined the influence of the shape of the located object
and of the reference object on the acceptability of over/under/above/below.
The analyses (involving multi-level modelling) are still in progress.
2
Development of Computational Model
The computational model for the processing of
visual scenes and the identification of the appropriate spatial preposition
consists of three main modules: (1) Vision Processing, (2) Elman Network,
(3) Dual-Route Network (cf. Figure 3). The first module uses a series of
Ullman-type visual routines to identify the constituent objects of a visual
scene (reference object, located object and liquid). The Elman network
module utilises the output information from the vision module to produce a
compressed neural representation of the dynamics of the scene (e.g.
movement of liquid flow between the reference and located objects). This
compressed representation is given in input to the dual-route (vision and
language) feedforward neural network to produce a judgement regarding the
appropriate spatial terms describing the visual scene. We describe each of
these modules and their development in turn.

Figure 3.
Architecture of the computational model. The dotted arrows indicate
functional connections between the three modules.
2.2.1 Vision processing module
In our computational model for spatial language,
visual object recognition, spatial location and motion information are
functionally necessary for the cognitive task. Beginning with the
distinction between “what” versus “where” pathways (classically assumed to
be the functionally segregated dorsal and ventral streams after Ungerleider
and Mishkin, 1982), we also needed to consider the integration of object,
location and motion integration when deriving a neurocomputational model.
Our novel neurocomputational approach to object recognition for spatial
cognition represents a compromise between the dynamic operation of the
recurrent neurodynamical models of Deco and Lee (2001) for selective
attention, and Edelman’s (1999) feedforward chorus model for object
recognition, and is conceptually congruent with Ballard et al’s (1997)
model (i.e. the output of our system is a plausible deictic pointer to
objects in the visual scene). Image sequences (real object images composed
into moving videos) are presented to the model, which processes them at a
variety of spatial scales and resolutions for object form and motion
features yielding a visual buffer (functionally analogous to processing in
the striate visual cortex). In addition to the basic scale representation,
texture, edge and region boundary features are extracted. Motion cells (in
the magnocellular pathway) are modelled as uni-directional brightness
gradient-sensitive cells whose outputs are combined. This is outlined in Figure
4.
The attentional saliency map (Figure 4, Right) is
a very low resolution (retinotopic) array of neurons which receive
bottom-up activation from the static and motion features in the visual
buffer, but which can be strongly inhibited when the region they code for
is attended to or when object recognition is strong enough to require
little further processing of a region. This represents information
integration that might take place involving the kinds of information
processed in the posterior parietal cortex. This is used to direct
attention and once a region is selected (analogous to a kind of spotlight
of attention), the higher-resolution information contained in the visual
buffer is allowed to feedforward to the object recognition stream. Since
attention selects only a windowed region of the whole visual buffer for
processing in IT, our system represents a chorus of object fragments. We
use Gaussian adaptive resonance models to learn the space of fragments for
each object (Williamson, 1996), leading to a probabilistic
implementation.

Figure 4. Left: Constituents of the Vision Processing
Module and their relationships with known neural substrates. Right (Top):
Snapshots of the overall saliency map after 9 fixations. Right (Bottom): Multiple
Fragments of Teapot Object (A) Full visual buffer (B) Edges (C)
Region/Boundary and (D) Texture
In the ICONIP02
conference publication (Joyce et al. 2002), we elaborate on the visual
processing and selective attention mechanism and its role in a novel chorus
of fragments framework for object recognition. We show how this may form part of a larger system for spatial
language comprehension and speculatively for prefrontal cortex short term
visual memory and object-place binding (via the perirhinal – entorhinal –
hippocampal network), all of which further ground the understanding of the
visuo-spatial processing in a computational framework.
2.2.2
Elman network module
This module consists of
a predictive, time-delay connectionist network similar to Elman’s (1990)
simple recurrent network, which we refer to hereafter as the Connectionist
Perceptual Symbol System Network (CPSSN; Joyce et al., 2003). Figure 3, middle image, shows the CPSSN
network as an Elman SRN. As a
suitable (and plausible) input representation for the CPSSN, we propose a
“what+where” code (see also Edelman, 2002). That is, the input consists of
an array of some 9x12 activations (representing retinotopically organised
and isotropic receptive fields) where each activation records some visual
stimulus in that area of the visual field. This is the output information
produced by the Vision module. In addition to the “field” representation,
we augment a distributed object identity code. These codes were produced by
an object representation system (Joyce et al. 2002; based on Edelman’s
(1999) theory) using the same videos.
The CPSSN is given one set of activations as input which feedforward
to the hidden units. In addition, the previous state of the hidden units is
fed to the hidden units simultaneously (to provide a temporal context viz.
Elman’s (1990) SRN model). The hidden units feedforward producing an output
which is a prediction of the next sequence item. Then, using the actual
next sequence item, back propagation is used to modify weights (see Figure
3) to account for the error. The actual next sequence item is then used as
the new input to predict the subsequent item and so on. Using the coding
scheme discussed, we have a total input vector of length 116 (where 8 of
these 116 elements code for each object, e.g. liquid, bowl, cup etc.). The
output is similarly dimensioned, and there were 20 hidden units (and 20
corresponding time-delayed hidden state nodes) to represent movement of the
liquid.
The network training
regime was as follows: a collection of sequences are shown to the network
in random order (but of course, the inputs within a sequence are presented
one after another). Each sequence contains a field and object code for the
“liquid” in the videos. Multiple CPSSN networks would be required to
account for the other objects in the scenes. A root-mean-square error measure is used to monitor the
network’s performance, and the ordering of sequences is changed each time
(to prevent destructive interference between the storage of each sequence).
Initially, the network is trained with a learning rate of 0.25, and after
the RMS error stabilises, this is reduced to 0.05 to allow finer
modifications to weights. For 6 sequences, a total of about 150
presentations are required (each sequence is therefore presented 25 times)
to reduce RMS averaged over the whole training set from around 35 to around
0.4.
It is quite obvious
that this network is hetero-associating successive steps in the sequence of
fields, but in addition, the network is performing compression and
redundancy reduction (in the hidden layer) as well as utilising the state
information in the time-delayed state nodes. It is also coding for the
changes between sequence items (e.g. the dynamics of how the object moves
over time) rather than coding individual sequence items (which would be
auto-association). The model
embodies the idea that representation is inherently dynamic (cf. Freyd,
Pantzer & Cheng, 1988). The network should, naturally, be able to make
a prediction about a sequence given any item in the sequence. Intuitively,
the network should be capable of this in the case where a cue is the first
item of a sequence, since the time-delayed state is irrelevant (i.e., there
can be no temporal context accumulated in the time-delay nodes). However,
we propose that the network is a mechanism for implementing perceptual
symbols, and therefore, a requirement is that it can “replay” the
properties of the visual episode that was learned. Given a cue, the network
should produce a prediction, which can be fed-back as the next input to
produce a sequence of “auto-generated” predictions about a sequence (viz, a
perceptual symbol). Indeed, this network is able to predict the final
outcome of the visual scenes. Prediction data were reported in the ICCM
conference publication (Joyce et al., 2003). These were also used in the
final part of the project, to study the predictive ability of the overall
computational model.
2.2.3
Dual-route network
The dual-route network
is a feedforward neural network (3-layer perceptron) that receives in input
the grounded “visual” information (hidden activations of the Elman
networks) and linguistic data (name of located object, name of reference
object, name of liquid + 4 spatial
prepositions over, above, below, under). In output it must reproduce
(auto-associate) the same visual data, and produce the names of object,
which are directly grounded in the input visual data. In addition, the four
output units for the spatial prepositions will encode the rating values given
by subjects. This architecture is directly inspired by dual-route networks
for the grounding of language (Plunkett et al., 1992; Cangelosi et al.,
2000).
This network is trained
via the error backpropagation algorithm. The training and test sets consist
of the 216 scenes. These are the same as those used in the experiment on
the rating of over, above, under, below (Experiment 11 above). Of these
stimuli, 195 are used for the training and 21 for the generalisation test.
The overall objective of the training is that the network must learn to
produce the same average ratings for the four prepositions. We did not use
the average ratings as the teaching input, because this was against the
principle of mutual exclusivity (Markmann 1987). During standard backpropagation
training, the use of the ratings as teaching input assumes that the same
scene must be simultaneously associated to the use of all four prepositions
(each with an activation value proportional to the subjects’ average
rating). Instead, during developmental learning subjects tend to choose
only one preposition to describe a scene. Naturally, the probability of
choosing one preposition to describe a spatial relation is correlated to
its level of appropriateness (i.e. similar to ratings). Therefore, to
simulate such a learning strategy better, the original ratings of each
scene-preposition pair were converted into frequency of presentation of a
stimulus with an associated localist teaching input (where the output unit
of the chosen preposition is 1 and the other three units are 0). To obtain
such a frequency, the original average ratings were scaled and normalised
within each scene and also within the whole training set. For example,
individual prepositions’ ratings of 7.08 (above), 7.12 (below), 3.96
(over), 4.32 (under) respectively correspond to presentation frequencies of
28, 28, 7 and 9. The conversion of ratings into preposition resulted in an
epoch of 2100 stimuli.
Three networks were
trained using different initial random weights and different random sets of
21 generalisation test stimuli . The training parameters included a
learning rate of 0.01 and momentum of 0.8, and a total number of training
epochs of 500. The average final error (RMS) for the 30 vision units was
0.008 for both training and testing data, and 0.003 for the 6 output units
of the object names. More importantly, for the 4 spatial preposition output
units, the error was 0.044 with training data and was 0.05 with
generalisation data. The error values in the preposition units were
calculated off-line by comparing the actual output of the 4 preposition
units and the rating data (from Experiment 11) converted to produce the
stimulus frequencies (the actual error values used for the weight
correction are always higher because they use localist teaching input).
These results clearly indicate that the networks produce rating values
similar to that of experimental subjects. They also indicate that the
training algorithm based on presentation frequency, instead of rating
teaching input, works well and provides a psychologically-plausible
learning regime. Similar results have also been found for in/on/over/above
using the scenes and data from Experiment 15.
3
Interplay between Experimental and Computational Work
During the research, the development of the
computational model has been conducted in parallel with experimental
investigations. However, in the first part of the project the experimental
work has mostly influenced the model design. For example, in the previous
section we explained that the training/test stimuli and the rating values
were directly taken from one experiment. In the final months of the
research, it was the model that directed some of the directions and
objectives of the experimental investigation. In particular, new simulations
produced some predictions that were subsequently tested in new experiments.
Research on the design and test of the Elman
module had shown that these networks were able to predict and auto-generate
the final outcome of the visual scenes, once they were given an initial cue
(e.g few initial frames). The network would produce the next prediction
frames, which were fed-back as the next input. To integrate such prediction
ability in the overall spatial language model, the hidden activation values
of these auto-generated sequences were used as visual input of the
dual-route network. The model was then run as usual to produce the ratings
of the 4 prepositions.
To establish if the new ratings provided by the
model were consistent with those produced by real subjects, a new
experiment was conducted (Experiment 13, see above). The results for this
experiment, together with the results of Experiment 14, strongly suggest
that subjects had to mentally “play” the visual scene and auto-generate the
outcome of the scene to rate the linguistic utterance. This is very similar
to what the model does, when the Elman network autogenerates the visual
scene, and the dual-route network uses the Elman net’s activations to
produce new ratings. The Elman network used the first 3 out of 7 frames.
This corresponds to the frames 0, 10 and 20 (Elman networks only see a
frame every 10). The comparison of the subjects’ rating data and the
networks’ output of the 4 prepositions resulted in an RMS error of 0.051
(Figure 5). This is a very low error level, and confirms that the model had
predicted very accurately the ratings. Overall, this result and those on
the dual-route tests support the development of a psychologically-plausible
model for spatial language.
A paper which combines the results of Experiments
11-16 with a comprehensive outline of then development and testing of the
full model is in preparation for submission to Cognitive Psychology
(Coventry et al., in preparation).

Figure 5 - Output of the Elman network for the auto-generated
prediction of the outcome of the liquid. The top 7 figures correspond to
the actual output of the network. The frames in the bottom row indicate the
final 4 target frames (the network receives the first 3 target frames).
References
Ballard, D. H., Hayhoe, M. M., Pook, P. K. and Rao,
R.P.N. (1997) Deictic Codes for the Embodiment of Cognition. Behavioural
and Brain Sciences, Vol. 20, pp. 723-767.
Barsalou, L. W. (1999). Perceptual symbol systems.
Behavioral and Brain Sciences, 22(4), 577-660.
Cangelosi A., Coventry K.R. et al. (in preparation),
Grounding spatial language in perception. 9th Neural Computation and
Psychology Workshop: Modelling Language, Cognition and Action
Cangelosi A., Greco A. & Harnad S. (2000). From
robotic toil to symbolic theft: Grounding transfer from entry-level to
higher-level categories. Connection Science, 12(2), 143-162
Cangelosi A., Martinez G.C. (2001). Neural networks for
spatial language processing using virtual reality. World Congress on
Neuroinformatics: Part II Proceedings, ARGESIM Verlag.
Coventry K.R., Cangelosi A. et al. (in preparation), A
computational and experimental model for the geometric functions framework
in spatial language. Cognitive Psychology
Coventry K.R., Richards L., Joyce D. & Cangelosi A. (in
preparation), Towards a psychological plausible model of spatial language
processing. Journal of Memory and Language
Coventry, K. R. & Garrod, S. C. (2001). Towards the
development of a psychologically-plausible model for spatial language
comprehension embodying geometric and functional relations. Proceedings of
the 2nd Annual Language and Space workshop: Defining Functional and Spatial
Features. University of Notre Dame, Indiana.
Coventry, K. R. & Garrod, S. C. (2004). Saying,
Seeing and Acting: The Psychological Semantics of Spatial Prepositions.
Essays in Cognitive Psychology Series. Psychology Press. Hove and New York.
Coventry, K. R. & Garrod, S. C. (in press). Spatial
prepositions and the functional geometric framework. Towards a
classification of extra-geometric influences. In L. A. Carlson & E. van
der Zee (Eds.), Functional features in language and space: Insights from
perception, categorization and development. Oxford University Press.
Coventry, K. R. & Guijarro-Fuentes, P. (2004). Las
preposiciones en español y en inglés: la importancia relativa del espacio y
función. (Spatial prepositions in Spanish and English: the relative
importance of space and function). Cognitiva, 16(1), 73-93.
Coventry, K. R. & Olivier, P. (Eds.) (2002). Spatial
Language. Cognitive and Computational Perspectives. Dordrecht, the
Netherlands; Kluwer Academic Publishers, pp283.
Coventry, K. R. (2003). Spatial prepositions, spatial
templates and “semantic” versus “pragmatic” visual representations. In E.
van der Zee and J. Slack (Eds.), Representing Direction in Language and
Space, pp255-267. Oxford University Press.
Coventry, K. R., Cangelosi, A., Joyce, D. &
Richards, L. V. (2002). Putting geometry and function together - Towards a
psychologically-plausible computational model for spatial language
comprehension. In W. D. Gray & C. D. Schunn (Eds.), Proceedings of the
Twenty-fourth Annual Conference of the Cognitive Science Society, p33.
Lawrence Erlbaum Associates, Mahwah, NJ.
Coventry, K. R., Prat-Sala, M., & Richards, L.
(2001). The interplay between geometry and function in the comprehension of
‘over’, ‘under’, ‘above’ and ‘below’. Journal of Memory and Language, 44,
376-398.
Coventry, K. R., Venn, S. & Armstead, P. (2002).
Object knowledge and the construction of spatial mental models. Cahiers de
Psychologie Cognitive, 21(6), 635-652.
Coventry, K. R., Venn, S. F., Smith, G. D. & Morley,
A. M. (2003). Spatial problem solving and functional relations. European
Journal of Cognitive Psychology, 15(1), 71-99.
Deco, G. and T.S. Lee (2002) A Unified Model of Spatial
and Object Attention Based on Inter-cortical Biased Competition. In Press,
Neural Computation.
Edelman, S. Representation and Recognition in Vision,
MIT Press, 1999.
Edelman, S. (2002) Constraining the Neural Representation
of the Visual World. Trends in Cognitive Sciences, Vol. 6, pp. 125-131
Elman, J.L. (1990). Finding structure in time. Cognitive
Science, Vol. 14, 179-211
Freyd, J.J. and Finke, R.A. (1984) Representational
Momentum. Journal of Experimental
Psychology: Learning, Memory and Cognition, Vol. 10, pp.126-132
Gapp, K. –P. (1995). Angle, distance, shape, and their
relationship to projective relations. In J. D. Moore, & J. F. Lehman
(Eds.), Proceedings of the 17th Annual Conference of the Cognitive Science
Society (pp. 112-117). Mahwah, NJ: Cognitive Science Society.
Glenberg, A. M. (1997). What memory is for. Behavioral
and Brain Sciences, 20(1), 1-55.
Herskovits, A. (1986). Language and Spatial Cognition.
An interdisciplinary study of the prepositions in English. Cambridge
University Press.
Joyce D., Richards L., Cangelosi A., Coventry K.R.
(2002), Object representation-by-fragments in the visual system: A
neurocomputational model. In L. Wang, J.C. Rajapakse, K. Fukushima, S.Y.
Lee, X. Yao (Eds), Proceedings of the 9th International Conference on
Neural Information Processing (ICONP02) IEEE Press, Singapore (pdf file)
Joyce. D. W., Richards, L. V., Cangelosi, A. &
Coventry, K. R. (2003). On the foundations of perceptual symbol systems:
Specifying embodied representations via connectionism. In F. Dretje, D.
Dorner & H. Schaub (Eds.), The Logic of Cognitive Systems. Proceedings
of the Fifth International Conference on Cognitive Modelling, pp147-152.
Universitats-Verlag Bamberg, Germany. (pdf file)
Landau, B., & Jackendoff, R. (1993). 'What' and
'where' in spatial language and cognition. Behavioural and Brain Sciences,
16(2), 217-265.
Landauer, T., & Dumais, S. (1997). A solution to
Plato’s problem: the latent semantic analysis theory of acquisition, induction
and representation of knowledge. Psychological Review, 104, 211-240.
Logan, G. D., & Sadler, D. D. (1996). A
computational analysis of the apprehension of spatial relations. In P.
Bloom, M. A. Peterson, L. Nadel, & M. F. Garrett (Eds.), Language and
Space (pp. 493-530). Cambridge, Mass.: MIT Press.
Martinez, G. C., Cangelosi, A. & Coventry, K. R.
(2001). A hybrid neural network and virtual reality system for spatial
language processing. Proceedings of the 2001 International Joint Conference
on Neural Networks. IEEE Press. vol. 1, 16-21 Washington DC.
Plunkett, K., Sinha, C., Moller, M.F & Strandsry, O.
(1992). Symbol grounding or the emergence of symbols? Vocabulary grouth in
children and a connectionist net. Connection Science, 4(3-4), 293-312.
Regier, T. (1996). The human semantic potential: Spatial
language and constrained connectionism. Cambridge Mass.: MIT Press.
Regier, T., & Carlson, L.A. (2001) Grounding spatial
language in perception: An empirical and computational investigation.
Journal of Experimental Psychology: General, 130(2), 273-298.
Richards, L. V. & Coventry, K. R. (2001). Children’s
production of locative prepositions in English; the influence of geomewtric
and extra-geometric factors. In Proceedings of the 2nd Annual Language and
Space workshop: Defining Functional and Spatial Features. University of
Notre Dame, Indiana.
Richards, L. V. & Coventry, K. R. (in press).
Children’s production of locative prepositions in English; the influence of
geomeotric and extra-geometric factors. In L. A. Carlson & E. van der
Zee (Eds.), Functional features in language and space: Insights from
perception, categorization and development Oxford University Press.
Richards, L. V., Coventry, K. R. & Clibbens, J.
(2004). Where’s the orange? Geometric and extra-geometric factors in
English children’s talk of spatial locations. Journal of Child Language,
31, 153-175.
Talmy, L. (1983). How language structures space. In H.
Pick, & L. Acredolo (Eds.), Spatial Orientation: Theory, research and
application (pp. 225-282). New York: Plenum Press.
Williamson J.R.
(1966). “Gaussian ARTMAP: A Neural Network for Fast Incremental
Learning of Noisy Multidimensional Maps”, Neural Networks, 9(5), pp. 881-
897
|