|
Objectives
The aim of this research project is to
investigate the processes and mechanisms leading to an integration of vision,
action and language in natural cognitive systems (namely human
participants) and to use this advancement in knowledge for the design of
psychologically plausible artificial cognitive agents (simulated robots)
able to communicate about the world they perceive and act upon it. This
scientific and technological aim will be achieved through the following
research objectives:
o To
explore the interface between language, action and vision through eye
tracking experiments and micro-affordances studies on the action component
of object representation (e.g. microaffordances)
o To
identify the time-course of action, vision and language integration
processes in tasks requiring selective attention, object search and object
manipulation and assembling under verbal instructions
o To
develop a cognitive robotic model capable of object manipulation and
language use based on psychologically plausible embodied cognition
principles identified through the above objectives I and II.
Methodology
This collaborative research project reflects an
interdisciplinary approach based on a combination of computational
modelling (cognitive robotics and neural networks) and experimental
methodologies (stimulus-response compatibility studies and eye-tracking experiments).
The cognitive robotic platform developed during the project will serve as a
tool to test feasibility of the vision/action/language integration
mechanisms identified during experimental studies, in addition to
demonstrating the technological potential in such an approach. Observation
and analyses of the robot’s cognitive and linguistic capabilities will also
result in the production and test of new predictions about mechanisms
integrating vision, action and language. The replication in a robotic model
of the psychological phenomena observed in experimental studies will have
the advantage of permitting the fine analysis and understanding of the
neural and behavioural processes that contribute to action-vision-language
integration (Cangelosi & Parisi 2002).
Empirical studies will be based on two main
experimental approaches: (i) a stimulus-response compatibility paradigm and
(ii) eye-tracking studies. The stimulus-response compatibility (SRC)
procedure will vary the relationship between the responses to a target
object and actions associated with it and also non-target objects. For
example, participants will respond with precision or power grips to some
property (category or shape) of a target, three-dimensional object. Both
target and non-targets may be compatible or incompatible with the response.
Two sets of human behavioural investigations will employ this procedure.
The first set (experiments 1.1–1.4 described in section 2.3) investigate
the role of attention in visuo-motor integration. Ellis, Tucker, Symes and
Ellis (in press) show that in selecting a target object, the actions
associated with non-target (distractor) objects are inhibited. We will
extend these findings to gather behavioural data on the time course of
action inhibition and potentiation during target object selection in
multi-object scenes. These data will inform the robotic model (see below)
in which object selection is the outcome of competition between
vision-action assemblies in a distributed system. The second set of SRC
experiments (experiments 2.8-2.12 described in section 2.3) will
investigate the interface between language and visual objects by
introducing object names as distractor and target objects.
A complementary set of behavioural studies will
be based on the eye-tracking methodology. This permits the identification
of the time-course of visuo-attentional processes in action and language
processing and will provide converging evidence from SRC studies on object
selection. Eye tracking data will also be used to constrain the behavioural
and attentional strategies used by the simulated cognitive robots during
tasks involving object naming and selection. In eye-tracking experiments we
will show arrays of novel objects and study three levels of action representation.
At the encoding level, we manipulate the location and onset time of a
visual detection probe in this array to reveal how observers attend and
prepare their actions (Fischer et al., in press). At the
representational/linguistic level, we present auditory object names and
register the observer’s eye movements towards the named objects (visual
world paradigm, e.g. Altmann & Kamide, 2004). Linguistic manipulations,
such as using phonological competitors (“candle-candy”), reveal the time
course of the interplay between covert and overt attention and the relative
strength of top-down (linguistic) vs bottom-up visual control over action
prediction. Finally, at the execution level, we instruct participants to
pick up the named object and record their overt manual responses (e.g.,
Chambers et al., 2002, 2004). Orthogonal to these three levels of
embodiment, we gradually associate each novel object with a particular name
and manual response, and we design object arrays with congruent and
incongruent response requirements. This learning approach enables us to
track embodied concept acquisition and its implications for action control,
separately at the encoding, linguistic/representational, and execution
level.
These behavioural studies will motivate the development
of the new robotic model to allow mutual interactions between language and
visual object representation and to analyse the time course of vision,
action and language processes. The robotic agents developed in this project
will consist of a simulated robot with a head, a torso and two arms and
hands. The simulator will implement a body configuration (sensors and
actuators) based on the upper torso of the humanoid platform iCub. Robotic
agents will be trained to interact with objects, such as artefacts and
tools (grasp cup, use hammer), so that agents acquire a sensorimotor and
functional (microaffordance) representation of the objects through eye and
hand movements. The visual input to the robot’s neural controller will
consist of pre-processed information regarding object properties (e.g.
size, colour, location etc.). This information will be processed directly
from the physics simulators. The extraction of visual features for further
processing and integration (within connectionist networks) with motor
representation will be based on Ullman’s visual routine approach. This
hybrid vision/connectionist approach was developed by Cangelosi and
collaborators in their previous EPSRC grant GR/N38145 on the perceptual
grounding of spatial terms (e.g. Joyce et al. 2003). In addition to vision,
the robot will receive tactile information and proprioceptive data on its
own body posture. Agents will also be trained to label objects and actions.
A connectionist network will be used to learn and guide the behaviour of
the robot and to acquire embodied representations of objects and actions.
The neural architecture will also have recurrent structures to permit
information integration and the execution of actions such as grasping (e.g.
Marocco et. al 2003). After training, the robot will be used to simulate
vision-action-language experiments.
The work on the development of robotic agents
will be based on the combination of epigenetic robotics methodologies and
“embodied connectionist” modelling. Epigenetic (developmental) robotics is
based on the use of embodied robotic systems that are situated in a
physical and social environment and are subject to a prolonged epigenetic
developmental process for the acquisition of cognitive capabilities (Weng
et al. 2001). Embodied connectionism refers to the use of artificial neural
networks for the learning and control of behaviour in cognitive robotic
agents. The integration of robotics and connectionist methodologies permits
the transfer of the principles and advantages of connectionism and parallel
distributed processing systems into embodied robotic agents (Cangelosi
& Riga 2006). Cangelosi and collaborators at the ABC group in Plymouth
have already used such a methodology to study basic action manipulation
tasks (Cangelosi & Riga 2006; Massera, Nolfi & Cangelosi 2006) and
successfully extended it for 100+ action combination repertoire (Tikhanoff
et al. 2007). The detailed analyses of the neural network activity in
controlling behaviour and of the time-course of processes and
representation activated by the robot’s neural controller will be used to
better understand behaviour observed in human participants and to derive
novel predictions about interactions between vision, action and language.
|