Resistive grid as a value map
Application to route finding and obstacle avoidance, planning a sequence of robot arm
movements, support for learning in delayed reward cases, pole balancing.
Guido Bugmann, Raju Bapi, Brendan D'Cruz, Mike Denham
Kaspar Althoefer and John G. Taylor (King's College London).
From a study of the literature on reinforcement learning in the field of control,
describing methods such as the ACE/ASE system of Barto, Sutton and Anderson (1983) and
temporal difference learning by Sutton (1988), it appeared that the generation of a relevant value
map was central in these methods. It was also a very slow process due, in part, due to the aim of
finding and optimal value map, requiring that updating results from an optimal policy to be
followed. Thus, only the value of the current state was updated. Further, as dynamic programming
techniques are used, a state had to be visited several times before the correct value map emerged.
Slowness was also due to a random exploratory component required to ensure a satisfactory
convergence of values and policy. The design of the exploration method was very much a matter of
personal taste and affected the final map.
1. In our proposed method, the concept of optimal map is abandoned and the emphasis is put on a
complete exploration. From each state, all accessible neighbours are visited and their values used
to update the state's value. Several updating cycles are also required to produce a smooth
interpolating map. But, as there is no need to follow a path (policy), this is an inherently parallel
process, potentially very fast.
2. As an example, a neural network is proposed which implements a resistive grid. The nodes in the
net corresponding to obstacles, or forbidden states have their output clamped to zero. The node
corresponding to the goal state has its value clamped to 1. All other nodes set their output to
intermediate values, equal to the average of the output of their neighbours. This implements a
Laplacian net (Bugmann, Taylor, Denham, 1994).
3. This form of value map has been applied mainly to problems where actions caused only
transitions between nearest neighbours: Route finding in 2-Dimensional problems (Bugmann,
Taylor and Denham, 1994; Bugmann, 1996)or planning a sequence of robot arm movements
(Althoefer and Bugmann, 1995).
4. The collaboration with Kaspar Althofer in the field of robot arm control led to a number of
interesting side results. i) It was found that the forbidden regions in configuration space due to
obstacles in the workspace can be computed by superposing elementary forbidden regions due to
all elementary components of the obstacles (Althofer et al, 1995, 1995, 1995). ii) It was found that
due to the typical shapes of the free space in the robot arm problem, a wave type updating method
allows to compute a useful value map in less than ten iterations (Althofer, Fraser and Bugmann,
1995). iii) A new sequence learning network based on RBF neurons was designed to produce a
smooth and precise movement, despite the coarse angular coding in the resistive grid (Althoefer and
5. The resistive grid technique was also used in a case where actions do not necessarily cause
transitions between nearest neighbours in the grid. For instance the pole balancing problem where
there are only 2 possible actions which cause the system to move within a 4-dimensional state
space. In this case, the neuro-resistive grid proved a useful replacement of the ACE element and led
to faster learning. The controller using a resistive grid as value map showed also a smaller dynamic
range in terms of starting positions of the pole, possibly indicating that the resistive grid may have
settled to a given value map too fast (Bapi, D'Cruz and Bugmann, 1996).
1. Relating formally the resistive grid method to other map building techniques such as Q-learning
2. Exploring nets where the action linking two nodes is not represented by the relative positions of the
nodes but encoded separately.
3. Deepening our understanding of the learning dynamics in the pole balancing problem with a
"Route finding by neural nets"
Bugmann, G., Taylor, J.G. and Denham M. (1994)
in Taylor J.G (ed) "Neural Networks", Alfred Waller Ltd, Henley-on-Thames, p. 217-230.
"The configuration Space Transformation for Articulated Manipulators: A Novel
Approach Based on RBF-Networks"
Althöfer K., Fraser D.A., Bugmann G. and Turan J. (1995)
in Proc. of the 4th International Conference on Artificial Neural Networks (ANN'95), Cambridge,
UK, IEE Conference Publication 409, pp 245-249.
"The configuration Space for Manipulators Computed by a Basis-Function Network"
Althöfer K., Fraser D.A. and Bugmann G. (1995)
in Proceedings of the International Conference on Engineering Applications of Neural Networks
(EANN'95), Espoo, Finland, August 21-23, pp.469-472.
"Neuro-Resistive Grid Approach to Pole Balancing Problem"
Bapi, R., D'Cruz B. and Bugmann G.(1995)
Proc. of ICANN'95, Paris, Vol. 2, 539-544.
"Planning and Learning Goal-Directed Sequences of Robot-Arm
Althöfer K. and Bugmann G.(1995)
Proc. of ICANN'95, Paris, Vol. 1, 449-454.
"Asymmetric-B-Splines for the fast Calculation of C Space
Patterns of Robot Arm"
Althöfer K., Fraser D.A., Bugmann G. and Plumbley M.D. (1995)
Proc. of ICANN'95, Paris, Vol. 2, 387-392.
"Rapid path planning for robotic manipulator using an
emulated resistive grid"
Althöfer K., Fraser D.A. and Bugmann G. (1995)
Electronics Letters, 31, 1960-1961.elec_www.zip (ps
"Neuro-resistive grid approach to trainable controllers:
A pole balancing example".
D'Cruz B., Bapi R. and Bugmann G. (1995)
To appear in Neural Computing and Application Journal
ole.zip (ps file)(144294)
"Value Maps for planning and learning implemented with cellular automata "
Bugmann G. (1996)
in Parmee I.C. (ed) "Proc. of the 2nd International Conf. on Adaptive Computing in Engineering
Design and Control (ACEDC'96)", Plymouth, 26-28 March 1996, ISBN 0 905227 61 1, p. 307-
Please send any comments to email