Evolving behavior in devoloping robot bodies
controlled by quasi-Hebbian neural networks

Richard K. Belew (1)

Craig Mautner (1)

Toshi Kondo (2)

(1) CSE Dept., UC San Diego
(2) Tokyo Inst. Technology

Submitted to 7th Joint Symposium on Neural Computation
7 Apr 00


Abstract

We are concerned with the interaction between three specific adaptive systems: evolutionary change by species, ontogenic change by an individual as it matures; and learning by the individual as it acquires experience. We present experiments in which a population of individuals, each grown from a single cell according to its particular genome, into an adult form corresponding to a simple robot body and NNet which allows it to function in and learn about its environment.

Extended abstract

We use a grammatical GA: genes correspond to production rules collectively define a translation from a ``gamete'' (start symbol) to a final phenotypic form. Our phenotype includes features of both the robot's body and its NNet. Beginning from an initial ``gamete'' cell, the grammar rules specify ``re-writing'' of cells in one state into one or more other cells. This growth process can be observed through this animation of the developmental process. Terminal states of this expression correspond to cells of particular ``fates'': light sensor cells, motor cells motors, structural elements connecting these and a neural network (NNet) controlling the flow of information from sensors to motors. (A second developmental animation shows all cell's dividing simultaneously, and also their ultimate cell fate.) After placement of the NNets somatic body within the robot's other cells, synaptoenesis causes processes of each neuron to grow in genetically-determined directions and synapse upon any neurons found at that location.

Once built, the NNet uses a variation of a standard Hebbian learning rule that is common to robotic tasks in which reinforcement information is available. Specifically, we imagine a single, extrinsic indicator of a form of "temporal difference" reinforcement signal that is perfectly correlated with successful behavior in the enviroment. Because we are now only building relatively small bundles of $O(100)$ cells which can be expected to have diameters of $O(10)$ cells, we make the further assumption that this reinforcement signal is available to all neurons. With these simplifications, our learning rule becomes:

\[ \delta w_{ij} = \eta * a_i * Corr( a_j , Val) \]

where $a_i$ and $a_j$ are the activities of the pre- and post-synaptic neuron, $Val$ is the reinforcement signal, and $Corr()$ is the correlation function.

Our current implementation is motivated by our instantiation of these robot Bauplans as Lego Mindstorm creatures. This means that we have constrained ourselves to the very limited capacities of the RCX controller (8-bit , 32k RAM, 6 I/O channels) and crude light sensors. All locomotion is accomplished by excitation or inhibition of forward and backward motor cells associated with each of a pair of coaxially-mounted wheels.

Testing the effectiveness of these designs can be accomplished in two ways. The most obvious is to actually build each Lego creature and test its performance. But this construction step is by far the most labor intensive and hence expensive, and so we also need a cheaper alternative. We have therefore also developed a simulation environment in which these designs can be tested. This enviroment allows us to model a single ``light'' source and the effects of our simulated robots' motions in this world.

From entirely random initial genomes, we reliably evolve simulated robots that are effective at the light-seeking behavior. An example of a good solution, as it behaves in the simulated environment, is shown in this animation of an evolved robot's behavior. The same individual is shown at two scales at right and left. The larger image at the right shows sensor, motor, neuron and body cells (as blue, yellow, green, and black squares, respectively), together with excitatory (red) and inhibitory (white) synapses connecting the neurons. These weighted edges change according to the learning rule and ambient reinforcement (signalled by the red dot at the bottom). The simultaneous light-seeking behavior of the robot is shown in smaller scale at the left.

Despite our many simplifying assumptions, the range of alternative solutions discovered across multiple runs is quite remarkable; many of them are nearly equivalent in performance. This variability seems an inevitable consequence of the composition of evolutionary, ontogenic and learned adaptations. Current work is focused on identifying features that remain constant in the face of this variation, on extending to larger designs and more complex task domains, and implementing our designs in real Lego Mindstorm robots.


Last updated by rik on 12 Apr 00