Report

Survival Mechanisms in Artificial Life Michael Blackadar Dept. of Computer Science, Faculty of Science, University of Ca...

0 downloads 195 Views 405KB Size
Survival Mechanisms in Artificial Life Michael Blackadar Dept. of Computer Science, Faculty of Science, University of Calgary, 2500 University Drive N.W., Calgary, Alberta, Canada T2N 1N4 [email protected]

Abstract. We created a simulation of artificial life and examined the behaviours that occurred between the life forms and the environment. These interactions include gathering food and moving around the environment, as well as fighting or mating with other creatures. Each creature was given characteristics such as speed and strength that differentiate it from the others. Creatures learn as they explore the world, using a method known as neural fitted Q-learning. Many parameters can be changed by the user in order to modify the simulation. Much of this work was based on Polyworld, an earlier exploration of artificial life created by Larry Yaeger [1, 2]. The main differences between this previous project and our work is our different learning method, as well as having a different vision system. We attempted to find interesting creature behaviour through the learning system, as well as changing world parameters, but had difficulties due to the slowness of the system.

1

Introduction

Artificial life is an area of computer science that studies the processes and systems of life through simulations. One purpose of artificial-life is to create life-like simulations, and study them to better understand the processes involved. Originally, simple modeling techniques such as cellular automata were used for this purpose. A popular example of this being Conway’s Game of Life created in the 1970’s. Other simulations model creatures within the simulation as agents, which consider the world state and interact with other agents and the environment. In order to choose an action, the agents may use a neural network. This is the basic idea behind Polyworld, which our project is largely based on. In our project, we created our own artificial life simulation, and explored what behaviours creatures exhibit in order to survive in their environment. We also examined how their behaviour changes due to changing their own capabilities or the environment.

2

Previous Work

Polyworld is an artificial life simulation created by Larry Yaeger in 1994 which is still being developed [1] . In [2], a description is given of the original system. The world consists of a flat plane which contains creatures, food pieces, and obstacles. Each creature is represented by a polygon where their colour is affected by their

2

Michael Blackadar

current state. Each one has unique characteristics such as size and metabolic rate. The most important property of creatures is that each one has an energy level which it must keep above zero to avoid death. To do this, it must be capable of collecting the food pieces scattered across the world. Creatures have many ways of interacting with each other such as signaling, fighting and mating. They are also capable of moving around the world and collecting food pieces. Through mating, creatures would evolve by passing on their physical characteristics such as size. In order to choose what action to do, each creature has a complex brain consisting of a neural network that can be structured differently depending on the genome of the creature. The output of this network was one of the discussed actions, and the input was a one-dimensional image of the environment, made of RGB pixels. One of the most interesting results of this work was how creatures would learn to respond to this visual stimulus, such as running away from strong creatures which might be dangerous. Another interesting aspect was how the structure of the creature’s neural network can be passed on to future generations, and how creatures learn through their lifetime.

3

Implementation

Although our project was based off of this previous work, we made a number of changes. In this section we will fully explain our system and how it is was implemented. 3.1

World

The world created consisted of a flat grid of tiles. At initialization, the majority of the tiles are empty. Some of the tiles are set to be obstacles, where creatures can never move through. Others are set as food patches, which can provide energy needed by the creatures. When a creature eats food from a patch, the food regrows at a fixed rate modifiable by the user. These food patches are scattered randomly around the world, and their number can also be set by the user. Each world tile is fully described by its type as mentioned earlier, and by how many food pieces are on the tile. At the beginning only food patches will have food on them, but when creatures die they also leave food on regular patches that does not regrow. An example of the initial setup is given here. 3.2

Creature Characteristics

A creature is an agent in the simulation which can sense the environment and interact with it by performing actions. The behaviour of the agent is affected by its personal characteristics, as well as the state of the environment. The most important characteristic is the energy of a creature. Initially, each creature has maximum energy. For every turn and action that occurs, energy is lost. Once energy is lost, the creature dies and is converted into food on the tile it was standing. In order to keep the energy high, creatures must eat food either from

Survival Mechanisms in Artificial Life

3

Fig. 1. A world is created with creatures and food patches scattered around. The gray tiles are obstacles creatures cannot pass through.

regrowing food patches or from dead creatures. In order to sense the environment, each creature is given information about nearby tiles that it can see. The amount of vision is unique to each creature, and can be shaped in various ways. Some creatures may be able to see for a long distance in the direction they are facing, while others may see for a short direction in a cone-shaped fashion. This was implemented by giving each creature a vision list which describes how their vision system is structured. As an example, a creature with a vision list of [1, 1, 1, 1, 1, 1] can see for a long distance in a single direction, where as a creature with the vision list [1, 2, 3, 4] can see tiles within a cone ahead of it. The system gives information about the tile such as its type, as well as information about any creature on it. This information is then passed on to the brain system which decides what action to take. In addition to the vision system, each creature also a speed and strength characteristic. The speed characteristics affects how far the creature can move when executing a move command. The strength attribute affects the creature’s

4

Michael Blackadar

ability to fight others as well as how much food it leaves behind when dead. These characteristics may be passed on to children creatures if the creature mates with another. An important thing to note is that creature lose energy every time based on these even if they do no action. This was implemented to prevent creatures evolving towards being as fast and strong as possible, and we believe it creates a more realistic simulation. 3.3

Actions

In each turn of the simulation, each creature has vision of its nearby tiles, gives this information to its brain, and then decides on an action to execute. We chose a simple set of possible actions, although this can be easily extended. The actions are: move, turn, stop, eat, fight and mate. The move action causes the creature to move forward at a rate proportional to the creature’s speed. If moving forward would cause the creature to move onto an obstacle tile or one with another creature, the creature does not move. The amount of energy lost is also proportional speed. Creatures can turn left or right 90 degrees at the cost of some energy. The stop action leaves the creature doing nothing, which saves energy since all normal actions require energy. The eat action causes the creature to eat one piece of food from the current tile, providing a large amount of energy. If their is no food on the current tile, nothing is done. The fighting and mating actions are more complex , as they involve interactions with other creatures. To execute either of these actions, a creature must be next to another and facing it. If there is no creature in one of the three tiles directly in front, no action is executed. If the fighting action occurs, both creatures lose energy proportional to their strength. Then their strengths are compared, and the weaker creature loses additional energy. As an example, consider a creature A with a strength of 3 attacking a creature B with strength 5. Both creatures first lose energy proportional to their strength. If the parameter for fighting is 3, then creature A loses 3 ∗ 3 = 9 energy while creature B loses 3 ∗ 5 = 15 energy. Now since creature A is weaker than B, it loses additional energy proportional to the difference in strengths. So here, creature A would lose an additional 10 ∗ (5 − 3) = 20 energy given a losing parameter of 10. In total, A loses 29 energy while B loses 15 energy. So it was probably a bad idea for A to attack B. This system was used because it provides many parameters for the user to modify which should affect the fighting behaviour. If the mating action is triggered, a new creature is created with a random amount amount of energy taken from both parents. This energy level is a world parameter modifiable by the user. When the child is created, each characteristic is either copied from one of the parents are generated in a random fashion. 3.4

Learning

Every turn, the brain system of the creature is given information about nearby tiles and must decide on an action to take. Creatures do not act based on rules,

Survival Mechanisms in Artificial Life

5

but are instead driven by a process called neural fitted Q-Learning. This algorithm, described in [3] combines the idea of reinforcement learning with neural networks. Each time we desire an action, we provide it with the environmental state, largely based on the vision of the creature. This vector of integers contains information about the creature such as its ID and energy level. Each tile has information regarding the type of tile, how much food is on it, and if there are any creatures on the tile. The algorithm then chooses an action based on this input and its memory of previous situations. Once we execute the action, we give the creature a reward based on the worth of the action. We decided to give a small negative reward for any action which decreases the energy of the creature, and a larger positive reward for actions such as eating and mating. When provided with information, the system trains itself to learn which actions provide it with the greatest reward. In this way, we can train creatures to behave appropriately. Initially, we wised to implement the system using a simple neural network. This consists of a network of nodes, that outputs a vector when given an input vector. The final output depends on the given inputs, but also on the weight of the connections between nodes in the network. For our system, the input vector was the environmental state, and their was one output for each possible action. Then we would execute the action that had the highest value in the output vector. In many neural network, the network can be trained by letting it choose an action, and then telling it what the correct action was given the input state. The network considers the error it made and uses this to change the weights of the network. However, we believed this process was not very realistic when simulating minds of creatures, because it requires that we tell it what actions to take. We instead wished to provide the creature with rewards, and let the creature figure out itself which action is best.

Fig. 2. An example neural network.

Instead of directly telling the creature which action is best, we can give it rewards based on its performance. These rewards are used by the learner to decide on the proper actions in the future. The goal of the learning is to maximize the long-term rewards given. In a popular variant known as Q-learning,

6

Michael Blackadar

each environmental state and action acting on that state is given a Q-value. This represents the long-term reward of executing a specific action when encountering a specific state. When choosing an action, we usually choose the one with the largest Q-value consider our current state. When a reward is received, we modify the Q-values. The Q-value for a particular state-action pair (st , at ) is modified with the following formula.   Q(st , at ) = Q(st , at ) + α r ∗ γ max Q(st+1 , a) − Q(st , at ) (1) a

Here, α is the learning rate, which describes how quickly the q-values are modified. r is the reward we were given for executing the action. γ is the discount factor, which controls how far back to apply the reward. Using this formula, we modify the q-values in order to learn what the best actions are. We do not always choose the action with the highest Q-value however, because there is some trade off between exploitation and exploration. We can choose the best q-value, exploiting current knowledge by selecting the best known action or choose another one in the hopes of finding one that is even better. The greatest problem with this system is it is based on having a q-value for every state-action pair. Although we have few numbers of actions, the number of possible states is very high, so we cannot possibly store the values for each pair. So we cannot use this basic approach. Below is an example of how q-learning works for a small number of states and actions. State Action 1 Action 2 1 Q(1, 1) Q(1, 2) 2 Q(2, 1) Q(2, 2) 3 Q(3, 1) Q(3, 2) Instead of using either of these approaches directly, we used a combination of the two. This is known as neural fitted Q-Learning, and the complete algorithm can be found in [3]. It works in the following manner. When given an input, we choose an action based on their Q-values. Initially, these will be random. We then save the state, our chosen action, and the reward we will receive in memory. After a certain number of turns, we undergo a learning phase. This is done by creating a dataset for a neural network, where each element is a state-actionreward tuple. The neural network is trained by going through this dataset and modifying the Q-values stored earlier. Eventually, we must delete some of this memory due to speed considerations. 3.5

Architecture

We will now describe how the project was developed. To simulate the world and the interactions between agents, we used the NetLogo tool [4], which is an educational tool for multi-agent systems. The brain system for each creature was implemented in Python using the machine learning library PyBrain [5]. These two systems communicated by saving and loading files, due to NetLogo’s lack of network communication ability. Each round, NetLogo lets each agent observe its

Survival Mechanisms in Artificial Life

7

nearby tiles and saves this information to a file, along with the reward for the previous action. The learning system reads this file and uses it as input. It then decides on an action for every creature and saves it to a file. Netlogo reads this, executes the actions, and modifies the world if necessary. This includes updating the energy and position of creatures, as well regrowing food. Once this is done, a new round can start. When running the simulation, the user has access to a variety of parameters that affects how the simulation behaves.

Fig. 3. The user controls for the simulation. These parameters affect the behaviour of the world and creatures.

4

Evaluation

Here we describe the types of behaviour we were attempting to find, what we discovered, and the problems we had when evaluating the system.

8

4.1

Michael Blackadar

Expected Behaviour

When creating the project, we had several goals. First, we were interested in having creatures learn throughout their lifetime. Initially we expected creatures to act randomly, but after several turns they should learn to eat food, explore the world, and interact with others in an appropriate fashion. Another goal was to demonstrate how changing the parameters of the world affected the actions chosen by the agents. For example, we expected that food patches that regrow quickly would cause creatures to stay on patches unmoving, while having patches regrow slowly would cause creatures to explore more often. Finally, we wished to see how these changing world parameters affected the fitness of creatures with different capabilities. If we increased the energy cost of being strong, we would expect that the population evolves towards creatures with less strength for example. 4.2

Discoveries

First we examined the behaviour of creatures without learning. This meant that actions were chosen randomly, so we would see the creature attempt all possible actions. When food is eaten from a food patch, it regrows in the proper fashion by relying on a timer. Fighting and mating actions also behaved as expected, causing stronger creatures to kill the weaker ones, and for new creatures to be created with the proper characteristics. Modifying the world parameters such as the number of creatures also had the expected effect. Obviously, increasing the food regrowth rate or decreasing energy cost caused creatures to live for a longer period of time. Although the actions done by creatures, we still some interesting behaviour. In the following figure, you can see creatures that were bunched together at the beginning, causing them to breed. This creates more creatures, but also means the creatures have lower energy, represented by giving them a darker colour. Creature far away from others could not breed, which means they can save up energy giving them a longer life span. Next, we examined the learning system for a small number of agents. We gave each agent a small vision range and created 10 spread around the world. Then we tested with various learning parameters. We define the learn rate as how often creatures would learn using their neural networks and their memory. The reset rate determines how often the memory of each creature is cleared. In our first test, we had learning occur every turn, with memory reset every 20 turns. This appeared to work rather well, with creatures learning to eat , but still moving around the world. For the second test, we had learning occurring every 5 turns and reseting every 100 turns. No learning seemed to occur, and behaviour seemed randomly. For the third test, we had learning occur every second turn and reseting occur every 20 turns. However, we also changed a learning parameter which cause the learner to behave in a more greedy fashion, choosing the action with the highest Q-value. In this case, the creatures learned quickly to eat food from patches, but seemed to stay on the food patch without exploring. The graphs for these tests can be seen on the final page.

Survival Mechanisms in Artificial Life

9

Fig. 4. Creatures in the top left and centre have breeded, created many children, and also leaving food.

Unfortunately, when large numbers of creature were added to the world, learning did not seem to occur. This caused a lack of interesting behaviour, as most actions were chosen randomly. Creatures were capable of learning to sit on a food patch and eat, or breeding with others due to the reward given, but the more interesting behaviours did not occur. We did not see evidence of creatures learning to react to other creatures nearby, such as running away or moving to fight. In addition, we did not find the creature capable of learning their strengths, such as having stronger creatures fight more often and faster ones explore more often. 4.3

Issues

The main issue we had was with the learning system. When creatures were executing their actions, the system behaved fairly fast. If a creature was learning

10

Michael Blackadar

Fig. 5. First test case with learning every turn

Fig. 6. Test case learning infrequently, but with a large memory

Fig. 7. Test case learning frequently and selecting actions greedily.

Survival Mechanisms in Artificial Life

11

however, the simulation would slow down greatly due to the training of the creature’s neural network. In addition to this problem, the training speed slowed down over time, due to having each creature keep in memory previous states, actions, and rewards. Clearing this frequently improved speed, but caused a lack of learning. Finally, when a creature is created, it has a completely untrained system, so it must learn from scratch. When creatures die, all of the learning that occurred is wasted. Considering all these factors, it is easy to understand why the learning was poor. Creatures frequently behaved in a random fashion, which it made it difficult to find interesting behaviour such as creature reacting to others or exploring the world in an appropriate manner.

5

Future Work

As described earlier, the largest issue with our simulation was the poor learning ability of the creatures. This could be improved in a few ways. First, a better simulation environment would speed up the simulation, as NetLogo and Python programs are not particular fast, and reading and writing to files every round could be eliminated. The second possible improvement would be to experiment with different reinforcement learning parameters and techniques to improve the efficiency of the learner. Ultimately, it may not be possible to create an adequate simulation that runs in real-time using a single computer due to the large number of neural networks being used and then discarded when creatures are destroyed. Provided that a learning system is possible, there are many possibilities to explore in this kind of simulation. More actions could be added, such as allowing creatures to move at different speeds to conserve energy or move quickly if needed. A hibernation action could be added to allow agents to save energy at the cost of less information about the world. A wide variety of world parameters can be changed to examine the changed behaviour of the creatures. Finally, creatures could have unique learning parameters which are passed on to future generations in a similar manner to physical characteristics such as speed to examine the effect of having different learning systems.

6

Conclusion

In our simulation, we examined the survival behaviour of creatures with differing characteristics such as speed and vision ability. Creatures were able to learn throughout their lifetime, learning how to eat food from the regrowing food patches. Changing the world parameters such as food regrowth and energy cost for actions affected the creatures as expected. However, we were unable to find more interesting behaviours such as exploring the world or running away from other creatures. We believe this failure to be due to the slowness of the learning system. We were able to demonstrate an artificial life system which is easily modifiable by the user, who can affect the simulation as wanted. This work is easily extendible, as more actions could be added, and more parameters could be examined to look for interesting behaviours.

12

Michael Blackadar

References 1. : Polyworld website. http://www.beanblossom.in.us/larryy/polyworld.html (2011) 2. Yaeger, L.: Computational genetics, physiology, metabolism, neural systems, learning, vision, and behavior or polyworld: Life in a new context. In: Proceedings of the Artificial Life III Conference. (1994) 263–298 3. Riedmiller, M.: Neural fitted q iteration first experiences with a data efficient neural reinforcement learning method. In: In 16th European Conference on Machine Learning, Springer (2005) 317–328 4. : Netlogo. http://ccl.northwestern.edu/netlogo/ (2011) 5. : Pybrain. http://pybrain.org/ (2011)