Chris

Extending Ordering Structures and Targeted Operators in a Harbour Security Setting Chris Thornton1 Dept. of Computer Sci...

1 downloads 218 Views 539KB Size
Extending Ordering Structures and Targeted Operators in a Harbour Security Setting Chris Thornton1 Dept. of Computer Science, Faculty of Science, University of Calgary, 2500 University Drive N.W., Calgary, Alberta, Canada T2N 1N4 [email protected]

Using particle swarms to nd emergent behaviour has been demonstrated with success, but only with a small focus on eciency. The purpose of this project is to investigate ways to improve the speed at which the swarm can nd an emergent mis-behaviour in a harbour security scenario. Abstract.

1 Introduction In multi-agent systems, emergent behaviour is often seen as something that is desirable, where agents work together to exhibit behaviours that they were not explicitly programmed to do. However, there is always the possibility that emergent mis-behaviour can occur, that is where agents do something that has an undesirable outcome. Testing for these emergent behaviours can be a complicated task for humans to perform (and is subject to biases), so automated tools are desired. For example, consider the problems involved in patrolling a large harbour that is home to both military and civilian activities. It is not feasible to provide constant sensor coverage over the entire area simply due to cost, so a eet of mobile sensor platforms are required. The platforms must protect an area that is determined to be of high importance, and prevent intruder vessels. These platforms need to be given a set of rules on how they act in the case of an intruder or suspected intruder, but it is not always clear what the results of such actions are. [1] presents a system that uses multi-objective particle swarms to learn strategies for a set of intruder vessels that can defeat the patrollers. [2] further develops the system by presenting `ordering structures', a tool that can be used to direct the types of weaknesses that are found in the patrol policy. The authors presented a basic overview of the eciency for the di erent types of ordering structures, although this was not their primary focus. This project will evaluate di erent variants of ordering structures, along with other components and the resulting e ect on the above system.

2 Project Background This section provides an overview of the testing framework used, particle swarm optimization, and ordering structures. (In some cases, parts of the content for these subsections comes directly from either [1] or [2]).

2

Chris Thornton

2.1 Multi-Agent System Testing

...

Ag tested,1

Ag tested,m Ag byst,1

. . .

E nv

Ag byst,k Ag attack,1

...

Ag attack,n

Machine learner Fig. 1.

Mutli-Agent Learning System

Figure 1 describes the general structure of an evolutionary learning system on top of a multiagent system. At the core of such a system is the environment E nv where all agents can interact with one another. Within E nv, there can be three kinds of agents. The set Atest = fAgtest;1 ,...,Agtest;m g is the set of agents that we are testing. The agents in the set Abyst = fAgbyst;1 ,..., Agbyst;k g, which can be empty, are not under the control of a learner, but still participate in the simulation. Finally, the set Aattack = fAgattack;1 ,...,Agattack;n g represents the agents in the simulation that the machine learner can control. In the harbour security instance, the set of Atest consists of the vessels that are responsible for defending the harbour. Abyst could consist of small civilian watercraft or large commercial freighters - in this instance it will be empty. Finally, Aattack contains the set of intruder vessels that are under control of the machine learner. Two ctitious patrol policies were created for evaluation (due to military security reasons, it was not possible to use actual policies). The rst policy divides the agents in Atest into two sub types, namely patrollers and interceptors. Patrollers are used to spot potential intruders and noti es the interceptors which will then approach the potential intruder, identify it, and, if necessary, disable it. A patroller updates an alarmed interceptor about the navigational course of the potential intruder as long as the potential intruder remains in sensor range. If a potential intruder comes close enough to a patroller to be identi ed, then the patroller will take it out if necessary. With the exception of this case, patrollers maintain their predetermined route throughout the harbour. The interceptors wait at prede ned positions in the harbour and only become active when alarmed by a patroller. When active, an interceptor determines the best

Title Suppressed Due to Excessive Length

3

position to approach near to a potential intruder, based on the information from the patroller. If the intruder is not detected at that position then the interceptor returns to its waiting position. If for some reason one of the patrollers becomes incapacitated, an interceptor will change groups to ll the role of the missing interceptor. By splitting the agents in Atest into two types, resources can be used in a more dedicated fashion, with the patrollers more specialized in detection and the interceptors with man power and other resources to deal with intruders. This is policy pat int. The second policy, all pat, does not distinguish the agents in Atest . All patrollers follow a circuit around the harbour (like a patroller in pat int) and they are evenly spaced on this circuit. In the event that a patroller becomes incapacitated, the remaining agents will spread out along the circuit to maintain even spacing. If an agent detects a potential intruder, the available agent closest to the intruder is sent to identify the boat. If the identi cation is unfriendly, the agent will attempt to intercept the intruder and disable it.

Fig. 2.

pat

int in Esquimalt Harbour, B.C.

Both of these patterns were instantiated for two harbours - Esquimalt Harbour in British Columbia, and Halifax Harbour in Nova Scotia. Figure 2 shows what pat int looks like in Esquimalt harbour. There are two patrollers, P at 1 which has a large circuit, while the other, P at 2, makes a much tighter loop closer to the goal (the red target). The two interceptors, I nt 1 and I nt2 start near the military base. In the case of all pat, there are four patrollers that make a large circuit from the mouth of the harbour up into the norther parts.

4

Chris Thornton

2.2 Particle Swarm Optimization Particle Swarm Systems (PSS, see [3]) are inspired by physics and biology, enhancing the idea of a moving particle with the attraction behaviour of members of a swarm to perform a search in a solution space (or an extension of such a space). In a PSS, the search state is represented by a set of l particles pi each of which is characterised by its current position posi , its current velocity vi , and its best position besti in the past. The position of a particle is usually a vector of continuous variables that represent a solution to the instance of the search problem the PSS is aimed at solving. In the basic case of a PSS there is a single function f describing the quality of a solution/position, where the goal of the search is to nd a solution that is as good as possible with regard to f . The search in the basic case is performed by updating each particle in the state according to the following equations: new

vi

= W vi + C1 r1 (besti new

posi

posi

) + C2 r2 (Best

= posi + vinew ;

)

posi ;

(1) (2)

where W is a weight parameter controlling the in uence of the previous velocity, C1 is the so-called cognitive learning factor, C2 the so-called social learning factor and r1 ,r2 2 [0; 1] are random values chosen by the search control. Best is the best position the whole swarm has found so far. If a particle reaches a new best position, i.e. f (posnew ) is better than f (besti ), then besti is updated, i.e. bestnew i i new = posi , else it stays, i.e. bestnew = besti . From the point of view of a particle, i a sequence of updates has it ying through the solution space and the whole algorithm terminates either after a given number of update rounds (with Best being the output of the system) or if Best ful lls certain conditions. In the harbour security problem, each particle position encodes a string of waypoints for each of the intruders. A waypoint is de ned by a location in the world, as well as a speed at which the intruders are to move towards a given waypoint. As such, a particle that represents a strategy with ten waypoints per attacker is a sixty dimensional vector.

2.3 Ordering Structures Since there is no one clear objective function or tness that can be assigned to a particle, a multi-objective optimisation method must be used (see [4]). For every run of the simulator, it produces a trace (e0 ; : : : ; ex ) of environmental states that happen in sequence. We can then de ne a number of functions that measure `success' of the intruders, but it is clear that there are some functions which are more important than others. For example, if all of the intruders have been destroyed by the defenders, then this solution is worse then one where all intruders have remained active a the end of the simulation. To meet this requirement, [1] introduces `ordering structures' - a way to deal with the hierarchy of measures. The general idea of a goal ordering structure  can be formally described as follows. For two particle position vectors pos1 and pos2 , that need to be

Title Suppressed Due to Excessive Length

5

compared, a goal ordering structure has the form (ff11 ,...,f1q1 g,...,ffu1 ,...,fuqu g), or (f1 ,...,fu ) for short, where fij is either a quality function assigning a number to a trace e0 ,e1 ,...,ex of environmental states produced by the strategy for the attack agents represented by a position when applied in E nv interacting with the other agents, or another ordering structure. If  denotes the ordering that is created by this ordering structure, then pos1

 pos2 ; if pos1

f1

pos2 ; or =f1 pos2 and pos1 f2 pos2 ; or ::: or pos1 =f1 ;:::;fu 1 pos2 and pos1 fu pos2

pos1

pos

[2] de nes a number of di erent measures of success for a particle position and trace e0 ,...,ex produced by the simulator as follows:

fintercept

80 > < )= > :1

((e0 ; :::; ex ); pos

if there is a j , such that all Agattack;i are intercepted in ej ; else ;

let pos1 intercept pos2 if fintercept ((e0 ; :::; ex ); pos1 ) > fintercept ((e0 ; :::; ex ); pos2 ).

fsuccess

81 > < )= > :0

if there are j; i, such that Agattack;i reached the target spot in ej ; else

((e0 ; :::; ex ); pos

;

with pos1 success pos2 if fsuccess ((e0 ; :::; ex ); pos1 ) > fsuccess ((e0 ; :::; ex ); pos2 ).

bX 100c x=

fdist;i

((e0 ; :::; ex ); pos) =

j

=1

(

dist e100j ;

A

gattack;i

)

+dist(ex ; Agattack;i ) where dist(e; Agattack;i ) is the length of the shortest path created from the position of Agattack;i in e to the target spot (computed using path nding). Let pos1 dist;i pos2 if fdist;i ((e0 ; :::; ex ); pos1 ) < fdist;i ((e0 ; :::; ex ); pos2 ).

bX 100c x=

fhide;i

((e0 ; :::; ex ); pos) =

j

=1

(

ndist e100j ;

A

gattack;i

)

+ndist(ex ; Agattack;i ) where ndist(e; Agattack;i ) is the shortest distance between Agattack;i and any of the vessels in Atested in e. Let pos1 hide;i pos2 , if fhide;i ((e0 ; :::; ex ); pos1 ) > fhide;i ((e0 ; :::; ex ); pos2 ).

6

Chris Thornton

fhidesum

((e0 ; :::; ex ); pos) =

X n

=1

fhide;i

((e0 ; :::; ex ); pos)

i

let pos1 hidesum pos2 if fhidesum ((e0 ; :::; ex ); pos1 ) > fhidesum ((e0 ; :::; ex ); pos2 ). [2] also de nes ve di erent ordering structures, and performs a number of experiments on each. They also present some of the di erences in the attacks that the machine learner generates with each di erent ordering structure. The ordering structures they used are as follows: base hidebef ore hideaf ter hsumbef ore hsumaf ter

= (ffintercept g; ffdist;1 ; :::; fdist;n g; ffsuccess g) = (ffintercept g; ffhide;1 ; :::; fhide;n g; ffdist;1 ; :::; fdist;n g; ffsuccess g) = (ffintercept g; ffdist;1 ; :::; fdist;n g; ffhide;1 ; :::; fhide;n g; ffsuccess g) = (ffintercept g; ffhidesum g; ffdist;1 ; :::; fdist;n g; ffsuccess g) = (ffintercept g; ffdist;1 ; :::; fdist;n g; ffhidesum g; ffsuccess g)

3 Project Details After three years of work on this project, there are still a large number of areas that have yet to be explored. This project was to explore the e ects four modi cations to the evolutionary learner used in [2] on the systems ability to nd weaknesses, and how quickly it would nd those weaknesses. These modi cations were to investigate the use of new kinds of ordering structures, vary the base topology of the swarm, both from the beginning and during the learning process, and nally the e ect of applying targeted operators to an attack. [2] also used an unrealistic sensor model, where a defender vessel could maintain complete sensor coverage all around them. Instead, a model that more accurately re ects reality, where the vessels only can get valid sensor readings in an two dimensional arc that is in front of the boat, will be used.

3.1 The Weak Unordered Structure In 2.3, ordering structures use Pareto dominance when comparing the vectors of elements. This means that for two vectors fi = ffi1 ; : : : ; fiqi g and fj = ffj1 ,...,fjqj g where qi = ji , fi  fj if 9k such that fik  fjk and 8l, either fil  fjl or fil = fjl . Informally, this means that there is at least one element in fi that is better then it's corresponding element in fj , but no element in fi is worse than it's corresponding element in fj . However, it is not clear the second condition (the requirement that nothing is worse) is required. There may be some advantage gained by the learner if it considers solutions that have at least one element in fi that is better then it's corresponding element in fj , without constraints on the others. More formally, a new type of ordering structure can be de ned as using this criterion - (< f11 ,...,f1q1 >,...,< fu1 ,...,fuqu >), or (fw1 ,...,fwu ) for short. If  denotes the ordering that is created by this ordering structure, then

Title Suppressed Due to Excessive Length

pos1

 pos2 ; if pos1

7

f1

pos2 ; or =f1 pos2 and pos1 f2 pos2 ; or ::: or pos1 =f1 ;:::;fu 1 pos2 and pos1 fu pos2

pos1

where pos1  pos2 if 9j such that pos1 fij pos2 . Using this new structure, a number of new ordering structures can be de ned: w base w hidebef ore w hideaf ter w hsumbef ore w hsumaf ter

= (< fintercept >; < fdist;1 ; :::; fdist;n >; < fsuccess >) = (< fintercept >; < fhide;1 ; :::; fhide;n >; < fdist;1 ; :::; fdist;n >; < fsuccess >) = (< fintercept >; < fdist;1 ; :::; fdist;n >; < fhide;1 ; :::; fhide;n >; < fsuccess >) = (< fintercept >; < fhidesum >; < fdist;1 ; :::; fdist;n >; < fsuccess >) = (< fintercept >; < fdist;1 ; :::; fdist;n >; < fhidesum >; < fsuccess >)

These structures can be compared to their non-weak variants in regards to the average number of iterations it takes for a weakness to be found, and for the number of times that a weakness is able to be found.

3.2 Swarm Connectivity There are a large number of di erent ways to connect the swarm of particles together (See [4]). The implementation of the particle swarm in [2] has 20 particles connected in a ring. That decision was completely arbitrary, and no other connective structures were investigated. Using the same ordering structures as in [2], but increasing the connectivity between particles (see Figure 3), the eciency of the system can again be related to the less connected swarm.

Fig. 3.

Left: Depiction of original swarm. Right: Depiction of highly connected swarm

3.3 Modi cation of Swarm Topology Figure 4 is an example of an interesting result from [2]. The particle generated has an unique aw. Attacker 1 is moving too fast to avoid collision with the

8

Chris Thornton

Top Left: Attacker 1 approaches shore. Top Right: Attacker 1 is moving too fast to avoid crashing into the shoreline. Bottom Left: Attacker 2 makes a similar approach at a lower speed. Bottom Right: Attacker 2 makes it to the target by use of a `timing' based attack. Fig. 4.

Title Suppressed Due to Excessive Length

9

shore before making its next waypoint, and as such removes itself from the simulation, even though attacker 2 is able to complete its mission. In this instance, a modi cation to the learner should generate a new particle in the swarm that is identical to the awed particle, with the exception that the speed of the waypoint at the collision is reduced (thereby reducing the chance of a collision with the shore). Figure 5 shows an example of the newly inserted particle in the swarm. This new particle should then dominate its awed predecessor (since the rst member in the ordering structure counts the number of remaining intruders at the end of an attack strategy), and as such it should prevent the swarm from exploring the search space in an area that is not helpful.

Fig. 5.

Left: original swarm. Right: Swarm with modi ed member

3.4 Targeted Operator In nearly all cases, additional knowledge can be applied to a system to help increase performance. For example, when an attacking vessel reaches the end of its waypoint string, the implementation in [2] simply causes the vessel to stop. However, it makes sense for the last available intruder to make a high speed run on the target, since now the intruders have nothing to lose. This operator is applied independently of the attack strategy, since the decision on which boat will make the nal attack run is decided during execution of the simulation, and, as such, it is not captured entirely by the particle's position in the search space. Only with the combination of a patrol/interception policy can this be applied. It is expected that applying this targeted operator to simulations of attack patterns will greatly increase the completion rate and decrease the time needed to nd a weakness.

4 Results The experiments were run on a lab of 25 2Ghz iMacs over the course of 5 days. Due to the large number of experiments that needed to execute and time constraints, only ten to fteen experiments were performed for each of the one hundred di erent variations of the system, so results are not as conclusive. However,

10

Chris Thornton

they still are able to illustrate the di erences between the di erent variations of the system. For each modi cation, the results are compared side by side with the `base' system.

4.1 Base Results Due to the changes made to the sensor model in [2] to this version of the simulator, not all the trends observed are the same. Average Success Time (Iterations) Policy: pat int all pat Num. Attackers 2 3 2 3 base 25.85 21.44 17.54 17.33 hidebef ore 27.00 20.00 20.20 21.50 hideaf ter 18.17 16.17 22.44 25.13 hsumbef ore 36.33 18.67 14.50 18.67 hsumaf ter 17.50 21.40 16.18 23.36 Completion Rate (%)

Policy: Num. Attackers base hidebef ore hideaf ter hsumbef ore hsumaf ter

pat

int

2 3 77.77 90.00 36.36 20.00 60.00 66.67 30.00 33.33 90.91 45.45

all

pat

2 3 100 54.54 55.55 25.00 90.00 80.00 22.22 33.33 100 100

The rst of the above two tables shows the average number of iterations that the particle swarm took in order to nd a weakness in the various patrol policies with di erent numbers of attackers. The second table shows the percentage of all experiments that were able to nd a solution within one hundred iterations of the particle swarm. Since most weaknesses are found in the rst fty iterations, this resouce limitation is reasonable.

4.2 Weak Unordered Structure Average Success Time (Iterations) Base Weak Ordering Structure Policy: pat int all pat Policy: pat int all pat Num. Attackers 2 3 2 3 Num. Attackers 2 3 2 3 base 25.85 21.44 17.54 17.33 w base 35.75 34.81 33.75 39.61 hidebef ore 27.00 20.00 20.20 21.50 w hidebef ore 6.00 16.00 26.26 30.60 hideaf ter 18.17 16.17 22.44 25.13 w hideaf ter 32.60 34.76 30.95 49.21 hsumbef ore 36.33 18.67 14.50 18.67 w hsumbef ore 17.60 13.60 24.16 14.88 hsumaf ter 17.50 21.40 16.18 23.36 w hsumaf ter 33.76 36.78 36.40 33.20

Title Suppressed Due to Excessive Length

Policy: Num. Attackers base hidebef ore hideaf ter hsumbef ore hsumaf ter

Completion Rate (%)

Base pat

11

int

2 3 77.77 90.00 36.36 20.00 60.00 66.67 30.00 33.33 90.91 45.45

Weak Ordering Structure Policy: pat int all pat 2 3 Num. Attackers 2 3 2 3 100 54.54 w base 100 91.67 92.31 100 55.55 25.00 w hidebef ore 7.17 28.57 46.66 33.33 90.00 80.00 w hideaf ter 83.33 94.44 100 100 22.22 33.33 w hsumbef ore 31.25 31.25 35.29 47.06 100 100 w hsumaf ter 85.00 70.00 95.24 95.25 all

pat

For the most part, the weak unordered structure causes solutions to be found in a larger number of iterations, however it increases the number of attacks that succeed. This is likely due to the fact that the weak unordered structure is more likely to dominate another structure (since all elements of one structure will need to be worse than another for it not to dominate). During an update phase, all of the particles that dominate are used when computing the forces that act on a particle. As such, each particle in the swarm is under the in uence of more con icting forces than when compared to the normal unordered structure, so the force is lower. Thus, the swarm will move slower towards a solution, but it explores the local area more thoroughly, causing a higher success rate.

4.3 Swarm Connectivity Average Success Time (Iterations) Base Weak Ordering Structure Policy: pat int all pat Policy: pat int all pat Num. Attackers 2 3 2 3 Num. Attackers 2 3 2 3 base 25.85 21.44 17.54 17.33 base 26.07 29.00 25.86 26.96 hidebef ore 27.00 20.00 20.20 21.50 hidebef ore 32.80 33.00 33.00 34.00 hideaf ter 18.17 16.17 22.44 25.13 hideaf ter 22.78 25.95 28.63 22.63 hsumbef ore 36.33 18.67 14.50 18.67 hsumbef ore - 41.33 28.33 22.00 hsumaf ter 17.50 21.40 16.18 23.36 hsumaf ter 25.78 25.14 24.66 26.18 Policy: Num. Attackers base hidebef ore hideaf ter hsumbef ore hsumaf ter

Completion Rate (%)

Base pat

int

all

pat

2 3 2 3 77.77 90.00 100 54.54

36.36 20.00 55.55 60.00 66.67 90.00 30.00 33.33 22.22 90.91 45.45 100

25.00 80.00 33.33 100

Increased Swarm Connectivity Policy: pat int all pat Num. Attackers 2 3 2 3 base 84.37 87.5 96.95 93.93 hidebef ore 14.70 9.09 35.48 25.81 hideaf ter 76.00 88.00 84.61 73.07 hsumbef ore 10.00 22.22 30.76 hsumaf ter 76.00 91.66 84.00 88.00

When the swarm connectivity was increased, hsumbef ore did not produce a single weakness. The unmodi ed learner was clearly far better at nding weaknesses in a shorter number of iterations, however there is no clear advantage

12

Chris Thornton

to using either the base system or the increased connectivity when it comes to measuring the completion rate. These two observations are possibly due to the fact that the larger swarm connectivity causes the particle that is deemed 'best' to constantly change for each update. The particle will then jump around the search space since it is not pulled consistently in one direction. This erratic movement means that the system takes longer to nd a weakness, but it still has the same chances of nding one.

4.4 Modi cation of Swarm Topology Average Success Time (Iterations) Base Modi ed Swarm Topology Policy: pat int all pat Policy: pat int all pat Num. Attackers 2 3 2 3 Num. Attackers 2 3 2 3 base 25.85 21.44 17.54 17.33 base 26.94 26.31 27.84 25.67 hidebef ore 27.00 20.00 20.20 21.50 hidebef ore 33.00 - 36.00 15.00 hideaf ter 18.17 16.17 22.44 25.13 hideaf ter 17.25 22.00 23.00 26.60 hsumbef ore 36.33 18.67 14.50 18.67 hsumbef ore 31.00 18.00 34.00 20.25 hsumaf ter 17.50 21.40 16.18 23.36 hsumaf ter 23.29 26.17 21.88 21.75

Policy: Num. Attackers base hidebef ore hideaf ter hsumbef ore hsumaf ter

Completion Rate (%)

Base pat

2

int

3

all

pat

Modi ed Swarm Topology Policy: pat int all pat Num. Attackers 2 3 2 3 base 72.00 61.90 72.22 75.00 hidebef ore 23.53 - 27.27 10.00 hideaf ter 50.00 62.50 100 62.5 hsumbef ore 20.00 22.22 12.50 57.14

2 3 77.77 90.00 100 54.54 36.36 20.00 55.55 25.00 60.00 66.67 90.00 80.00 30.00 33.33 22.22 33.33 90.91 45.45 100 100 hsumaf ter

77.78 75.00 100 100

Surprisingly, inserting the modi ed particle into the swarm did not substantially increase either the completion rate or the average time it takes to nd a weakness. This might be due to the swarm's momentum that can not easily be swayed by adding one additional particle - many particles that have this weakness xed may have to be added. Similarly, by adding the extra particle, it might be the case that the two particles in close proximity in the search space act as an anchor for the swarm, trapping it so it can not move on to a solution. Future work with the system should investigate further why this is the case.

Title Suppressed Due to Excessive Length

13

4.5 Targeted Operators Average Success Time (Iterations) Base Targeted Operators Applied Policy: pat int all pat Policy: pat int all Num. Attackers 2 3 2 3 Num. Attackers 2 3 2 base 25.85 21.44 17.54 17.33 base 11.31 9.85 11.34 hidebef ore 27.00 20.00 20.20 21.50 hidebef ore 16.76 13.00 16.54 hideaf ter 18.17 16.17 22.44 25.13 hideaf ter 11.96 10.10 11.10 hsumbef ore 36.33 18.67 14.50 18.67 hsumbef ore 20.44 19.21 19.44 hsumaf ter 17.50 21.40 16.18 23.36 hsumaf ter 11.56 11.12 11.69 Completion Rate (%) Targeted Operators Applied Policy: pat int all pat Policy: pat int all pat Num. Attackers 2 3 2 3 Num. Attackers 2 3 2 3 base 77.77 90.00 100 54.54 base 100 100 100 100 hidebef ore 36.36 20.00 55.55 25.00 hidebef ore 97.91 100 100 100 hideaf ter 60.00 66.67 90.00 80.00 hideaf ter 86.53 100 100 100 hsumbef ore 30.00 33.33 22.22 33.33 hsumbef ore 100 100 100 100 hsumaf ter 90.91 45.45 100 100 hsumaf ter 100 100 100 100 Base

Targeted operators without a doubt provide a huge boost in both the average search time, as well as the completion rate. However, when the weaknesses that are generated are analyzed, they are almost all of a timing based attack (where the intruder slips between the patroler(s) at the mouth of the harbour without being spotted). These types of attacks are useful, but easily countered with more sensor platforms. As such, it is likely that if the timing based weaknesses are plugged, the advantage gained by using this targeted operator will diminish.

5 Conclusion Of all the di erent variations on the system, targeted operators make the largest di erence when comparing the eciency of the system from [2]. Other methods, in particular the new weak unordered structure, can increase the completion rate by a small margin. However, changing the connectedness of the swarm or inserting new particles while the system is learning appear to have little e ect on the outcome. These last two results remain slightly at odds with their expected outcome, requiring further study to determine why exactly they do not improve performance.

References 1. Flanagan, T., Thornton, C., Denzinger, J.: Testing harbour patrol and interception policies using particle-swarm-based learning of cooperative behaviour. In: Proc. CISDA 2009. (2009) 1{8

pat

3

10.23 15.23 10.34 18.70

10.98

14

Chris Thornton

2. C. Thornton, T. Flanagan, J.D., Boyd, J.: Testing harbour patrol and interception policies using particle-swarm-based learning of cooperative behaviour. In: To appear in Proc. CISDA 2011. (2011) 3. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proc. IEEE ICNN 1995. (1995) 1942{1948 4. Reyes-Sierra, M., Coello, C.C.: Multi-objective particle swarm optimizers: A survey of the state-of-the-art. Int. Jour. Comp. Int. Res. 2(3) (2006) 287{308