Sadly, when I heard about this event last year, it was only a handful of days before the closure of the submission period and I did not have time to join the competition. I was however on time to participate as a judge and seeing all these videos was the final push I needed to start thinking about my entry for the next year.

I am very passionate about making math more accessible and have already been making “mathematical videos” for about three years now. So far, I had mostly created videos to enhance my academic presentations and make some of the concepts I describe easier to understand. These videos are usually short (less than 30 seconds), without sound, and simply focusing on a specific property. For example, I have a video showing how counting visits of a Markov chain gives a value close to the stationary measure or how we can see a transition of behaviour for particle systems as the temperature increases.

With this pre-existing interest and experience in making videos about mathematical concepts, I saw this competition as a way to challenge myself and to develop new video-editing skills, while also reaching a larger and more diverse audience. For this reason, I wanted to participate in this event the moment I heard about it. I just waited a year before having the opportunity to do so.

While all levels of difficulty are accepted, I personally wanted to create a video whose topic was accessible but whose solution was not easily guessed. Working in probability and being particularly interested in the relationship this field has with many concrete examples, I wanted to talk about a “relatable” random phenomenon. In the end, I decided to cover properties of randomly moving chess pieces, and more precisely the number of moves it takes them to return to their original position when moving around the board.

I chose this topic because I believed it fit my two main goals: accessibility and complexity. Accessibility, since the idea of a chess piece moving around the board in a random way seems simple enough. Complexity, since the question usually first appears as complicated and without clear ways on how to tackle it. As a bonus point for choosing this topic, the final solution is actually quite easy to compute and based on a sequence of rather simple ideas.

With the topic chosen, I just had to present it properly.

In general, I would advise anyone who wants to join such a competition to use whatever skill they already have: coding, drawing, using animation software, facing the camera… or any combination of these! I was personally most comfortable with coding animations, so this is the approach I used and the one I will detail now.

My process for making the video can mostly be split into three steps.

- First, I wrote a draft of the video I wanted to make. I spent about a week on this step, to review the story with fresh eyes a couple of times and discuss it with people around me. At the end of this step, I had a document with the text I wanted to read and notes on what should happen in the video at the same time.

- Second, I coded all the different parts of the video. I started this step in parallel to the previous one, as I already knew some of the animations I wanted to make, and spent about two weeks on it. At the end of this step, I had a set of python scripts that could create all the frames of the video and that contained a lot of parameters to be fine-tuned.

- And third, I recorded myself and edited the video. I spent a couple of days on this step, cleaning and organising the recordings, tuning the code to match the speed of the frames with the pace of my voice, and adding the subtitles. At the end of this step is the final video that I submitted to the competition.

A lot! I would definitely recommend anyone interested in participating in this competition to do so. It was a very fun challenge to create a (longer) video explaining math topics from start to finish. I also now have an example of what I am able to make and see what are my strengths and weaknesses: I am quite happy with the final quality of the animations and their pacing, but need to improve on the audio part of the video.

In the coming months, I intend to improve my coding practices when creating such animations and work on the quality of my audio recording skills. Hopefully, by next year I submit an even better entry to the competition, as I already have a possible topic in mind… For now, enjoy the video!

]]>*We will use the standard Monopoly set from the US from 2008 onwards*.

For example, in Monopoly we throw with two dice, not just one! This means that the probability distribution for each turn won’t be uniform anymore. The probability of throwing 2 is , by throwing two 1’s. The probability of throwing a 7 is much higher, , since we can throw 7 in six different ways. Can you list all possible ways to do this?

Now let’s construct the well desired matrix . There are a total of 40 spaces on a Monopoly board, and we treat the Jail and Just Visiting space as the same one. Let’s first construct the first row of .

This row corresponds with the probability distribution when starting on the Go. A first attempt could be the following:

But here we neglect to include an important feature in Monopoly: the Chance and Community Chest cards. Usually the cards order you to pay a small fee, or maybe receive a prize, but sometimes it tells you to move to another space. We have to take this into account when constructing .

There are many editions of Monopoly, but we will use the standard Monopoly set from the US from 2008 onwards. In that set, there are nine Chance cards that advance you to another square. This can be anything from *Advance to Go (Collect $200)* to *Go Back 3 Spaces*. There are only two Community Chest cards which move you to another space: *Advance to Go (Collect $200)* and *Go directly to jail, do not pass Go, do not collect $200*. There are a total of 16 Chance cards, and also a total of 16 Community Chest cards.

How does this affect the first row of our matrix? Square 2 is a Community Chest square and Square 7 is a Chance square. These are the only squares we can reach from the Go square where we have to pull a card.

Let’s tackle the Community Chest first. How big is the probability that we end up there, if we’re starting from Go? Well, first we have a probability of throwing 2 and actually landing on the Community Chest. Then we grab a random card from the pile. What happens next? Either we get a card from the pile instructing us where to go next, or it’s telling us to pay something, and stay put. There is a chance that the card tells us to stay put, a chance that you have to go to Jail and a chance that you have to advance to Go. As an exercise, verify these probabilities on your own (you can use this list with all the Community Chest and Chance cards).

What does this mean for our matrix? We have just seen that the probability of landing on Community Chest and then staying there is . Then we also have a chance that we end up in Jail and a chance that we get back on Go again. Notice that there was already a chance of that we end up on the Jail square (by just visiting), so the total probability of ending up on the Jail square becomes In the matrix this looks like this:

You might have noticed we are ignoring an exciting rule in Monopoly. Throwing doubles! If you throw double, you are allowed to throw again, but if you throw double three times in a row, you have to go to jail. We can actually model this in the Markov chain by having three possible states for every square, resulting in a matrix. We won’t explore this version of the Markov chain, but you can try to write it down yourself!

After calculating all the other rows of the transition matrix, we can construct the transition matrix . This quite some work, but behind all beautiful and interesting things there is always a lot of work and effort. "There is no royal road to geometry", as Euclid said when asked if there is an easier way to master geometry than reading the *Elements*. This article is just the very final result of all this work, so don't be deterred by how much work it is to set up such matrices, once you have them you can do many nice and interesting computations with them. Let's do some computations!

One piece of information that is useful to know is what squares have the highest probability of being visited, not just after one throw, but throughout the entire game. Let’s look at the probability of landing on Go. We know that the probability of starting at Go and ending up on Go is . We can add to this the probability of going from Mediterranean Avenue to Go. But we also need to add the probability of going from Community Chest to Go. In other words, we need to add all probabilities of going from any square to Go, which is the same as adding up all probabilities in in the column under Go. We can plot this in a bar chart, so we can interpret these probabilities a bit easier:

The probability to land on the In Jail/Just Visiting square is huge! This is not really surprising though, since you never end a turn on the Go To Jail square, so it is essentially two squares in one. Let's not bother too much about this. Look at **Illinois Avenue**! It is the most visited property on the Monopoly board. Surely that means that it’s the best property to buy, right? We will see if that’s the case in just a moment, but let’s first look at the rest of the graph.

We also see that the expensive **Park Place** has a relatively low probability of being visited, about the same as **Mediterranean Avenue** and **Baltic Avenue**. So which one should we have our eyes on? We need a way to factor in money!

The probabilities of landing on **Park Place** or either **Mediterranean Avenue** and **Baltic Avenue** are similar, but clearly **Park Place** is more profitable, since if somebody lands on that, we get more money. We can calculate the expected income by multiplying the probability of landing on a square by the rent somebody has to pay when they land on it.

And look at that, **Park Place** and **Boardwalk** shoot out as the properties with the most expected income! Other than that, the train stations are also looking really good. This effect disappears when we look at the expected return of *fully developed properties*, which is something we will look at in just a moment.

When someone lands on your Electric Company or Water Works, they either pay 4 or 10 times the value of their dice throw, depending on how many Utility properties you own. To calculate the expected rent per turn, we assume you own both the Utilities, so that the 10 times multiplier is always used. What makes calculating the expected return difficult, is that it depends on the dice throw of your opponent, which can be different every time.

To solve this problem, we make a table. First we list all possible dice throws that somebody could throw to land on our Utility square. Underneath that, we list the probabilities of throwing these dice values. The probability of throwing a value of 3 is , since out of all 36 possibilities, there are two ways of throwing a 3: first a 1 and then a 2, or throw a 2 first and then a 1.

Underneath that we list the amount of money somebody has to pay when they land on your Utility square. This is simply 10 times their dice throw. After that we can calculate the expected rent by multiplying the probability by the rent to pay. After that we can add all these values, to obtain an expected rent of $70. Can you calculate the expected rent when you only own one of the utilities?

Dice value | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |

Probability of value | |||||||||||

Rent to pay | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | 110 | 120 |

Expected rent |

*Table: Amount of money somebody has to pay when they land on your Utility square*

When another player lands on a street you own, they have to pay you rent. The amount of rent depends on how many houses and hotels you have built on that street. If another player lands on your square with a hotel built on it, great news for you! They have to pay you a lot of money. Since you cannot build more on this street, we say this property is fully developed.

Stations and Utilities can also be fully developed. This just means that you own all four Stations, or both of the Utilities.

It seems that the further along you get across the board, the more money you can expect to earn. But the properties also get more expensive to buy the further along the board you go, as well as houses and hotels. A house on **Baltic Avenue** or **Vermont Avenue** costs only $50, while a house on **Pacific Avenue** or** Boardwalk** costs a whopping $200. In other words: we have to take the expenses into account. For this we can use the expected return of a property:

Doing this for all properties which are fully developed gives us the following histogram

Suddenly the **Deep Blue** properties don’t seem so attractive anymore! Especially **Park Place**, which is one of the squares with the least amount of return of all properties, only behind **Mediterranean Avenue**. Out of this a new champion emerges! The orange properties are now looking very attractive. We already saw that the probabilities for these properties were relatively high, and it seems that **New York Avenue** hits the sweet spot when combining this with the lower prices.

Of course, this all holds for fully developed properties, when in reality one rarely builds hotels in the game. How does this compare to all the different stages of development?

The clear standout here are the **Orange **properties, but we also see a surprising second place for **Light Blue**, especially when building more than 2 houses. The **Green** and **Deep Blue** now look like terrible investments! When building more than three houses, the expected return actually goes down. To see this more clearly we can look at the change in return:

What do we see in this histogram? At '0 Houses' we see the same data as in the histogram before: what is the expected return of all colors. At '1 House', we get the *change* in expected return: how much extra return we can expect if we go from 0 houses to 1 house. Similarly, the bars above '2 Houses' represent how much extra return we can expected if we go from 1 house to 2 houses on a property. In other words, this graph answers the question: how much does the expected return go up if we build another house or hotel.

How can we use these numbers to win our Monopoly at our next game night? Go for **Orange** and **Light Blue**, don’t bother with **Brown**, **Green **and** Deep Blue**. When playing a board game which involves dice or probability, try to see if you can call Mr. Markov for help!

Lizzie Magie (1866-1948) was an American progressive writer, feminist and game board designer who was dissatisfied with the inequality between men and women in society. She was also a Georgist: somebody who believes ownership of nature is not possible; it should belong to everyone, equally. Magie wanted to create a board game that promoted the principles of Georgism. A game she would call ’The Landlords Game’.

The Landlords Game is what we now know as Monopoly, but Magie actually included two sets of rules. One set emulated the Georgist system, where players would try to produce equity. Wealth was evenly distributed, and winning was a collaborative effort. The other set of rules emulated a capitalist system, with the rules of Monopoly as we know them today. You can only win by crushing your opponents and obtaining a literal monopoly. Magie tried to sell the game to Parker Brothers, but they declined deeming the game 'too political'. Darrow learned about The Landlords Game through a friend and sold the game to Parker Brothers in the 1930's, only including the capitalist set of rules.

Monopoly has now sold over 275 million units worldwide, making it one of the most popular board games of all time. I am sure that the best strategy to win Monopoly has been discussed and debated on many game nights. Luckily, we can use mathematics to obtain the optimal strategy, so you can use it to beat your family and friends in Monopoly!

But before we explore the optimal strategy of Monopoly, we have to look at another classic: Snakes and Ladders.

Snakes and Ladders is easier to analyze than Monopoly for two reasons. Firstly, it is less complex than Monopoly. There are no Chance or Community Chest cards, and there is no jail. Secondly, Snakes and Ladders is a game of pure chance: there is no strategy involved. The only way to win the game is to simply be lucky. This makes for a game without any strategy, which makes it easier to analyze. This analysis will give us some tools (spoiler: *probabilities*, *matrices* and* Markov chains*) to explore the strategies of Monopoly later on. And in case you are in a phase of orienting for a study in a technology or science related field it is good to know that the mathematics we will show is very important and is typically discussed in the first year of such studies ;).

Snakes and Ladders (or sometimes Chutes and Ladders) is a game for two or more players played on a 10x10 grid where every square is numbered. Since the two players don’t affect each other’s gameplay, we might as well analyze the game when a single player is playing it.

Since Snakes and Ladders is a game of pure chance, there is no strategy to the game, only hoping. Then what is there to analyze? Well, we can calculate how long we expect the game to go on for: how many turns do you need on average

to get to square 100 and win Snakes and Ladders? This analysis will give us some tools, which we can later use to determine the best properties to buy in Monopoly!

In Snakes and Ladders we use a single die. This means that every number from 1 to 6 is equally like to come up when we throw it. Since there are a total of six possible outcomes on a die, they all come up with probability . We call this a *uniform probability distribution*. All possible outcomes of the die are equally probable.

The game always starts on square 0 (outside the board). This means that after turn 1, Player 1 has a probability of to end up on square 1, a probability of to end up on square 2, etc. But there are ladders on some of these squares! There is a ladder from square 1 to square 38, and a ladder from square 4 to square 14. Since we immediately go up a ladder when we land on it, we never end our turn on square 1 or square 4. So if we start the game then we actually go to square 38 with probability and to square 14 with probability . We never end a turn on square 1 or square 4, so the probabilities to end up there are 0.

We can summarize all this information in one big table, a matrix. This matrix will contain all probabilities from going to a certain square to another. In the case of Snakes and Ladders, there are 101 possible positions, since we start on square 0 and end on square 100. This means we need to build a matrix, where the entries encode information of going from one square to another. For example, the first row of this matrix will contain all probabilities of going from square 0 to all other squares, which we discussed above. And the second row will contain all probabilities of going from square 1 to all other squares. But since square 1 has a ladder taking us to square 38 all these probabilities are 0. If we hit square 1 we take the ladder to square 38.

0 | 1 | 2 | 3 | 4 | 5 | 6 | ... | 14 | ... | 38 | ... | 98 | 99 | 100 | |

0 | 0 | 0 | 1/6 | 1/6 | 0 | 1/6 | 1/6 | ... | 1/6 | ... | 1/6 | ... | 0 | 0 | 0 |

Then, in the next turn, row 38 gives the probabilities for this square:

0 | ... | 39 | 40 | 41 | 42 | 43 | 44 | ... | 99 | 100 | |

38 | 0 | ... | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 | ... | 0 | 0 |

Similar to ladders you can also end up on a square with a snake, which takes you back to a previous square. Using the game board above, can you fill in the tables for squares 5 and 59?

6 | 7 | 8 | 9 | 10 | 11 | 12 | ... | 10 | 11 | 12 | ... | 98 | 99 | 100 | |

5 | ... | ... |

18 | ... | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | ... | 97 | 98 | 99 | 100 | |

59 | ... | ... |

If we do this for all squares on the board, we get the huge matrix below! One important rule is that to finish the game, you have to throw the exact number to get to 100. If you throw more, you stay where you are. This is why the probability to go from square 99 to square 99 is . Can you see why the entire row corresponding to square 97 is all zeroes?

*The matrix contains all probabilities to go from square to square . This probability is found at position . *

This is all nice and interesting, but what can we do with this? The matrix P is called the *transition matrix*, and it represents the probability distribution of a single turn, i.e. the probability of going from any square on the board to any other square. However, in the game we throw the die many times: how do we model that? For people working in disciplines where mathematics plays an important role matrices are very familiar, they are like generalizations of numbers. An important property of matrices is that we can define an addition and a multiplication, just like with numbers. You don't need to know how these work, just that it is possible. But if you are curious and want to see how it works don't hesitate to have a look, it is a little bit technical but doable. And since multiplication of two matrices can be defined, it means that also taking powers of a matrix can be defined! These powers give the full solution to how a game will evolve!

It turns out that if we square this matrix, we get the probability distribution of two turns! In general, if we want to know what the probability is from going to square to square in turn , we can look at the matrix , and look at the number in row and column .

How can we use this to calculate the average number of throws needed to finish the game? We know that represents the probability distribution of the first throw. We can look at the entry corresponding to going from square 0 to square 100. This is the number in the first row and in the last column. Obviously this probability is 0, since we cannot go from square 0 to square 100 in a single throw.

But what about two throws? As we have seen previously, if we square the matrix, we get information about the probability distribution of two turns. We can look up the entry corresponding to going from square 0 to square 100, but will also be 0, since we cannot go from square 0 to square 100 in just two turns. You can maybe try to figure out yourself what is the least amount of turns in which a player can complete the game.

We need to keep going, now we have to cube the matrix to see the probability distribution of three turns. After that we also need four turns, and five turns, and six turns. This is in infinite amount of work! Luckily, there’s a neat mathematical trick that can help us.

**Intermezzo - Did Mr Markov play Snakes and Ladders?**

To compute how many rounds a player needs, on average, to reach the final square I rely on two very important fields within modern mathematics, linear algebra and Markov chains (named after Russian mathematician Andrey Andreyevich Markov (1856–1922)). Linear algebra is the area in mathematics studying matrices, like above. Markov chains actually relies heavily on linear algebra and probability theory, but because of the rich ideas it has produced, and its importance in various other fields and applications it has become a research area on its own. All the results I will use later originate from the field of Markov chains. Although I find it really exciting to discuss all of these results in detail, I will just mention them and you have to believe me they are true. Markov chains are so interesting that learning more about them is motivating enough to study mathematics! If you are interested you can have a look at this article written by Nelly Litvak.

Before the intermezzo, we had the matrices which give the probabilities you reach square starting from square in turns. What we want is the expected number of turns to reach square 100 starting from square 0. To find this we need all the probabilities that you can reach square 100 in steps, these are exactly the numbers in row 1 and column 101 of the matrices . We denote these probabilities by

Mathematicians working on Markov chains have proven that the expected number of turns to reach square 100 starting from square 0 is equal to

Such results carry the name times to absorption if you want to read further. But how can we compute this number? In the denominator we need to add infinitely many probabilities! Here comes the creative idea offered by linear algebra!

First we rephrase the problem slightly. Instead of looking up the correct probability in all the matrices, and so on, separately, and adding them, we can also first add all these matrices, and then look up the correct probability! So we need to calculate

and then look at the value in the first row and in the last column. You will wonder, instead of adding up infinitely many probabilities you are adding up infinitely many matrices. This is still a lot of work to do. But watch what happens when we multiply everything by ( is the identity matrix, don't worry too much about it, think about it as the 1 for matrices. When you multiply a matrix with you get the same matrix).

With some rearranging, we get

Again don't worry too much about what happened here, linear algebra is very clear when we can "divide" with a matrix, as we did here. So instead of adding an infinite amount of matrices, we can simply calculate the matrix , invert it, and multiply this inverse with . Then we look up the entry in the first row and last column of that matrix. To do this by hand would take a lot of time. Luckily for us, computers can do it faster. After crunching the numbers, the computer gives us

which is approximately 39.2. This means that, on average, a game of Snakes and Ladders takes 39.2 throws! Thank you, Mr. Markov!

To see how we can use these tools to come up with the best strategy for Monopoly, come back next week!

]]>Maya was looking out of the window as the train departed Rotterdam central station, she left her imagination travel while she was just looking at the blue sky and the green scenery. Suddenly she thought of her puzzles. She recalled a tricky puzzle from her childhood called three utilities problem. Her grandmother used to give her such puzzles for fun. She was a mathematician and every now and then she would share some new discoveries she made in her research. In this puzzle you have three vacation houses, and each house needs to be provided with water, electricity and gas. For each of these facilities, you have a provider which can supply you with it, but you need to connect your house to the facility source with a tube. And as tubes are quite massive, they cannot be either burrowed or lifted above the other ones. So, to solve it you need to connect each house to each facility with a line, and these lines should not intersect.

In terms of Maya’s diagrams, you have two sets of three points, and each point should be connected to all the points from the different set. But there shouldn't be any connections between points in the same set. Even then she knew that this puzzle is impossible, but she was never completely sure about it. Maybe now she can dispel her doubts.

**Three utilities problem**

The graph related to this problem is called the *complete bipartite graph *with 3 and 3 vertices. The term bipartite means that you can separate the points of your graph into two sets in such a way that every connection is connecting points from different sets. In the case of the ‘three utilities problem’, the first set contains points which denote houses, and the second contains points which denote utilities. The term *complete *in this case means that all pairs of points from different sets are connected with a connection.

Now Maya has 6 points, and she needs to add 9 connections which shouldn't intersect each other. She started looking in her notes from the trip to Rotterdam, she knew she had found a nice result she could maybe use to solve the problem.

"Arghh, where did I write that down, I remember I had found something about the maximum number of connections you can add if there are no intesections. Yes! Here it is, how was it again?"

Maya started reading her previous notes and trying to remember what she had done.

"Right, the amount of connections can't exceed 3 times the amount of points minus 2. Let's see what we get for this graph. The maximal amount of connections possible is , which is larger than the 9 connections I want to add in this case. Hm, weird. This doesn't seem to help. According to this result it could be possible."

Maya was stuck. It seems that the approach she just invented does not work for this case. So, she comes back to drawing, maybe there is something that she missed (in the following pictures, all the utilities are marked in blue and all the houses are marked in red).

Indeed, after some attempts Maya realizes that she cannot draw a face with exactly 3 connections around it. And there is a reason behind it. If you follow the boundary of a face, the points which represent houses and facilities should alternate, as the connections link only houses with facilities. Having in mind this observation, Maya notices that for this particular diagram, each face needs not 3, but at least 4 connections around it! So, the amount of faces cannot exceed a half of the amount of connections (try to prove this yourself by following the ideas of our previous article).

"This is much beter! In my previous puzzle I had that the amount of faces cannot be larger than 2/3 of the number of connections. Using the additional property that this graph should be bipartite makes this estimate better! Let's see what I get now in Euler's formula."

Using Euler's formula means that the maximal amount of connections in this case is at most 2 times the amount of points minus 2. For 6 points you have which is smaller than the required 9! It provides Maya with the complete argument why the "three utilities problem" puzzle is unsolvable.

However, this second puzzle raised a question: is the recently discovered formula enough to test if a particular diagram can be represented without intersecting connections or not? Thinking about this question for a while, Maya faces the main problem. Even if she manages to construct a diagram for which Euler’s formula will not lead to a contradiction, she still needs some different arguments to be sure

that the representation of this diagram is impossible. The solution she finds was quite elegant.

The diagram cannot for sure be represented if at least some part of this diagram cannot. Indeed, knowing that it is impossible to draw five pairwise connected points, you for sure cannot draw six. With this insight in mind, Maya easily manages to find an example of a diagram which is impossible to represent, but this fact cannot be derived from Euler's formula. You can simply take 6 points, connect five of them to each other, and connect the last one to two of the others.

This diagram contains 6 points, meaning that the maximal possible amount of connections is , which is exactly the amount of connections. But as Maya knows, such a diagram is impossible to draw. It contains a part, 5 pairwise connected points, which you cannot draw. This example shows Maya that Euler's formula is not that reliable, but she finds a different rule.

A diagram is impossible to draw if it contains either a diagram of 5 connected points (the graph of the first puzzle), or a diagram with two sets of three points, each connected to all points from the other set (the graph of the "second puzzle"three utilities problem").

“Is that all?”

A question arises in Maya's head at the same time when the Antwerpen Central Station was announced.

“Are these really the only two reasons why a diagram cannot be represented without intersections?”

She gets off the train to walk a little bit around, deeply in her thoughts. During her ten minutes stop in Antwerpen she tries to imagine different diagrams, maybe she can find such a graph that is impossible to draw, but does not involve any of two forbidden patterns. And it seems that she finds one. After getting back on the train, Maya draws the following diagram.

It definitely does not have any of the two forbidden diagrams Maya has in mind, but it looks so much similar to the first one, so there is no chance it can be drawn without intersections. Or can it? Looking closely at this diagram, Maya notices that it actually contains the “houses and utilities” diagram, but some connections are interrupted by some different point.

This example gives Maya another insight. If one of your points is connected only to two of the others, you can erase this point, saving the drawn connections - then the two points which were connected to this point appear to be connected to each other, and the erased point disappears.

As you can see, such an action does not change the diagram in a sense of its picture, but does in fact change the diagram itself. It means that if the initial diagram is possible to draw without intersections, the new one, which appears after erasing some points, is possible to draw as well. And what is even more useful, Maya can combine her two ideas: first delete some connections from your initial diagram and only after that erase some points.

In this case it means that if the diagram on the left is possible to draw without intersections, then it is possible to draw the diagram on the right without intersections as well. But Maya knows that the diagram she obtains is the three utilities diagram, which cannot be drawn without intersections. Hence, the diagram on the left cannot be pictured in this was as well!

This idea gives Maya a new procedure of building a non-drawable diagrams. Instead of simply adding points and connections to her forbidden diagrams, Maya can come up with a graph from which, after erasing some points she can get either the 5 pairwise connected points, or the three utilities diagram. But is there a simple way to create such graphs?

Apparently, there is. You simply need to invert the erasing procedure. You may notice that after erasing a point you obtain a simple connection. Meaning, if you want to go backwards you can choose a connection and interrupt it with a point. And you can in fact do such interruption not only once, but many times.

Here you need to be a bit careful as you should do it only for one connection. Remember that you are allowed to erase only those points which have just two connections. If you try to put a point exactly on the intersection of two different connections, such a point would have 4 connected points, meaning that you are not allowed to erase it.

Combining these three actions: adding new points, adding new connections and interrupting the connections, Maya manages to build enormous amounts of diagrams which are for sure impossible to draw based only on the two forbidden diagrams.

This thought sounds to Maya a bit heavier than the previous one, but after spending some time and energy she convinced herself about it - at the end of the day, there are only two principally different patterns which prevent you from drawing a diagram, the rest are just their variations. Can there be anything else?

Searching in the internet Maya indeed finds the following fact:

*Every non-planar graph should contain either a subdivision of the complete graph with 5 vertices, or a subdivision of the complete bipartite graph with 3 and 3 vertices.*

This fact is called Kuratowski’s theorem, and it is extremely remarkable result in the mathematical field of graph theory. In terms of Maya's diagrams it means exactly what she just formulated from her observations: the diagram is impossible to draw without intersections only if it either contains an ’interrupted’ diagram of five pairwise connected points, or an ’interrupted’ diagram from the ‘three utilities problem’!

The train arrives at Brussels central station. Finally, Maya gets to her destination. She is extremely proud of herself, she just managed to complete not only the journey from Cologne to Rotterdam and then to Brussels, but also the whole path from a simple question about five bordering countries to the complete and definitive answer about the possibility of drawing diagrams without intersections. With this feeling she heads toward the Atomium to enjoy its architecture!

]]>Maya likes travelling. This year she spends her vacation in Germany. She has already been to Berlin and Frankfurt, and now she is in Cologne. Walking around after visiting the monumental Cologne Cathedral and looking for some close attractions, she sees pictures of Atomium in Brussels and Markthal in Rotterdam. They immediately attract her attention. She thinks why not to go there right now, both these cities are close to each other, and it should not be a problem to go there by train. So, she comes up with a plan: first go from Cologne to Brussels to visit Atomium, and then from Brussels to Rotterdam for the Markthal.

However, when she came to the train station, she realized that all trains to Brussels are cancelled due to strikes. Trying to solve this issue, Maya comes up with an idea: she can do the same trip but reversed. Going from Cologne first to Rotterdam, and then to Brussels she fulfils the same goal. And this reverse way is possible because all three countries border each other. Reflecting on this idea, she became more ambitious. Maybe, next time she can visit four different countries, or five, or even more? But this situation taught her a lesson. Next time she would like to be as much independent of the unreliable between-country traffic as possible.

If she wants to go from one country to another, her way should not go through any third country. And this rule should be applied not only to her original route, because in this case she can get stuck, but to all possible travels between countries. In fact, it means that any two of these countries should share a common border. In this case, even if her initial travel plan will be ruined by any reason, she still manages to change the order of the countries she wants to visit, and continue her journey. But first she should buy a ticket to Rotterdam.

Taking her seat in the train, Maya starts scrolling maps in curiosity of how many such countries she could find. Finding four countries where any two of them share a boarder does not take much time, they are located right next to her: Germany, France, Belgium and Luxembourg. However, finding five is much more challenging. Ensured that there are no such countries in Europe, Maya proceeds to scan the other areas. Though, even passing through all the other continents, she manages to find one more example of four countries, namely Brazil, Bolivia, Paraguay and Argentina (fun puzzle: open a map and try this yourself, you will definitely improve your geography skills!). But she can't find five countries! No matter how much she looks. At this point she starts wondering if such a configuration is actually possible. So, she begins drawing.

*Some of Maya's drawings trying to make five countries such that any two of them share a boarder. *

Time flies and the train passed by the small German town Neuss. The more Maya tried, the more complicated her pictures become, but none of them satisfies her goal. At some point she decided to simplify her sketches.

“Maybe I don’t need to draw the whole country” - Maya thinks

“Maybe it is enough just to mark the city I want to visit by a point, and then simply picture all possible routes between them by lines which connect a pair of points. But I need a picture for which each route does not intersect the third country. So maybe it is enough to draw these connections in such a way that they do not intersect each other?”

*For Germany, Belgium, Luxembourg and France you can see that each pair of them shares a boarder. Hence you can connect all capitals so that the connections don't intersect. *

Going through her pictures Maya noticed that indeed for any particular country you can put the points inside the corresponding countries and draw each connection line exactly through its border so that they do not intersect each other.

*By making a graph of the cities, the countries and the connections between them the drawings become simpler.*

The graphs which can be pictured in such a way that its edges do not intersect each other are called *planar graphs*. And, what is much more remarkable, it works as well the other way around: if you can draw a diagram without intersections, you can make a map out of it.

"But I have to be careful when drawing the graph!"

Usually, the particular image of a graph does not play a role. The crucial property of a graph is the way how different points are connected to the other ones by a connection. However, to verify if a graph is planar or not, we should look at its pictures. And it is not enough to look only at one image because the same graph can be drawn with and without intersections. It suffices to find one such picture without intersections!

Maya starts wondering

"If I want to find five cities, in five neighboring countries, such that every pair of countries shares a border, then I need to draw a graph with five points, where all points are connected to each other, and make sure that no connections intersect!"

This realization excited Maya who started working on this puzzle. Using this new insight, Maya's pictures became more readable, but she still kept failing to draw such a graph. Meanwhile, the train reaches her first stop - Monchengladbach train station.

”Is it actually possible?” - Maya wonders -

”Maybe there is a way to show that, but how…”

Looking at her new diagrams, Maya realized that the new pictures look again similar to maps. You can again see different countries with theirs borders, which are now bounded by the connections Maya is drawing.

"Maybe I can just start adding connections one by one and see how I can add them all. I can now connect the purple and the black point without intersecting any other connection. Then I only need to connect the purple and the green points. Aaaah, frustrating! The purple point lies trapped in the red-black-blue face and I can't reach the green point without intersecting one of these three connections. Maybe it is just not possible to do this... Let's start from the beginning and see how it goes."

While adding connections to the graph Maya realized there are three numbers that seem to play a role in how the graph looks like, namely the number of points , the number of connections , and the number of faces . A face is a closed area which is surrounded by connections. Faces seem somehow important since they may redeem points inaccessible to each other, as above with the purple-green points. Maya starts from a single point, adding points and the connections one by one, and writing down how many points, connections, and faces appear in the graph.

"Ok, I am going to build the graph from 1 point and keep adding points and connections to see how this works".

While drawing another diagram, Maya noticed that each time she draws a new connection, she either adds a new point, or a new face:

So, there must be some relation which tie them together. Certainly, going through all her diagrams, Maya finds one remarkable property:

"Wow, the sum of the number of points and the number of different faces which appear, including a giant one around the image, is always two more than total amount of connections!"

If we write this down compactly in a formula it is

Puzzle time

At this point we welcome the reader to verify this fact for different planar graphs. To construct a graph you take look at some area of the map, for example, Europe or South America. For each country you can mark its capital. These would be the points of your graph. Then, for each two neighbouring countries you should draw a line which connects the respective capitals and passes only through these two countries. Please keep in mind that your connections should not intersect each other. This way you construct a planar graph. Now you can verify the formula by simply calculating the amount of points, connections, and faces. Do not forget that the area around your picture is considered as a face as well.

The formula discovered by Maya is called Euler’s formula. We kindly invite our reader to visit this article for more details. You may notice however that Euler’s formula presented in the aforementioned article is slightly different to what we discuss here. This difference appears from our convention of considering the area around diagram as a face. From a visual perspective such a convention is not natural. However, such an agreement is natural from a mathematical point of view and will be useful later.

Maya really likes her recent observation. But can it actually help her with his problem? Remember that she still tries to connect 5 points pairwise without any intersections of the connections. Maybe her recent insight can tell her that 10 connections are too much for 5 points. Scanning her pictures, Maya noticed that each face needs at least three connections surrounding it (why is this?, look at the figure above and try to understand why this is true.). Moreover each connection can take part in constructing only two faces. It means that the amount of faces cannot be larger than 2/3 of the amount of connections. Using this finding, in combination with Euler's formula from above, Maya obtains that the amount of connections can't exceed 3 times the amount of points minus 2. In particular, for 5 points we can't draw more than connections, meaning that it is indeed impossible to draw 5 points connected to each other without an intersection of these connections!

"Whaaaat!" - Maya uttered in surprise of what she has found.

By the dramatically deteriorated weather Maya understood that the train entered The Netherlands. She was slightly dissatisfied with her recent discovery, as she must limit to only four countries! On the other hand, she gained some knowledge, so she can maybe use it in a different occassion. She opened again the map and she was looking at all these countries being destined to satisfy this mathematical restriction for ever. While she was thinking about her time in Rotterdam and what she will do there, she remembered another puzzle from her childhood which sounded similar.

The train conductor announced its stop at ’s-Hertogenbosch, the origin of famous Dutch painter Hieronymus Bosch. Maya recalls that she saw the pictures of Bosch in the Städel museum in Frankfurt a couple of days ago. But despite reflecting on the nice memories about the art exhibition, she is still puzzled.

"I don't want to think about it anymore, I am almost there. I will think about that on my way to Brussels!"

]]>Imagine a group of students, who just received points on an exam. Besides knowing a little bit about the grading scheme – only full points are awarded – the only information we have is that the average result was 13.6 points, but we don't know anything about the number of the students, their individual performances or the maximum number of possible points for the exam. What conclusions can we draw from this information? For example, could we conclude that there must have been a student that did particularly well?

Of course, the answer depends on what we mean by "doing well", yet we can still infer a little bit about this: At least one student must have scored 14 points or better – otherwise, everyone would have received at most 13 points, and the average score would have been at most 13 points, too.

Something interesting happened here: Usually, if I wanted to convince you that at least one student did well in the exam, then exhibiting for example the best student would suffice (assuming we ignore any concerns about the confidentiality of examination results...). But this is not what we did! Instead, we managed to show that there is a good student – at least, a student who exceeds the average performance of their peers – without needing to find them.

In fact, this observation works with any data set: There is always at least one data point that is at least equal to the average taken over the entire data set. Intuitively, this makes a lot of sense: The average should be somewhere in the middle of all the data, so there should at least be one value above it.

We want to think about this in more probabilistic terms to obtain a useful mathematical tool. The data will then be the outcome of a random experiment, written as some random variable . The weighted average over all possible outcomes – weighted according to the probability of the outcome – is the expected value . In this language, our observation takes on the following form:

There is always an outcome to a random variable that is at least equal to its expected value .

Just as the example in the beginning, about the exam results of the students, this statement guarantees the existence of something that is above average, without actually exhibiting it. In the case of the student group, that might seem like a drawback. But if we want to make mathematical statements, this might even be an advantage! In fact, the statement above is one version of the so-called probabilistic method, which generally states that if a randomly chosen object has a desirable property with positive probability, then there exists an object having the desirably property.

The probabilistic method was pioneered by the Hungarian mathematician Paul Erdős (1913 - 1996), famous for his many contributions to combinatorics and graph theory, and it has since become an important tool in these areas of mathematics.

One of Erdős' earlier applications of the probabilistic method was to a problem in Ramsey theory, an area of mathematics that, roughly speaking, deals with the question of how large we need to make a certain structure to be guaranteed a given smaller structure. In more concrete terms, consider points, called vertices, and connect each pair of them by a line, called edge. The resulting object is called the complete graph on vertices, , where the attribute complete means that all possible edges are present in the graph. If you now colour every edge of a arbitrarily by either red or blue, you might notice that there is a smaller complete graph, all of whose edges share the same colour. For a given , we may ask how large we need to make to ensure the existence of a monochromatic -- the smallest such is the Ramsey number .

Let us see a concrete example, say we take . Then we ask how large we need to make to ensure the existence of a monochromatic , which is either a blue triangle or a red triangle. There is a nice mathematical argument that for every larger or equal to 6, always has a monochromatic triangle. Try this yourself, consider an at least equal to 6, draw points and start colouring all the edges either blue or red. You will soon see that you cannot avoid making a monochromatic triangle. Like in the drawing below.

*On the left you see , if you colour all edges either blue or red you will always get a monochromatic triangle. Do you see the two blue triangles on the right?*

You can also puzzle yourself and convince yourself that if you take less than 6 vertices, say 3, 4 or 5, then you can find colourings of the edges such that the graph doesn't have any blue nor red triangles.

Determining the values of is a notoriously difficult question, with only the values for being known ( for example, if you want to see why you can have a look at this article), so mathematicians have moved their goalposts. Rather than trying to get to the exact values, trying to understand the growth of is – slightly – more feasible, but still very difficult. Indeed, it was on this question that Erdős in 1947 employed the probabilistic method, to show that grows at least exponentially. This means that for a given , there needs to be a red-/blue-colouring of without a monochromatic , and we need to be able to choose as an exponential function of .

Rather than trying to construct a red-/blue-colouring that avoids monochromatic substructures, Erdős had the fantastic idea to instead consider a random colouring: What happens if we colour each edge blue or red with probability each, independently of one another? Since a consists of edges, all of which need to have the same colour between red and blue, the probability that fixed vertices create a monochromatic for a fixed color is . One can in turn use this to show that the expected number of monochromatic 's in a is given by

The additional two in the multiplication comes from the fact that can have either of the two colors, red or blue.

It can be shown that this expression is strictly smaller than 1 for large and . Thus, we can now use a version of the probabilistic method above (instead of looking for something above average, we want something below): The expected number of monochromatic 's is less than 1, hence there is an outcome of the random red-/blue-colouring without a monochromatic , and that is exactly what we wanted to achieve!

Let us look at another fun application of our observation above. This one is a puzzle by Naoki Inaba, popularized in an article by mathematician and puzzle aficionado Peter Winkler.

Assume we have ten dots drawn arbitrarily on a table, and we want to cover them by using 1€-coins. Is this possible? Surely yes, we can take ten coins and place one of them over each of the dots. But what if the coins are not meant to overlap? Then the question is trickier, and it seems very difficult to present a clear answer – the nonoverlapping coins leave gaps in between them and the dots might be positioned in such a way that some of them always end up in such a gap.

But again, we can make use of the probabilistic method -- only, where do we get the probability from? The dots are fixed, so we have to try to cover them at random. However, throwing coins at random onto the table would be too chaotic, and eventually the coins would overlap. Instead, here's how to do it:

Pretend the dots on the table are really points on the plane. Arrange the coins into a hexagonal pattern and extend the pattern infinitely to the entire plane. Here, it is convenient to be a mathematician and replace the coins by abstract "disks", because in practice, you may run out of coins. Then, randomly shift the coins by maintaining the orientation of the hexagonal grid. Here, it is inconvenient to be a mathematician, because this is not well-defined: For reasons rooted deep within probability theory, there is no uniform random shift across the plane. However, we can work around this by using the periodicity of the hexagonal grid -- it is enough to shift by less than one hexagon, in whatever direction.

*The hexagonal arrangement of disks used for covering the points. The red arrows indicate by how much we may shift the entire configuration from a centre point.*

Now that we have introduced randomness into the problem, we need to use our observation above. It turns out that the hexagonal arrangement of coins covers a proportion of , or roughly 90.69% of the plane. After applying the random shift, this means that each point is covered by a coin with a probability of approximately 90.69%. With a little bit of probability theory, it follows that the expected number of points being covered is ten times this probability, so . Since this is larger than 9, it means there must be some outcome of the random shift that covers more than nine points, and therefore all ten points.

It is now a natural question to ask whether we can also always cover 11 points? According to a paper by Aloupis, Hearn, Iwasawa and Uehara, the answer is yes. In fact, we can even do it for 12 points, but these cases require more complicated constructions -- the probabilistic method alone is not powerful enough to answer these questions. Can you see why the argument above breaks down when you have 11 points or more?

On the other hand, there are configurations of 45 points that cannot be covered, and it is an open problem if it always works for any number between 13 and 45.

Nonetheless, similar to our observation about the above average student, the probabilistic method gave us a very elegant approach to two difficult questions, by completely avoiding the need to describe a specific mathematical construction. As it turned out, for both the question in Ramsey theory and the problem of covering points, having an outcome above or below the average was good enough to solve the problem!

]]>*Figure 1: the poster for IMAGINARY in Amsterdam. Can you find the mathematics in the poster?*

*Figure 2: Personal project: Life on a cell membrane*.

“After high-school I didn’t really know what to do. I was interested in biology, but I liked all scientific topics. I was convinced by my parents to try medicine, mostly because of its professional perspectives. I started but it wasn’t my thing. I liked biology, for me biology and medicine are two sides of the same coin, namely how life works. During my bachelor project I had to think more about my future. I was doubting if I wanted to go into research, which demands long-term commitment. I liked short-termed projects more. During my bachelor project I started making comics and I really liked it, then I decided to start with a master in graphic design and science illustration. I fell in love with it! I only wanted to learn more! In the end I did my master thesis in visualization of proteins.

I was enthusiastic about making comics. I started communicating publishers and newspapers If they would be interested in a collaboration. My first comic was about Xyllela, a parasite infecting olive trees in the South of Italy. My second comic was in 2015 about vaccination, a hot topic. After that I started getting requests.

My first big project was with Donna Moderna, a very popular fashion magazine in Italy. The director wanted to also communicate some scientific discoveries to the readers. She thought that comics would be a less scary form to communicate science. You can create characters, make jokes, or make a funny drawing of some bacteria!”

*Figure 3: Comic commissioned by Donna Moderna magazine (issue n.27, year XXIX, 1st June 2016) about antibiotic resistant bacteria.*** ***Fully story to be found here**.*

On the occasion of the 2018 European Girls’ Mathematical Olympiad, the magazine Comics&Science has dedicated an issue to the topic of women in maths. Claudia was asked to write a comic story about the EGMO. She decided to write the story of four young girls going to the Olympiad: Emmy Noether, Ada Lovelace, Sofia Kovalevskaja and Ipazia!

*Figure 4: Emmy Noether, Ada Lovelace and Ipazia preparing for the EGMO.*

“This comic shares bonding moments of students doing mathematics. It was a very touching comic. I remember I started reading a diary of the tutor of a group who went to the Mathematics Olympiad. About the traveling, their daily routines, being there, and participating in the Olympiad. So, I came up with the idea to make the comic as a diary. And since the Olympiad was in Florence the tutor’s name is Dante! I decided to take famous women mathematicians participating in the team because this would make it more appealing for mathematicians, I could also make some inside jokes.

I studied a little bit the biographies of these mathematicians, and I thought what people with such biographies would do as teenagers participating in the Olympiad in our time. For example, Emily Noether was not getting paid as a woman doing research, hence her character in the comic is a tutor at her school who is not getting paid. Sofia Kovalevskaja was involved in politics; and so is the character in the comic. Reading about them helped me build the characters and understand their way of thinking.”

“Each medium has its strengths. When you write, you can delve into a lot of details within just one page. In comics, however, you need more space to achieve the same depth in the subject. On the other hand, comics have the advantage of having pictures, which can make difficult concepts easier to understand. Indeed, you have two levels of understanding that work in synergy: the text, and the picture. There are other advantages: comics are perceived as less scary, which may help you reach a broader audience; people can relate to the characters and be involved in the narrative. Finally, comics can be a good way to show the people behind science, including mathematicians.”

“My very first project was about Infinity. I was contacted by mathematician Bruno Codenotti, who wanted to write a book for teenagers. We decided for a combination of comics followed by more detailed explanations. We spent many hours discussing with Bruno about the content, how we wanted to present some concepts, and how the comics and the scientific explanations could strengthen each other. What I realized when working with Bruno is that mathematicians need to be very precise with the language they use. You cannot just play and mix words or make jokes the whole time because you can communicate something that is wrong.

I think the hardest part with mathematics is when you want to communicate a topic to readers who have no experience with it. You must motivate them to spend time and energy in your explanation. In biology it is different, you may need many details to explain some concepts, but they concern phenomena readers can relate to. Understanding mathematics often needs a lot of background, you need to be a very good communicator. But the societal impact of such collaborations between scientists and artists can be huge!

*Figure 5: Left and right: celebrating women in mathematics, middle: Book about infinity.*

“Absolutely! But before starting such a collaboration you need to have clear what your goal is, do you want to explain the science or be inspired by it. As an artist I don’t only want to explain science, but also to be inspired by it. When you want to explain science, you need to be very careful not to be misunderstood. When you do art, it is not to explain the science, but to get a feeling from it.

We have in Cambridge a version of “Pint of science” (hosted at nine locations in the Netherlands), it is an international project where scientists are asked to give short presentation about their research in their local pub, bar, cafe or public space. In Cambridge they have paired this exhibition with another cool exhibition which is called “Creative reactions”, where they asked volunteers from art to be paired to the scientists and create a piece of work inspired from the science. And you see many different results. So you see that the outcomes depend on how they react to each other. In a collaboration both sides should be open about how the other one will interpret some piece of work. So there should be room for creativity. Sometimes mathematicians expect a precise representation of their work, but the funny thing is that when working with an artist you can get something which is totally opposite from what you expect! I really love this interplay!”

*All the illustrations are taken from Claudia's personal website. If you want to reuse any of these illustrations you need her consent.*

One of the main building blocks of modern AI tools are *artificial* *neural networks,* abstract models inspired by the structure and functions of biological neural networks. These artificial neural networks enable machines to be trained and learn. But do they actually learn? Or do they just imitate something that looks like learning? This is a question that has sparked many debates and discussions. In this article, I will discuss some thoughts on this topic, and present my viewpoints on this question.

Neural networks and their associated field of Deep Learning are ubiquitous in today’s tech sector. Due to dazzling breakthroughs in recent years, such as beating the world’s best Go player, solving the decades-long protein folding problem, and of course the introduction of ChatGPT, Artificial Intelligence (AI) has become a commodity term in society at large as well. Progress appears to be dizzying, with the world’s largest tech companies pumping vast amounts of resources into the sector. Therefore, the following question arises: what does the future hold? Are current architectures, such as those powering today’s Large Language Models (LLMs) such as GPT-4, sufficient to build truly intelligent machines?

In order to answer this question, it is worth probing these architectures to determine potential pitfalls and limitations. For example, despite impressive results, machine learning models are still highly dependent on the data they are trained on, relying on large numbers of previously observed examples. The training process and subsequent model’s capabilities will be accordingly limited. This results in both a lack of flexibility for new problem settings and a reproduction of biases and patterns that are present in the data used to train the system. You can find many articles about these problems, for example, this article on the website of the United Nations Development Program about how stereotypes are inherited by AI systems like Dall E. It is questionable whether a system that isn’t able to change itself in response to changing conditions can truly be called intelligent. Therefore, the study of *adaptive *systems, both natural and artificial, could provide the solution.

A field of science that concerns itself with the study of adaptive systems is known as *complexity theory*. More specifically, it aims to understand phenomena exhibited by systems of many interacting components. Such phenomena comprise self-organisation, emergence, and adaptive behaviours, which can be summarised as *collective intelligence*. An example that illustrates the power of such systems is demonstrated by ant colony behaviour.

*Left: Trajan’s Bridge at Alcantara, built in 106 AD by Romans. Right: Army ants forming a bridge, want to read more about ants building bridges? And also see a video of them doing it? Have a look at this article.*

Before diving further into the topic you can have a look at this short, nice, and relaxing video about emergence, made by Kurzgesagt.

Through the coordination of a large number of individuals, ants are able to perform a variety of tasks. One such task is the construction of bridges for terrain traversal. When compared to the stone (or steel) bridges built by humans, ant bridges exhibit remarkable flexibility. The width and length of the bridge can be adapted to the gap that needs to be traversed. Human bridges, on the other hand, are built for one specific location only. If it were to be taken out of its context and placed elsewhere, the bridge could no longer fulfill its intended purpose. Current deep learning architectures resemble stone bridges in this regard, rather than ant bridges. Once trained, the learned parameters of the neural network remain statically fixed. This rigidity leads to a lack of robustness, leading to failure in the face of changing data. For example, an artificial agent trained on a given video game training data set will fail at the game if a small number of pixels are modified during playtime. This has been investigated by researchers from the University of Singapore, who analyzed various methods to trick AI models. They have shown that even changing a few pixels can lead to mistakes.

Furthermore, a neural network has a very strict expectation of input structure. If it is trained on a certain number of inputs, pixels in an image for example, it cannot take a different number of pixels as input without first being re-trained. A bridge formed by ants, on the other hand, has much less rigid assumptions about the environment it perceives and acts within. To overcome this shortcoming we need to *bridge *the gap between artificial intelligence and collective intelligence. Here again, the field of complexity science offers adequate tools. This intersection of disciplines could be referred to as AI/CI (Not to be confused with the famous Australian rock band).

Rather than employing real-world biological systems, such as ant colonies, we make use of simulated systems within virtual environments. Complex phenomena exhibited by collections of biological agents can often be captured by relatively simple rulesets in a simulated setting. For example, bird flocking behaviour has been simulated using boids.

Three simple rules govern the behaviour of each virtual bird:

- Separation: Avoid collision with other flockmates
- Alignment: Steer towards the average direction of local flockmates
- Cohesion: Steer towards the average position of local flockmates

This is sufficient for the system to exhibit the *emergence *of flocking behaviour, where large numbers of individuals are coordinated in 2-dimensional or 3-dimensional space.

*Left: Photograph of real birds exhibiting flocking. Right: 2D boids simulation*. *Image taken from **Le Monde du PC*.

While this is an intuitive introduction to simulated collective intelligence, there is another example that is even simpler, yet it allows the creation of fascinating patterns. Imagine a field of squares, like the checkerboard of a chess game. Think of a chess board where the colours are not ordered, but any square could be black or white.* *At the same time, the squares can change colour. Just like the behaviour of individual birds in our previous example, the colour of each square is controlled by simple rules. More specifically, the rules tell the squares whether they should flip from black to white, or vice versa, based on the surroundings of each square. I hear you say “Wait a minute, I thought this was supposed to be simple!” Okay, here’s an example. The rule book says:

- If a white square has less than 2 white neighboring squares, it will turn black.
- If it has 2 or 3 neighbours that are white, it will stay white.
- If a white square has more than 3 neighbours, it will turn black.
- If exactly 3 white squares surround a black one, the black one will turn white.

*S. Conway’s Game of Life by Sam Twidale. GIF used with explicit permission from author.*

Also simply known as *Life**,* its name is inspired by the fact that the squares, or* cells,* can live, reproduce, and die. Therefore, white and black represent life and death, or more simply on and off.

The rules that govern how a cell responds to its neighborhood, such as the ones outlined above, are called *update rules. *Given these rules, we can begin with any pattern of cells. From this starting point, the grid will start changing the values of its cells, some flicking on and some flicking off. This changes the starting pattern, so that each cell is once again in a new neighborhood, and so on for as long as the simulation runs. At each time step, the *local* *neighbourhood* determines the on/off patterns of the grid, evolving the system over time. This is an example of emergent behaviour arising from the local interactions of cells, determined by the update rules. Such emergent behaviour includes the formation of large patterns that persist in time, known in the community as “spaceships” or “demonoids”, among many others. Enthusiasts are actively exploring all the possible “life forms” existing in this virtual microcosm. Here is an interactive example of a spaceship!

Cellular automata are key in helping researchers understand the complexity of the real world. Take the example of ants forming a bridge. Each ant on its own cannot see the full picture of the bridge, it is only aware of what its neighboring ants are doing. In this sense, its neighborhood controls how the ant will behave. As a collective, the ants are able to create an emergent, functional structure, much like the spaceships of Conway’s Game of Life.

The origins of deep learning trace back to the first half of the 20^{th} century, when neuroscientist Warren McCulloch and mathematician Walter Pitts conceptualised a simplified model of the human neuron: The McCulloch-Pitts neuron. This architecture was first implemented in 1958 by Frank Rosenblatt, an American Psychologist. It appears fitting that neural networks, the second component of our AI/CI dichotomy, were also inspired by a complex biological system from the very beginning.

Rosenblatt’s Mark I Perceptron was initially designed to function as a “photo-interpreter”, foreshadowing the image recognition capabilities of modern neural networks. However, it was not until the 2010s that neural networks, inspired by such early architectures as the Perceptron, truly entered the mainstream following significant enhancements in complexity and performance. These models process inputs through a series of neural activation patterns, culminating in an output specific to the task—for example, identifying a dog in an image and labeling it accordingly. The learned weights in a neural network encapsulate the 'concept' of an image, which is understood through low-level features such as lines, edges, textures, and colors.

In certain advanced generative models like Generative Adversarial Networks (GANs), invented by computer scientist Ian Goodfellow and presented in their seminal article Generative Adversarial Nets, once a network has been trained on such concepts, it is possible to use the model in reverse to generate images from labels, effectively synthesizing new visuals based on learned data. This generative component has a strong connection to our following example, where we explore biological *re-*generation, bringing together collective intelligence and artificial intelligence.

Now that we have introduced both the AI and the CI, it is time to combine them in a meaningful way. Let’s consider a scenario where artificial intelligence converges with biological growth and regeneration, introduced by Mordintsev et al. We revisit the realm of images. It isn’t too much of a stretch to imagine a digital picture as a grid of cells, formed by the individual pixels of the image. This should ring a bell, reminding us of cellular automata, also acting on grids. Recall the on/off state switches flickering across the grid through time. Now, instead of simple on/off values for the cells, each pixel is composed of three primary colour channels: red, green and blue. For a given pixel, each of the three channels has a certain value between zero and one (A digital image is simply a grid of many pixels, each of which has differing values for the three primary colours).

For example, a low-resolution drawing of a gecko on a white background has a distinct shape and colour and can be captured with relatively few pixels. This is no ordinary picture of a gecko, however. It is a sort of digital life-form, since its pixels, or cells, belong to a special kind of cellular automaton. This means that if something happens to a part of the gecko, changing the values of some pixels on one of its body parts, the neighboring pixels will also react to this change. For example, introducing a “wound” to the gecko by destroying several pixels will lead to a regeneration of the damaged area, based on a correct set of update rules and the local neighborhood interactions. The twist is that in this case, the update rules are *learned* by a neural network to achieve the desired image. This regenerative behaviour based on collective, localised information is a prime example of the adaptive behaviour of collective systems. It demonstrates how ideas from both complexity science and deep learning can be symbiotic, allowing for novel methods with unprecedented capabilities. An interactive demo of the neural cellular automaton exhibiting regeneration can be tried here.

*Left to right: Image of a gecko regenerating as a neural cellular automaton*. *Image taken from Distill* *(an online academic journal in the area of Machine Learning)*.

Another exciting intersection of the two disciplines comes in the form of *multi-agent reinforcement learning.* The field of reinforcement learning (RL) is dedicated to training artificial agents that learn via interactions with their environment in virtual “playgrounds”. The agents receive feedback in the form of rewards and penalties in response to correctly and incorrectly performed actions, respectively. Typically, single or few agents are trained within the context of a single simulation, so some form of limited interaction is possible. However, traditional reinforcement learning doesn’t simulate enough agents for collective intelligence, such as ant colony coordination, to emerge. Some more recent works have explored multi-agent RL settings containing vast numbers of agents. For example, Lianmin Zheng, a PhD researcher from Berkeley University introduced MAgent. MAgent allows for the simulation of hundreds to millions of agents within a single environment, allowing an in-depth exploration of complex social behaviors such as the emergence of societal hierarchies and languages. This vast number is critical for studying phenomena that only emerge from large-scale agent interactions. You can watch this nice video to see how MAgent works.

In MAgent, each agent operates within a highly configurable grid-world environment where they can move, interact, and evolve based on the actions of their neighbors. An important difference to the grids of cellular automata is that the agents live* on* the grid, freely moving around it. The platform supports a range of scenarios from cooperative tasks where agents must work together to achieve common goals, to competitive settings where they fight for limited resources. The real-time, interactive nature of MAgent allows users to directly observe and tweak these interactions, providing a powerful tool for studying the collective behaviors of these micro-societies.

We have touched upon the profound implications of merging collective intelligence with traditional deep learning frameworks. The rigid, static nature of current deep learning architectures, much like the unyielding Roman bridges, limits their application in dynamic, real-world environments. By integrating principles of self-organization and adaptability observed in natural systems—exemplified by ant bridges and flock behaviours—we move towards more versatile and resilient AI systems. This includes models that can self-regulate, self-repair, and autonomously improve over time through interactions with their environment. In doing so, artificial systems may start to closely resemble the complexity and adaptability of natural systems. There is a lot to learn from systems of collective intelligence, so let’s help our machines learn from them as well!

This article was based on the original post Marcello wrote for his blog.

]]>The scientific world would not be what it is today without the normal distribution. It is the foundation of many statistical models for several good reasons. Most importantly, it appears commonly in nature. For instance, if you collect height data from people at your workplace or school and create a histogram, you will likely observe the familiar bell curve. This is because human characteristics, such as weight and height, follow a normal distribution, like many other natural phenomena. One could even go so far as to call the normal distribution nature’s default pattern of randomness.

But this is only the tip of the iceberg. For example, roll a die many times and count the average number of times you rolled a six. In the beginning, your results might appear quite chaotic - three sixes in a row and then none for a long time… However, as you continue rolling, a familiar pattern emerges; the distribution of the average starts to resemble a normal distribution. This phenomenon turns out to be quite universal as the shape of the original data typically does not matter; if you add up enough of it, the result starts looking like a normal distribution.

This principle is known as Central Limit Theorem (CLT) and it is responsible for the widespread use of normal distributions in many models. So, is this it? Is the normal distribution all you need to remember from your statistics course? Definitely not! What is more, things can go terribly wrong when we assume that something is normal, when it is not. Let me show you that this is true with one simple example.

You have probably heard about the financial crisis in 2008. Up until that point, pricing and risk models in finance, such as the Black-Scholes model, relied heavily on the assumption of a normal distribution. However, this assumption was more often than not violated in practice. In consequence, the models were underestimating the risk of extreme or rare events which is commonly called the tail risk.

This means that people and financial institutions did not fully realize how bad things could get if an extreme event happened, and therefore were not prepared or insured against it. Although the 2008 crisis was years in the making and already in 2006 we could observe its first effects on the U.S. housing market, it is the fall of Wall Street bank Lehman Brothers in September 2008, the largest bankruptcy in U.S. history, that tipped the scales.

Chaos ensued as everyone abruptly began losing huge amounts of money, leading to a worldwide financial crisis.

While the causes of the crisis were numerous and complex, it is the underestimation of tail risk and improper risk management that can be considered the primary culprits. So, how do we properly account for the tail risk? Heavy tails can help with this! Whether this is a totally new territory for you or if you are already versed in heavy tails but seeking an engaging read, join me on this journey where we (re)discover heavy tails and some of their magical properties and applications.

Heavy tails, or more precisely *heavy-tailed distributions*, represent a type of data where the likelihood of extreme events is greater compared to more common distributions, such as the normal or the exponential distribution. They are used to describe situations where rare or unusual things occur more often than you would think. For example, earthquake magnitudes are heavy-tailed. This means that small earthquakes occur frequently, almost continuously and typically we do not even notice them. But, once in a while, an extreme earthquake happens, like the Japan earthquake in 2011 or the Indian Ocean earthquake in 2004. Prediction models based on normal distribution would deem an earthquake on such scale as almost impossible, while two of them already happened in this century, causing hundreds of thousands of casualties. This shows that heavy tails are crucial to properly understand and model the risk of extreme earthquakes.

The name heavy tail comes from a visual representation of the distributions (see Figure 1 below). Here we compare the right tails of an exponential, normal, and Pareto distribution — the most famous example of a heavy-tailed distribution. The line corresponding to the Pareto distribution is highest for large indicating that the probability of a very large data point or event is higher than for the other two distributions.

*Figure 1: Tail comparison of Pareto, Normal and Exponential distribution.*

Heavy tails may seem mysterious simply because they are less known. In the early evolution of probability theory, the focus primarily rested on the elegance of normal distributions and their widespread applicability. It wasn’t until the 20th century that scientists like Vilfredo Pareto and Paul Lévy began advocating the existence of distributions with heavier tails. However, this was not enough to convince the scientific world to depart from the comforts of the normal distribution and venture towards the unknown heavy tails.

For many years, heavy-tailed theory was studied only by a few and considered more as a mathematical curiosity rather than a tool that is useful in practice. People simply were not convinced that such a high likelihood of extreme events could be true in real life. However, nature has mysterious ways of surprising us and this holds true for heavy tails as well. With increasing digitization, we became more and more capable of collecting and analyzing data and suddenly we realized that there is an entire world of heavy tails beyond the "bell curve" and that examples of heavy tails are found all around us. To name a few, the following can be heavy-tailed:

- Natural disasters such as magnitudes of earthquake distributions;
- City sizes;
- Packet sizes in Internet traffic;
- Insurance claim sizes;
- Number of connections in real-world networks;
- Sizes of disease outbreaks.

With these new findings, we began adapting our mathematical models to reflect the possibility of rare events that many of the classical approaches ignored. Unfortunately, this revolution has been primarily driven by catastrophic events like the financial crisis, but better late than never!

Another reason why, for a long time, people did not believe in the occurrence of heavy tails in practice is their somewhat unorthodox properties. For example, a heavy-tailed distribution can have an infinite variance, or even an infinite mean. This is problematic for a couple of reasons. First, the classical statistics revolve around averages and variances. We use them to describe and compare data in a meaningful way, perform hypothesis testing, etc. However, if the mean or the variance is infinite, none of these methods can be applied. Second, imagine that some natural phenomenon has an infinite mean; think for example of earthquake distributions. If you collect a sample, no matter how large, you will be able to compute its average value and it will always be finite. This is a bit counterintuitive and makes the estimation of heavy-tailed phenomena less straightforward than that of light-tailed (not heavy-tailed) phenomena.

What about the Central Limit Theorem that “magically” transforms distributions into a normal distribution? Can we use it to make some sense out of the heavy-tailed distributions? Yes! … and no. CLT requires the variance to be finite, and, as we know by now, not all heavy tails have that. However, it does not mean that there is no regularity to these heavy-tailed distributions. Instead of being transformed into a normal distribution, they can be transformed to another heavy-tailed distribution with infinite variance.

Yet another non-conforming feature of heavy tails becomes evident when examining Figure 2. There, we took a sample of 1000 data points and for each value on the -axis, we plotted the sum of the first data points in our set. Mathematically, we would call this object a random walk, which is an extremely useful model for analyzing time-dependent processes such as the movement of particles or stock market prices.

*Figure 2: Different behavior of random walks with heavy- and light-tailed increments.*

Looking back at the graph, if the samples come from any light-tailed distributions, we could approximate the plot with a straight line. However, for the Pareto case (which is a heavy-tailed distribution), we observe visible jumps, caused by extreme events. In this case, a straight-line approximation no longer seems like a good idea. It seems that some distributions just do not want to conform and there is nothing more we can do other than accept them as they are. But that is alright, because, as it turns out, some of their properties are intuitive, well-understood, and can make analysis quite simple.

A well-known and intuitive characteristic principle of heavy tails is the catastrophe principle. Let me explain it using an example. If the total wealth of people in a train is a few million dollars, then most likely you are traveling with one millionaire and the rest of the passengers have an average wealth. This is because wealth distribution is typically heavy-tailed. This example can be generalized to a *catastrophe principle*, which tells us that if a sum of heavy-tailed data points is large, it is most likely due to one data point being extremely large, a.k.a a catastrophe.

Now, imagine that you travel in a train and the average height of the passengers is more than two meters. Does that mean that you are traveling with a 10-meter-tall giant? Probably not! It is more likely that you travel with a basketball team where all players are exceptionally tall. This is an example of a *conspiracy principle*, as all data points in your height sample “conspired” by having an above-average height. Height distribution is light-tailed because extremely tall people do not occur due to biological constraints. This is why the conspiracy principle applies in this case. These two examples illustrate fundamental differences in the behavior of heavy-tailed distributions, as opposed to the light-tailed distributions we are more familiar with.

The intuition that comes from the catastrophe principle is extremely useful when analyzing processes related to heavy tails and especially their extrema. For example, imagine a supermarket queue where the number of items each customer buys is heavy-tailed. We are interested in the probability of a very large waiting time. Although infinitely many different scenarios could lead to this, we only need to care about one! Most likely the large waiting time is caused by only one extreme event, for example, a customer who decided to stock up for the entire year. This is where the beauty of heavy tails lies: using the catastrophe principle we can bring down a complex problem to the analysis of a single instance that is tractable. But there is so much more.

Over the years, this idea has been polished and perfected, resulting in theorems for heavy-tailed processes which allow us to understand more and more complex heavy-tailed problems. The recently published mathematics book *The Fundamentals of Heavy Tails* provides a comprehensive account of properties, emergence, and estimation of heavy-tails. This book can help you navigate through the world of heavy tails and reveal other properties that could not be covered in this text.

To sum up, through this blog, I aim to show that there is more to statistics than the familiar bell curve and other light-tailed distributions. Heavy tails are prevalent, and they adhere to non-standard yet intuitive principles. What is more, things can go very wrong when we ignore tail behavior. So, the next time you stumble upon heavy tails, do not ignore them; embrace them. Despite appearances, they turned out to be quite tamable.

]]>