Skip to content
~/home/alelouis

Crowd wisdom solves a maze.

As I was reading through the book The Wisdom of the Crowds by James Surowiecki, a fairly straightforward example was given to prove that crowd averaged decisions can be better than all individual decisions taken separately.

The actual experiment was performed by a researcher named Norman Lloyd Johnson (1917-2004) at Los Alamos National Laboratory. Johnson ran several computer simulations of agents in a maze, trying to find a path between an entrance and an exit. Agents used simple strategies to solve the maze; those are not explained in the book though. Regardless, what Johnson did after running the agents was to take all recorded trajectories and average their decisions in order to produce an "average" crowd path. Surprisingly, the crowd path was actually better (in length terms) than any individual path recorded, better to the point it even reached the optimal shortest path that no agent took.

Because I found this result somewhat intriguing, I wondered if I could replicate it. As the setup was fairly simple, I will describe one version in this post and conclude whether I observe the same results or not.

Building the maze

First, we need a maze. Many solutions are available out there, I will opt for some version of randomized depth first search. The algorithm works by iteratively building a graph in a depth first fashion, choosing randomly a new node from possible neighbors until it's not possible (reaching maze bounds or too much existing neighbors). For a 2D maze, each node describes some position in the maze while the edge connect the positions to form corridors.

Below is my Python implementation, using a networkx graph along the way to store/represent the resulting maze.

def randomized_dfs(width, height, init_cell):
    G = nx.Graph()
    visited = set()
    stack = deque()
    G.add_node(init_cell)
    visited.add(init_cell)
    stack.append(init_cell)
    while len(stack) > 0:
        current_cell = stack.pop()
        r, c = current_cell
        G.add_node(current_cell, pos=current_cell)
        neigh = []
        if r+1 <= height: neigh.append((r+1, c))
        if r-1 > 0: neigh.append((r-1, c))
        if c+1 <= width: neigh.append((r, c+1))
        if c-1 > 0: neigh.append((r, c-1))
        if any([n not in visited for n in neigh]):
            stack.append(current_cell)
            next_cell = random.choice([n for n in neigh if n not in visited])
            nr, nc = next_cell
            G.add_node(next_cell, pos=next_cell)
            G.add_edge(current_cell, next_cell, weight=1)
            visited.add(next_cell)
            stack.append(next_cell)
    return G

If we were to plot the graph during its construction, that's what it would look like:

Figure $1$: Randomized DFS maze building

Simplifying the maze

I will work with the following $8\times 16$ maze. The start and end points are marked in red.

Figure $2$: Reference maze

As you can see, there are actually many chains or corridors in this maze. One reasonable assumption we can take is that our agents won't simply turn around in the middle of a corridor for no reason and will keep going until reaching either a dead end or a room (with multiple choices).

This means that any node with 2 neighbors is part of a corridor and can be simplified. By simplified I mean that between dead ends and rooms, there could be only on edge, with a weight of the number of steps needed to go between them.

So I will proceed and simplify our graph by iteratively deleting any node with exactly two neighbors, and building an edge between them with a weight equal to the sum of the two outgoing paths.

In Python, this could be implemented as such:

def simplify(G):
    g = G.copy()
    for node, degree in dict(g.degree()).items():
        if degree == 2:
            one, two = list(g.neighbors(node))
            new_weight = g.edges[(node, two)]['weight'] + g.edges[(node, one)]['weight']
            g.add_edge(one, two, weight = new_weight)
            g.remove_node(node)
    return g

This give us a simplify graph, which I plot in lime green below:

Figure $1$: signal of interest

Sanity checks

In order to check that we actually did implement this simplification right, we can compare the shortest path on each graph and we should get the same result.

First, on the original maze graph:

start, end = (2, 1), (16, 1)
shortest_path = nx.shortest_path(G, start, end)
print(len(shortest_path)-1)
> 52

This means there are $52$ edges between both red ends on the original graph.

Figure $1$: signal of interest

Next, we check that the weighted sum of the edges between the two ends is also equal to $52$:

g = simplify(G)
red_sp = nx.shortest_path(g, start, end)
sp_len = sum(g.edges[(s, e)]['weight'] for (s, e) in zip(red_sp, red_sp[1::]))
print(sp_len)
> 52
Figure $1$: signal of interest

Everything is fine and good. Now that we know the optimal value of the least number of steps, let's implement our dumb agents.

Stochastic agents

The goal here is to design dumb agent to explore the maze with a really simple strategy. The simplest I could think was:

  1. Given a choice, the agent will take an option it hasn't taken yet.
  2. If that's not possible, it will choose randomly between all available choices.

The agent actually explores the simplified graph, which implements the reasonable trait I talked earlier: they can't turn back in the middle of a corridor. Also, because the agents are jumping from rooms / dead ends to other rooms / deads ends, the implementation is straightforward and we do not waste cycles computing the corridor's walks.

One step of an agent walk can is implemented below :

def agent_walk(g, position, history):
    options = set(g.neighbors(position))
    already_visited = options.intersection(history)
    not_visited = options - already_visited
    if len(not_visited) == 0:
        next_position = random.choice(list(options))
    else:
        next_position = random.choice(list(not_visited))
    history.append(next_position)
    return history, next_position

Then, the fun begins. We can instantiate a large number of agents and make them explore the maze until they found the exit. I chose a number of agents big enough to observe the crowd effect but still small such as none of the agent actually finds the optimal path ($100$ runs).

Also, for each node of the simplified graph, I record all of the actions taken by the agents. After the simulation is finished, I will be able to see which way was taken most of the time for each node.

This is what it looks like:

statistics = {n: {nn: 0 for nn in g.neighbors(n)} for n in g.nodes}
n_runs = 100
for run in range(n_runs):
    history, position = [start], start
    while position != end:
        history, next_position = agent_walk(g, position, history)
        statistics[position][next_position] += 1
        position = next_position

On this simulation, the average length was $233.42$ steps while the best agent took the smallest path of $62$ steps (which is smaller than $52$, so not optimal).

Checking the wisdom of the crowd

Now it's time to check if averaging all agent's decisions indeed produces a better path than the individual ones. I start by initializing a path at the start node, then I check the records and take the action the was taken the most of the time by the agents. I repeat this rule until the end is reached.

def follow_the_crowd(statistics, start, end):
    history = [start]
    next_position = start
    while next_position != end:
        next_position = max(statistics[next_position], key=statistics[next_position].get)
        history.append(next_position)
    return history

The resulting path taken is the crowd averaged path and, guess what, it is only $52$ steps long. So here you go, even if none of the individual got it right and with a very bad mean of $233$-ish steps for all agents, the collective average found a better path, which happens to be the shortest path.

Fascinating, right?

Figure $1$: Crowd path, same as the shortest path

Scaling

But does it scale ? Maybe I was lucky. I tested with a bigger maze, a $32\times 64$. The simulation was obviously longer to run. The mean number of steps was $11820.2$ while the best agent scored $1314$.

The shortest path is far from that: $802$ steps.

Figure $1$: Scaled version

Then, I performed the same averaging and found $802$ steps as well, this thing scales!

Figure $1$: Crowd path, same as the shortest path

Hope you enjoyed this little experimentation of crowd's wisdom, now back to reading.

Get back to top