Setup notebook for experiment
Setting up the workspace – assuming:
you’ve vscode (setup with python, jupyter and anaconda) – just my preferred dev env
have cloned the repo
are in the parent folder called sandbox
and have created a conda environment in which you’re running this notebook
In [1]:
import sys
import os
sys.path.append(os.path.abspath(".."))
Imports
In [2]:
import networkx as nx import numpy as np import pandas as pd
In [3]:
from src.plotting_func import * from src.strategies import * from src.utils import *
Set up Network
watts_strogatz_graph – mimics social networks
https://www.kth.se/social/files/5605669af2765468be471eda/lecture%204%20%282015%29.pdf
https://snap.stanford.edu/class/cs224w-readings/watts98smallworld.pdf
agents – nodes – people /regions
create a graph with n nodes and k nearest neighbors with p probability of rewiring (long range connections)
n ~30 for testing, k ~ n/10 (even, > 2 for clusters), p=0.3 introduces global connections
refer to notebook sandbox_social_networks for more details
In [4]:
def generate_graph(num_nodes=50, k=6, p=0.3):
G = nx.watts_strogatz_graph(n=num_nodes, k=k, p=p)
return G
G = generate_graph()
Each node has two options – to defect (0) or to cooperate (1) – its not a value more like option A and option B
this information is captured in a node’s current state – this represents the action (not the value of that action) Value of decisions is assigned further below in payoffs
Simple Initialization
Simple here means nodes are randomly initialized
and all of them have same strategy (which in a way is learning from their neighbors to maximize the reward)
In [5]:
def initialize_actions(G, random_init=True): #all nodes randomly initialized
for node in G.nodes:
G.nodes[node]['current_action'] = np.random.choice([0, 1]) if random_init else 1
initialize_actions(G)
Displaying this network here – you could change n,k and p above if you’re not satisfied with this one or want to model something else
you can also refer to other sandbox notebook on social networks i have created to see some other samples and play around with more model environments you would like to experiment with
In [6]:
current_actions = [G.nodes[node]['current_action'] for node in G.nodes] visualize_graph(G, current_actions, title="Our network to model the problem on")

where its initialized with some people defecting (0) and some cooperating (1) – this can be changed later – but for now we just choose to initialize them like this
In [7]:
plot_node_attribute_dist(G.nodes, 'current_action', title="Current Distribution (Initialized)")

Rules
Setting rewards and penalties – its a cooperation game between 2 – so rewards and penalties are on 2^2 => 4 combinations
We want to reward cooperation most
i was however thinking that we could also add something for whoever defects after gets more that’d be interesting but maybe later
In [8]:
payoff_matrix = {
(1, 1): (3, 3), # Both cooperate
(1, 0): (0, 2), # Cooperator gets 0, defector gets 2
(0, 1): (2, 0), # Defector gets 2, cooperator gets 0
(0, 0): (0, 0) # Both defect
}
Right now no node has any payoff assigned because we have initialized them but we havent done message passing yet
Round – Interaction
This defines one round of prisoners dilemma
our graph has already been initialized to cooperating and defecting nodes
we start with setting the payoffs for everyone to 0
we take their initialized actions
calculate the payoffs for each node based on its interaction with its neighbors
assign payoff to each node
In [9]:
def play_pd_round(G):
payoffs = {node: 0 for node in G.nodes} #initializing payoff for each to 0
for node in G.nodes: #each node can only play with its neighbors which in our case was defined by k
for neighbor in G.neighbors(node): # for each neighbor
s1 = G.nodes[node]['current_action'] # we get action of the node
s2 = G.nodes[neighbor]['current_action'] # we get action of the neighbor
p1, p2 = payoff_matrix[(s1, s2)] # we calculate payoff of each interaction - where p1 is payoff of node in question
payoffs[node] += p1 #adding the payoff of the node in question (based on interaction with all its neighbors)
G.nodes[node]['payoff'] = payoffs[node] #updating the node with its payoff
In [10]:
play_pd_round(G)
print("Payoffs:", [G.nodes[node]['payoff'] for node in G.nodes])
Payoffs: [8, 9, 12, 10, 15, 9, 9, 18, 4, 6, 9, 10, 9, 6, 9, 6, 9, 4, 6, 6, 6, 6, 0, 6, 8, 8, 3, 3, 2, 8, 0, 4, 0, 2, 4, 6, 0, 4, 4, 4, 6, 4, 6, 4, 4, 3, 6, 6, 4, 9]
Now each node has payoff – right now it was a mix of luck (cause it was randomly initialized to have defecting or cooperating neighbors)
and its own decisions (some of them were nice (cooperative) – some of them were nasty (defecting))
the nodes are also initialized to be nice or nasty randomly – so we’ve about 50% nice and 50% nasty in our sample of 50
In [11]:
plot_node_attribute_dist(G.nodes, 'current_action', title="Current Distribution (Initialized)") plot_node_attribute_dist(G.nodes, 'payoff', title="Payoff Distribution after 1st round")


In [12]:
plot_node_attribute_grouped_dist(G, group_by='current_action', value_attr='payoff',
title="Payoff Distribution by Action")

in this particular instance:
This graph shows us that high payoffs (>10) – are only obtained by nodes with cooperation strategy – that is nodes that were initialized as 1s this also shows that the lowest payoff is also a cooperative node and most of the defectors were in the middle
what we should look for is the peak of the blue and orange distributions – where they lie
on the payoff scale – are the peak of distribution towards the right (most of theme have high payoff) on the frequency scale – is the peak higher (smoother distribution or not – a not smooth distribution indicates that most of them are likely to have that payoff value)
Updating action after interaction
Each neighbor mimics the neighbor with highest payoff in the last round
In [13]:
def update_actions(G):
for node in G.nodes:
payoffs = {n: G.nodes[n]['payoff'] for n in G.neighbors(node)}
neighbor_payoffs = {neighbor: payoffs[neighbor] for neighbor in G.neighbors(node)}
if neighbor_payoffs:
best_neighbor = max(neighbor_payoffs, key=neighbor_payoffs.get)
G.nodes[node]['current_action'] = G.nodes[best_neighbor]['current_action']
In [14]:
update_actions(G) current_actions2 = [G.nodes[node]['current_action'] for node in G.nodes] visualize_graph(G, current_actions2, title="Updated Agent Actions After 1 Round")

Well i guess cooperation works – most of the agents realized that lol – niceness spreads fast! (i might run it multiple times and that might not be the case)
In [15]:
def compute_neighbor_counts(G):
for node in G.nodes:
neighbors = list(G.neighbors(node))
neighbor_actions = [G.nodes[neighbor]['current_action'] for neighbor in neighbors]
G.nodes[node]['defector_neighbors'] = neighbor_actions.count(0)
G.nodes[node]['coop_neighbors'] = neighbor_actions.count(1)
compute_neighbor_counts(G)
plot_node_scatter(G, x_attr='defector_neighbors', y_attr='payoff', color_attr='current_action',
title="Payoff vs. defector neighbors (Colored by Current Action (Round 2))")
plot_node_scatter(G, x_attr='coop_neighbors', y_attr='payoff', color_attr='current_action',
title="Payoff vs. coop neighbors (Colored by Current Action (Round 2))")


Multiple strategies
above we just randomly initialized and planned to use mimic down the road to dictate the next decision – mimic or imitator in this case just copies the move of its highest paid neighbor
if you watched the youtube video you see they have a lot of actors / agents / strategies their like tit-for-tat etc.
i (with the help of chatgpt) have written some of those out in there – now the method below actually has another initialization option – in which each of your node has an actual strategy and wont be all mimicing down the road
In [16]:
def initialize_agent_actions(G, config=None, random_init=True):
for node in G.nodes:
G.nodes[node]['strategy_type'] = config['strategy_type'] if config and 'strategy_type' in config else np.random.choice(list(STRATEGY_FUNCTIONS.keys()))
G.nodes[node]['current_action'] = np.random.choice([0, 1]) if random_init else 1
G.nodes[node]['memory'] = {}
G.nodes[node]['triggered'] = False
G.nodes[node]['payoff'] = 0
G.nodes[node]['prev_payoff'] = 0
In [17]:
def initialize_agent_actions(G, config=None, random_init=True):
strategy_types = list(STRATEGY_FUNCTIONS.keys())
nodes = list(G.nodes)
total_nodes = len(nodes)
if config:
strategy_counts = []
assigned_nodes = 0
for strategy in config:
count = int(config[strategy] * total_nodes)
strategy_counts.append((strategy, count))
assigned_nodes += count
remaining_nodes = total_nodes - assigned_nodes
# If there are remaining nodes, assign them randomly
remaining_strategies = np.random.choice(strategy_types, size=remaining_nodes)
strategy_assignment = []
for strategy, count in strategy_counts:
strategy_assignment.extend([strategy] * count)
strategy_assignment.extend(remaining_strategies)
# Shuffle the assignments
strategy_assignment = np.array(strategy_assignment)
np.random.shuffle(strategy_assignment)
else:
# Assign all nodes randomly
strategy_assignment = np.random.choice(strategy_types, size=total_nodes)
# Assign strategies and initialize other attributes
for node, strategy in zip(nodes, strategy_assignment):
G.nodes[node]['strategy_type'] = strategy
G.nodes[node]['current_action'] = np.random.choice([0, 1]) if random_init else 1
G.nodes[node]['memory'] = {}
G.nodes[node]['triggered'] = False
G.nodes[node]['payoff'] = 0
G.nodes[node]['prev_payoff'] = 0
In [18]:
initialize_agent_actions(G)
In [19]:
current_actions = [G.nodes[node]['current_action'] for node in G.nodes] visualize_graph(G, current_actions, title="Our network to model the problem on")

In [20]:
agent_strategies = [G.nodes[node]['strategy_type'] for node in G.nodes] visualize_graph(G, agent_strategies, title="Agents in our network")

In [21]:
plot_node_attribute_dist(G.nodes, 'current_action', title="Current Distribution (Initialized)") plot_node_attribute_dist(G.nodes, 'strategy_type', title="Strategy type Distribution")


so, starting the round again on this new graph – doing interactions – calculating payoffs – this shouldn’t be any different than the first time because we still have initialized the nodes randomly and again payoffs is luck – much like real life these actors will be advantaged or disadvantaged (because they could be always cooperate but initialized to defect)
In [22]:
play_pd_round(G)
In [23]:
current_actions2 = [G.nodes[node]['current_action'] for node in G.nodes] visualize_graph(G, current_actions2, title="our Graph after Round 1")

Updating action – after interaction
In [24]:
def update_agent_actions(G):
for node in G.nodes:
# payoffs = {n: G.nodes[n]['payoff'] for n in G.neighbors(node)}
strategy_type = G.nodes[node]['strategy_type']
strategy_func = STRATEGY_FUNCTIONS[strategy_type]
neighbors = list(G.neighbors(node))
next_action = strategy_func(node, G, neighbors)
G.nodes[node]['current_action'] = next_action
G.nodes[node]['prev_payoff'] = G.nodes[node]['payoff']
for neighbor in neighbors:
G.nodes[node]['memory'][neighbor] = G.nodes[neighbor]['current_action']
the above code is the one that’ll be making things interesting thing because now instead of mimicing the behaviour of their most well performing neighbor – they’ll act according to their own characterstics
In [25]:
update_agent_actions(G)
In [26]:
current_actions2 = [G.nodes[node]['current_action'] for node in G.nodes] visualize_graph(G, current_actions2, title="our Graph after Round 1")

In [27]:
plot_node_attribute_grouped_dist(G, group_by='current_action', value_attr='payoff', title="Payoff Distribution by Action")

In [28]:
plot_node_attribute_grouped_dist(G, group_by='strategy_type', value_attr='payoff', title="Payoff Distribution by Action")

Okay that sjust initial round – thats what we start with
we didnt start with all cooperate – equal footing payoffs 0
we let first payoff decided by the neighbors (coop or def) and their own behaviour was randomly initialized
In [29]:
play_pd_round(G) update_agent_actions(G) current_actions3 = [G.nodes[node]['current_action'] for node in G.nodes] visualize_graph(G, current_actions3, title="our Graph after Round 2") plot_node_attribute_grouped_dist(G, group_by='strategy_type', value_attr='payoff', title="Payoff Distribution by (Round 2)")


In [30]:
analyze_strategy_performance(G)
Strategy Avg Payoff Min Payoff Max Payoff Total Payoff Node Count Always_C 15.00 12.00 18.00 30.00 2 Grim 12.00 9.00 15.00 48.00 4 TFTT 12.00 3.00 15.00 60.00 5 Prober 10.75 6.00 15.00 86.00 8 Random 10.25 4.00 15.00 41.00 4 ZD_Extortion 8.80 4.00 16.00 44.00 5 TFT 8.40 3.00 15.00 42.00 5 GTFT 8.00 6.00 9.00 24.00 3 Always_D 6.80 2.00 14.00 34.00 5 Pavlov 6.75 4.00 9.00 27.00 4 Imitator 6.40 2.00 18.00 32.00 5
Run for multiple rounds
Now that we have defined the steps – we will create a new graph – call those methods and run multiple rounds – and store each round
In [ ]:
def run_simulation(num_rounds=10):
G = generate_graph(num_nodes=50, k=6, p=0.3)
# initialize_actions(G, random=False)
initialize_agent_actions(G, random_init=False) #config=load_config(),
current_actions = [G.nodes[node]['current_action'] for node in G.nodes]
visualize_graph(G, current_actions, title="Our network to model the problem on (Initialized)")
plot_node_attribute_dist(G.nodes, 'current_action', title="Current Distribution (Initialized)")
all_data = []
for round_num in range(num_rounds): # In every round
# Calculate payoffs
play_pd_round(G)
# Collect node features and labels
for node in G.nodes:
degree = G.degree[node]
current_action = G.nodes[node]['current_action']
strategy_type = G.nodes[node]['strategy_type']
payoffs = G.nodes[node]['payoff']
prev_payoff = G.nodes[node]['prev_payoff']
memory = G.nodes[node]['memory']
triggered = G.nodes[node]['triggered']
neighbors = list(G.neighbors(node))
neighbor_actions = [G.nodes[neighbor]['current_action'] for neighbor in neighbors]
defector_neighbors = neighbor_actions.count(0)
coop_neighbors = neighbor_actions.count(1)
all_data.append({
'node': node,
'round': round_num,
'degree': degree,
'current_action': current_action,
'strategy_type': strategy_type,
'payoff': payoffs,
'memory': memory,
'current_action': current_action,
'memory': memory,
'triggered': triggered,
'prev_payoff': prev_payoff,
'coop_neighbors': coop_neighbors,
'defector_neighbors': defector_neighbors
})
# Update actions based on payoffs
# update_actions(G)
update_agent_actions(G)
# Visualize the graph after each round
current_actions = [G.nodes[node]['current_action'] for node in G.nodes]
visualize_graph(G, current_actions, title=f"Graph after Round {round_num + 1}")
plot_node_attribute_dist(G.nodes, 'current_action', title=f"Current Distribution (Round {round_num + 1})")
plot_node_attribute_grouped_dist(G, group_by='current_action', value_attr='payoff', title=f"Payoff by Action (Round {round_num + 1})")
plot_node_attribute_grouped_dist(G, group_by='strategy_type', value_attr='payoff', title=f"Payoff by Strategy (Round {round_num + 1})")
analyze_strategy_performance(G)
return pd.DataFrame(all_data)
In [36]:
df = run_simulation(G, num_rounds=10)
df.to_csv('../data/agent_simulation2.csv', index=False)
df.head(5)






Strategy Avg Payoff Min Payoff Max Payoff Total Payoff Node Count Pavlov 19.50 12.00 27.00 117.00 6 ZD_Extortion 18.75 15.00 24.00 75.00 4 TFTT 18.75 18.00 21.00 75.00 4 TFT 18.50 15.00 24.00 111.00 6 Imitator 18.00 15.00 21.00 126.00 7 Grim 18.00 15.00 21.00 90.00 5 Random 18.00 18.00 18.00 18.00 1 Always_D 17.25 15.00 18.00 69.00 4 GTFT 17.14 12.00 24.00 120.00 7 Prober 17.00 15.00 18.00 51.00 3 Always_C 16.00 15.00 18.00 48.00 3




Strategy Avg Payoff Min Payoff Max Payoff Total Payoff Node Count TFTT 13.50 12.00 18.00 54.00 4 GTFT 11.57 3.00 18.00 81.00 7 Grim 11.40 6.00 15.00 57.00 5 Prober 11.00 9.00 12.00 33.00 3 TFT 10.50 6.00 18.00 63.00 6 ZD_Extortion 9.00 6.00 12.00 36.00 4 Imitator 8.29 4.00 15.00 58.00 7 Always_C 8.00 3.00 12.00 24.00 3 Random 8.00 8.00 8.00 8.00 1 Pavlov 7.00 4.00 12.00 42.00 6 Always_D 6.50 4.00 10.00 26.00 4




Strategy Avg Payoff Min Payoff Max Payoff Total Payoff Node Count GTFT 10.29 3.00 18.00 72.00 7 TFTT 8.75 6.00 12.00 35.00 4 TFT 8.50 3.00 15.00 51.00 6 ZD_Extortion 8.00 4.00 12.00 32.00 4 Prober 7.67 6.00 9.00 23.00 3 Always_C 7.00 0.00 15.00 21.00 3 Imitator 6.00 4.00 9.00 42.00 7 Random 6.00 6.00 6.00 6.00 1 Grim 5.60 2.00 8.00 28.00 5 Pavlov 5.00 0.00 10.00 30.00 6 Always_D 4.50 2.00 8.00 18.00 4




Strategy Avg Payoff Min Payoff Max Payoff Total Payoff Node Count Random 12.00 12.00 12.00 12.00 1 ZD_Extortion 6.75 2.00 12.00 27.00 4 Always_C 6.00 0.00 12.00 18.00 3 Grim 5.60 4.00 8.00 28.00 5 GTFT 5.57 0.00 12.00 39.00 7 TFT 5.50 2.00 12.00 33.00 6 TFTT 5.00 2.00 10.00 20.00 4 Pavlov 4.67 2.00 8.00 28.00 6 Prober 4.67 2.00 6.00 14.00 3 Always_D 4.50 4.00 6.00 18.00 4 Imitator 4.14 2.00 6.00 29.00 7




Strategy Avg Payoff Min Payoff Max Payoff Total Payoff Node Count Random 9.00 9.00 9.00 9.00 1 Always_C 6.00 0.00 9.00 18.00 3 ZD_Extortion 5.50 2.00 8.00 22.00 4 Prober 5.33 4.00 8.00 16.00 3 Grim 5.20 2.00 10.00 26.00 5 GTFT 4.57 0.00 10.00 32.00 7 Always_D 4.50 4.00 6.00 18.00 4 TFT 4.17 2.00 9.00 25.00 6 Imitator 3.86 0.00 9.00 27.00 7 Pavlov 3.67 0.00 10.00 22.00 6 TFTT 3.25 0.00 9.00 13.00 4




Strategy Avg Payoff Min Payoff Max Payoff Total Payoff Node Count Random 15.00 15.00 15.00 15.00 1 Prober 7.67 6.00 9.00 23.00 3 TFT 6.17 2.00 15.00 37.00 6 Always_C 6.00 0.00 12.00 18.00 3 Grim 4.80 4.00 6.00 24.00 5 ZD_Extortion 4.50 0.00 9.00 18.00 4 Imitator 4.00 2.00 10.00 28.00 7 Pavlov 3.50 0.00 8.00 21.00 6 Always_D 3.50 2.00 4.00 14.00 4 GTFT 3.14 0.00 9.00 22.00 7 TFTT 3.00 0.00 10.00 12.00 4




Strategy Avg Payoff Min Payoff Max Payoff Total Payoff Node Count Random 8.00 8.00 8.00 8.00 1 ZD_Extortion 7.25 4.00 9.00 29.00 4 Grim 6.40 2.00 8.00 32.00 5 Always_D 6.00 4.00 8.00 24.00 4 Prober 6.00 6.00 6.00 18.00 3 GTFT 5.86 3.00 9.00 41.00 7 TFT 5.17 0.00 12.00 31.00 6 Pavlov 4.33 0.00 8.00 26.00 6 Always_C 4.00 0.00 9.00 12.00 3 TFTT 4.00 0.00 8.00 16.00 4 Imitator 3.29 0.00 8.00 23.00 7




Strategy Avg Payoff Min Payoff Max Payoff Total Payoff Node Count Random 12.00 12.00 12.00 12.00 1 Always_C 6.00 0.00 9.00 18.00 3 Grim 6.00 4.00 8.00 30.00 5 Prober 6.00 4.00 8.00 18.00 3 Pavlov 4.83 0.00 12.00 29.00 6 GTFT 4.57 0.00 8.00 32.00 7 Always_D 4.00 2.00 6.00 16.00 4 TFT 3.83 0.00 9.00 23.00 6 ZD_Extortion 3.75 0.00 8.00 15.00 4 TFTT 3.25 0.00 9.00 13.00 4 Imitator 3.14 0.00 6.00 22.00 7




Strategy Avg Payoff Min Payoff Max Payoff Total Payoff Node Count Random 8.00 8.00 8.00 8.00 1 ZD_Extortion 7.25 2.00 12.00 29.00 4 TFT 5.67 2.00 12.00 34.00 6 Prober 5.67 2.00 12.00 17.00 3 Always_D 5.50 4.00 6.00 22.00 4 Grim 5.20 4.00 8.00 26.00 5 Always_C 5.00 0.00 12.00 15.00 3 GTFT 4.71 0.00 12.00 33.00 7 Imitator 4.57 0.00 12.00 32.00 7 TFTT 3.50 0.00 10.00 14.00 4 Pavlov 3.00 0.00 8.00 18.00 6




Strategy Avg Payoff Min Payoff Max Payoff Total Payoff Node Count Random 9.00 9.00 9.00 9.00 1 Always_C 6.00 0.00 12.00 18.00 3 ZD_Extortion 5.75 2.00 12.00 23.00 4 TFT 5.67 2.00 12.00 34.00 6 Grim 5.20 4.00 6.00 26.00 5 Always_D 4.00 2.00 6.00 16.00 4 GTFT 4.00 2.00 6.00 28.00 7 Prober 3.67 3.00 4.00 11.00 3 TFTT 3.00 0.00 8.00 12.00 4 Imitator 2.71 0.00 8.00 19.00 7 Pavlov 2.67 0.00 6.00 16.00 6
Out[36]:
Do pay attention to the attributes i have run the simulation with – my graph is initialized to be all cooperative here (not random), i have assigned nodes different strategies randomly (you could specify a certain distribution by updating the config file and passing the config argument maybe you want 20% tit for tat, 20% grim (rest of them would be random) and run those iterations).
this has saved data of these iterations which we will now train our GNNs on – if doing more iterations – maybe comment out plotting functions.
In [4]:
from src.prisoner_dilemma import run_simulation
df = run_simulation(num_rounds=200, num_nodes=50, average_connection=6, rewiring=0.3)
df.to_csv('../data/agent_simulation200.csv', index=False)
df.head(5)
Out[4]:

Leave a Reply