Prisoner’s Dilemma – Graph setup

Setup notebook for experiment

Setting up the workspace – assuming:
you’ve vscode (setup with python, jupyter and anaconda) – just my preferred dev env
have cloned the repo
are in the parent folder called sandbox
and have created a conda environment in which you’re running this notebook

In [1]:

import sys
import os
sys.path.append(os.path.abspath("..")) 

Imports

In [2]:

import networkx as nx
import numpy as np
import pandas as pd

In [3]:

from src.plotting_func import *
from src.strategies import *
from src.utils import *

Set up Network

watts_strogatz_graph – mimics social networks
https://www.kth.se/social/files/5605669af2765468be471eda/lecture%204%20%282015%29.pdf
https://snap.stanford.edu/class/cs224w-readings/watts98smallworld.pdf
agents – nodes – people /regions
create a graph with n nodes and k nearest neighbors with p probability of rewiring (long range connections)

n ~30 for testing, k ~ n/10 (even, > 2 for clusters), p=0.3 introduces global connections
refer to notebook sandbox_social_networks for more details

In [4]:

def generate_graph(num_nodes=50, k=6, p=0.3):
    G = nx.watts_strogatz_graph(n=num_nodes, k=k, p=p)
    return G

G = generate_graph()

Each node has two options – to defect (0) or to cooperate (1) – its not a value more like option A and option B
this information is captured in a node’s current state – this represents the action (not the value of that action) Value of decisions is assigned further below in payoffs

Simple Initialization

Simple here means nodes are randomly initialized
and all of them have same strategy (which in a way is learning from their neighbors to maximize the reward)

In [5]:

def initialize_actions(G, random_init=True): #all nodes randomly initialized
    for node in G.nodes:
        G.nodes[node]['current_action'] = np.random.choice([0, 1]) if random_init else 1
        
initialize_actions(G)

Displaying this network here – you could change n,k and p above if you’re not satisfied with this one or want to model something else
you can also refer to other sandbox notebook on social networks i have created to see some other samples and play around with more model environments you would like to experiment with

In [6]:

current_actions = [G.nodes[node]['current_action'] for node in G.nodes]
visualize_graph(G, current_actions, title="Our network to model the problem on")

where its initialized with some people defecting (0) and some cooperating (1) – this can be changed later – but for now we just choose to initialize them like this

In [7]:

plot_node_attribute_dist(G.nodes, 'current_action', title="Current Distribution (Initialized)")

Rules

Setting rewards and penalties – its a cooperation game between 2 – so rewards and penalties are on 2^2 => 4 combinations
We want to reward cooperation most
i was however thinking that we could also add something for whoever defects after gets more that’d be interesting but maybe later

In [8]:

payoff_matrix = {
    (1, 1): (3, 3),  # Both cooperate
    (1, 0): (0, 2),  # Cooperator gets 0, defector gets 2
    (0, 1): (2, 0),  # Defector gets 2, cooperator gets 0
    (0, 0): (0, 0)   # Both defect
}

Right now no node has any payoff assigned because we have initialized them but we havent done message passing yet

Round – Interaction

This defines one round of prisoners dilemma
our graph has already been initialized to cooperating and defecting nodes
we start with setting the payoffs for everyone to 0
we take their initialized actions
calculate the payoffs for each node based on its interaction with its neighbors
assign payoff to each node

In [9]:

def play_pd_round(G):
    payoffs = {node: 0 for node in G.nodes} #initializing payoff for each to 0
    for node in G.nodes: #each node can only play with its neighbors which in our case was defined by k
        for neighbor in G.neighbors(node): # for each neighbor
            s1 = G.nodes[node]['current_action']  # we get action of the node
            s2 = G.nodes[neighbor]['current_action'] # we get action of the neighbor
            p1, p2 = payoff_matrix[(s1, s2)] # we calculate payoff of each interaction - where p1 is payoff of node in question
            payoffs[node] += p1 #adding the payoff of the node in question (based on interaction with all its neighbors)
        G.nodes[node]['payoff'] = payoffs[node] #updating the node with its payoff

In [10]:

play_pd_round(G)
print("Payoffs:", [G.nodes[node]['payoff'] for node in G.nodes])
Payoffs: [8, 9, 12, 10, 15, 9, 9, 18, 4, 6, 9, 10, 9, 6, 9, 6, 9, 4, 6, 6, 6, 6, 0, 6, 8, 8, 3, 3, 2, 8, 0, 4, 0, 2, 4, 6, 0, 4, 4, 4, 6, 4, 6, 4, 4, 3, 6, 6, 4, 9]

Now each node has payoff – right now it was a mix of luck (cause it was randomly initialized to have defecting or cooperating neighbors)
and its own decisions (some of them were nice (cooperative) – some of them were nasty (defecting))
the nodes are also initialized to be nice or nasty randomly – so we’ve about 50% nice and 50% nasty in our sample of 50

In [11]:

plot_node_attribute_dist(G.nodes, 'current_action', title="Current Distribution (Initialized)")
plot_node_attribute_dist(G.nodes, 'payoff', title="Payoff Distribution after 1st round")

In [12]:

plot_node_attribute_grouped_dist(G, group_by='current_action', value_attr='payoff', 
                            title="Payoff Distribution by Action")

in this particular instance:
This graph shows us that high payoffs (>10) – are only obtained by nodes with cooperation strategy – that is nodes that were initialized as 1s this also shows that the lowest payoff is also a cooperative node and most of the defectors were in the middle

what we should look for is the peak of the blue and orange distributions – where they lie
on the payoff scale – are the peak of distribution towards the right (most of theme have high payoff) on the frequency scale – is the peak higher (smoother distribution or not – a not smooth distribution indicates that most of them are likely to have that payoff value)

Updating action after interaction

Each neighbor mimics the neighbor with highest payoff in the last round

In [13]:

def update_actions(G):
    for node in G.nodes:
        payoffs = {n: G.nodes[n]['payoff'] for n in G.neighbors(node)}
        neighbor_payoffs = {neighbor: payoffs[neighbor] for neighbor in G.neighbors(node)}
        if neighbor_payoffs:
            best_neighbor = max(neighbor_payoffs, key=neighbor_payoffs.get)
            G.nodes[node]['current_action'] = G.nodes[best_neighbor]['current_action']

In [14]:

update_actions(G)
current_actions2 = [G.nodes[node]['current_action'] for node in G.nodes]
visualize_graph(G, current_actions2, title="Updated Agent Actions After 1 Round")

Well i guess cooperation works – most of the agents realized that lol – niceness spreads fast! (i might run it multiple times and that might not be the case)

In [15]:

def compute_neighbor_counts(G):
    for node in G.nodes:
        neighbors = list(G.neighbors(node))
        neighbor_actions = [G.nodes[neighbor]['current_action'] for neighbor in neighbors]
        G.nodes[node]['defector_neighbors'] = neighbor_actions.count(0)
        G.nodes[node]['coop_neighbors'] = neighbor_actions.count(1)
compute_neighbor_counts(G)
plot_node_scatter(G, x_attr='defector_neighbors', y_attr='payoff', color_attr='current_action', 
                  title="Payoff vs. defector neighbors (Colored by Current Action (Round 2))")
plot_node_scatter(G, x_attr='coop_neighbors', y_attr='payoff', color_attr='current_action', 
                  title="Payoff vs. coop neighbors (Colored by Current Action (Round 2))")

Multiple strategies

above we just randomly initialized and planned to use mimic down the road to dictate the next decision – mimic or imitator in this case just copies the move of its highest paid neighbor

if you watched the youtube video you see they have a lot of actors / agents / strategies their like tit-for-tat etc.
i (with the help of chatgpt) have written some of those out in there – now the method below actually has another initialization option – in which each of your node has an actual strategy and wont be all mimicing down the road

In [16]:

def initialize_agent_actions(G, config=None, random_init=True):
    for node in G.nodes:
        G.nodes[node]['strategy_type'] = config['strategy_type'] if config and 'strategy_type' in config else np.random.choice(list(STRATEGY_FUNCTIONS.keys()))
        G.nodes[node]['current_action'] = np.random.choice([0, 1]) if random_init else 1
        G.nodes[node]['memory'] = {}
        G.nodes[node]['triggered'] = False
        G.nodes[node]['payoff'] = 0
        G.nodes[node]['prev_payoff'] = 0

In [17]:

def initialize_agent_actions(G, config=None, random_init=True):
    strategy_types = list(STRATEGY_FUNCTIONS.keys())
    nodes = list(G.nodes)
    total_nodes = len(nodes)

    if config:
        strategy_counts = []
        assigned_nodes = 0
        for strategy in config:
            count = int(config[strategy] * total_nodes)
            strategy_counts.append((strategy, count))
            assigned_nodes += count
        remaining_nodes = total_nodes - assigned_nodes
        # If there are remaining nodes, assign them randomly
        remaining_strategies = np.random.choice(strategy_types, size=remaining_nodes)

        strategy_assignment = []
        for strategy, count in strategy_counts:
            strategy_assignment.extend([strategy] * count)
        strategy_assignment.extend(remaining_strategies)
        
        # Shuffle the assignments
        strategy_assignment = np.array(strategy_assignment)
        np.random.shuffle(strategy_assignment)
    else:
        # Assign all nodes randomly
        strategy_assignment = np.random.choice(strategy_types, size=total_nodes)

    # Assign strategies and initialize other attributes
    for node, strategy in zip(nodes, strategy_assignment):
        G.nodes[node]['strategy_type'] = strategy
        G.nodes[node]['current_action'] = np.random.choice([0, 1]) if random_init else 1
        G.nodes[node]['memory'] = {}
        G.nodes[node]['triggered'] = False
        G.nodes[node]['payoff'] = 0
        G.nodes[node]['prev_payoff'] = 0

In [18]:

initialize_agent_actions(G)

In [19]:

current_actions = [G.nodes[node]['current_action'] for node in G.nodes]
visualize_graph(G, current_actions, title="Our network to model the problem on")

In [20]:

agent_strategies = [G.nodes[node]['strategy_type'] for node in G.nodes]
visualize_graph(G, agent_strategies, title="Agents in our network")

In [21]:

plot_node_attribute_dist(G.nodes, 'current_action', title="Current Distribution (Initialized)")
plot_node_attribute_dist(G.nodes, 'strategy_type', title="Strategy type Distribution")

so, starting the round again on this new graph – doing interactions – calculating payoffs – this shouldn’t be any different than the first time because we still have initialized the nodes randomly and again payoffs is luck – much like real life these actors will be advantaged or disadvantaged (because they could be always cooperate but initialized to defect)

In [22]:

play_pd_round(G)

In [23]:

current_actions2 = [G.nodes[node]['current_action'] for node in G.nodes]
visualize_graph(G, current_actions2, title="our Graph after Round 1")

Updating action – after interaction

In [24]:

def update_agent_actions(G):             
    for node in G.nodes:
        # payoffs = {n: G.nodes[n]['payoff'] for n in G.neighbors(node)}
        strategy_type = G.nodes[node]['strategy_type']
        strategy_func = STRATEGY_FUNCTIONS[strategy_type]
        neighbors = list(G.neighbors(node))
        next_action = strategy_func(node, G, neighbors)
        G.nodes[node]['current_action'] = next_action
        G.nodes[node]['prev_payoff'] = G.nodes[node]['payoff']
        for neighbor in neighbors:
            G.nodes[node]['memory'][neighbor] = G.nodes[neighbor]['current_action']

the above code is the one that’ll be making things interesting thing because now instead of mimicing the behaviour of their most well performing neighbor – they’ll act according to their own characterstics

In [25]:

update_agent_actions(G)

In [26]:

current_actions2 = [G.nodes[node]['current_action'] for node in G.nodes]
visualize_graph(G, current_actions2, title="our Graph after Round 1")

In [27]:

plot_node_attribute_grouped_dist(G, group_by='current_action', value_attr='payoff', title="Payoff Distribution by Action")

In [28]:

plot_node_attribute_grouped_dist(G, group_by='strategy_type', value_attr='payoff', title="Payoff Distribution by Action")

Okay that sjust initial round – thats what we start with
we didnt start with all cooperate – equal footing payoffs 0
we let first payoff decided by the neighbors (coop or def) and their own behaviour was randomly initialized

In [29]:

play_pd_round(G)
update_agent_actions(G)
current_actions3 = [G.nodes[node]['current_action'] for node in G.nodes]    
visualize_graph(G, current_actions3, title="our Graph after Round 2")
plot_node_attribute_grouped_dist(G, group_by='strategy_type', value_attr='payoff', title="Payoff Distribution by (Round 2)")

In [30]:

analyze_strategy_performance(G)
Strategy       Avg Payoff  Min Payoff  Max Payoff  Total Payoff   Node Count
Always_C       15.00       12.00       18.00       30.00          2         
Grim           12.00       9.00        15.00       48.00          4         
TFTT           12.00       3.00        15.00       60.00          5         
Prober         10.75       6.00        15.00       86.00          8         
Random         10.25       4.00        15.00       41.00          4         
ZD_Extortion   8.80        4.00        16.00       44.00          5         
TFT            8.40        3.00        15.00       42.00          5         
GTFT           8.00        6.00        9.00        24.00          3         
Always_D       6.80        2.00        14.00       34.00          5         
Pavlov         6.75        4.00        9.00        27.00          4         
Imitator       6.40        2.00        18.00       32.00          5         

Run for multiple rounds

Now that we have defined the steps – we will create a new graph – call those methods and run multiple rounds – and store each round

In [ ]:

def run_simulation(num_rounds=10):

    G = generate_graph(num_nodes=50, k=6, p=0.3)
    # initialize_actions(G, random=False)
    initialize_agent_actions(G, random_init=False) #config=load_config(), 
    
    current_actions = [G.nodes[node]['current_action'] for node in G.nodes]
    visualize_graph(G, current_actions, title="Our network to model the problem on (Initialized)")
    plot_node_attribute_dist(G.nodes, 'current_action', title="Current Distribution (Initialized)")
    
    all_data = []

    for round_num in range(num_rounds): # In every round
        # Calculate payoffs
        play_pd_round(G)
        # Collect node features and labels
        for node in G.nodes:
            degree = G.degree[node]
            current_action = G.nodes[node]['current_action']
            strategy_type = G.nodes[node]['strategy_type']
            payoffs = G.nodes[node]['payoff']
            prev_payoff = G.nodes[node]['prev_payoff']
            memory = G.nodes[node]['memory']
            triggered = G.nodes[node]['triggered']
            neighbors = list(G.neighbors(node))
            neighbor_actions = [G.nodes[neighbor]['current_action'] for neighbor in neighbors]
            defector_neighbors = neighbor_actions.count(0)
            coop_neighbors = neighbor_actions.count(1)
            all_data.append({
                'node': node,
                'round': round_num,
                'degree': degree,
                'current_action': current_action,
                'strategy_type': strategy_type,
                'payoff': payoffs,
                'memory': memory,
                'current_action': current_action,
                'memory': memory,
                'triggered': triggered,
                'prev_payoff': prev_payoff,
                'coop_neighbors': coop_neighbors,
                'defector_neighbors': defector_neighbors
            })
        # Update actions based on payoffs
        # update_actions(G)
        update_agent_actions(G)
        # Visualize the graph after each round  

        current_actions = [G.nodes[node]['current_action'] for node in G.nodes]

        visualize_graph(G, current_actions, title=f"Graph after Round {round_num + 1}")
        
        plot_node_attribute_dist(G.nodes, 'current_action', title=f"Current Distribution (Round {round_num + 1})")
        plot_node_attribute_grouped_dist(G, group_by='current_action', value_attr='payoff', title=f"Payoff by Action (Round {round_num + 1})")
        plot_node_attribute_grouped_dist(G, group_by='strategy_type', value_attr='payoff', title=f"Payoff by Strategy (Round {round_num + 1})")
        
        analyze_strategy_performance(G)

    return pd.DataFrame(all_data)

In [36]:

df = run_simulation(G, num_rounds=10)
df.to_csv('../data/agent_simulation2.csv', index=False)
df.head(5)
Strategy       Avg Payoff  Min Payoff  Max Payoff  Total Payoff   Node Count
Pavlov         19.50       12.00       27.00       117.00         6         
ZD_Extortion   18.75       15.00       24.00       75.00          4         
TFTT           18.75       18.00       21.00       75.00          4         
TFT            18.50       15.00       24.00       111.00         6         
Imitator       18.00       15.00       21.00       126.00         7         
Grim           18.00       15.00       21.00       90.00          5         
Random         18.00       18.00       18.00       18.00          1         
Always_D       17.25       15.00       18.00       69.00          4         
GTFT           17.14       12.00       24.00       120.00         7         
Prober         17.00       15.00       18.00       51.00          3         
Always_C       16.00       15.00       18.00       48.00          3         
Strategy       Avg Payoff  Min Payoff  Max Payoff  Total Payoff   Node Count
TFTT           13.50       12.00       18.00       54.00          4         
GTFT           11.57       3.00        18.00       81.00          7         
Grim           11.40       6.00        15.00       57.00          5         
Prober         11.00       9.00        12.00       33.00          3         
TFT            10.50       6.00        18.00       63.00          6         
ZD_Extortion   9.00        6.00        12.00       36.00          4         
Imitator       8.29        4.00        15.00       58.00          7         
Always_C       8.00        3.00        12.00       24.00          3         
Random         8.00        8.00        8.00        8.00           1         
Pavlov         7.00        4.00        12.00       42.00          6         
Always_D       6.50        4.00        10.00       26.00          4         
Strategy       Avg Payoff  Min Payoff  Max Payoff  Total Payoff   Node Count
GTFT           10.29       3.00        18.00       72.00          7         
TFTT           8.75        6.00        12.00       35.00          4         
TFT            8.50        3.00        15.00       51.00          6         
ZD_Extortion   8.00        4.00        12.00       32.00          4         
Prober         7.67        6.00        9.00        23.00          3         
Always_C       7.00        0.00        15.00       21.00          3         
Imitator       6.00        4.00        9.00        42.00          7         
Random         6.00        6.00        6.00        6.00           1         
Grim           5.60        2.00        8.00        28.00          5         
Pavlov         5.00        0.00        10.00       30.00          6         
Always_D       4.50        2.00        8.00        18.00          4         
Strategy       Avg Payoff  Min Payoff  Max Payoff  Total Payoff   Node Count
Random         12.00       12.00       12.00       12.00          1         
ZD_Extortion   6.75        2.00        12.00       27.00          4         
Always_C       6.00        0.00        12.00       18.00          3         
Grim           5.60        4.00        8.00        28.00          5         
GTFT           5.57        0.00        12.00       39.00          7         
TFT            5.50        2.00        12.00       33.00          6         
TFTT           5.00        2.00        10.00       20.00          4         
Pavlov         4.67        2.00        8.00        28.00          6         
Prober         4.67        2.00        6.00        14.00          3         
Always_D       4.50        4.00        6.00        18.00          4         
Imitator       4.14        2.00        6.00        29.00          7         
Strategy       Avg Payoff  Min Payoff  Max Payoff  Total Payoff   Node Count
Random         9.00        9.00        9.00        9.00           1         
Always_C       6.00        0.00        9.00        18.00          3         
ZD_Extortion   5.50        2.00        8.00        22.00          4         
Prober         5.33        4.00        8.00        16.00          3         
Grim           5.20        2.00        10.00       26.00          5         
GTFT           4.57        0.00        10.00       32.00          7         
Always_D       4.50        4.00        6.00        18.00          4         
TFT            4.17        2.00        9.00        25.00          6         
Imitator       3.86        0.00        9.00        27.00          7         
Pavlov         3.67        0.00        10.00       22.00          6         
TFTT           3.25        0.00        9.00        13.00          4         
Strategy       Avg Payoff  Min Payoff  Max Payoff  Total Payoff   Node Count
Random         15.00       15.00       15.00       15.00          1         
Prober         7.67        6.00        9.00        23.00          3         
TFT            6.17        2.00        15.00       37.00          6         
Always_C       6.00        0.00        12.00       18.00          3         
Grim           4.80        4.00        6.00        24.00          5         
ZD_Extortion   4.50        0.00        9.00        18.00          4         
Imitator       4.00        2.00        10.00       28.00          7         
Pavlov         3.50        0.00        8.00        21.00          6         
Always_D       3.50        2.00        4.00        14.00          4         
GTFT           3.14        0.00        9.00        22.00          7         
TFTT           3.00        0.00        10.00       12.00          4         
Strategy       Avg Payoff  Min Payoff  Max Payoff  Total Payoff   Node Count
Random         8.00        8.00        8.00        8.00           1         
ZD_Extortion   7.25        4.00        9.00        29.00          4         
Grim           6.40        2.00        8.00        32.00          5         
Always_D       6.00        4.00        8.00        24.00          4         
Prober         6.00        6.00        6.00        18.00          3         
GTFT           5.86        3.00        9.00        41.00          7         
TFT            5.17        0.00        12.00       31.00          6         
Pavlov         4.33        0.00        8.00        26.00          6         
Always_C       4.00        0.00        9.00        12.00          3         
TFTT           4.00        0.00        8.00        16.00          4         
Imitator       3.29        0.00        8.00        23.00          7         
Strategy       Avg Payoff  Min Payoff  Max Payoff  Total Payoff   Node Count
Random         12.00       12.00       12.00       12.00          1         
Always_C       6.00        0.00        9.00        18.00          3         
Grim           6.00        4.00        8.00        30.00          5         
Prober         6.00        4.00        8.00        18.00          3         
Pavlov         4.83        0.00        12.00       29.00          6         
GTFT           4.57        0.00        8.00        32.00          7         
Always_D       4.00        2.00        6.00        16.00          4         
TFT            3.83        0.00        9.00        23.00          6         
ZD_Extortion   3.75        0.00        8.00        15.00          4         
TFTT           3.25        0.00        9.00        13.00          4         
Imitator       3.14        0.00        6.00        22.00          7         
Strategy       Avg Payoff  Min Payoff  Max Payoff  Total Payoff   Node Count
Random         8.00        8.00        8.00        8.00           1         
ZD_Extortion   7.25        2.00        12.00       29.00          4         
TFT            5.67        2.00        12.00       34.00          6         
Prober         5.67        2.00        12.00       17.00          3         
Always_D       5.50        4.00        6.00        22.00          4         
Grim           5.20        4.00        8.00        26.00          5         
Always_C       5.00        0.00        12.00       15.00          3         
GTFT           4.71        0.00        12.00       33.00          7         
Imitator       4.57        0.00        12.00       32.00          7         
TFTT           3.50        0.00        10.00       14.00          4         
Pavlov         3.00        0.00        8.00        18.00          6         
Strategy       Avg Payoff  Min Payoff  Max Payoff  Total Payoff   Node Count
Random         9.00        9.00        9.00        9.00           1         
Always_C       6.00        0.00        12.00       18.00          3         
ZD_Extortion   5.75        2.00        12.00       23.00          4         
TFT            5.67        2.00        12.00       34.00          6         
Grim           5.20        4.00        6.00        26.00          5         
Always_D       4.00        2.00        6.00        16.00          4         
GTFT           4.00        2.00        6.00        28.00          7         
Prober         3.67        3.00        4.00        11.00          3         
TFTT           3.00        0.00        8.00        12.00          4         
Imitator       2.71        0.00        8.00        19.00          7         
Pavlov         2.67        0.00        6.00        16.00          6         

Out[36]:

Do pay attention to the attributes i have run the simulation with – my graph is initialized to be all cooperative here (not random), i have assigned nodes different strategies randomly (you could specify a certain distribution by updating the config file and passing the config argument maybe you want 20% tit for tat, 20% grim (rest of them would be random) and run those iterations).
this has saved data of these iterations which we will now train our GNNs on – if doing more iterations – maybe comment out plotting functions.

In [4]:

from src.prisoner_dilemma import run_simulation
df = run_simulation(num_rounds=200, num_nodes=50, average_connection=6, rewiring=0.3)
df.to_csv('../data/agent_simulation200.csv', index=False)
df.head(5)

Out[4]:

Share:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *