Tag Archives: Complex Systems

Math is the Path: Degree Distribution of the Prelims Network and Other Random Graphs

Over the past few weeks, I have been trying to learn the basics of statistical analysis of graphs, more specifically, complex networks. Here I must be honest, with high school level algebra as my only mathematical tool, trying to work through an article such as Albert and Barabási’s “Statistical mechanics of complex networks” is a daunting task. However, I have been able to get through the basic concepts and begin applying them to my work.

Something I found particularly interesting is the concept of Degree Distribution. As we already know, the degree of a node refers to the number of edges that are connected to that node, and not all of the nodes in a graph have the same degree. The distribution of degree in a network is “characterized by a distribution function P(k), which gives the probability that a randomly selected node has exactly k edges” (Albert, Barabási 2002). The degree distribution of a random graph, such as the variants of the Erdős–Rényi model, is a Poisson distribution, because “in a random graph the edges are placed randomly, the majority of nodes have approximately the same degree, close to the average degree <k> of the network” (Albert, Barabási 2002). However, is has been demonstrated that many real world graph’s degree distribution differ greatly from the Poisson distribution, instead, they demonstrate a power-law tail, P(k) ~ kγ ,and are called scale free networks (Albert, Barabási 2002).

Thinking about this, I became curious about the Preliminaries graph and its degree distribution. Preliminaries is a real-world network, its based on real information gathered from physical objects; however, the process of designing the schemas and relationships requires a fair amount of human manipulation. More about Preliminaries here, here, and here. So I decided to find the Prelims degree distribution and compare it with some random graphs. Now, as I said, my math skills are lacking a bit, something I plan on really working on over the next few years, but I do have what is takes to model  degree distribution using the Python modules NetworkX and Pylab. Here I will briefly describe my methodology(code) and the resulting plots.

First I decided to generate two random graphs and plot their degree distribution:

GNP Graph (Erdős–Rényi model):

This graph is generated by inputing the number of nodes in the graph and the probability that there is an edge between each pair of nodes. Because I wanted to imitate the node and edge count of the Prelims graph, I found a probability that would generate approximately 3464 edges. The formula for computing the probability is included in the code snippet used to generate the graph:


import pylab as pl
from networkx import *

### generate gnp_random_graph
### n = number of nodes
### m = expected number of edges

n = 1616
m = 3464

### p = probablity of edge creation
### m = p*n(n-1)
### 3464 = p*1616(1615)
### p = 0.0013272844312295006

p = 0.0013272844312295006
# generate graph
G = gnp_random_graph(n,p,directed=True)

# print basic stats
print ("Number of Nodes : %i" % (n))
print ("Number of Edges : %i" % (number_of_edges(G)))

# make a list of each node's degree
degree_list = list(G.degree().values())

# compute and print average node degree
print ("Avg. Node Degree: %f" %
 (float(sum(degree_list))/n))

# generate a list degree distribution
degree_hist = degree_histogram(G)
if len(degree_hist) < 15:
 print ("Degree Fequency List:")
 print ("Degree : # of Nodes")

# print the degree and number of nodes that have that degree
 for degree,number_of_nodes in enumerate(degree_hist):
   print ("%i : %i" % (degree,number_of_nodes))
else:
 print ("Degree Frequency List Too Long to Print")

# generate x,y values for degree dist. scatterplot
x_list = []
y_list = []
for degree,num_of_nodes in enumerate(degree_hist):
 if num_of_nodes > 0:
 x_list.append(degree)
 y_list.append(num_of_nodes)

# label the graph
pl.title('Degree Distribution\nGNP Graph')
pl.xlabel('Degree')
pl.ylabel('Frequency')

# plot degree distribution
pl.scatter(x_list,y_list)
pl.show()

This script results in the terminal output:

Number of Nodes : 1616
Number of Edges : 3605
Avg. Node Degree: 4.461634
Degree Fequency List:
Degree : # of Nodes
0 : 17
1 : 76
2 : 201
3 : 284
4 : 299
5 : 259
6 : 218
7 : 119
8 : 82
9 : 32
10 : 19
11 : 7
12 : 2
13 : 1

And the following scatter plot:

GNP_graph

As you can see this resembles a Poisson distribution, like this one taken from the WolframAlpha website:

Poisson

Scale Free Random Graph:

Next I generated a random scale free graph. The script I used was very similar to the previous script, except I used a different graph generator with only the node count as a parameter and I set the tighter limits for the x and y axes:


n = 1616
G = scale_free_graph(n)

# set limits for the axes
pl.gca().set_xlim([-10,70])
pl.gca().set_ylim([-10,120])

This script results in the following terminal output:


Number of Nodes : 1616
Number of Edges : 3428
Avg. Node Degree: 4.242574
Degree Frequency Too Long to Print

And the scatter plot:

TheScaleFree

This plot resembles a power law tail, such as this one from WolframAlpha:

PowerLawWol

Scale free distributions are commonly plotted in using log-log plots, such as those used by Albert and Barabási in the previously mentioned article. To produce a log-log plot, you can simply change the Pylab scale to log, also for better visualization change the axes limits:


# set limits for the axes
pl.gca().set_ylim([0.9,1000])
pl.gca().set_xlim([0.9,1000])

# log-log plot
pl.gca().set_xscale("log")
pl.gca().set_yscale("log")

The random scale free graph plotted in log-log looks like this:

NewScaleLog

So, now that we have seen what a PNG random graph and a scale free random graph degree distribution looks like, let’s take a look at the degree distribution of the Preliminaries graph. Although I should be able to read the Prelims .gexf file with NetworkX, I was generating error after error, so I decided to simply use the Gephi scripting console to generate a .txt file with the degree of each node. This should have been fairly straightforward, but I found the formatting of the degree values in the .txt file to be extremely difficult to work with, so I wrote a fairly ugly script to clean up the data so I can process it and generate plots for degree distribution. Using the following script I was able to generate some plots:

import pylab as pl

def degree_distribution(degree_list):
 """
 set up a dictionary with degree as key
 and frequency as value
 """
 degree_dict = {}
 for degree in node_list:
   degree_dict.setdefault(degree,0)
   degree_dict[degree] += 1
 return degree_dict

f = open('prelims_degree.txt','r')
line = f.readline().split()
f.close()

# clean up the data
clean_line = []
for degree in line:
 degree = list(degree)
 degree.pop()
 degree = ''.join([num for num in degree])
 clean_line.append(degree)

# make a list to be used in the in the-
# degree_distribution function
degree_list = []
for degree in clean_line:
 try:
   degree = int(degree)
   degree_list.append(degree)
 except ValueError:
   pass

# compute and print basic graph stats
# num. of nodes, egdes, and average degree
avg_node_degree = float(sum(degree_list))/len(degree_list)
print ("Number of Nodes : %i" % (len(degree_list)))
print ("Number of Edges : %f" %
 (len(degree_list)*avg_node_degree/2))
print ("Avg. Node Degree: %f" %
 (avg_node_degree))

# set up a dict with degree frequency values
degree_dict = degree_distribution(degree_list)

# generate x,y values for degree dist. scatterplot
x_list = []
y_list = []
for degree,frequency in degree_dict.items():
 x_list.append(degree)
 y_list.append(frequency)

# label the graph
pl.title('Degree Distribution\nPrelims Graph')
pl.xlabel('Degree')
pl.ylabel('Frequency')

# set limits for the axes
pl.gca().set_xlim([-10,75])
pl.gca().set_ylim([-10,125])

# plot degree distribution
pl.scatter(x_list,y_list)
pl.show()

Which generates the following scatter-plot:

NewPrelimLin

This plot looks like it also has the power law tail, although I can’t be sure, and as you can see is quite similar to the random scale free graph’s degree distribution. Alternatively, if we plot the Prelims data in a log-log plot we generate the following image:

PrelimLog

Once again we see that the Preliminaries graph’s degree distribution is quite similar to the random scale free graph. This leads me to believe that the Preliminaries graph is indeed scale free, as many real world networks are. What does this mean? Preliminaries is a ‘real’ network? If anything it further validates the study of early cultural production using a network based methodology, as we can see that the network we have generated for this study does indeed share characteristics with modern day networks, and thus provides a comparative methodology for analyzing network evolution throughout human history.

@dbrownbeta

all the code for this post is available in this gist

Leave a Comment

Filed under Uncategorized