A fresh look at Prelims’ degree distribution…is it really scale free?

As Statistical Methods pointed out in their comment on my post, the methodology I used when I proposed that the Prelims graph’s degree distribution was scale free is outdated and not conclusive. This afternoon I decided to take a fresh look at the data following the methodology of Clausset et. al using the Python Powerlaw library. After fitting the data and plotting the probability density function, I evaluated the goodness of fit of the power law distribution through comparisons to the fit of other distributions. The results indicated that a power law distribution may not be the best fit (although a better than an exponential distribution), and a better fit might be a stretched exponential distribution (p > .05). In the following figure you can see the actual data (blue line), a power law fit (red dotted line), a log normal fit (green dotted line), and a stretched exponential fit (blue dotted line). More about this in a couple of weeks.

loglikelihood

Leave a Comment

Filed under Uncategorized

sonnet 0.1.6

Over the last year, I have been focused on studying networks at the CulturePlex, and it has been a heck of a learning process. After reading over my older posts, I’ve begun to realize that it is time to start updating my methods as I learn more about programming and statistics. My next blog post (June 22) will describe the renovated and re-focused Preliminaries Project as it will be presented in San Juan, Puerto Rico.

As I mentioned in my last post, I have been working on a small Python library to simplify visualization for use with the up and coming Preliminaries interactive website. Surprisingly, after I published it on PyPi two weeks ago, quite a few people installed it. To my horror, version 0.1.0 had a bug, if it got you I apologize. Tonight I will be releasing version 0.1.6, in which I have added very basic support for Matplotlib. Of course there is a lot to do, and hopefully as the year goes on my releases and testing and docs will get better and better. For now it’s pretty minimal. Check out a full usage example here:

sonnet example ipython notebook

source is here

Leave a Comment

Filed under Uncategorized

Prelims Round 2

It’s a new school year and the Preliminaries Project is alive and well. After a having a great time and meeting some cool folks at DH2013, it’s time to move forward and think about plans for the future.

Later this month at the Sixteenth Century Society and Conference in San Juan, we will be presenting a new look at the Prelims data set called Networks of Culture: A Graph-Driven Approach to Understanding Publishing in the Spanish Golden Age. This will be a more expansive look at publication in the Spanish Empire from 1598-1643,

Also, we are in the process of building the official website for the Preliminaries Project. It will be based on Skeleton, a Django project template system we are currently developing at the CulturePlex Lab. The site will describe the Prelims with information about the methodology and lines of research associated with the project. It will also include an interactive feature that allows the user to interact with the Prelims data set using the JavaScript D3 library. The interactive feature will be based on the Sonnet Python library, a NetworkX based library that produces detailed node data and statistics in JSON format. Since about ten minutes ago, Sonnet is available through PyPi

Later this week I will publish several examples using Sonnet to produce custom D3 graphs. Until then…Python Rules!

Leave a Comment

Filed under Uncategorized

A Family Affair

Well the second phase of the Preliminaries project is well under way, and today I would like to talk about a subsection of  the Preliminaries dataset I have been developing over the past month. Here at the lab we are quite interested in Pedro Calderón de la Barca, and because he only published a few editions during during the administration of the Count Duke of Olivares (1621-1643), I thought it would be interesting to extract a sub-graph (701 nodes, 1297 edges)  that only shows the networks related to the publication of theatre in this time period. This allows us to compare Calderón directly with the other prominent playwrites of the era without an excess of ‘noise’ produced by the inclusion of poetry and prose in the graph. As usual, we can see that Lope de Vega dominates the model. However, we also can the other principle dramatists of the era: Pedro Calderón de la Barca, Juan Ruiz de Alarcón,  Tirso de Molina, and Juan Pérez de Montalbán:


theatre_degree

Sized for Degree, Colored for Modularity

Calderón (pink modularity group, upper right), isn’t very prominent in the visualization, so I decided to take a look at his publication network. A publication network, if we remember, shows the neighbors of a node up to four degrees of separation. This allows us to see not only people who had some kind of direct relationship with an author, such as Calderón’s brother Josef, but everyone who signed a document in an edition containing his pieces, the printers who published this edition, etc. Calderón’s publication network actually seems quite compact (106 nodes) when compared to someone like Lope de Vega (504) nodes, or Tirso de Molina (491 nodes). Here we see an image of all of the people who are part of Calderón’s publication network (labelled nodes positioned manually):

calderon_pub

Calderon’s Publication Neighbors

The 25 labelled nodes in this image represents around a tenth of the total number of people in the theatre portion of the database. Curious about Calderon’s relationship to the other playwrites, I decided to see how Calderón’s set of ‘publication neighbors’ compared with his contemporaries. After a bit of Gephi magic, I produced a set that represents the intersection of all of the people in all of the publication networks of the five principle playwrites:

core_theatre

Core Theatre

This set, which I call the “Core Theatre” group, consists of 17 people who all, in one way or another, are closely related to the production of the major dramatists of the period.  This demonstrates the relative ‘tightness’ of the group responsible for the production of theatre at this time, in particular, the production of theatre in Madrid, as all of the members of the Core Theatre group are in one way or another connected to Madrid.  Interestingly, this set accounts for almost all of the members of Calderon’s publication neighbors. What does this tell us? Well at least in terms of his preliminaries, Calderon’s network was fairly limited.

Looking a bit more at Calderon’s network, another interesting fact jumps out: all of his early editions were published by the women printers María de Quiñones and the “Widow of Juan Sánchez”. Both of these women appear in the core theatre group as well, so I decided to investigate a bit more about these two women, particularly Quiñones, an oddity because at this time female printers were generally presented as the “Widow of Some Famous Printer”. As it turns out, María de Quiñones is also the widow of a famous printer, as a matter of fact, the widow of two famous printers. The first, Pedro Madrigal a prominent late 16th century printer, appears as the printer of some of the earliest editions in the first Preliminaries data set. Upon his death in 1603, Quiñones married Juan de la Cuesta, who promptly set up shop in Madrigal’s print shop and became one of the most famous printers of the era. Why was he so famous? Maybe taking a look at the first Preliminaries graph can help us remember:

cuesta2gEgo Network of Juan de la Cuesta to 2 Degrees of Seperation

If we look closely we can see that de la Cuesta published some important editions. Here’s a look at just the editions he printed:

cuesta_edsEditions Published by de la Cuesta

Recognize any of the abbreviated titles? Here we see that an edition of every work of prose written by Cervantes, including the first edition of Don Quixote, was published  by Juan de la Cuesta. Although Juan de la Cuesta was (presumably) dead 10 years before the first publication of Calderón’s comedies in 1636, it was his print shop, and that of Pedro Madrigal before him, under the direction of the twice widowed María de Quiñones, that would first print the timeless play “La vida es sueño” de Pedro Calderón de la Barca. In this case, it seems that literary production in 17th century Spain truly was a family affair

Leave a Comment

Filed under Uncategorized

Scale Free Cultural Systems

This morning I had the pleasure of reading Barabási and Bonabeau’s article Scale Free Networks, published in Scientific American’s May 2003 issue. Yes, in the scientific world it may be ancient history, but I found it a very appealing article due to its readability and  discussion of the real world implications of scale free networks. As I have recently been trying to better understand the topology, particularly the degree distribution, of the Preliminaries network–and I am way overdue for some blogging–I thought I would take this opportunity to further reflect on the true nature of Preliminaries database and it implications for early modern cultural systems.

Before I begin this discussion, I need to be very clear on one point: I have not proved, mathematically, the scale free nature of the Preliminaries network. A power-law degree distribution (typical of scale free networks),  p(k) ∝ k, when presented in a log-log plot, is a straight line for all values:

powerlaw

This image is simply the Prelims network degree distribution fed into the above equation using ~ -1.94 as the exponent parameter. However, most empirical data does not truly follow this distribution, and power-law tail can be tricky to prove (Clauset et al., 2009), especially for the mathematically uninitiated. For example, if we look at a degree distribution probability plotted in log-log, we do not see a straight line, instead we see variation, especially at very low and very high values of x:

prelims_degreedist_loglog

This can be explained quite simply. First of all due to the Preliminaries database methodology and data set, we see no nodes with a degree of 0 and relatively few with a degree of 1. A node is not created unless it is found in the preliminaries of an edition, and therefore all node in the database have a degree of at least 1. Also, due to the nature of the data, we find that most names, places, etc. are recurring; most works have multiple editions,etc; so generally most nodes have at least two or three connections.

For the high values of x, due to the method of calculating probability of degree occurrence, we find a minimum y value of 0.00061881188, or 1/1616. The preliminaries graph has 1616 nodes, and for the unique nodes with very high degree, we can only say that there is a 1 in 1616 chance that a node of that degree exists. Eventually I would like to employ a more sophisticated analysis, such as proposed in Clauset et al., to determine if these high x values truly fit a power law distribution.

Currently, just to investigate a bit, I have experimented a bit with making a rough estimate for the exponential parameter. First I calculated the -α in  p(k) ∝ k for each degree value in the prelims graph. This produces a range of values between -2.356155 and -1.241804 with a rough fit at -1.945687. I also have calculated, following Ghoshal and Barabasi 2011, the relative degree distance between the top two nodes (Madrid and Lope de Vega), Δk = 1.3375. While not as great a distance observed in the much larger scale free networks, it is considerably larger than those observed in large exponential networks. Finally, I plotted the power-law distribution and the real world data together:

prelims_degreedist_bestfit
Due to the general good fit found with these simple calculations, the relatively large distance between the top two nodes, and the general look of the network when represented visually (think hubs), I would like to simply state here that the data may suggest that the Prelims graph is scale free. (code for visualization)

So what are the implications of this? As far as I can tell, after a very brief and perhaps elementary review of the literature on scale free graphs, there are two primary factors: growth and robustness. Here I hope to briefly discuss each of these factors and its relevance to cultural networks.

Growth:

A scale free network has two principle driving growth factors: continuous expansion and preferential attachment (Barabasi, Albert 1999). Nodes are continuously added to the network in an incremental fashion, but they do not connect to other nodes randomly. Instead, they connect preferentially to nodes that are well connected.  Let’s think about this in terms of a network of cultural objects, people, institutions, and places such as the Preliminaries network. A new writer comes on the scene. First of all, if he wants to actually publish, he will somehow connect himself, perhaps based on his own geographic limitations, to a major publishing center, let’s say Madrid. Now, publishing a book in 1607 in Madrid was no simple process. First you need to pass through a process of censorship: ecclesiastic approval, licensing, revisions, pricing. In an ideal world, this process should be completed based on the merit of the work, however, in the bureaucratic and favor currying world of  early modern Spain, it seems that being well connected could greatly expedite this process. Considering that much licencing was done directly from the Consejo de Castilla, being well connected at court could never hurt. The practice of authors soliciting the patronage of powerful nobles at this time is well documented, and it appears that their patronage not only protected and funded them, but also opened doors into the world of publication.

For example, for a number of years Lope de Vega was the personal secretary of Pedro Fernandez de Castro y Andrade, VII Count of Lemos, who was also the recipient of regular dedications in from important writers by Cervantes, Gongora, Quevedo, and Lope himself. The prestige and influence of these authors cannot be questioned, and their relationship with the Count of Lemos is hardly coincidental. This intuitive association is well grounded, and has been the subject of serious study. However, in terms of network growth, this kind of preferential attachment, through jockeying for position within a network of cultural influence, can help to explain the heterogeneous nature of literary influence and publication we see in this time period. Referring to their model for scale free growth, Barabási and Albert anticipate this phenomenon in their 1999 article:

“Similar mechanisms could explain the origin of the social and economic disparities governing competitive systems, because the scale-free inhomogeneities are the inevitable consequence of self-organization due to the local decisions made by the individual vertices, based on information that is biased toward the more visible(richer) vertices, irrespective of the nature and origin of this visibility.” (512)

Now in terms of the durability and influence of a certain novel, it is perhaps hasty to say that 17th century network structure has resulted in the enduring popularity and fame of a work such as Don Quixote. Even in Cervantes day, the fact that Don Quixote was so widely published and appreciated obviously is the result of varied factors. But I believe that it is very important to recognize that this imbalance, or heterogeneity, is at least partially a natural consequence of self organizing scale free networks.

Robustness

Scale free networks are generally robust, unless they are subject to a targeted attack (Barabási, Bonabeau 2003). What does this mean exactly? Well first of all,  most nodes are not hubs, therefore removing a single node from the network generally does not really affect the structure of the network. Indeed even removing a hub will generally not cause the network to fracture, because of the existence of other hubs. However, a coordinated attack on a scale free network that removes a significant number of hubs can threaten the integrity of the entire system. In terms of cultural networks, I believe that this robustness is crucial. Culture as a system is very resilient, and does not rely too heavily on any one person, object or location. After removing Madrid from the Prelims network,  the graph is still connected (there is a path between each set of nodes in the network). However, thinking about a cultural network, removing a physical hub like Madrid, or an author like Lope, does not reflect the reality of the situation. If we remove Madrid, wouldn’t we have to remove all of the nodes associated with Madrid? All of the editions published there, all of the editors who lived there, are somehow inextricably linked to this hub. But if Madrid had never existed, would these same editions have been published in different places? Would the people associated with Madrid exist? Would they have existed in another space and attached themselves differently to the network? These questions, although perhaps impossible to answer, must be posed when considering the robustness of a cultural network.

The other side of the robustness issue, the vulnerability to concentrated attacks, has interesting implications when considering the idea of cultural conquest. If, during the expansion of the Spanish Empire, the hubs of the indigenous American culture were attacked, destroyed, or taken over, this would certainly be the most effective way of causing failure in the previous cultural network. For example, when Cortés took Tenochtitlan, he was taking control of one of the most important cultural hubs in the Aztec empire. However, he did not remove this node, causing cultural collapse. Instead, this became the beginning of a process of new information being passed through the network using an old hub as a key location. Today, Mexico City continues to be a center for Mexican culture. It is interesting to observe, however effective the Spaniards cultural campaign was, that the resulting Mexican culture was somehow a mix of Prehispanic and European cultural practices. This seems to suggest that even under directed cultural attack, cultural information present in a network has lasting effects, something that speaks to the ability of information to continue to circulate through a network, even in the presence of new, sometimes conflicting information, especially, as many authors claim, when there is some degree of compatibility between the two types of information.

These ideas are just the beginning of a reflection on the implications of scale free structures in cultural systems. To fully develop this idea, I must continue to research, as I have only just begun to understand the science and history that I am writing about. Until next time…

@dbrownbeta

Barabási, Albert-László, and Réka Albert. “Emergence of Scaling in Random Networks.” Science 286.5439 (1999): 509–512. Web. 20 Mar. 2013.
Barabási, Albert-László, and Eric Bonabeau. “Scale-Free Networks.” Scientific American (2003): 50-59. Web. 20 Mar. 2013.
Clauset, Aaron, Cosma Rohilla Shalizi, and M. E. J. Newman. “Power-law Distributions in Empirical Data.” arXiv:0706.1062 (2007): n. pag. Web. 20 Mar. 2013.
Ghoshal, Gourab, and Albert-László Barabási. “Ranking Stability and Super-stable Nodes in Complex Networks.” Nature Communications 2 (2011): 394. Web. 20 Mar. 2013.

3 Comments

Filed under Uncategorized

Math is the Path: Degree Distribution of the Prelims Network and Other Random Graphs

Over the past few weeks, I have been trying to learn the basics of statistical analysis of graphs, more specifically, complex networks. Here I must be honest, with high school level algebra as my only mathematical tool, trying to work through an article such as Albert and Barabási’s “Statistical mechanics of complex networks” is a daunting task. However, I have been able to get through the basic concepts and begin applying them to my work.

Something I found particularly interesting is the concept of Degree Distribution. As we already know, the degree of a node refers to the number of edges that are connected to that node, and not all of the nodes in a graph have the same degree. The distribution of degree in a network is “characterized by a distribution function P(k), which gives the probability that a randomly selected node has exactly k edges” (Albert, Barabási 2002). The degree distribution of a random graph, such as the variants of the Erdős–Rényi model, is a Poisson distribution, because “in a random graph the edges are placed randomly, the majority of nodes have approximately the same degree, close to the average degree <k> of the network” (Albert, Barabási 2002). However, is has been demonstrated that many real world graph’s degree distribution differ greatly from the Poisson distribution, instead, they demonstrate a power-law tail, P(k) ~ kγ ,and are called scale free networks (Albert, Barabási 2002).

Thinking about this, I became curious about the Preliminaries graph and its degree distribution. Preliminaries is a real-world network, its based on real information gathered from physical objects; however, the process of designing the schemas and relationships requires a fair amount of human manipulation. More about Preliminaries here, here, and here. So I decided to find the Prelims degree distribution and compare it with some random graphs. Now, as I said, my math skills are lacking a bit, something I plan on really working on over the next few years, but I do have what is takes to model  degree distribution using the Python modules NetworkX and Pylab. Here I will briefly describe my methodology(code) and the resulting plots.

First I decided to generate two random graphs and plot their degree distribution:

GNP Graph (Erdős–Rényi model):

This graph is generated by inputing the number of nodes in the graph and the probability that there is an edge between each pair of nodes. Because I wanted to imitate the node and edge count of the Prelims graph, I found a probability that would generate approximately 3464 edges. The formula for computing the probability is included in the code snippet used to generate the graph:


import pylab as pl
from networkx import *

### generate gnp_random_graph
### n = number of nodes
### m = expected number of edges

n = 1616
m = 3464

### p = probablity of edge creation
### m = p*n(n-1)
### 3464 = p*1616(1615)
### p = 0.0013272844312295006

p = 0.0013272844312295006
# generate graph
G = gnp_random_graph(n,p,directed=True)

# print basic stats
print ("Number of Nodes : %i" % (n))
print ("Number of Edges : %i" % (number_of_edges(G)))

# make a list of each node's degree
degree_list = list(G.degree().values())

# compute and print average node degree
print ("Avg. Node Degree: %f" %
 (float(sum(degree_list))/n))

# generate a list degree distribution
degree_hist = degree_histogram(G)
if len(degree_hist) < 15:
 print ("Degree Fequency List:")
 print ("Degree : # of Nodes")

# print the degree and number of nodes that have that degree
 for degree,number_of_nodes in enumerate(degree_hist):
   print ("%i : %i" % (degree,number_of_nodes))
else:
 print ("Degree Frequency List Too Long to Print")

# generate x,y values for degree dist. scatterplot
x_list = []
y_list = []
for degree,num_of_nodes in enumerate(degree_hist):
 if num_of_nodes > 0:
 x_list.append(degree)
 y_list.append(num_of_nodes)

# label the graph
pl.title('Degree Distribution\nGNP Graph')
pl.xlabel('Degree')
pl.ylabel('Frequency')

# plot degree distribution
pl.scatter(x_list,y_list)
pl.show()

This script results in the terminal output:

Number of Nodes : 1616
Number of Edges : 3605
Avg. Node Degree: 4.461634
Degree Fequency List:
Degree : # of Nodes
0 : 17
1 : 76
2 : 201
3 : 284
4 : 299
5 : 259
6 : 218
7 : 119
8 : 82
9 : 32
10 : 19
11 : 7
12 : 2
13 : 1

And the following scatter plot:

GNP_graph

As you can see this resembles a Poisson distribution, like this one taken from the WolframAlpha website:

Poisson

Scale Free Random Graph:

Next I generated a random scale free graph. The script I used was very similar to the previous script, except I used a different graph generator with only the node count as a parameter and I set the tighter limits for the x and y axes:


n = 1616
G = scale_free_graph(n)

# set limits for the axes
pl.gca().set_xlim([-10,70])
pl.gca().set_ylim([-10,120])

This script results in the following terminal output:


Number of Nodes : 1616
Number of Edges : 3428
Avg. Node Degree: 4.242574
Degree Frequency Too Long to Print

And the scatter plot:

TheScaleFree

This plot resembles a power law tail, such as this one from WolframAlpha:

PowerLawWol

Scale free distributions are commonly plotted in using log-log plots, such as those used by Albert and Barabási in the previously mentioned article. To produce a log-log plot, you can simply change the Pylab scale to log, also for better visualization change the axes limits:


# set limits for the axes
pl.gca().set_ylim([0.9,1000])
pl.gca().set_xlim([0.9,1000])

# log-log plot
pl.gca().set_xscale("log")
pl.gca().set_yscale("log")

The random scale free graph plotted in log-log looks like this:

NewScaleLog

So, now that we have seen what a PNG random graph and a scale free random graph degree distribution looks like, let’s take a look at the degree distribution of the Preliminaries graph. Although I should be able to read the Prelims .gexf file with NetworkX, I was generating error after error, so I decided to simply use the Gephi scripting console to generate a .txt file with the degree of each node. This should have been fairly straightforward, but I found the formatting of the degree values in the .txt file to be extremely difficult to work with, so I wrote a fairly ugly script to clean up the data so I can process it and generate plots for degree distribution. Using the following script I was able to generate some plots:

import pylab as pl

def degree_distribution(degree_list):
 """
 set up a dictionary with degree as key
 and frequency as value
 """
 degree_dict = {}
 for degree in node_list:
   degree_dict.setdefault(degree,0)
   degree_dict[degree] += 1
 return degree_dict

f = open('prelims_degree.txt','r')
line = f.readline().split()
f.close()

# clean up the data
clean_line = []
for degree in line:
 degree = list(degree)
 degree.pop()
 degree = ''.join([num for num in degree])
 clean_line.append(degree)

# make a list to be used in the in the-
# degree_distribution function
degree_list = []
for degree in clean_line:
 try:
   degree = int(degree)
   degree_list.append(degree)
 except ValueError:
   pass

# compute and print basic graph stats
# num. of nodes, egdes, and average degree
avg_node_degree = float(sum(degree_list))/len(degree_list)
print ("Number of Nodes : %i" % (len(degree_list)))
print ("Number of Edges : %f" %
 (len(degree_list)*avg_node_degree/2))
print ("Avg. Node Degree: %f" %
 (avg_node_degree))

# set up a dict with degree frequency values
degree_dict = degree_distribution(degree_list)

# generate x,y values for degree dist. scatterplot
x_list = []
y_list = []
for degree,frequency in degree_dict.items():
 x_list.append(degree)
 y_list.append(frequency)

# label the graph
pl.title('Degree Distribution\nPrelims Graph')
pl.xlabel('Degree')
pl.ylabel('Frequency')

# set limits for the axes
pl.gca().set_xlim([-10,75])
pl.gca().set_ylim([-10,125])

# plot degree distribution
pl.scatter(x_list,y_list)
pl.show()

Which generates the following scatter-plot:

NewPrelimLin

This plot looks like it also has the power law tail, although I can’t be sure, and as you can see is quite similar to the random scale free graph’s degree distribution. Alternatively, if we plot the Prelims data in a log-log plot we generate the following image:

PrelimLog

Once again we see that the Preliminaries graph’s degree distribution is quite similar to the random scale free graph. This leads me to believe that the Preliminaries graph is indeed scale free, as many real world networks are. What does this mean? Preliminaries is a ‘real’ network? If anything it further validates the study of early cultural production using a network based methodology, as we can see that the network we have generated for this study does indeed share characteristics with modern day networks, and thus provides a comparative methodology for analyzing network evolution throughout human history.

@dbrownbeta

all the code for this post is available in this gist

Leave a Comment

Filed under Uncategorized

It’s Hard to Listen to your Own Voice

Today I would like to present a short video I made that shows what it is like to work with the Preliminaries data in digital space. Here you will see a few examples of Sylva’s functionality, what a Sylva database looks like in Gephi and a few tricks for working with Gephi’s scripting console…

 

 

 

Leave a Comment

Filed under Uncategorized

Thinking Ahead: My Academic Future

Well Winter term has started and it is time to get back to work. I know this blog is called Preliminaries Project, but for now I don’t have much to say about Preliminaries. We are beginning a new round of research and model building, but it is still in the, ehem, preliminary phase. I would like instead to take this opportunity to talk a bit about my academic future.

Here in the Hispanic Studies department at Western University, instead of doing comprehensive exams we design courses. I like this; it seems that comps represent the old guard of academia, an antiquated rite of passage that often provides more stress than benefit. Is it important to know your field of study inside and out? Of course it is. However, designing courses forces the doctoral student to engage creatively with the material that interests them, and appears to be more realistic, or representative of what you will actually be doing in the real world. They say that teaching a subject is the best way to learn it. Cliché yes, but in my experience very true, and I believe planning a course and then defending it in front of a critical audience simulates this this kind of leaning process. Plus designing courses give you a leg up in the job market: you already have two courses more or less ready when you (hopefully) get  that new job.

To graduate from the Ph.D. program here there are three requirements: coursework, 2 course designs, and a doctoral thesis. Although I am still in my first year, it is already time to begin thinking about my thesis and course designs. A year from now, I will be presenting an official thesis proposal and defending my first course. These are big projects, so it is crucial to find a topic that you find very engaging.

What engages me? Good question. I guess the point of this blog is to get some of my ideas out on paper, even if they are still in the formative stages. First of all, thinking about my thesis, I think my primary interest in terms of Hispanic Studies is early print culture and the interactions between political or ecclesiastic structures, print culture, and authors. Sounds a bit like Preliminaries doesn’t it? This is a BIG topic, so obviously I have to narrow it down. I find that I am also quite interested in New Spain—the  Spanish viceroyalty located in present day Mexico, Central America, and the southwest of the United States—something about the history really intrigues me, so it seems to be a given that I will focus my studies here. Interestingly enough, my interest for New Spain was sparked by my studies of New Spanish painting, a subject that recently lead me to investigate the literary phenomenon surrounding its most famous iconographic subject: The Virgin of Guadalupe. Here I found all sorts of interesting results, above all the heavy influence in literary production of certain affiliates of the Royal and Pontifical University of Mexico. These individuals were involved in a campaign to historicize the guadalupan legend during the second half of the 17th century and most of the 18th century, and I strongly suspect that they had a hand in much of the literary production of the era. For my thesis, I hope to explore these kinds of relationships while at the same time engaging deeply with the literary discourse of the era, as I feel that I will discover strong and interesting relationships between discourse and politics.

As far my course design, the jury is still out. Here I will limit myself to a few ideas:

Techniques and Technologies of the Digital Humanities: A survey course that focuses on some of the common technologies of the digital humanities, how they have been applied in research projects, and how to use them.

Hispanic literature-The Nineties and Beyond: This course would focus on some of my favorite Hispanic authors: Junot Diaz, Roberto Bolaño, Cristina Rivera Garza etc. Using books like 2666, The Brief Wondrous Life of Oscar Wao, and Verde Shanghai, we would look at what it means to be a writer, and furthermore a Latin American author in the 21st century

New Spain and the Creole Consciousness: This course would focus on the literatures of New Spain and their relationships with the emerging Mexican identity. We would read authors such as Bernardo de Balbuena, Sor Juana Inés de la Cruz, Carlos de Sigüenza y Góngora, and the founders of the guadalupan legend: Laso de la Vega, Miguel Cabrera etc.

These are just a few ideas. Thankfully I still have a while to think on my academic future. But as we all know, time flies around here, and before I know it I will be writing those first chapters of my thesis. Can you ever truly be ready? Maybe not. But it doesn’t hurt to try.

@dbrownbeta

2 Comments

Filed under Uncategorized

Winter Break @CulturePlex

Well it’s already the 20th of December, classes have been over for two weeks and things are a bit more relaxed around here at Western. Campus is strangely deserted and it’s hard to get that 4 p.m. cup of coffee that you know every grad student requires for survival. Everyone is closing up shop, mopping the floor and stepping out early. It’s that time of year.

What is dbrownbeta doing for vacations…drinking margaritas in Cabo…or perhaps a bit of SCUBA in Roatán—maybe mounting a quick roadtrip to Utah?

Not this year folks. Just sitting tight in Canada waiting for the snow.

I am taking advantage of this time to get a few things done. When school is in session it’s tough to get much real work done with the nonstop itinerary of classes and meetings and readings and speakers. Your job is grad school and grad school is your life. Does that make sense? So your job is really to live your life of a grad student. Confusing? Yes it confuses me as well.

Moving on to more technical and less ridiculous topics, I want to talk a bit about what I am doing this Winter Break. Let’s do this.

Finishing Up My CourseWork

Last weekend I finished my project for the class the Máquina cultural. Although the essay wasn’t my finest work, the models I made turned out quite nicely, and I got a chance to experiment with Gephi’s Geo Layout. And I got to add a new function to my Gephi/Python library.

The graph consisted of the metadata of a corpus of literary and critical texts laid out in Gephi. The majority of nodes were just standard nodes with normal attributes, however, the nodes that represented geographical locations (cities) were arranged based on their lat/long attributes using Gephi’s Geo Layout.

These geonodes were then fixed in place using my new fix_set function in combination with other functions from the Gephi/Python library. Then the other nodes were arranged around the geonodes using ForceAtlas 2.

Pretty neat huh?

CulturePlex Projects

I am also working on a few CulturePlex projects this Winter Break. I recently received the chance to help out with the Sylva project. The lab is getting ready to officially release Sylva to the public, and because one of the priorities of Sylva is ease of use, we want to provide comprehensive documentation. I am helping to develop the content of this documentation. We are working on creating three types of documentation: a user guide that describes all of the features of Sylva, a step by step tutorial to creating your first graph with Sylva, and a help menu with FAQs, solutions, etc.

We are also beginning work on a new period of the Preliminaries project. In this phase we will focus on the time period of 1643-1661 during the administration of Luis de Haro. We are particularly interested in this period because this is when Pedro Calderón de la Barca began to be published prolifically. In this case, the graph will be used not only for general network analysis, but also as a supplement to studies on the contemporary reception of Calderón’s work. We have just barely begun to assemble the first editions list for this phase, but we plan to have it finished before May.

Personal Projects

I have three personal projects this winter break: learn HTML/CSS, learn JavaScript for use in web pages and the Google Maps API, and build a personal web page. I started learning HTML last Sunday evening, and my colleague Roberto showed me on the Bootstrap on Wednesday. Bit by bit, my website is coming along:

 

It’s called xitōmatl and it will provide links to my social networking sites, descriptions of my projects (personal and CulturePlex) with their associated image galleries, my personal profile and CV, etc. Also, I plan on creating a page that focuses specifically on the research of New Spain. Here I will provide a variety of content supplemented with links to digitized rare New Spanish books, various websites useful in the study of New Spain, and a few resources for learning Classical Nahuatl (another project coming soon). xitōmatl is available at this Gist if you want to take a look. It’s still a bit sloppy (a bunch of style elements that need to be is a CSS), but you get the idea.

That’s it for today…time to get back to work.

Happy Holidays

@dbrownbeta

 

 

Leave a Comment

Filed under Uncategorized

Preliminaries Project: A Gephi/Python Library

This week I would like to present a short report that describes the Gephi/Python library I have been slowly developing over the past two months. All the code is available at this gist, please try it, use it, or even better, contribute!

Introduction

Preliminaries is a CulturePlex Laboratory project that focuses on social network analysis of Early Modern Spanish literature. Starting in Summer 2012, we have been collecting extensive bibliographic information from the preliminaries section of these texts and storing it in Sylva, a graph database also designed at the CulturePlex. This information is then visualized as a graph using Gephi. The Python scripts presented here were developed to be run in Gephi’s Python scripting console in response to a need for efficient access to information in the graph, to modify the graph, and to perform statistics operations not included in the Gephi application. The following is brief explanation of each function included in the mini-library and an example of its use.

The Preliminaries graph sized for betweenness centrality:

A Gephi/Python Library

1. find_neighbors(degree_sep,node)

This function finds all neighbors of any node in the graph to n-degrees of separation. Gephi includes an ego network filter that extends to 3 degrees of separation. We needed to find ego networks to 4 or more degrees. Also, I just fixed the recursive version of this function (thanks versae)!



 
Here I will filter the visible graph to find the neighbors of Miguel de Cervantes to 4 degrees:

2.  color_set(set,color)

Since I can’t create a subgraph with this function, I wrote another function to control a set of nodes color to mark subsets visually on the graph.

3.  size_set(set,size)

And yet another to size the nodes.

 

 

 

 

 

 

Miguel de Cervantes neighbors colored red and sized at 20:

 

4.  filter_by_type(set_to_be_filtered,nodetype)

If I am only concerned with a certain type of node (text, person, etc), I can filter for type using the following function.

‘Persona’s in the Cervantes subset:

5. return_label(set)

 Node IDs don’t tell me much, what if I want to see the labels of these nodes?

 

 

 

 

 

 

 

Labels of the ‘Persona’s:

6.  set_intersect(set1,set2)

 Finally, if I am comparing two subsets of a graph, I can see what nodes they have in common.

 

 

 

 

 

Set intersect of Miguel de Cervantes and El Inca Garcilaso:

 

@dbrownbeta

Leave a Comment

Filed under Uncategorized