Hello! Welcome back to the Preliminaries Project blog!
This week, as promised, I would like to give you all a bit more information about the project including the current status of the Preliminaries database and the methodology used in contructing the database. However, the primary focus of this entry will be on various techniques used to analyze the Preliminaries graph, due to the fact that I have spent the last few days trying to figure out how to do this. But first let me give you a bit of background about me.
My educational background is primarily literature and linguistics. I did my undergrad work at the University of Oregon, where I studied Spanish with a fair bit of Linguistics as a secondary focus. Last year, I started my graduate work as masters student here at Western studying Hispanic Literature. My first contact with using technological means to study literary topics came last spring in in Professor Suárez’s class about the Hispanic Baroque. As a class project we started building an early version of the Preliminaries database in Sylva. I ended up doing my final project on the social networks involved in the production of early editions of Don Quixote, and I haven’t looked back. Last summer, I began officially working here at the Culturplex Lab on the Preliminaries Project. So to make a long story short I am a rookie when it comes to digital humanities, computer modeling, and programming. This fall I have been taking a class that focuses on Python, a high level programming language that is popular amongst scientists of all types, and also a Coursera course about social network analysis. I am just learning how to use this technology, but I hope I can share some of this learning process with you and in the end maybe everyone will benefit. Okay enough about me…let’s get back to the project.
As I mentioned before the Prelims Project is ongoing, and although it isn’t 100% complete, the database is sufficiently devoloped to begin doing a bit of analysis. Currently the first editions list (Duque de Lerma, 1598-1618) constists of 330 editions, out of which I have been able to obtain 228 scanned copies of preliminary sections, approximately %70, which isn’t bad considering that these texts were published 400 years ago. Of these scans, around 120 have been entered into the database, producing a graph with 1612 nodes and 3472 relationships. Rendered in Gephi using the built in YifanHu’s Multlevel algorithm, colored for modularity, and sized for betweenness centrality, the graph looks like this:
This visualization is nice because you can see the general structure of the graph and the coloring gives you a good idea of the communities within the the network as a whole. However, the amount of information presented here is overwhelming, so I have been looking for some ways to control the visualization and the information on which it is based to allow for some detailed comparative analysis.
One of the nice features of Gephi is that it has a variety of built in filters to allow the user to limit the information that appears in the graph. Something that we are interested in regarding the Prelims Project is the community structures within the graph. Let’s use a filter to see the modules of various famous writers of the period:
First Miguel de Cervantes, author of Don Quixote
Then Lope de Vega, author of the Comedias
There is another type of subset within a graph called the Ego Network. These are based on direct conections between a node and its neighbors. Although Gephi also has an filter for Ego Networks, I encountered a small problem here: Gephi only allows filtering for up to three degrees of seperation. This presents a challenge with the Preliminaries graph due to the schema design for the database.
In order to establish a connection between the author and an edition there are two steps: Author->Obra, Obra->Edition. This is due to organizational/editorial concerns that I hope to address in the next blog. Furthermore, for the author to be related to the people involved in the approval, licensing, and publication of an edition, two more steps are required e.g Edition->Approval, Approval->Censor. Therefore to establish what I call a Publication Network, somewhat equivalent to an Ego Network, I need to be able to find neighbors for up to four degrees of seperation. Thankfully, Gephi includes a scripting console based on the Python programming language. Using functions based on the following patterns I am able to mimic the filtering abilities of Gephi and create a way to isolate and compare subsets of the graph in order to generate these Publication Networks:
It is also important to note that it is necessary to combine the subsets generated by these functions, which I have done using the following function “completelist”, and then to make sure there are no stray ‘NoneType’s or duplicates, which I have done with “masterlist”:
Then, using the subsets generated here I can color and size the Publication Networks using the following functions:
I can also find the intersections of various Publication Networks using the following function:
Thus, using a very basic knowledge of Python I am able to manipulate the graph and compare any subsets of nodes that I would like.
An applied example of these functions would be the following:
Publication network of Bernardo de Balbuena, author of Grandeza mexicana: Red
Publication network of Juan de Torquemada, author of Monarquía Indian: Blue
Their intersecting Publication Networks: Yellow
That’s it for today folks. Over the next week and a half I hope to generate some definite results to talk about and some more refined visualizations using my newfound techie skills.
Hope to see you next time around. For more information you can always email me at: email@example.com or follow me on twitter @dbrownbeta