Networks, Maps, and NodeGoat

I haven’t been around in a while. The reason for that is that many changes have been taking place and a lot of information had to be sunk in before I wrote this post. Last time I was discussing the presentations on corpus analysis, and little did I know that it was only the beginning of this field called digital humanities.

There is a whole lot more. During my last couple of classes, I have learned about networks and maps – and what I learnt is only a small part from what I learnt is to be learnt. But I am still glad I have a starting point now.

Let’s begin with networks.

Networks – in digital humanities – is an amazing concept. It allows one to visualize all what they read about in a book, on Wikipedia, or in a table. It visually connects the information and provides an easy, logical form for understanding its content. Before I move on to its use in different types of projects, I would like to give a short definition and description of a network.

A network consists of nodes, edges, and relationships. A node is any item in our data that we are focusing our analysis on, it is the person/ object/ phenomenon of our study. Edges are the links/ connections between two nodes and show that those nodes have a common characteristic. Lastly, the relationship is the idea behind the edge, the criteria used in determining the connections between the nodes before the visuals are displayed. An easy way to picture a network is by thinking about one’s genealogical three: each member of the family represents a node, each arrow (or double arrow) from one to the other is an edge, and the meaning of the arrow (as well as the position of one node in comparison with another – since the genealogical three is a hierarchical structured network).

An example of what one can do with networks (and a bit of programming knowledge!) in digital humanities is the amazing project by Silvia Gutiérrez, New Maps for the Lettered City. A few blog posts ago I referenced to the Mapping the Republic of Letters project created by a team of researchers at Stanford University, which offered a visualization of Republic of Letters writers’ travels – which could be further analyzed and interpreted. This new map, which looks at members of the 19th century salons in Mexico, is doing even more than what the Stanford project did. For example, the generations’ problem shows who met whom and where, and what literary movement(s) and salon(s) were each of them part of. This is extremely helpful for thinking about the human relationships that were formed in each salon and what new ideas might have each of them brought it from other salons and/ or other literary movements.

Coming back to the Digital Humanities class, me and my classmates, together with our professor, have started a network project on our own! We have used nodegoat to introduce our data and create a network visualization. The subject we chose was Egyptian Cinema and the categories of nodes we created were multiple: title of the film, author/ other authors/ main cast, release year (the very first, despite the country/ region). Then, for each person we have decided to attach some information which would help us study the social relationships between them. Thus, we added their spouse, date and place of birth and of death. Towards the end, our database looked like this, with 44 entries:

egyptian-cinema-jpg

The network project had two distinct parts: data gathering and network visualization. In order to gather all the information on each film, we created an excel table where we introduced most of the required details (title, author, year) and then introduced them in the Film or Person forms on Nodegoat. Once we filled in all the data for around 10 films each classmate, we were ready to test the network visualization functionalities of the website! In the picture below you can see a social relationship network visualization of the connections between different film directors. Where they are more connected (the example of Chahine Youssef, Mazar Ahmed, Kamal Hussein are relevant in this sense). Other authors, such as Mohamed Khan, Abouseif Salah and others seem to be less-connected with the rest of the Egyptian film industry.

network1

But wait! Nodegoat only displays the user’s input. The reason why some authors look to be less well-connected than others is simply because the “Person” category for each film was not filled accordingly, which is an important detail to remember. If the data collection would’ve been done more in depth, then our social network would have looked terribly different! (And we might have discovered that, in reality, they are all inter-connected with each other).

network2

Above, you can see the films – out of all the 44 entries – that were associated with the film director Mohamed Khan. In a specific research project (on the themes addressed in various films) – such a network would serve the analysis and exploration of the research subject by providing an easy visual mean of analyzing it.

All in all, NodeGoat was interesting to use. After also checking out Palladio, I believe both websites are handy and can generate basic good-quality visualizations. Some shortcomings of NodeGoat would be:

  • the large amount of manual labor one still has to put in gathering and introducing the data (instead of simply giving it a .csv file to read we have to manually introduce data on each film which takes up quite a lot of time). In comparison with a web-scraping software it looks terribly inefficient to have to google the information and organize it;
  • its limited visualization options (only geographical and social);
  • its unnecessarily pop-up menus and tabs which slow down the process of introducing data.

However, considering the fact that it is a start-up website and my class purpose was only learning through experience, NodeGoat helped in showing us the “behind-the-scenes” of network visualization.

What happens in the DH world

I am at a point in the semester where, before I start working on my own digital projects, I need to look back and reflect on what I have discovered so far about the world of digital humanities. And what I have learnt is a lot in comparison with what I previously knew. I will do this, however, in two different ways: I will update my digital narrative and I will write down some remarks I have made or thought about during these almost two months since the beginning of the class. The latter first.

As I’ve seen until now, digital humanities covers – as humanities do – a whole lot of possibilities in terms of what to do with it, what (research) project to do with the knowledge it provides. One can choose to map out the places where the first historical sources were found (I would love to see such a project on Romania’s early history); another could simply digitize the collection of letters of a famous historical figure; or, if someone feels more ambitious, they could gather data on the Neo-modernist literature in ex-communist countries and see how the regime influenced the authors’ themes, ways of expressions, and purpose of writing.

Among all the possibilities there are in developing a digital humanities project, I have noticed there are some themes and ideas project initiators and researchers lean towards the most. Three such examples are: online collections, visual representations,  and research and process.

The first one – online collections – is one of the initial forms of digital projects which started when contemporary humanists learnt the benefits of having text in a digital form. Online collections presuppose the existence of a physical collection that would be photographed or scanned and then either  transcribed by the project team or digitized using an OCR software. Examples of online collections are: Arabic Collections Online, Early English Books Online,  Al-Maktaba Al-Shamela, Eighteen Century Collections Online, Blue Mountain Project.

How to read such collections? (and why can they be considered digital humanist projects – rather than simple collections of author work). Let’s take the example of EEBO (link above). After going through the search process on and reaching the desired book to view, the fun begins! There are two ways in which the text is presented on EEBO:

  • First, the photocopies. If the text hasn’t been digitized, the viewer is confronted with photocopies of a printed edition of the book. What is extremely valuable here, in having the book displayed in such way, is the preservation of forms, spelling, and grammar of those works. Many of the books available online today are “adjusted” (edited) so that the contemporary casual reader can understand them without further research. Moreover, any element that was intentionally preserved or any old form of a word that was mandatory to keep (e.g. to preserve the verse length), is more often than not explained in a footnote. This is not the case, however, with online collections of old books, where the creators of the collections only reproduce the works in their initial form.
  • Second, the photocopies might be accompanied by digitized text of their contents. For example, in the image below (a print screen from EEBO), we are given the digitized form of the work (a randomly selected discourse by Pierre Ayrault):

eebo

First, we must notice that the text formation was preserved (the writing in italics or bold). However, if we open the link above the title, which sends the user to the original photocopy of the text, we are faced with a completely different representation of it (seen below):

eebo

Being offered such representations is invaluable for humanist researchers who have little to no access to the original forms of the works they are studying. Not only that they are given a photo copy of the work they need (with all bonus annotations that could help guide their research), but they are also given a “translated” form of the text, which preserved the words form and (as much as possible) text formatting. This makes the reading process easier for our researcher, without taking away from him or her the incredibly interesting facts of the original form of the work on which to continue their research.

An extremely interesting feature of having such collections in digital formats is the different ways to access the contents of a book/ manuscript/ article, in order to further analyze data. For example, Austen Said contains some of Jane Austen’s most popular novels and allows the user to „explore Austen’s pattern of diction” such as word frequencies or other novel visualizations. Which brings us to…

… Visual representations – maps, graphs, charts etc. – which are also extremely popular among digital humanities projects. Maps are an interesting and useful tool for visualizing (and, consequently, better grasping) the different distributions of data out there. For example, this map from the Linguistics Landscapes of Beirut (project by David J. Wrisley) beautifully shows what one would take hours to learn: the different occurrences of Arabic, Latin, or mixed scripts appear in a delimited area of Beirut. By using colors to represent each type of script, the author(s) have significantly decreased a reader’s work. They no longer have to represent in their minds, while reading a text, where each of these scripts would be found. They are already given the visualization, making it possible for them to immediately start analyzing the no-longer-raw data. (e.g. to determine in which region – and attempt to explain why – the occurrence of Arabic script is higher than that of Latin script). Other such projects, that either output a map, or a chart, or even an interactive graphic are: Mapping the Republic of Letters, Digital Karnak: Timemap, Ibn Jubayr.

The third type of theme I have noticed to occupy a large space in the digital humanist world is the user-input based research project. This kind of project’s primary purpose, before diving into data analysis, is gathering data from the users (the large public). For example, Zooniverse asks its visitors to help recognize faces of wild animals – which would probably further lead to the development of an AI tool that would do that for us, but which lacks the database to operate in such way. This type of projects are valuable in the sense that they familiarize the user with the problems and topics digital humanists are studying and involve them in the process. This could easily mean that, once a person offers their input, he or she would be also interested in checking the progress and finalization of the project, and in supporting it all the way through – something that happens less often or not at all when one randomly comes across a digital project online. We tend to look over a finalized research, read some about it, and later almost forget it existed, unless we need it for other purposes.

This novelty is recognized as the so-called social turn in scholarship because the impact of the new research methods are, well, social. While engaging the wide public in their research projects and problems, scholars benefit in two ways. First, they gather the necessary data for the development of the research, and they disseminate in real time the process and the results of it. This dissemination happens because the users are the creators of their small piece of data – thus, they already know that much, but also because more often than not, the user will also be curious to find about the outcome of a project they also took part in. At the same time, the user benefits from the status of ‘collaborator’ and from a feeling of accomplished social responsibility.

As we have seen, there is a lot happening in the field of digital humanities, yet the processes are not always visible or engaging for the large public for more than the time period they have a separated interest in the topic explored. Let’s hope that, as the subject is getting wider academic recognition, people will also get more acquainted with it.

 

The Citizen Scholars in Context

The class on the 26th of September had a very inspiring topic (which lead me to day-dreaming about implementing a crowdsourcing system in my hometown sometime in the future).

Citizen Science is, according to Wikipedia, “a scientific research conducted, in whole or in part, by amateur or nonprofessional scientists,” or simply put “public participation in scientific research.” The citizen scholars are, thus, any person who takes part in the research and contributes to the progress of it. An example of such project is Zooniverse, where people are invited to help recognize and classify faces of animals that would further contribute to the development of an AI feature that computers will use in recognizing those faces automatically.

In class, not only that we discussed about the benefits of such a mechanism, but we even tried it ourselves! Crowdtranscription is a subcategory of Crowdsourcing which requires the user’s help with recognizing and transcribing text in scanned images. Me and my classmates, together with our professor, went to 18thConnect and edited the Memoir of a chart of the east coast of Arabia from Dofar to the Island Maziera. The document had been previously digitized by an OCR program, but as we learned last time, the digitization of a text comes with occasional errors which, so far, only a human brain can correct. It was an amazing activity for me as I could take responsibility and contribute to other people’s attempts to create great online resources for the large public. At the same time, I was able to notice, as last time, other errors that appear in the process of text digitization  and also what decisions one editor needs to make when transcribing and/ or editing a text. For example, he or she needs to decide whether to preserve the italics, size, indentations, or superscripts that appear in a text, or simply to replace them and motivate their decisions in a note.

Since the text was documenting the journey of a sailor around the Arabian coasts, a thought popped up in my mind. I realized I know very little about the old history of the geographical area I am currently living in (Abu Dhabi, United Arab Emirate). Then I realized there are an incredible number of research that can be conducted using citizen science. The UAE and the Arab World in general is still so little known to those outside of it, especially when it comes to fields such as history, language, literature, culture, and even (old or traditional) cuisine (if you are to ask me). A research on almost anything in these categories would contribute to the dissemination of information beyond the Arab borders, out into the curious and intrigued world. After a quick search on Google I found that there are some projects (currently undergoing or already finished) on the topic. For example, the team behind the Arabic language collection claims that their collection comprises more than 100,000 books and more than 15,000 manuscripts. Still, very little of this is available online, to the large public, and, which is more, even fewer must have been translated to English. However, good news are announced, as some of the manuscripts are going through the process of digitization.

Continue reading “The Citizen Scholars in Context”