What happens in the DH world

I am at a point in the semester where, before I start working on my own digital projects, I need to look back and reflect on what I have discovered so far about the world of digital humanities. And what I have learnt is a lot in comparison with what I previously knew. I will do this, however, in two different ways: I will update my digital narrative and I will write down some remarks I have made or thought about during these almost two months since the beginning of the class. The latter first.

As I’ve seen until now, digital humanities covers – as humanities do – a whole lot of possibilities in terms of what to do with it, what (research) project to do with the knowledge it provides. One can choose to map out the places where the first historical sources were found (I would love to see such a project on Romania’s early history); another could simply digitize the collection of letters of a famous historical figure; or, if someone feels more ambitious, they could gather data on the Neo-modernist literature in ex-communist countries and see how the regime influenced the authors’ themes, ways of expressions, and purpose of writing.

Among all the possibilities there are in developing a digital humanities project, I have noticed there are some themes and ideas project initiators and researchers lean towards the most. Three such examples are: online collections, visual representations,  and research and process.

The first one – online collections – is one of the initial forms of digital projects which started when contemporary humanists learnt the benefits of having text in a digital form. Online collections presuppose the existence of a physical collection that would be photographed or scanned and then either  transcribed by the project team or digitized using an OCR software. Examples of online collections are: Arabic Collections Online, Early English Books Online,  Al-Maktaba Al-Shamela, Eighteen Century Collections Online, Blue Mountain Project.

How to read such collections? (and why can they be considered digital humanist projects – rather than simple collections of author work). Let’s take the example of EEBO (link above). After going through the search process on and reaching the desired book to view, the fun begins! There are two ways in which the text is presented on EEBO:

  • First, the photocopies. If the text hasn’t been digitized, the viewer is confronted with photocopies of a printed edition of the book. What is extremely valuable here, in having the book displayed in such way, is the preservation of forms, spelling, and grammar of those works. Many of the books available online today are “adjusted” (edited) so that the contemporary casual reader can understand them without further research. Moreover, any element that was intentionally preserved or any old form of a word that was mandatory to keep (e.g. to preserve the verse length), is more often than not explained in a footnote. This is not the case, however, with online collections of old books, where the creators of the collections only reproduce the works in their initial form.
  • Second, the photocopies might be accompanied by digitized text of their contents. For example, in the image below (a print screen from EEBO), we are given the digitized form of the work (a randomly selected discourse by Pierre Ayrault):


First, we must notice that the text formation was preserved (the writing in italics or bold). However, if we open the link above the title, which sends the user to the original photocopy of the text, we are faced with a completely different representation of it (seen below):


Being offered such representations is invaluable for humanist researchers who have little to no access to the original forms of the works they are studying. Not only that they are given a photo copy of the work they need (with all bonus annotations that could help guide their research), but they are also given a “translated” form of the text, which preserved the words form and (as much as possible) text formatting. This makes the reading process easier for our researcher, without taking away from him or her the incredibly interesting facts of the original form of the work on which to continue their research.

An extremely interesting feature of having such collections in digital formats is the different ways to access the contents of a book/ manuscript/ article, in order to further analyze data. For example, Austen Said contains some of Jane Austen’s most popular novels and allows the user to „explore Austen’s pattern of diction” such as word frequencies or other novel visualizations. Which brings us to…

… Visual representations – maps, graphs, charts etc. – which are also extremely popular among digital humanities projects. Maps are an interesting and useful tool for visualizing (and, consequently, better grasping) the different distributions of data out there. For example, this map from the Linguistics Landscapes of Beirut (project by David J. Wrisley) beautifully shows what one would take hours to learn: the different occurrences of Arabic, Latin, or mixed scripts appear in a delimited area of Beirut. By using colors to represent each type of script, the author(s) have significantly decreased a reader’s work. They no longer have to represent in their minds, while reading a text, where each of these scripts would be found. They are already given the visualization, making it possible for them to immediately start analyzing the no-longer-raw data. (e.g. to determine in which region – and attempt to explain why – the occurrence of Arabic script is higher than that of Latin script). Other such projects, that either output a map, or a chart, or even an interactive graphic are: Mapping the Republic of Letters, Digital Karnak: Timemap, Ibn Jubayr.

The third type of theme I have noticed to occupy a large space in the digital humanist world is the user-input based research project. This kind of project’s primary purpose, before diving into data analysis, is gathering data from the users (the large public). For example, Zooniverse asks its visitors to help recognize faces of wild animals – which would probably further lead to the development of an AI tool that would do that for us, but which lacks the database to operate in such way. This type of projects are valuable in the sense that they familiarize the user with the problems and topics digital humanists are studying and involve them in the process. This could easily mean that, once a person offers their input, he or she would be also interested in checking the progress and finalization of the project, and in supporting it all the way through – something that happens less often or not at all when one randomly comes across a digital project online. We tend to look over a finalized research, read some about it, and later almost forget it existed, unless we need it for other purposes.

This novelty is recognized as the so-called social turn in scholarship because the impact of the new research methods are, well, social. While engaging the wide public in their research projects and problems, scholars benefit in two ways. First, they gather the necessary data for the development of the research, and they disseminate in real time the process and the results of it. This dissemination happens because the users are the creators of their small piece of data – thus, they already know that much, but also because more often than not, the user will also be curious to find about the outcome of a project they also took part in. At the same time, the user benefits from the status of ‘collaborator’ and from a feeling of accomplished social responsibility.

As we have seen, there is a lot happening in the field of digital humanities, yet the processes are not always visible or engaging for the large public for more than the time period they have a separated interest in the topic explored. Let’s hope that, as the subject is getting wider academic recognition, people will also get more acquainted with it.


The Citizen Scholars in Context

The class on the 26th of September had a very inspiring topic (which lead me to day-dreaming about implementing a crowdsourcing system in my hometown sometime in the future).

Citizen Science is, according to Wikipedia, “a scientific research conducted, in whole or in part, by amateur or nonprofessional scientists,” or simply put “public participation in scientific research.” The citizen scholars are, thus, any person who takes part in the research and contributes to the progress of it. An example of such project is Zooniverse, where people are invited to help recognize and classify faces of animals that would further contribute to the development of an AI feature that computers will use in recognizing those faces automatically.

In class, not only that we discussed about the benefits of such a mechanism, but we even tried it ourselves! Crowdtranscription is a subcategory of Crowdsourcing which requires the user’s help with recognizing and transcribing text in scanned images. Me and my classmates, together with our professor, went to 18thConnect and edited the Memoir of a chart of the east coast of Arabia from Dofar to the Island Maziera. The document had been previously digitized by an OCR program, but as we learned last time, the digitization of a text comes with occasional errors which, so far, only a human brain can correct. It was an amazing activity for me as I could take responsibility and contribute to other people’s attempts to create great online resources for the large public. At the same time, I was able to notice, as last time, other errors that appear in the process of text digitization  and also what decisions one editor needs to make when transcribing and/ or editing a text. For example, he or she needs to decide whether to preserve the italics, size, indentations, or superscripts that appear in a text, or simply to replace them and motivate their decisions in a note.

Since the text was documenting the journey of a sailor around the Arabian coasts, a thought popped up in my mind. I realized I know very little about the old history of the geographical area I am currently living in (Abu Dhabi, United Arab Emirate). Then I realized there are an incredible number of research that can be conducted using citizen science. The UAE and the Arab World in general is still so little known to those outside of it, especially when it comes to fields such as history, language, literature, culture, and even (old or traditional) cuisine (if you are to ask me). A research on almost anything in these categories would contribute to the dissemination of information beyond the Arab borders, out into the curious and intrigued world. After a quick search on Google I found that there are some projects (currently undergoing or already finished) on the topic. For example, the team behind the Arabic language collection claims that their collection comprises more than 100,000 books and more than 15,000 manuscripts. Still, very little of this is available online, to the large public, and, which is more, even fewer must have been translated to English. However, good news are announced, as some of the manuscripts are going through the process of digitization.

Continue reading “The Citizen Scholars in Context”