Foods of Abu Dhabi

The Foods of Abu Dhabi was a pilot group project me and my class have started during the Introduction to Digital Humanities class. The aim of the project was to create a map of as many dining places in Abu Dhabi as possible and create a map visualization of them. Our professor has designed a form on Fulcrum where all of us could introduce the data gathered.

Screenshot of the Fulcrum form.

The form included the following questions:

  • place of the name
  • food origin
  • food subvariety
  • location (either by using GPS or typing in the coordinates from Google maps)
  • date of establishment
  • average price (AED)
  • number of tables
  • comments
  • delivery to saadiyat (yes/ no)
  • last delivery time

In the beginning of the data collection process, we have ventured out in the city and went to restaurants, cafeterias, and cafes to ask the owners or the workers details about their dining places. The interactions were interesting as we would come to learn more details about the place than what we were mainly interested for the Fulcrum form and also “get a feel” of the place which a map cannot recreate. Then, however, we learnt that there is a much easier way to “collect” spatial data by using the coordinates from Google Maps. Using Zomato.com and other similar websites, we continued to add entries to our data set.

Our next job was to create and export a map using CARTO, a map that would surprise a certain aspect of dining in Abu Dhabi. I chose to focus on the Khalifa City and the dining areas it provides. My map can be consulted at the link below:

Foods of Abu Dhabi

The reason why I chose to focus on Khalifa City is because the area is known to be one of the richest in the entire city so I had my expectations set quite high regarding the prices of the food. It turned out that it was not the case and that there are many affordable dining places in Khalifa City. The viewer of the map can hover over the points on the map and see the name, the origin, and the average price of the dishes served at the respective place.

Findings:

  • many of the dining areas have very affordable average prices (10-25 AED)
  • there is a variety of cuisines in Khalifa city: Middle Eastern (Emirati, Lebanese, Saudi), European (Spanish, French, Italian, British), Asian (Indian, Japanese, Chinese), American, and other International.
  • the average prices vary from 10-15 to 80-90 AED
  • there are extremely expensive dining places in Khalifa City (which was initially my expectation)
  • there are many Indian restaurants
  • many of the restaurants are centered around the Etihad building (center of the city)
  • none of the places deliver to the faraway land of Saadiyat

This project will be restarted during the following semester at NYU Abu Dhabi by professor Wrisley’s class on mapping. For the students in that class, I would like to give the following pieces of advice:

  • try to collect data by going out and interacting with the owners and workers of the restaurant; it is very difficult (hence, very little data about that) to gather information regarding the number of tables and the date of establishment (and maybe even others) about the place without getting to talk to someone who works there. Plus, it is very fun to do it although you might need to pretend you are organizing a big party when you are asking/ counting the number of tables!
  • try to add more about fields to the Fulcrum form: opening hours, special menu, general rankings on different websites. Some research on opening hours would really be interesting!
  • set a “national” granularity and try to add as many “national” restaurants. It would be very interesting in the end to see how many and where are different nation-representative restaurants located. Also, a comparison between the demographics and the origins of the food would definitely say something!
  • Good luck with the project and do add as many entries as possible!

Georeferencing the UAE maps

On the 23rd of November, our class met in the Center for Digital Scholarship at the NYUAD Library, to discuss and exercise on georeferencing. Georeferencing is the process of matching the coordinates of a map or aerial photo with an existent set of coordinates (e.g. an old map can be matched – overlapped, superimposed – with the occupying reality of the old map’s geographic coordinates).

Matt Sumner a Data Services Librarian at NYUAD lead the ArcGIS workshop concomitantly with the class discussions on georeference, data comparison, and limitations of such a process.

What a georeferencing software does to maps is very similar to what an OCR software does to images: it makes them “digitally useful.” This means that the resulted digital map can be engaged in other digital exercises: it can be compared with other digitized maps or, as I said above, overlapped with them. Doing so helps creating a visualization of the land transformations in time. We can see whether the borders between two countries have stayed the same, for example. Or, we can look at an evolution of land distribution over time (for climate change research purposes). Using a georeferencing software (such as ArcGIS) helps doing all that. During the class, Matt showed us which are the steps for doing so and georeferenced an old Soviet map of the UAE.

The maps we were working with (already scanned and ready to be used) were easy to work with because each of the four corners of them had the geographic coordinated written down. From there, all we had to do was to match the current coordinates with the corners of the images. However, in case we didn’t have the coordinates available, what we would have had to do was to find reference points (a building, a street, the center of the city) that could help with the overlapping. After both the current map and the old map were imported, the workspace looked like this:

01
Red contour of a recent UAE map (front); old Soviet map of the Sharjah Emirate (scan, background) – before georeferencing

In order to place the map on the contour, we had to create four referencing points which would be placed at the coordinates specified at the corners of the image. Then, once these were created, the last step was to match each point with its corresponding corner and …

Voila! With very little disagreement (because each corner pulls the image towards itself), the two objects overlapped! Now it’s time to look how “things” changed over time.

02
The two maps (red contour and scanned image) overlap

The Soviet map was created in 1978 with materials from 1975, while the red contour is not older than ten years. We can clearly notice some changes (and imagine if we had the physical map and not just a contour how many changes we could have spotted!); we can see how the lakes have diminished in size, as well as the fact that the coastline has expanded.

Matt had more scans like this (and he has also placed all the other necessary reference points before), so, in a few minutes while me and my classmates were discussing about the interesting things we learnt, he created this:

sharjah-sheet-indicated
The collection of Soviet maps of the UAE, georeferenced on the current shape of the country

As a fun exercise, and because we were lucky to have a copy of it, we also georeferenced a map of the city of Abu Dhabi dating back somewhere between 1964 and 1971. The map provides a simplified shape of the city, together with important landmarks and streets which helped us georeference it (because we had no coordinates for it).

fncb-map
An old map of the city of Abu Dhabi overlapped with a current map of Abu Dhabi (the process is not complete)

We’ve learned some interesting facts about mapping in general and about georeferencing during the ArcGIS workshop. Some things that I found note-worthy are:

  • people believe what a map says. Although, we have seen that this might not always be 100% accurate;
  • in georeferencing, we have to make compromises and choose between fitting the entire map in a shape and having “disagreements” with the contour, or perfectly fitting a part of it and neglecting the rest;
  • a map can be seen as a “purposeful simplification of reality.”

Networks, Maps, and NodeGoat

I haven’t been around in a while. The reason for that is that many changes have been taking place and a lot of information had to be sunk in before I wrote this post. Last time I was discussing the presentations on corpus analysis, and little did I know that it was only the beginning of this field called digital humanities.

There is a whole lot more. During my last couple of classes, I have learned about networks and maps – and what I learnt is only a small part from what I learnt is to be learnt. But I am still glad I have a starting point now.

Let’s begin with networks.

Networks – in digital humanities – is an amazing concept. It allows one to visualize all what they read about in a book, on Wikipedia, or in a table. It visually connects the information and provides an easy, logical form for understanding its content. Before I move on to its use in different types of projects, I would like to give a short definition and description of a network.

A network consists of nodes, edges, and relationships. A node is any item in our data that we are focusing our analysis on, it is the person/ object/ phenomenon of our study. Edges are the links/ connections between two nodes and show that those nodes have a common characteristic. Lastly, the relationship is the idea behind the edge, the criteria used in determining the connections between the nodes before the visuals are displayed. An easy way to picture a network is by thinking about one’s genealogical three: each member of the family represents a node, each arrow (or double arrow) from one to the other is an edge, and the meaning of the arrow (as well as the position of one node in comparison with another – since the genealogical three is a hierarchical structured network).

An example of what one can do with networks (and a bit of programming knowledge!) in digital humanities is the amazing project by Silvia Gutiérrez, New Maps for the Lettered City. A few blog posts ago I referenced to the Mapping the Republic of Letters project created by a team of researchers at Stanford University, which offered a visualization of Republic of Letters writers’ travels – which could be further analyzed and interpreted. This new map, which looks at members of the 19th century salons in Mexico, is doing even more than what the Stanford project did. For example, the generations’ problem shows who met whom and where, and what literary movement(s) and salon(s) were each of them part of. This is extremely helpful for thinking about the human relationships that were formed in each salon and what new ideas might have each of them brought it from other salons and/ or other literary movements.

Coming back to the Digital Humanities class, me and my classmates, together with our professor, have started a network project on our own! We have used nodegoat to introduce our data and create a network visualization. The subject we chose was Egyptian Cinema and the categories of nodes we created were multiple: title of the film, author/ other authors/ main cast, release year (the very first, despite the country/ region). Then, for each person we have decided to attach some information which would help us study the social relationships between them. Thus, we added their spouse, date and place of birth and of death. Towards the end, our database looked like this, with 44 entries:

egyptian-cinema-jpg

The network project had two distinct parts: data gathering and network visualization. In order to gather all the information on each film, we created an excel table where we introduced most of the required details (title, author, year) and then introduced them in the Film or Person forms on Nodegoat. Once we filled in all the data for around 10 films each classmate, we were ready to test the network visualization functionalities of the website! In the picture below you can see a social relationship network visualization of the connections between different film directors. Where they are more connected (the example of Chahine Youssef, Mazar Ahmed, Kamal Hussein are relevant in this sense). Other authors, such as Mohamed Khan, Abouseif Salah and others seem to be less-connected with the rest of the Egyptian film industry.

network1

But wait! Nodegoat only displays the user’s input. The reason why some authors look to be less well-connected than others is simply because the “Person” category for each film was not filled accordingly, which is an important detail to remember. If the data collection would’ve been done more in depth, then our social network would have looked terribly different! (And we might have discovered that, in reality, they are all inter-connected with each other).

network2

Above, you can see the films – out of all the 44 entries – that were associated with the film director Mohamed Khan. In a specific research project (on the themes addressed in various films) – such a network would serve the analysis and exploration of the research subject by providing an easy visual mean of analyzing it.

All in all, NodeGoat was interesting to use. After also checking out Palladio, I believe both websites are handy and can generate basic good-quality visualizations. Some shortcomings of NodeGoat would be:

  • the large amount of manual labor one still has to put in gathering and introducing the data (instead of simply giving it a .csv file to read we have to manually introduce data on each film which takes up quite a lot of time). In comparison with a web-scraping software it looks terribly inefficient to have to google the information and organize it;
  • its limited visualization options (only geographical and social);
  • its unnecessarily pop-up menus and tabs which slow down the process of introducing data.

However, considering the fact that it is a start-up website and my class purpose was only learning through experience, NodeGoat helped in showing us the “behind-the-scenes” of network visualization.

Corpus Presentations on Text Analysis

On Monday, October 17th, my class had the presentation on the corpus analysis. After working on them for the duration of the past couple of weeks of classes, me and my classmates presented to each other, to the professor, and to our guests, the results of our work. Not only that each of us learnt from each presentation, but we also learnt from the feedback we received from the audience.

The five collections of text the analysis were based on were from different categories, which shows the diversity of material a digital humanist can work with. We had a really fun and interesting analysis of the top 10 billboard songs in the past decades which looked at the patterns of writing a hit and how it varied over the years. Then, I presented my corpus analysis and quantification which will serve for the development of a poem generator in a Bacovian style. Third, another classmate provided us with an extremely detailed overview and comparison of two dictionaries of the Costa Rican Slang. Another classmate followed with a presentation on the portrayal of Islam and Arab culture in the Western media in an attempt to create a tool to destroy the stereotypes (and which I found very interesting, inspiring, useful, and applicable in other fields). The last presentation was an analysis of Paris’ 2005 Race Riots and their portrayal in the media.

As I already said, these presentations showed the wide range of topics one can choose when they decide on a text analysis in a digital humanist way.

Looking back on my work, I think that the process of quantifying the chosen poems using AntConc was incredibly useful and gave me a starting point for the development of the program I am currently working on. While using Voyant it was also a good time to see the similarities and differences between what I knew about the poems and what the tool allows one to find about them (if they never read them before). Now that I think of it, I could have done a more detailed analysis of the connections between the words used inside the poems. This would not only help me when programming the generator, but also offer me and others a better understanding of how, when, and why Bacovia chooses his words.

As a closing thought, I am eagerly looking forward to work on other digital humanist projects and apply what I learnt through this experience and by looking at others’ works.

 

 

What happens in the DH world

I am at a point in the semester where, before I start working on my own digital projects, I need to look back and reflect on what I have discovered so far about the world of digital humanities. And what I have learnt is a lot in comparison with what I previously knew. I will do this, however, in two different ways: I will update my digital narrative and I will write down some remarks I have made or thought about during these almost two months since the beginning of the class. The latter first.

As I’ve seen until now, digital humanities covers – as humanities do – a whole lot of possibilities in terms of what to do with it, what (research) project to do with the knowledge it provides. One can choose to map out the places where the first historical sources were found (I would love to see such a project on Romania’s early history); another could simply digitize the collection of letters of a famous historical figure; or, if someone feels more ambitious, they could gather data on the Neo-modernist literature in ex-communist countries and see how the regime influenced the authors’ themes, ways of expressions, and purpose of writing.

Among all the possibilities there are in developing a digital humanities project, I have noticed there are some themes and ideas project initiators and researchers lean towards the most. Three such examples are: online collections, visual representations,  and research and process.

The first one – online collections – is one of the initial forms of digital projects which started when contemporary humanists learnt the benefits of having text in a digital form. Online collections presuppose the existence of a physical collection that would be photographed or scanned and then either  transcribed by the project team or digitized using an OCR software. Examples of online collections are: Arabic Collections Online, Early English Books Online,  Al-Maktaba Al-Shamela, Eighteen Century Collections Online, Blue Mountain Project.

How to read such collections? (and why can they be considered digital humanist projects – rather than simple collections of author work). Let’s take the example of EEBO (link above). After going through the search process on and reaching the desired book to view, the fun begins! There are two ways in which the text is presented on EEBO:

  • First, the photocopies. If the text hasn’t been digitized, the viewer is confronted with photocopies of a printed edition of the book. What is extremely valuable here, in having the book displayed in such way, is the preservation of forms, spelling, and grammar of those works. Many of the books available online today are “adjusted” (edited) so that the contemporary casual reader can understand them without further research. Moreover, any element that was intentionally preserved or any old form of a word that was mandatory to keep (e.g. to preserve the verse length), is more often than not explained in a footnote. This is not the case, however, with online collections of old books, where the creators of the collections only reproduce the works in their initial form.
  • Second, the photocopies might be accompanied by digitized text of their contents. For example, in the image below (a print screen from EEBO), we are given the digitized form of the work (a randomly selected discourse by Pierre Ayrault):

eebo

First, we must notice that the text formation was preserved (the writing in italics or bold). However, if we open the link above the title, which sends the user to the original photocopy of the text, we are faced with a completely different representation of it (seen below):

eebo

Being offered such representations is invaluable for humanist researchers who have little to no access to the original forms of the works they are studying. Not only that they are given a photo copy of the work they need (with all bonus annotations that could help guide their research), but they are also given a “translated” form of the text, which preserved the words form and (as much as possible) text formatting. This makes the reading process easier for our researcher, without taking away from him or her the incredibly interesting facts of the original form of the work on which to continue their research.

An extremely interesting feature of having such collections in digital formats is the different ways to access the contents of a book/ manuscript/ article, in order to further analyze data. For example, Austen Said contains some of Jane Austen’s most popular novels and allows the user to „explore Austen’s pattern of diction” such as word frequencies or other novel visualizations. Which brings us to…

… Visual representations – maps, graphs, charts etc. – which are also extremely popular among digital humanities projects. Maps are an interesting and useful tool for visualizing (and, consequently, better grasping) the different distributions of data out there. For example, this map from the Linguistics Landscapes of Beirut (project by David J. Wrisley) beautifully shows what one would take hours to learn: the different occurrences of Arabic, Latin, or mixed scripts appear in a delimited area of Beirut. By using colors to represent each type of script, the author(s) have significantly decreased a reader’s work. They no longer have to represent in their minds, while reading a text, where each of these scripts would be found. They are already given the visualization, making it possible for them to immediately start analyzing the no-longer-raw data. (e.g. to determine in which region – and attempt to explain why – the occurrence of Arabic script is higher than that of Latin script). Other such projects, that either output a map, or a chart, or even an interactive graphic are: Mapping the Republic of Letters, Digital Karnak: Timemap, Ibn Jubayr.

The third type of theme I have noticed to occupy a large space in the digital humanist world is the user-input based research project. This kind of project’s primary purpose, before diving into data analysis, is gathering data from the users (the large public). For example, Zooniverse asks its visitors to help recognize faces of wild animals – which would probably further lead to the development of an AI tool that would do that for us, but which lacks the database to operate in such way. This type of projects are valuable in the sense that they familiarize the user with the problems and topics digital humanists are studying and involve them in the process. This could easily mean that, once a person offers their input, he or she would be also interested in checking the progress and finalization of the project, and in supporting it all the way through – something that happens less often or not at all when one randomly comes across a digital project online. We tend to look over a finalized research, read some about it, and later almost forget it existed, unless we need it for other purposes.

This novelty is recognized as the so-called social turn in scholarship because the impact of the new research methods are, well, social. While engaging the wide public in their research projects and problems, scholars benefit in two ways. First, they gather the necessary data for the development of the research, and they disseminate in real time the process and the results of it. This dissemination happens because the users are the creators of their small piece of data – thus, they already know that much, but also because more often than not, the user will also be curious to find about the outcome of a project they also took part in. At the same time, the user benefits from the status of ‘collaborator’ and from a feeling of accomplished social responsibility.

As we have seen, there is a lot happening in the field of digital humanities, yet the processes are not always visible or engaging for the large public for more than the time period they have a separated interest in the topic explored. Let’s hope that, as the subject is getting wider academic recognition, people will also get more acquainted with it.

 

The Citizen Scholars in Context

The class on the 26th of September had a very inspiring topic (which lead me to day-dreaming about implementing a crowdsourcing system in my hometown sometime in the future).

Citizen Science is, according to Wikipedia, “a scientific research conducted, in whole or in part, by amateur or nonprofessional scientists,” or simply put “public participation in scientific research.” The citizen scholars are, thus, any person who takes part in the research and contributes to the progress of it. An example of such project is Zooniverse, where people are invited to help recognize and classify faces of animals that would further contribute to the development of an AI feature that computers will use in recognizing those faces automatically.

In class, not only that we discussed about the benefits of such a mechanism, but we even tried it ourselves! Crowdtranscription is a subcategory of Crowdsourcing which requires the user’s help with recognizing and transcribing text in scanned images. Me and my classmates, together with our professor, went to 18thConnect and edited the Memoir of a chart of the east coast of Arabia from Dofar to the Island Maziera. The document had been previously digitized by an OCR program, but as we learned last time, the digitization of a text comes with occasional errors which, so far, only a human brain can correct. It was an amazing activity for me as I could take responsibility and contribute to other people’s attempts to create great online resources for the large public. At the same time, I was able to notice, as last time, other errors that appear in the process of text digitization  and also what decisions one editor needs to make when transcribing and/ or editing a text. For example, he or she needs to decide whether to preserve the italics, size, indentations, or superscripts that appear in a text, or simply to replace them and motivate their decisions in a note.

Since the text was documenting the journey of a sailor around the Arabian coasts, a thought popped up in my mind. I realized I know very little about the old history of the geographical area I am currently living in (Abu Dhabi, United Arab Emirate). Then I realized there are an incredible number of research that can be conducted using citizen science. The UAE and the Arab World in general is still so little known to those outside of it, especially when it comes to fields such as history, language, literature, culture, and even (old or traditional) cuisine (if you are to ask me). A research on almost anything in these categories would contribute to the dissemination of information beyond the Arab borders, out into the curious and intrigued world. After a quick search on Google I found that there are some projects (currently undergoing or already finished) on the topic. For example, the team behind the Arabic language collection claims that their collection comprises more than 100,000 books and more than 15,000 manuscripts. Still, very little of this is available online, to the large public, and, which is more, even fewer must have been translated to English. However, good news are announced, as some of the manuscripts are going through the process of digitization.

Continue reading “The Citizen Scholars in Context”

Text Digitization and Ideas for Personal Corpus

On Wednesday, September 21st, me and my classmates in the Digital Humanities class, have tested out text digitization using Abbyy FineReader. It was enriching to see and learn the way in which historical documents, administrative papers, or any other sort of text in physical format, can be transformed into a piece of digital text, using only a scanner and an Optical Character Recognition (OCR) software.

The process is very easy, and can be done in very few steps:

  1. Scan the paper and save it as image (I think that both .jpg and .png work) or as a PDF
  2. Open the saved file using Abbyy FineReader
  3. Select the language of the text
  4. Command Abbyy FineReader to “read” the paper
  5. Adjust, select, delete as you prefer
  6. Export the text as a .RTF (for more efficiency when switching between operating systems) document

During class time I had the opportunity to both digitize a text and to analyze it, in order to observe the functionalities, as well as the shortcomings, of using OCR.

First of all, I was impressed at how easily it can reconstruct the text in a digital format, and the multitude of possibilities to select which parts of the text you further need for export. For example, Abbyy recognizes the page numbers, any annotations made, and even where the spine of a book (if such is the case) was scanned and lightens it. Also, close to the export process, the user can choose to preserve the format of the initial page. However, it does not correctly interpret the handwriting; one piece of handwriting scanned was interpreted as being written in Arabic.

Once the text is exported and opened as either a .doc or an .RTF, the even more interesting part of the digitization process is taking part. In class, I analyzed a short fragment of the Bible published in Romanian language, and a short piece of Arabic text. Screenshots of both are attached below:

screen-shot-2016-09-21-at-12-56-42-pm
Short fragment from the the Bible (Genesis) Published in 2001, in Bucharest, Romania. The language is Romanian.
screen-shot-2016-09-21-at-12-57-24-pm
A short piece of Arabic text, both in vowelled and non-vowelled script.

For the text in Romanian, a few things I noticed while looking over both the original text and the digital version of it are:

  • the export does not preserve the symbol of the cross, changing it (depending on the context) in “t” or ”f;”
  • it preserved, in some cases, the cursive ‘D’ in “Domnul,” whereas in other cases it replaced it with the copyright symbol ©;
  • it replaced some of the superscript letters (e.g. 1 instead of “i” or “!” instead of “1”);
  • it didn’t preserve all the whitespaces between words, joining them in unreadable syntagmas

For the Arabic text, the OCR interpreted the two scripts (vowelled and non-vowelled) as different ones (in the picture we can see that one is highlighted with green and the other in red). Far the most interesting comparison is present in the image below, where the export of the scanned image determined a new page layout (realigning the entire front page of a book to the right, according to Arabic writing standards), emphasizing some words over others, and not preserving the artistic aspect of the calligraphy (where it was present).

screen-shot-2016-09-21-at-12-59-32-pm
On the left: Original scanned page of a text in Arabic. The text is centered and stylized. On the right: the Abbyy FineReader processed and exported version of the image on the left. The style is not preserved, some words are in bold, and the alignment has shifted.

After this extremely fascinating exercise me and my classmates have done and after discovering some of the few things computing helps in dealing with text, I have thought about some project ideas and personal corpus to work with. Two ideas come up in my mind:

  1. An anthology of poems by Lucian Blaga (Romanian poet and philosopher) for which to find the most recurrent words/ series of words that are also associated with concepts in his philosophy; or
  2. A comparison between lines in screenplays and the actual dialogue that is used in a film (for those films for which I can find both data sets).

 

 

What I discovered about digital projects

I was very pleased to learn the different functionalities of a digital project, the forms it can take, and the purpose it serves.

During one of the seminars for my digital humanities course, I discovered that digital projects all come with a set of general steps “to follow” before and after their implementation. This means that goals needs to be set, methodologies must be determined, and resources engaged in them. There is usually a team working on one project, a process that observes the changes over time, as well as media (often visual) employed in it. The two main characteristics of such a project are the interdisciplinarity and the generative aspect of it. The first, means that it resorts to more than one discipline/ field of study in order to achieve its goal (for example, computing and literature, or history and data science), while the latter suggests that the aim of the project is trial before success, and that learning by doing and failure are two recurrent occurrences in the process.

The projects, then, differ in their form and purpose. Some of them use the online platform as their main mean of interaction with the user, whereas others mostly use the online medium to disseminate the information regarding the offline events and for networking purposes. The projects can be person-based, they can follow various historical periods in different locations (the routes of the envelopes of great philosophers of the Renaissance, http://republicofletters.stanford.edu/), or they can simply focus on one moment in time and space to recreate it online (e.g.  the World’s Fair in Italy that took place in 1911,  http://www.italyworldsfairs.org/). Last, but not least, the projects’ content may be front-ended or back-ended, which requires a different type of engagement of the user with the material. In the first case, the user is not concerned with studying all the data and drawing conclusions (as in the second case), but rather he or she is given the results (sometimes displayed in an interactive form) of a long-term research conducted by the digital project team.