Corpus Presentation – Oct. 17

It all started with me wanting to create a poem generator on a Romanian’s poet – named George Bacovia – writing style for one of my projects for the Introduction to Digital Humanities class.  For that, I had to discover his patterns of writing such as: most frequently used words, the average length of a poem, the average vocabulary variation. I decided to comprise a corpus of some of Bacovia’s most famous poems and quantify them in the hope of getting the values I needed.

According to Voyant, the document has a total of: 2,057 words and 736 unique word forms.

From the Excel file: 2083, and 1404.*

Before that, I used the tools Voyant offers so that I can offer you a better visualization of his work (more specific, the selected poems for my corpus).

WordCloud of the most frequent words in all 30 poems. The wordcloud helps us make an idea on what the poems (and the volume/ author’s entire work) are about, without even reading a single one of them: autumn, rain, snow, town, leaves etc. These words already prompt us to a landscape portrait of late autumn/ early winter. The wordcloud includes the colors to describe such landscapes (white, pale, red, violet, black), the places observed (town, park), the elements of movement (rain, snow, ghosts) etc.



dark blue: rain; green: autumn; pink: town; purple: snow; light blue: black


TextArc of the document. The TextArc shows which words were used the most (by creating a size contrast) and in correlation with which other words were they used (the arc pulls them closer to those words on the ellipse).

Then I used AntConc to proceed with the quantification of Bacovia’s work. *

Screenshot of the Excel file with data gathered using AntConc.


Poem Ranking: Shortest to Longest


Poem Ranking: Least Unique Words to Most Unique Words


Poem Ranking: Vocabulary Variation: Least to Most Varied



Random Poem Attempt: (program under development)

Once I had the quantification of Bacovia’s work completed, I could start working on the poem generator program. A piece of the initial code can be find below.