Chair: Theodorus Fransen
As more and more medieval texts are being digitized, it becomes reasonable to use computational methods which have proved reliable for modern languages in various NLP tasks such as morphological analysis, topic modelling or stylometry. This paper describes a series of experiments using cluster analysis for automatic dating of medieval Irish texts, comparing different algorithms of clusterisation and drawing special attention to the correspondence of automatically obtained results with those stemming from human judgement.
The texts used for the research are digitised editions of Early Irish narrative published on UCC's Corpus of Electronic Texts (CELT) website. The results of the experiments include similarity trees, which show how close the texts are to each other, and feature sets, which represent distinctions between texts and serve as a basis for assigning a chronological label to each of them. Texts were analysed both at word level and character level. In the case of words, we cannot distinguish between temporal and topical differences, while going down to the level of characters allows us to concentrate on orthographic features which play a very important role in text dating (leaving aside semantics). Both configurations reveal interesting tendencies, probably invisible to the human eye.
The relative uniformity of Old Irish has led some scholars to believe that the language as it appears in extant manuscripts conforms to a strict literary standard that was not representative of the spoken language. This suggestion accounts for the lack of obvious dialectical evidence in the Old Irish corpus that would naturally be expected from a language that was so geographically widespread. Yet, despite this assertion, ample variations are present, with persistent oddities that cannot be explained by diachronic or other linguistic change. In his Grammar of Old Irish, Rudolf Thurneysen briefly addresses the question of dialects and provides a handful of these variations from the Old Irish glosses as potential dialectical evidence. His comments have served as the basis of further investigations into the question of dialects, though as of yet no firm conclusions have been reached.
My current doctoral work broadly concerns synchronic language-variation in Old Irish, specifically within the glosses of Würzburg, Milan and St Gall, with a view to identifying evidence of dialectical variation within Old Irish, using Thurneysen’s suggestions as a starting point. This paper will discuss a selection of variations found in the Old Irish glosses, their relative frequencies within the glosses, and the implications of these variations on the possibility of dialect or, perhaps, register. The paper will likewise discuss the challenges faced when attempting to categorise a set of variants as evidence of dialect, or register.