Chair: Catherine McKenna
Welsh marks indefinite direct objects with lenition: Gwelodd Mair dŷ [Sawº Mair house (tŷ)] ‘Mair saw a house’. Welsh also applies 'syntactic mutation' in a number of other circumstances: gwelwyd tŷ ar y bryn [was.seenº house on the hill], but with the prepositional phrase interposed: gwelwyd ar y bryn dŷ [was.seenº on the hill house] ‘a house was seen on the hill’.
Formal Arabic marks indefinite direct objects with -an (indefinite accusative): ra’at miryam bait.an [sawºf Miryam house.indef.acc] ‘Miryam saw a house’. “Faulty indefinite accusative” is applied frequently by proficient users of Formal Arabic where there should be indefinite nominative: ru’iya ‘alà t-tall bait.(un) [was.seenºm on the-hill house.(indef.nom)] ‒> ru’iya ‘alà t-tall bait.an [was.seenºm on the-hill house(indef.acc)] ‘a house was seen on the hill’. Every time 'faulty indefinite accusative' is found in Arabic, the Welsh equivalent would have 'syntactic mutation', which seems to be the result of an identical head-trigger-dependent rule marking the dependent, accounting for all cases of syntactic mutation in Welsh, and for both correct and faulty indefinite accusative in Arabic.
Welsh appears to have gone from sandhi to object-marking (case) to intercalated trigger (configuration). Arabic, too, appears to have gone from case to configuration, with the same trigger rule. A widely studied rule of Welsh thus helps to explain a persistent, but little studied 'faulty' pattern in Formal Arabic, and the likely evolution of that 'faulty' rule in Arabic, in turn, may shed light, in this typological exercise, on the origin and development of the Welsh rule.
Welsh syntax has been researched using different frameworks during the last decades: e.g., HPSG by Borsley, and LFG by Sadler. Dependency syntax as such, however has been less frequently applied to Welsh (though it has been used by Tesnière).
In computational linguistic, dependencies syntax is gaining momentum, since dependency trees can be the starting point for further, semantic, analysis such like semantic role labelling. Many tools (based on machine learning) exist to produce correct dependency trees from raw text. In order to do so, however, these tools need to be trained on annotated corpora, or treebanks.
The Universal Dependency initiative under Nivre has defined a set of 19 part-of-speech categories as well some 30 syntactic relations like 'nominal subject', 'direct object' or 'determiner' in order to be able to annotate sentences of different languages in a coherent manner. Evidently some of the syntactic relations do not apply for all languages, and some languages need more specific relations. For instance the Welsh predicative yn (as in mae hi'n dda or bu Gwyn yn athro) is not a prepositional relation, or an auxiliary relation.
For the time being 80 different treebanks in about 50 language have been annotated by linguists. Currently the Celtic languages are represented by an Irish version (1020 sentences) and a Breton one (888). Since Welsh is the most widely spoken Celtic language, with the largest number of native speakers, a Welsh treebank seems indispensable.
This paper will present the state of the work, the way how sentences are chosen and preprocessed (including lemmatisation and part-of-speech tagging). The linguistic modelisation, as well as similarities and differences (like infinitives, verbnouns or composite tenses)
My illustrated talk will reconsider the evidence behind the conflicting hypotheses, put forward by historians, archaeologists and linguists, about where a distinct Celtic dialect first emerged from Indo-European. I hope to concentrate on the most recent research, up to 2019.