Arabic Morphological Lexicon (Aramolex)

Synopsis

Aramolex is an Arabic morphological lexicon, a dictionary database generated to serve as a full-form lexicon for the entire regular vocabulary for the Arabic language beside the other non-regular surface forms of the Arabic vocabulary, the scope of the lexicon is the Modern Standard Arabic (MSA) with all possible morphological contents in the scope, the dictionary is continuously undergoing optimization, maintenance, and extensions accompanied by subsequent filtering of candidates not recognized in the literature.


Please download a PDF specification manual of the lexicon dataset, the manual comes fully with all necessary information, no extra reference is needed, (250 KB, 30 pages).

Arabic is a non-concatinative language, it can be described as derivational language meaning that the morphotactics depend rather on affixation i.e. adding morphemes onto the word without changing the radical or the “root", that is, preserving the order of the verb core binyanim, this results in the highly regular inflectional/derivational patterns distinguishing the Arabic language.

It also gives the way to the highly prolific vocabulary that characterizes the Arabic morphology to be reproduced using any suitable production tool carefully designed to work on a root-based algorithm; for example a root like [ksr] "to break" may be seeded into the module to yield roughly 30,000 conjugations which is the thorough listing of the verb paradigm plus all inflectional morphemes applied using specific orthographic and heuristic rules, this is theoretically true for any other triconsonantal sound root with minor exceptions.

Current inflection/derivation tools are limited to the third person masculine singular past tense form which serves as the "dictionary form" used to identify any verb instead of the infinitive the way appears in the English dictionaies for example; this basic form has no but a minor usage in NLP related operations e.g. Tokenization, Stemming or POS tagging not serving as practical. So it is important to generate the "real world" derivational candidates found in ordinary literature texts by having an exhaustive morphological lexicon.


ID: Unique ID, character+8 digits ID number (C########)Vocalized: vocalized Arabic, word surface form fully diacritized
Category: main category part of speechKATS: KATS version of "Vocalized" field for software coding compatibility purposes
Subcategory: extended POS subcategory for sorting and statistical purposesArguments: Person/Number/Gender combination marking the subject and/or object
POS: part of speech code, KTagset is used for tagging purposesAffixes: prefixes and suffixes, pairs in brackets
Root: Arabic root, concatenated radicalGloss: POS glossary, English description of POS field
Lemma: Arabic headword (canonical form)

Please download larger samples in TXT format from Aramolex sample.

ID Category POS SubPOS Root Lemma Vocalized KATS Arguments Affixes Gloss
N001697 N NCGI •1 كسر كَاسِر كَاسِرِهِمَا kaAsirihimaA 3DM•SM [-,ihimaA] APP-GEN
N001698 N NCGI •1 كسر كَاسِر كَاسِرِهِمَا kaAsirihimaA 3DF•SM [-,ihimaA] APP-GEN
N001699 N NCGI •1 كسر كَاسِر كَاسِرَتِهِمَا kaAsiratihimaA 3DM•SF [-,atihimaA] APP-GEN
N001700 N NCGI •1 كسر كَاسِر كَاسِرَتِهِمَا kaAsiratihimaA 3DF•SF [-,atihimaA] APP-GEN
N001701 N NCGI •1 كسر كَاسِر كَاسِرِهِم kaAsirihim 3PM•SM [-,ihim] APP-GEN
N001702 N NCGI •1 كسر كَاسِر كَاسِرِهِن kaAsirihin 3PF•SM [-,ihin] APP-GEN
N001703 N NCGI •1 كسر كَاسِر كَاسِرَتِهِم kaAsiratihim 3PM•SF [-,atihim] APP-GEN
N001704 N NCGI •1 كسر كَاسِر كَاسِرَتِهِن kaAsiratihin 3PF•SF [-,atihin] APP-GEN
N001705 N NCGI •1 كسر كَاسِر كَاسِرَيْهِ kaAsiray&hi 3SM•DM [-,ay&hi] APP-GEN
N001706 N NCGI •1 كسر كَاسِر كَاسِرَيْهَا kaAsiray&haA 3SF•DM [-,ay&haA] APP-GEN
N001707 N NCGI •1 كسر كَاسِر كَاسِرَتَيْهِ kaAsiratay&hi 3SM•DF [-,atay&hi] APP-GEN
N001957 N NPNI •1 كسر مَكْسُور مَكْسُورَاي mak&suwraAy 1SM•DM [-,Ay] PPP-NOM
N001958 N NPNI •1 كسر مَكْسُور مَكْسُورَاي mak&suwraAy 1SF•DM [-,aAy] PPP-NOM
N001959 N NPNI •1 كسر مَكْسُور مَكْسُورَتَاي mak&suwrataAy 1SM•DF [-,ataAy] PPP-NOM
N001960 N NPNI •1 كسر مَكْسُور مَكْسُورَتَاي mak&suwrataAy 1SF•DF [-,ataAy] PPP-NOM
N001961 N NPNI •1 كسر مَكْسُور مَكْسُورَانَا mak&suwraAnaA 1DM•DM [-,AnaA] PPP-NOM
N001962 N NPNI •1 كسر مَكْسُور مَكْسُورتَانَا mak&suwrtaAnaA 1DF•DM [-,taAnaA] PPP-NOM
N001963 N NPNI •1 كسر مَكْسُور مَكْسُورَتَانَا mak&suwrataAnaA 1DM•DF [-,ataAnaA] PPP-NOM
N001964 N NPNI •1 كسر مَكْسُور مَكْسُورَتَانَا mak&suwrataAnaA 1DF•DF [-,ataAnaA] PPP-NOM
N001965 N NPNI •1 كسر مَكْسُور مَكْسُورَانَا mak&suwraAnaA 1PM•DM [-,AnaA] PPP-NOM
N001966 N NPNI •1 كسر مَكْسُور مَكْسُورَانَا mak&suwraAnaA 1PF•DM [-,AnaA] PPP-NOM
N001967 N NPNI •1 كسر مَكْسُور مَكْسُورَتَانَا mak&suwrataAnaA 1PM•DF [-,ataAnaA] PPP-NOM
V007206 V VISA T2 نصر نَصَّرَ تُنَصِّرَهُ tunaS~irahu 2SM3SM [tu, hu] IMF-SUB-ACT
V007207 V VISA T2 نصر نَصَّرَ تُنَصِّرَهَا tunaS~irahaA 2SM3SF [tu, haA] IMF-SUB-ACT
V007208 V VISA T2 نصر نَصَّرَ تُنَصِّرِيهِ tunaS~iriyhi 2SF3SM [tu, yhi] IMF-SUB-ACT
V007209 V VISA T2 نصر نَصَّرَ تُنَصِّرِيهَا tunaS~iriyhaA 2SF3SF [tu, yhaA] IMF-SUB-ACT
V007210 V VISA T2 نصر نَصَّرَ تُنَصِّرَهُمَا tunaS~irahumaA 2SM3DM [tu, humaA] IMF-SUB-ACT
V007211 V VISA T2 نصر نَصَّرَ تُنَصِّرَهُمَا tunaS~irahumaA 2SM3DF [tu, humaA] IMF-SUB-ACT
V007212 V VISA T2 نصر نَصَّرَ تُنَصِّرِيهِمَا tunaS~iriyhimaA 2SF3DM [tu, yhimaA] IMF-SUB-ACT
V007213 V VISA T2 نصر نَصَّرَ تُنَصِّرِيهِمَا tunaS~iriyhimaA 2SF3DF [tu, yhimaA] IMF-SUB-ACT
V007214 V VISA T2 نصر نَصَّرَ تُنَصِّرَهُم tunaS~irahum 2SM3PM [tu, hum] IMF-SUB-ACT
V007215 V VISA T2 نصر نَصَّرَ تُنَصِّرَهُن tunaS~irahun 2SM3PF [tu, hun] IMF-SUB-ACT
V007216 V VISA T2 نصر نَصَّرَ تُنَصِّرِيهِم tunaS~iriyhim 2SF3PM [tu, yhim] IMF-SUB-ACT
V008920 V VPIA T1 نصر نَصَرَ نَصَرْنَاهَا naSar&naAhaA 1DF3SF [-,naAhaA] PRF-IND-ACT
V008921 V VPIA T1 نصر نَصَرَ نَصَرْنَاهُمَا naSar&naAhumaA 1DM3DM [-,naAhumaA] PRF-IND-ACT
V008922 V VPIA T1 نصر نَصَرَ نَصَرْنَاهُمَا naSar&naAhumaA 1DM3DF [-,naAhumaA] PRF-IND-ACT
V008923 V VPIA T1 نصر نَصَرَ نَصَرْنَاهُمَا naSar&naAhumaA 1DF3DM [-,naAhumaA] PRF-IND-ACT
V008924 V VPIA T1 نصر نَصَرَ نَصَرْنَاهُمَا naSar&naAhumaA 1DF3DF [-,naAhumaA] PRF-IND-ACT
V008925 V VPIA T1 نصر نَصَرَ نَصَرْنَاهُم naSar&naAhum 1DM3PM [-,naAhum] PRF-IND-ACT
V010406 V VPSA T3 نصر نَاصَرَ نَاصَرْنَاكُمَا naASar&naAkumaA 1DF2DF [-,naAkumaA] PRF-SUB-ACT
V010407 V VPSA T3 نصر نَاصَرَ نَاصَرْنَاكُم naASar&naAkum 1DM2PM [-,naAkum] PRF-SUB-ACT
V010408 V VPSA T3 نصر نَاصَرَ نَاصَرْنَاكُن naASar&naAkun 1DM2PF [-,naAkun] PRF-SUB-ACT
V010409 V VPSA T3 نصر نَاصَرَ نَاصَرْنَاكُم naASar&naAkum 1DF2PM [-,naAkum] PRF-SUB-ACT
V010410 V VPSA T3 نصر نَاصَرَ نَاصَرْنَاكُن naASar&naAkun 1DF2PF [-,naAkun] PRF-SUB-ACT
V015010 V VPIP T1 نصر نَصَرَ نُصِرْنَ nuSir&na •••3PF [-,na] PRF-IND-PAS
V015011 V VPIP T1 نصر نَصَرَ نُصِرُوا nuSiruwA •••3PM [-,wA] PRF-IND-PAS
V015012 V VPIP T1 نصر نَصَرَ نُصِرْنَ nuSir&na •••3PF [-,na] PRF-IND-PAS
V015013 V VPIP T1 نصر نَصَرَ نُصِرْتُ nuSir&tu •••1SM [-,tu] PRF-IND-PAS
V015014 V VPIP T1 نصر نَصَرَ نُصِرْتُ nuSir&tu •••1SF [-,tu] PRF-IND-PAS
V015015 V VPIP T1 نصر نَصَرَ نُصِرْتُ nuSir&tu •••1SM [-,tu] PRF-IND-PAS
V015016 V VPIP T1 نصر نَصَرَ نُصِرْتُ nuSir&tu •••1SF [-,tu] PRF-IND-PAS
V015017 V VPIP T1 نصر نَصَرَ نُصِرْنَا nuSir&naA •••1DM [-,naA] PRF-IND-PAS

Home » Databases  » Orthography » Aramolex
Category Databases | Reference Aramolex | Entries 200,500,000+ | Last updated 3/5/2019