Arabic Morphological Lexicon (Aramolex)

Synopsis

Aramolex is an Arabic morphological lexicon, a dictionary database generated to serve as a full-form lexicon for the entire regular vocabulary for the Arabic language beside the other non-regular surface forms of the Arabic vocabulary, the scope of the lexicon is the Modern Standard Arabic (MSA) with all possible morphological contents in the scope, the dictionary is continuously undergoing optimization, maintenance, and extensions accompanied by subsequent filtering of candidates not recognized in the literature.


Please download a PDF specification manual of the lexicon dataset, the manual comes fully with all necessary information, no extra reference is needed, (250 KB, 25 pages).

Arabic is a non-concatinative language, it can be described as derivational language meaning that the morphotactics depend rather on affixation i.e. adding morphemes onto the word without changing the radical or the “root", that is, preserving the order of the verb core binyanim, this results in the highly regular inflectional/derivational patterns distinguishing the Arabic language.

It also gives the way to the highly prolific vocabulary that characterizes the Arabic morphology to be reproduced using any suitable production tool carefully designed to work on a root-based algorithm; for example a root like [ksr] "to break" may be seeded into the module to yield roughly 30,000 conjugations which is the thorough listing of the verb paradigm plus all inflectional morphemes applied using specific orthographic and heuristic rules, this is theoretically true for any other triconsonantal sound root with minor exceptions.

Current inflection/derivation tools are limited to the third person masculine singular past tense form which serves as the "dictionary form" used to identify any verb instead of the infinitive the way appears in the English dictionaies for example; this basic form has no but a minor usage in NLP related operations e.g. Tokenization, Stemming or POS tagging not serving as practical. So it is important to generate the "real world" derivational candidates found in ordinary literature texts by having an exhaustive morphological lexicon.


ID: Unique ID, character+8 digits ID number (V########)Vocalized: vocalized Arabic, word surface form fully diacritized
Category: main category part of speechKATS: KATS version of "Vocalized" field for software coding compatibility purposes
Subcategory: extended POS subcategory for sorting and statistical purposesArguments: Person/Number/Gender combination marking the subject and/or object
POS: part of speech code, used for tagging purposesAffixes: prefixes and suffixes, pairs in brackets
Root: Arabic root, concatenated radicalGloss: POS glossary, English description of POS field

Please download larger samples in TXT format from Aramolex sample.

ID Category Subcategory POS Root Vocalized KATS Arguments Affixes Gloss
N001697 N X-X-X NCGS كسر كَاسِرِهِمَا kaAsirihimaA SM3DM [-,ihimaA] APP-GEN
N001698 N X-X-X NCGS كسر كَاسِرِهِمَا kaAsirihimaA SM3DF [-,ihimaA] APP-GEN
N001699 N X-X-X NCGS كسر كَاسِرَتِهِمَا kaAsiratihimaA SF3DM [-,atihimaA] APP-GEN
N001700 N X-X-X NCGS كسر كَاسِرَتِهِمَا kaAsiratihimaA SF3DF [-,atihimaA] APP-GEN
N001701 N X-X-X NCGS كسر كَاسِرِهِم kaAsirihim SM3PM [-,ihim] APP-GEN
N001702 N X-X-X NCGS كسر كَاسِرِهِن kaAsirihin SM3PF [-,ihin] APP-GEN
N001703 N X-X-X NCGS كسر كَاسِرَتِهِم kaAsiratihim SF3PM [-,atihim] APP-GEN
N001704 N X-X-X NCGS كسر كَاسِرَتِهِن kaAsiratihin SF3PF [-,atihin] APP-GEN
N001705 N X-X-X NCGD كسر كَاسِرَيْهِ kaAsiray&hi DM3SM [-,ay&hi] APP-GEN
N001706 N X-X-X NCGD كسر كَاسِرَيْهَا kaAsiray&haA DM3SF [-,ay&haA] APP-GEN
N001707 N X-X-X NCGD كسر كَاسِرَتَيْهِ kaAsiratay&hi DF3SM [-,atay&hi] APP-GEN
N001957 N X-X-X NPND كسر مَكْسُورَاي mak&suwraAy DM1SM [-,Ay] PPP-NOM
N001958 N X-X-X NPND كسر مَكْسُورَاي mak&suwraAy DM1SF [-,aAy] PPP-NOM
N001959 N X-X-X NPND كسر مَكْسُورَتَاي mak&suwrataAy DF1SM [-,ataAy] PPP-NOM
N001960 N X-X-X NPND كسر مَكْسُورَتَاي mak&suwrataAy DF1SF [-,ataAy] PPP-NOM
N001961 N X-X-X NPND كسر مَكْسُورَانَا mak&suwraAnaA DM1DM [-,AnaA] PPP-NOM
N001962 N X-X-X NPND كسر مَكْسُورتَانَا mak&suwrtaAnaA DM1DF [-,taAnaA] PPP-NOM
N001963 N X-X-X NPND كسر مَكْسُورَتَانَا mak&suwrataAnaA DF1DM [-,ataAnaA] PPP-NOM
N001964 N X-X-X NPND كسر مَكْسُورَتَانَا mak&suwrataAnaA DF1DF [-,ataAnaA] PPP-NOM
N001965 N X-X-X NPND كسر مَكْسُورَانَا mak&suwraAnaA DM1PM [-,AnaA] PPP-NOM
N001966 N X-X-X NPND كسر مَكْسُورَانَا mak&suwraAnaA DM1PF [-,AnaA] PPP-NOM
N001967 N X-X-X NPND كسر مَكْسُورَتَانَا mak&suwrataAnaA DF1PM [-,ataAnaA] PPP-NOM
V007206 V X-T-V02 VISA نصر تُنَصِّرَهُ tunaS~irahu 2SM3SM [tu, hu] IMF-SUB-ACT
V007207 V X-T-V02 VISA نصر تُنَصِّرَهَا tunaS~irahaA 2SM3SF [tu, haA] IMF-SUB-ACT
V007208 V X-T-V02 VISA نصر تُنَصِّرِيهِ tunaS~iriyhi 2SF3SM [tu, yhi] IMF-SUB-ACT
V007209 V X-T-V02 VISA نصر تُنَصِّرِيهَا tunaS~iriyhaA 2SF3SF [tu, yhaA] IMF-SUB-ACT
V007210 V X-T-V02 VISA نصر تُنَصِّرَهُمَا tunaS~irahumaA 2SM3DM [tu, humaA] IMF-SUB-ACT
V007211 V X-T-V02 VISA نصر تُنَصِّرَهُمَا tunaS~irahumaA 2SM3DF [tu, humaA] IMF-SUB-ACT
V007212 V X-T-V02 VISA نصر تُنَصِّرِيهِمَا tunaS~iriyhimaA 2SF3DM [tu, yhimaA] IMF-SUB-ACT
V007213 V X-T-V02 VISA نصر تُنَصِّرِيهِمَا tunaS~iriyhimaA 2SF3DF [tu, yhimaA] IMF-SUB-ACT
V007214 V X-T-V02 VISA نصر تُنَصِّرَهُم tunaS~irahum 2SM3PM [tu, hum] IMF-SUB-ACT
V007215 V X-T-V02 VISA نصر تُنَصِّرَهُن tunaS~irahun 2SM3PF [tu, hun] IMF-SUB-ACT
V007216 V X-T-V02 VISA نصر تُنَصِّرِيهِم tunaS~iriyhim 2SF3PM [tu, yhim] IMF-SUB-ACT
V008920 V X-T-V01 VPIA نصر نَصَرْنَاهَا naSar&naAhaA 1DF3SF [-,naAhaA] PRF-IND-ACT
V008921 V X-T-V01 VPIA نصر نَصَرْنَاهُمَا naSar&naAhumaA 1DM3DM [-,naAhumaA] PRF-IND-ACT
V008922 V X-T-V01 VPIA نصر نَصَرْنَاهُمَا naSar&naAhumaA 1DM3DF [-,naAhumaA] PRF-IND-ACT
V008923 V X-T-V01 VPIA نصر نَصَرْنَاهُمَا naSar&naAhumaA 1DF3DM [-,naAhumaA] PRF-IND-ACT
V008924 V X-T-V01 VPIA نصر نَصَرْنَاهُمَا naSar&naAhumaA 1DF3DF [-,naAhumaA] PRF-IND-ACT
V008925 V X-T-V01 VPIA نصر نَصَرْنَاهُم naSar&naAhum 1DM3PM [-,naAhum] PRF-IND-ACT
V010406 V X-T-V03 VPSA نصر نَاصَرْنَاكُمَا naASar&naAkumaA 1DF2DF [-,naAkumaA] PRF-SUB-ACT
V010407 V X-T-V03 VPSA نصر نَاصَرْنَاكُم naASar&naAkum 1DM2PM [-,naAkum] PRF-SUB-ACT
V010408 V X-T-V03 VPSA نصر نَاصَرْنَاكُن naASar&naAkun 1DM2PF [-,naAkun] PRF-SUB-ACT
V010409 V X-T-V03 VPSA نصر نَاصَرْنَاكُم naASar&naAkum 1DF2PM [-,naAkum] PRF-SUB-ACT
V010410 V X-T-V03 VPSA نصر نَاصَرْنَاكُن naASar&naAkun 1DF2PF [-,naAkun] PRF-SUB-ACT
V015010 V X-T-V01 VPIP نصر نُصِرْنَ nuSir&na 3PF [-,na] PRF-IND-PAS
V015011 V X-T-V01 VPIP نصر نُصِرُوا nuSiruwA 3PM [-,wA] PRF-IND-PAS
V015012 V X-T-V01 VPIP نصر نُصِرْنَ nuSir&na 3PF [-,na] PRF-IND-PAS
V015013 V X-T-V01 VPIP نصر نُصِرْتُ nuSir&tu 1SM [-,tu] PRF-IND-PAS
V015014 V X-T-V01 VPIP نصر نُصِرْتُ nuSir&tu 1SF [-,tu] PRF-IND-PAS
V015015 V X-T-V01 VPIP نصر نُصِرْتُ nuSir&tu 1SM [-,tu] PRF-IND-PAS
V015016 V X-T-V01 VPIP نصر نُصِرْتُ nuSir&tu 1SF [-,tu] PRF-IND-PAS
V015017 V X-T-V01 VPIP نصر نُصِرْنَا nuSir&naA 1DM [-,naA] PRF-IND-PAS

Home » Databases  » Orthography » Aramolex
Category Databases | Reference Aramolex | Entries 200,500,000+ | Last updated 2/9/2018