Orthography - Annotated Arabic Corpus


Annotated Arabic copora.
MAPSOrtho Family MembersOther databasesSoftware
Arabic Text Diacritizer Arabic Corpus Arabic Part of Speech Tagger
Arabic Noun Inflector Arabic Roots Arabic Text Parser
Arabic Text Stemmer Arabic Stems Geographic Names Romanizer
Arabic Verb Conjugator Loan Words  
Arabic Lemmatizer Loan Terms  
  Colloquial Arabic  
  English/Arabic Entity Names  

A highly precise set of grammatically tagged Arabic corpora, these POS-tagged lexical resources are invaluable for users who perform researches on the corpus, other linguists with other interests and differing perspectives may exploit this resource in many other language desciplines.


Raw input text

Unvocalized KATS version

Annotated corpus tex====

Annotated KATS version

Please download more organized samples in TXT format from dbsamples.htm.

Arabic Full-form Monolingual Dictionary

This huge lexicon is automatically generated using Kalmasoft linguistic engines; a very useful tool for research intended for statistical NLP. It generally conforms to MSA with an insight to the most regular content of the whole Arabic language including most dominant grammatical classes, this resource is valuable for purposes include corpus annotation, abstraction, and analysis in general. [300 Million entries]

Entry: Arabic entryClass: Entry grammatical class
Vocalized: Vocalized ArabicSense: Entry sense
KATS: KATS version of field "Vocalized" Kalmasoft KATSType: Grammatical class
IPA: International Phonetic Alphabet

Please download more organized samples in TXT format from dbsamples.htm.

ID Entry Vocalized KATS IPA Class Sense Type


Home » Databases » Annotated Arabic Corpus
Category Databases | Reference DBCORPUS | Entries 50+ | Last updated 5/01/2015