|MAPSSeman Family Members||Related Databases||Other software of interest|
|Arabic Text Parser||Arabic Corpus||Arabic Text Stemmer|
|Arabic Part of Speech Tagger||Database of Arabic Roots||Arabic Verb Conjugator|
|Arabic Ontology Processor||Database of Arabic Stems||Arabic Text Diacritizer|
|Database of Loan Words||Arabic Noun Inflector|
|Database of Loan Terms||Personal Names Retrieval System|
|Database of Colloquial Arabic||Geographical Names Romanizer|
|Database of English/Arabic Entity Names|
|Arabic Text Parser|
|Parsing is a key to accurate translation; once text is correctly disassembled it is much easier to transfer to a different (or simplified) language.
Kalmasoft is in the process of developing a unique high-end Arabic parser, which can correctly analyze natural text, represent it as abstract elements and relationships, and then seed it to generate text in a new language. Kalmasoft's parser which utilizes a bottom-up parsing technique is language dependent for now, however, migration is possible requiring only a different rule set and number of lexicons for each additional language. Please refer to Arabic Text Parser for details.|
|Arabic PoS Tagger|
|Part of speech tagging is the process of selecting the most likely sequence of syntactic categories for the words in a sentence. It determines grammatical characteristics of the words, such as part of speech, grammatical number, gender, person, etc. In the case of Arabic language, this task is not trivial since most of the words are ambiguous as a result of the absence of vowels.|
For each word, we want at a minimum to identify its main lexical category (noun, verb etc.) and inflectional features if any (plural, past tense etc.). We might also identify some quasi-semantic features (proper noun) or even specify a word sense relative to some lexicon.
Kalmasoft's PoS tagger returns a syntax free solutions fore each token through extensive set of rules, the output is in XML format but CSV listing is also possible.
A tagged corpus is more useful than an untagged corpus because there is more information there than in the raw text alone. Once a corpus is tagged, it can be used to extract information from the corpus. This can then be used for creating dictionaries and grammars of a language using real language data. Tagged corpora are also useful for detailed quantitative analysis of text.
Please refer to this link Tag-set for a list of Arabic corpus tag set. You may also get a sample of our tagged corpus from this page Arabic corpus. Please refer to Arabic Part of Speech Tagger for details.
|Arabic Ontology Processor|
|This module is currently being developed. Please refer Arabic Ontology Processor for details.|