Arabic Semantic Processing System (MAPSSeman©)


MAPSSeman© is the family name of our Arbic semantics processing package; a set of specialized modules tuned for applications such as information retrieval, document clustering, rule-based machine translation (RBMT), example based machine translation (EBMT) and many other applications.
Arabic Text Parser Arabic Corpus Arabic Root Extractor
Arabic Part of Speech Tagger Arabic Roots Arabic Verb Conjugator
Arabic Ontology Processor Arabic Stems Arabic Text Diacritizer
Loan Words Arabic Noun Inflector
Loan Terms Personal Names Retrieval
Colloquial Arabic Toponym Romanizer
English/Arabic Entity Names

Arabic Text Parser
Parsing is a key to accurate translation; once text is correctly disassembled it is much easier to transfer to a different (or simplified) language. Kalmasoft is in the process of developing a unique high-end Arabic parser, which can correctly analyze natural text, represent it as abstract elements and relationships, and then seed it to generate text in a new language. Kalmasoft's parser which utilizes a bottom-up parsing technique is language dependent for now, however, migration is possible requiring only a different rule set and number of lexicons for each additional language. Please refer to Arabic Text Parser for details.

MAPS orthographic processor
A screenshot of the program showing the output interface, you can view the technical specifications.

Arabic PoS Tagger
Part of speech tagging is the process of selecting the most likely sequence of syntactic categories for the words in a sentence. It determines grammatical characteristics of the words, such as part of speech, grammatical number, gender, person, etc. In the case of Arabic language, this task is not trivial since most of the words are ambiguous as a result of the absence of vowels.

For each word, we want at a minimum to identify its main lexical category (noun, verb etc.) and inflectional features if any (plural, past tense etc.). We might also identify some quasi-semantic features (proper noun) or even specify a word sense relative to some lexicon.

Kalmasoft's PoS tagger returns a syntax free solutions fore each token through extensive set of rules, the output is in XML format but CSV listing is also possible.

A tagged corpus is more useful than an untagged corpus because there is more information there than in the raw text alone. Once a corpus is tagged, it can be used to extract information from the corpus. This can then be used for creating dictionaries and grammars of a language using real language data. Tagged corpora are also useful for detailed quantitative analysis of text.

Please refer to this link Tag-set for a list of Arabic corpus tag set. You may also get a sample of our tagged corpus from this page Arabic corpus. Please refer to Arabic Part of Speech Tagger for details.

MAPS semantic processor
A screenshot of the program showing the output interface, you can view the technical specifications.

Arabic Ontology Processor
This module is currently being developed. Please refer Arabic Ontology Processor for details.

Home » MAPS » MAPS Semantic Processing System

Category Software | Reference MAPSSEMANL | Family MAPSSEMAN | Last updated 30/6/2011