TerminologyWe use some new terms through this document which we would like to clarify. Each task done by MAPS is called "process", the following table shows pairs of languages in the first four columns, the column headers shows the "direction" of the process; the name of the process itself is shown in the last column. Please follow the links for details on each process.
|Source Language||Input Script||Target Language||Output Script||Process Name|
|Arabic||Unvocalized Arabic||Arabic||Vocalized Arabic||Vocalization|
|Non-Arabic||Language dependent||Arabic||Arabic Alphabet||Retrieval (Arabic)|
|Arabic||Arabic Alphabet||Non-Arabic||Language dependent||Transcription (Phonemic)|
|Arabic||Arabic Alphabet||Latin script based||Phonemic Latin||Romanization|
|Arabic||Arabic Alphabet||Multilingual||Language dependent||Retrieval (Multilingual)|
|Multilingual||Language dependent||Arabic||Arabic Alphabet||Arabicization|
Features and specifications
|A diagram showing MAPS Suite families and their sub-modules|
|Module pages layout||Family layout|
| MAPS pages are laid out as follows:||
|Arabic is one of the UN official languages and is read from right-to-left; Arabic language has an inflectional system that is known for its rich vocabulary and complex morphology. The Arabic Abjad consists of twenty eight letters, twenty five of which are consonants and the remaining three letters are long vowels. A distinguishing feature of Arabic is that no letters are used to represent short vowels. Instead, they are represented by short strokes called diacritics, which are placed either above or below the preceding consonant.|
Another feature is that Arabic text is written unvocalized except for classical themes and Koranic text, this is a major stumbling stone for any NLP system. Kalmasoft diacritizing module is developed to accomplish full and semi-vocalization process of the raw input text. Please refer to Arabic Text Diacritizer for details. This module is currently being developed. Please refer Arabic Diacritizer for details.
|Arabic Root Extractor|
|Arabic is a highly inflectional language, meaning it uses an effective system to generate and derive words. Stemming is the process of removing any affixes from such words, and reducing those words to their roots. Our full-fledged morphological analyzer utilizes a light stemmer which does not only affix removal but also root extraction, it does this using complicated techniques to deal with all forms of the assimilated, hollow, and defect tokens, the morphological analyzer does the pattern recognition necessary to complete the task and returns the correct form of the root or stem. A root dictionary is implemented to boost the system which can be used in Arabic monolingual document retrieval. Please refer to Arabic Text Stemmer/Root Extractor for detailsThis module is currently being developed. Please refer Arabic Stemmer for details.|
|Arabic is a non-concatinative language, it can be described as derivational language meaning that the morphotactics depend rather on affixation i.e. adding morphemes onto the word without changing the root, that is, preserving the core order of the verb binyanim, this results in the highly regular inflectional pattern distinguishing the language.|
The Inflection Generator (or simply conjugator) is a full-form lexical production module built on a root-based algorithm; a root like [ksr] "to break" may be seeded into the system yielding roughly 30,000 conjugations this is theoretically true for any other triconsonantal sound root.
What Kalmasoft offers here will be not the thorough listing of the verb conjugation paradigm but rather the software which can then be used to create the whole inflectional model of the language back again or just the conjugation table of a specific form of verb; binary scripts are available and can be obtained too. Please refer to the list of tagged roots for further information. This module is currently being developed; please refer Arabic Conjugator for details.
|Arabic noun declension is the process of inflecting nouns to their sub-grammatical categories, MAPS inflects every single Arabic noun to more than dozen of categories including the classes e.g. Verbal Noun, Noun of Instrument, Active Participle, Passive Participle, Noun of Place, Noun of Time and three cases Accusative, Nominative, and Genitive; the first group are directly derived from their parallel verbs since they are grammatically classified as nouns.|
Other stem inherent or generic characteristics e.g. semantic classification are not reflected in the table below, they have rather been dealt with in a direct hard-coding basis throughout the declension. This module is currently being developed; please refer Arabic Inflector for details.
|Arabic POS Tagger|
|POS tagging is the process of assigning a part-of-speech tag such as noun, verb, pronoun, preposition, adverb, adjective or other tags to each word in a sentence. It reflects the word syntactic category based on its context for the purposes of resolving lexical ambiguity.|
This is a rule based module that makes use of an extensive knowledge base of rules developed our linguists to define precisely when to apply each form of tags.Please refer Arabic POS Tagger for details.
|Parsing is a key to accurate translation - once text is correctly dis-assembled, it is much easier to transfer to a different language. Kalmasoft has developed a unique Parser for Arabic language which can correctly analyze natural text, represent it as abstract elements and relationships, and then seed it to generate text in a new language. This technology is language dependent but requires only few changes, different rule set, and dictionary for each additional language. This module is currently being developed. Please refer to Arabic Parser for details.|
|Arabic Ontology Processor|
|This module is currently being developed. Please refer to Arabic Ontology Processor for details.|
|Personal Names Romanizer|
|Both Arabic and English lack some of each other’s sounds and letters. For example, there is no perfect match for pharyngeals [Haa', Ein] or uvulars [Qaf, Khaa', Ghain] in English and (P, V) in Arabic. This leads to ambiguities during the transliteration process. Hence, if there is an Arabic name with one of these sounds, variant spellings will result in English.|
This in fact a major stumbling block when converting non-western language characters like Arabic Alphabet to Roman characters; it is especially challenging for Arabic-to-English conversion because the Arabic alphabet uses only consonants and rarely use some diacritics for disambiguation, making it difficult to accurately return a single English version of an Arabic name input.
Our system takes into account these peculiarities and supports many transliteration standards including UNGEGN, ALA-LC, DIN31635, SATTS, ISO233 as well as some academic transliteration systems like Buckwalter, Khoja and Qalam; this makes it essential as an integral transliteration module in NLP applications like Machine Translation (MT) and Cross-Language Information Retrieval (CLIR). Please refer Name Romanizer for details.
|Arabic Name Indexor/Geolocater|
|This module makes use of the truth that different geographical regions have different name patterns and most have specific set of names unique to it beside other patterns that are in common e.g. the names "حفني" /ħaf'ni /, "مرسي" /mursi /, and "مدبولي" /mad'bu:li / are unique to Egypt while the names "أحمد" /ʔħ'mad/ "محمد" /muħam'mad/ can not be assigned to specific geographical region since they share the top very high frequency in all regions in the Arabic speaking countries among other names like "علي" /ʕli /. The module also gives some hints "gist" about gender and guesses on the religion for some non-Arabic origin names e.g. "جرجس" / girgis/, "مينا" /mi:na/, and "حنا" /ħan'na/ which denote Coptic or Christian names common in Egypt and Iraq. MAPS uses this embedded module to give high and reliable results. Please refer Name Indexer for details.|
|Personal Names Arabicizer|
|Transliteration is the process of formulating a representation of words in one language using the alphabet
of another language; the challenge of importing non-Arabic "foreign" names into Arabic language is not less important than the reverse process; this is called "Arabicization" in MAPS terminology, "Arabicizing" is the process of representing names written using scripts other than Arabic alphabet; this process does not actually impose such great challenges for language pairs that employ very close alphabets and sound systems such as Spanish/English or French/Spanish.
Distinction here should be made between two important points:|
|Personal Names Transcription System|
|Representing Arabic names written in Arabic script in different languages and vice versa is a task always been described as a challenge to most cross-language content management and data mining systems; MAPS works not only for Romanization but also for a dozen of languages, the integral transliteration system make it possible to take names in native script or Romanized form, perform the transcription and return results in the target native script, the output "transcribed names" are formatted in a special way directed to readers of the particular geographic region; For instance the Arabic name "بُرْهَان" is rendered "Бурхан" for Russian, "Burhan" for English speakers, Czech or Spanish, "Borhane" for Francophone and "Borhan" for German and Polish. Please refer to the Global Personal Name Transcription System for details and samples.|
|Personal Names Retrieval System|
|The system is capable of regenerating names back to their original languages and return the result in the target language native script; this re-building capability makes it ideal for applications like Named Entity Recognition (NER), Cross Language Information Retrieval (CLIR); retrieval feature works only for Arabic and partial section of Latin names for now. This module makes heavy usage of detailed conversion rules and heuristics to correctly re-build each input name no matter how bad the original name is damaged by the transliteration process. Please refer to Name Retrieval System for detailed output sample.|
|Geographic Names Romanizer|
|This module does the mostly required processes of Arabic place name Romanization in more than 10 official Romanization systems; Please refer Geographical Names Arabicizer for details.|
|Geographical Names Arabicizer|
|This module is currently being developed. Please refer Geographical Names Arabicizer for details.|
|Geographic Names Transcription System|
|This module is currently being developed. Please refer Geographic Names Transcription System for details.|
|Geographic Names Retrieval System|
|This module is currently being developed. Please refer Geographic Names Retrieval System for details.|