Thousands of Arabic given names in native script with Arabic diacritical marks (short vowels), English transcription, and gender fields; transcription here follows the common way a name is spelled in English but other transcription systems are available too. A separate database of frequency statistics on each name can be supplied upon request.
This is perhaps the most interesting database to those involved in developing NER applications or name scoring software packages, same as above with Arabic diacritical marks and full English transcription. A separate frequency statistics on each name can be supplied as a separate database upon request.
A database of millions of real world Arabic names collected from many sources and supplied with gender and locale fields covering the entire Arabic region as well as additional three countries known to be under strong influence of Arabic culture.
Names from all over the world, what is new in this database is Arabic transcriptions which are added to every English name, the gender field is also added, most of the records are showing additional information e.g. locale and meaning.
Full suite of bilingual databases covering almost all aspects of life e.g. sports, politics, science, and more; each database may have additional fields e.g. "type" but ,basically, all have the "locale" field present.
Unique and valuable, this is a database of all entity keywords found in Arabic, it also comprises the Arabic counterparts of entity keywords like company, society, union, factory committee and other terms, it is very useful for NER applications, web crawlers, search engines and CLIR applications.
First of its kind, this database of odonyms (street names) long been awaited now available in Arabic; world street names of more than two million geographical entities; very useful for information retrieval systems e.g. NER applications, web crawlers, search engines and CLIR applications.
World gazetteer (populated places only) database with multiple information including latitude and longitude in DMS format together with feature type and other minor fields; Arabic transcription of the feature's name is added for each place name in this database; this is very useful for localized software, entity scoring applications, navigation software, other geographical software.
World gazetteer is a full featured database of geographical information concerning the geographical makeup of all world countries and natural physical features, such as mountains, waterways, or roads. This database is a complement to the above two databases.
Tagged Arabic corpus encoded either in UTF-8, Windows 1256, or in Kalmasoft generic transliteration system "KATS"; essential for MT application based on statistical techniques, and as a reference for POS taggers and text parsers.
Extended database of Arabic roots; the database is in two forms in native script coded using either UTF-8 or Windows 1256 coding or in Kalmasoft native transliteration system "KATS" which is using ASCII characters to facilitate text processing; this is essential for every root-based Arabic processing system in particular POS taggers and inflection generation systems.
Presented in native script using UTF-8 or Windows 1256 coding or in Kalmasoft native transliteration system "KATS" which is using plain ASCII characters, this database is a major asset for many kinds of NLP applications, usages include conjugation generators, spell checkers and more.
Full information about 5,000+ of loanwords of multiple origins including English, French, Turkish, etc. coded in native Arabic script and Kalmasoft "KATS" with their possible original parallels, this database is good for text abridging, parser and other kinds of NLP applications.
Full information about 50,000+ of loanterms of multiple origins including English, Spanish, Italian, French, Turkish, etc. coded in native Arabic script and Kalmasoft "KATS" with their possible original parallels, this database is good for text abridging, parser and other kinds of NLP applications.