Kalmasoft Databases and Glossaries

Synopsis

Kalmasoft maintains and manages a central repository of bilingual datasets of terms which are the largest multilingual set of databases, these datasets have been progressively collected since the last two decades with a major goal of achieving effective and accurate reference materials to assist in developing Arabic language support for software packages and to boost the ever-growing domains of the new web technologies that related to language engineering and NLP. We do keep our data updated and organized to ensure that only the correct information is always present whenever needed. All materials presented here are commercially available and can be customized and fine-tuned to meet your specific requirements.

 Anthroponyms (personal names)

Database of Arabic Given Names
  • Thousands of Arabic given names in native script with Arabic diacritical marks (short vowels), Roman transcription, and gender fields; transcription here follows the common way a name is spelled in English but other transcription systems are available too. A separate database of frequency statistics on each name can be supplied upon request.
Database of Arabic Surnames
  • This is perhaps the most interesting database to those involved in developing NER applications or name scoring software packages, same as above with Arabic diacritical marks and full Roman transcription. A separate frequency statistics on each name can be supplied as a separate database upon request.
Database of Arab Full Names
  • A database of millions of real world Arabic names collected from many sources and supplied with gender and locale fields covering the entire Arabic region as well as additional three countries known to be under strong influence of Arabic culture.
Database of Romanized Arabic Names
  • An extended database of Arabic names of exceeds 1 million records romanized to 6 languages English, German, Dutch, Spanish, Italian, and French, based on 300K origianl Arabic names.
Database of Arabic Name Variants
  • A huge 40 millions records database of all Arabic names with their all possible roman variants, based on 300K origianl Arabic names.
Database of Transcripted Arabic Names
  • A database of 3.6M records based on 300K original Arabic names canonically transcripted to 12 languages Amharic, Hebrew, Greek, Japanese, Rusian, Armenian, Georgian, Hindi, Thai, Bengali, Tagalog, and Malyalam.
Database of Names of Arabic Origins
  • Few hundreds of names of Arabic origins mostly from counties known to have been under the umbrella of Islamic culture e.g. Turkey, India, Spain, Persia and few African countries.
Database of non-Arab Names
  • Names from all over the world, what is new in this database is Arabic transcriptions which are added to every name, the gender field is also added, most of the records are showing additional information e.g. locale and meaning.
Database of Unique and Indigenous Names
  • Unique and indigenous names from all Arabic speaking countries.
Database of Heterophonic Names
  • A database of names that share the same spelling across multiple languages but with different meaning and pronunciation.

 Entities

Database of Famous Names and Celebrities
  • Famous Names and Celebrities from over 100 countries.
Database of Entity Names
  • Full suite of bilingual databases covering almost all aspects of life e.g. sports, politics, science, and more; each database may have additional fields e.g. "type" but ,basically, all have the "locale" field present.
Database of Arabic Entity Keywords
  • Unique and valuable, this is a database of all entity keywords found in Arabic, it also comprises the Arabic counterparts of entity keywords like company, society, union, factory committee and other terms, it is very useful for NER applications, web crawlers, search engines and CLIR applications.
Database of Street Names
  • First of its kind, this database of odonyms (street names) long been awaited now available in Arabic; world street names of more than two million geographic entities; very useful for information retrieval systems e.g. NER applications, web crawlers, search engines and CLIR applications.
Database of Arabic Colloquial Entity Names
  • A valuable listing of indigenous and rare names found in Arabic countries, the biggest database of its kind now available electronically.

 Acronyms and Initialisms

Database of Acronyms and Abbreviations
  • Thousands of acronyms and abbreviations cover many fields like aviation, aerospace, military, sports, education, science, engineering, media, law, recreation and entertainment, and more.

 Toponyms (place names)

Database of Arabic Place Names
  • Highly organized gazetteer (populated places only) of thousands of Arabic place names ready for publishing on the internet with many Arabic transcription systems.
Database of World Place Names
  • World gazetteer (populated places only) database with multiple information including latitude and longitude in DMS format together with feature type and other minor fields; Arabic transcription of the feature's name is added for each place name in this database; this is very useful for localized software, entity scoring applications, navigation software, other geographic software.
World Gazetteer
  • World gazetteer is a full featured database of geographic information concerning the geographic makeup of all world countries and natural physical features, such as mountains, waterways, or roads. This database is a complement to the above two databases.
Database of Geographic Terms
  • Most of geographic terms can be found here, this database is compiled to be used with electronic dictionaries and MT applications.
Database of Famous Places
  • This is part of the above database, it has all common geographic features e.g. valley, creek, summit etc. as well as some world famous features including oceans, continents and major cities.

 Fauna and Flora

Database of Domestic Animal Names
  • under construction
Database of Domestic Plant Names
  • under construction

 Orthographic Databases

Arabic Corpus
  • Tagged Arabic corpus encoded either in UTF-8, Windows 1256, or in Kalmasoft generic transliteration system "KATS"; essential for MT application based on statistical techniques, and as a reference for POS taggers and text parsers.
Database of Arabic Roots
  • Extended database of Arabic roots; the database is in two forms in native script coded using either UTF-8 or Windows 1256 coding or in Kalmasoft native transliteration system "KATS" which is using ASCII characters to facilitate text processing; this is essential for every root-based Arabic processing system in particular POS taggers and inflection generation systems.
Database of Arabic Full-form Verbs
  • Arabic full-form verbs that actually found in ordinary running text, this database includes all regular conjugated verbs.
Database of Arabic Full-form Nouns
  • Arabic full-form nouns that actually found in ordinary running text, this database includes all regular inflected nouns.
Arabic Morphological Lexicon (Aramolex)
  • Aramolex is an Arabic morphological lexicon, a dictionary database generated to serve as a full-form lexicon for the entire regular vocabulary for the Arabic language beside the other non-regular surface forms of the Arabic vocabulary.
Monolingual Database of Arabic Stems
  • Presented in native script using UTF-8 or Windows 1256 coding or in Kalmasoft native transliteration system "KATS" which is using plain ASCII characters, this database is a major asset for many kinds of NLP applications, usages include conjugation generators, spell checkers and more.
Database of Arabic words of Amharic origin
  • Full information about thousands of Amharic loanwords found in the current day Arabic and also classical Arabic. This database is compiled for the purposes of CLIR and other IR disciplines.
Bilingual Database of Loan Words in Arabic
  • Full information about 5,000+ of loanwords of multiple origins including English, French, Turkish, etc. coded in native Arabic script and Kalmasoft "KATS" with their possible original parallels, this database is good for text abridging, parser and other kinds of NLP applications.
Bilingual Database of Loan Terms in Arabic
  • Full information about 50,000+ of loanterms of multiple origins including English, Spanish, Italian, French, Turkish, etc. coded in native Arabic script and Kalmasoft "KATS" with their possible original parallels, this database is good for text abridging, parser and other kinds of NLP applications.

 Semantic Databases

Database of Arabic Idiomatic Expressions
  • Hundreds of Arabic idiomatic expressions with their meanings and English parallels; important for MT applications.
Database of Arabic Newspaper Expressions
  • Thousands of Arabic newspaper expressions with their meanings and English parallels; important for MT applications.
Database of Arabic Proverbs
  • Thousands of Arabic proverbs with their English parallels; important for MT and TMM.

 Ontology and Semantic Databasess

Database of Arabic Noun Ontology
  • Arabic Noun ontology database (under construction).
Database of Arabic Verb Ontology
  • Arabic verb ontology database (under construction).

 Taxonomy

Database of Arabic Noun Taxonomy
  • Arabic Noun taxonomy database (under construction).

Home » Databases and Glossaries
Category Dictionaries | Reference DBASES | Entries +250,000,000 | Last updated 22/5/2020