Kalmasoft Databases and Glossaries

Synopsis

Kalmasoft maintains and manages a central repository of datasets which are the largest multilingual databases; these datasets have been progressively collected since the last two decades with a major goal of achieving effective and accurate reference materials to assist in developing linguistic support for different software packages and to boost the ever-growing domains of the new web technologies that related to language engineering and NLP. We do keep our data updated and organized to ensure that only the correct information is always present whenever needed.

All materials presented here are commercially available and can be customized and fine-tuned to meet your specific requirements.

Potential applications include:

- Anti Money Laundering
- Customer Data Management
- Employment Diversity
- Electronic Health Records
- Fraud Detection
- Identity Matching
- Identity Resolution
- Immigration Control
- Intelligence Analysis
- KYC and Due Diligence
- Law Enforcement
- Passenger Screening
- Voters list correction

Information

Reference: DBASES

Total entries 250,000,000+

Last updated: 14/1/2023

Anthroponyms (personal names)
KDBAGIV
Database of Arabic Given Names

Thousands of Arabic given names in native script with Arabic diacritical marks (short vowels), Roman transcription, and gender fields; transcription here follows the common way a name is spelled in English but other transcription systems are available too. A separate database of frequency statistics on each name can be supplied upon request.

KDBASN
Database of Arabic Surnames

The most interesting database to those involved in developing NER applications or name scoring software packages, same as above with Arabic diacritical marks and full Roman transcription. A separate frequency statistics on each name can be supplied as a separate database upon request.

KDBGIVE
Database of Romanized Arabic Names (Extended)

An extended database of Arabic names of exceeds 1 million records romanized to 6 languages English, German, Dutch, Spanish, Italian, and French, based on 300K original Arabic names.

KDBFULL
Database of Arab Full Names

A database of millions of real world Arabic names collected from many sources and supplied with gender and locale fields covering the entire Arabic region as well as additional three countries known to be under strong influence of Arabic culture.

KDBNAO
Database of Names of Arabic Origins

Few hundreds of names of Arabic origins mostly from counties known to have been under the umbrella of Islamic culture e.g. Turkey, India, Spain, Persia and few African countries.

KDBATRANS
Database of Transcripted Arabic Names

A database of 3.6M records based on 300K original Arabic names canonically transcripted to 12 languages Amharic, Hebrew, Greek, Japanese, Russian, Armenian, Georgian, Hindi, Thai, Bengali, Tagalog, and Malyalam.

KDBVAROM
Database of Arabic Name Variants

A huge 40 millions records database of all Arabic names with their all possible roman variants, based on 300K original Arabic names.

KDBUNIQ
Database of Unique and Indigenous Names

Unique and indigenous names from all Arabic speaking countries.

KDBWNAN
Database of World Names

Names from all over the world, what is new in this database is Arabic transcriptions which are added to every name, the gender field is also added, most of the records are showing additional information e.g. locale and meaning.

KDBCOUNTER
Database of Counterintuitive Names

A database of names whose counterintuitive pronunciations make it difficult to spell or read even if obviously written in Latin characters since the phonemic characteristics of the original language affect the way these names are used.

KDBHETERO
Database of Heterophonic Names

A database of names that share the same spelling across multiple languages but with different meaning and pronunciation.

KDBETHAR
Database of Arabized Ethiopic Names

A database of Ethiopic names with Arabic parallel names that closely share the same meaning.

Toponyms (place names)
KDBTOPO
Database of Arabic Place Names

Highly organized gazetteer (populated places only) of thousands of Arabic place names ready for publishing on the internet with many Arabic transcription systems.

KDBTOPO
Database of World Place Names

Highly organized gazetteer (populated places only) of thousands of Arabic place names ready for publishing on the internet with many Arabic transcription systems.

KDBGAZET
Database of World Gazetteer

World gazetteer is a full featured database of geographic information concerning the geographic makeup of all world countries and natural physical features, such as mountains, waterways, or roads. This database is a complement to the above two databases.

KDBODON
Database of World Street Names

First of its kind, this database of odonyms (street names) long been awaited now available in Arabic; world street names of more than two million geographic entities; very useful for information retrieval systems e.g. NER applications, web crawlers, search engines and CLIR applications.

KDBGEOFAM
Database of World Landmarks

This is part of the above database, it has all common geographic features e.g. valley, creek, summit etc. as well as some world famous features including oceans, continents and major cities.

KDBGEOTERM
Database of Geographic Terms

Most of geographic terms can be found here, this database is compiled to be used with electronic dictionaries and MT applications.

Entity Names Databases
KDBFN
Database of Famous and Celebrity Names

Famous Names and Celebrities from over 100 countries.

KDBENTTY
Database of Entity Names

Full suite of bilingual databases covering almost all aspects of life e.g. sports, politics, science, and more; each database may have additional fields e.g. "type" but ,basically, all have the "locale" field present.

KDBKK
Database of Arabic Entity Keywords

Unique and valuable, this is a database of all entity keywords found in Arabic, it also comprises the Arabic counterparts of entity keywords like company, society, union, factory committee and other terms, it is very useful for NER applications, web crawlers, search engines and CLIR applications.

KDBCOLNOS
Database of Arabic Colloquial Entity Names

A valuable listing of indigenous and rare names found in Arabic countries, the biggest database of its kind now available electronically.

KDBAUTO
Database of Automobile Names

Industrial entities e.g. consumer electronics, heavy industries, construction, automobiles, housing, information technology, medical equipment, military industries, etc. Good for MT software developers, web based search engines.

Acronyms and Initialisms
KDBACR
Database of Acronyms and Abbreviations

Thousands of acronyms and abbreviations cover many fields like aviation, aerospace, military, sports, education, science, engineering, media, law, recreation and entertainment, and more.

Orthographic Databases
KDBCORPUS
Arabic Corpus

Tagged Arabic corpus encoded either in UTF-8, Windows 1256, or in Kalmasoft generic transliteration system "KATS"; essential for MT application based on statistical techniques, and as a reference for POS taggers and text parsers.

KDBORTHO
Database of Arabic Roots

Extended database of Arabic roots; the database is in two forms in native script coded using either UTF-8 or Windows 1256 coding or in Kalmasoft native transliteration system "KATS" which is using ASCII characters to facilitate text processing; this is essential for every root-based Arabic processing system in particular POS taggers and inflection generation systems.

KDBORTHO
Database of Arabic Full-form Verbs

Arabic full-form verbs that actually found in ordinary running text, this database includes all regular conjugated verbs.

KDBORTHO
Database of Arabic Full-form Nouns

Arabic full-form nouns that actually found in ordinary running text, this database includes all regular inflected nouns.

ARAMOLEX
Arabic Morphological Lexicon (Aramolex)

Aramolex is an Arabic morphological lexicon, a dictionary database generated to serve as a full-form lexicon for the entire regular vocabulary for the Arabic language beside the other non-regular surface forms of the Arabic vocabulary.

KDBLW
Bilingual Database of Loan Words in Arabic

Full information about 5,000+ of loanwords of multiple origins including English, French, Turkish, etc. coded in native Arabic script and Kalmasoft "KATS" with their possible original parallels, this database is good for text abridging, parser and other kinds of NLP applications.

KDBLT
Bilingual Database of Loan Terms in Arabic

Full information about 50,000+ of loanterms of multiple origins including English, Spanish, Italian, French, Turkish, etc. coded in native Arabic script and Kalmasoft "KATS" with their possible original parallels, this database is good for text abridging, parser and other kinds of NLP applications.

KDBAMHARA
Database of Arabic words of Amharic origin

Full information about thousands of Amharic loanwords found in the current day Arabic and also classical Arabic. This database is compiled for the purposes of CLIR and other IR disciplines.

KDBSYARA
Database of Arabic Words of Syriac Origin

Full information about thousands of Syriac loanwords found in the current day Arabic and also classical Arabic. This database is compiled for the purposes of CLIR and other IR disciplines.

KDBAMSYR
Database of Amharic and Syriac common words

Full information about thousands of words that are common to both Amharic and Syriac

KDBAMSYAR
Database of Arabic Loan Words from Amharic and Syriac

Full information about thousands of loanwords from both Amharic and Syriac found in the current day Arabic and also classical Arabic. This database is compiled for the purposes of CLIR and other IR disciplines.

Semantic Databases
KDBGAIE
Database of Arabic Idiomatic Expressions

Hundreds of Arabic idiomatic expressions with their meanings and English parallels; important for MT applications.

KDBPRVRB
Database of Arabic Proverbs

Thousands of Arabic proverbs with their English parallels; important for MT and TMM.

KDBGANT
Database of Arabic Newspaper Expressions

Thousands of Arabic newspaper expressions with their meanings and English parallels; important for MT applications.

Fauna and Flora
KDBFAUNA
KDBFLORA
Ontology and Semantic Databases
KDBONTNOS
Database of Arabic Noun Ontology

Arabic Noun ontology database (under construction).

KDBONTVRB
Database of Arabic Verb Ontology

Arabic verb ontology database (under construction).

Taxonomy Databases
KDBATAXON
Database of Arabic Noun Taxonomy

Arabic Noun taxonomy database (under construction).