Arabic Named Entity Extractor
Synopsis
Named Entity Recognition "NER" is the process of selecting the most likely sequence of informative lexical items in a sentence. The process determines syntactic and semantic characteristics of the words from unstructured text, such as person, place, organization, date etc. and also classifies them to subcategories according to the taxonomy implemented.
Preview
Kalmasoft NERSys is an Arabic Named Entity Recognition/Extraction tool aimed at preparing Arabic annotated corpora; a context-sensitive rule-based solution utilizing hand-crafted set of comprehensive semantic and syntactic rules to deal with unstructured Arabic texts, the output is an annotated structured XML or JSON formatted corpus but SQL database and TXT are among the other output alternatives. For the purposes of quick review HTML, XLSX, and PDF are also available.
NERSys is designed to prepare Arabic structured datasets since documents of unstructured text are difficult to make use of in their raw nature in NLP applications like MT, IR, Entity linking, Semantic search, and search engines because there is more information there than in the raw text alone.
NERSys also implements advanced classification algorithm to categorize text to more than 20 predefined subject domains.
Resources
The sample given below contains multilingual text. Without proper rendering support, you may see question marks, boxes, or other symbols instead of the correct intended characters, you may either get the suitable fonts from our lingual support page or download the TXT version for which no extra arrangement is required.