Arabic Named Entity Extractor


A context-sensitive Arabic Named Entity Extractor suited for prestages of unstructured Arabic text classification.

Professional productivity tool for extraction of Arabic text; the system incorporates morphological analyzer which in turn works on morphotactical and contextual probabilities before extraction of any token.

The system approaches the issue of named entity recognition by using rule based techniques that go beyond simple string manipulation such as stemming, tokenization, morphological analysis or any other classical techniqes leveraging the ideosyncratic meaning in a completely efficient technology.

Export output to JSON, XML, SQL, TXT, for processing purposes; HTML, XLSX, PDF for viewing purposes.

This tool is designed to be used with large Arabic corpora, however, many simplified features are added to assist novice users making it ideal for use by academic purposes as well.

The system is equipped with a powerful tagset editor so users can edit the built-in tagsets or start compiling a completely new tagset of their own.

Three Arabic varieties in the input (classic, MSA, colloquial).

Over 20 multi-level granuled highly descriptive NER tagset.

Supports semantic joining, entities semantic extension, semantic bridging, as well as optional gazetteers lookup.

Output view interface

Specification Description
Hardware platform x86, 32bit, 64bit
Operating system Windows 10, Windows 8, Windows 7
Hard disc free space 1GB minimum
Processor Pentium at 1GHz or higher
Main memory (RAM) 1GB or more
Criteria Details
Extraction speed 100,000 (names/min)
Multi-user This software does not support multi-user environment.

Output view interface

Output variants

Kalmasoft Named Entity Labels, please refer to Kalmasoft Tagset


Arabic Named Entity Extraction and content classification system that provides a powerful tool to recognize and tokenize Arabic entities.

Implements advanced classification algorithm, compact tokenizer, and morphological analyzer to support categorizing text to more than 20 predefined subject domains.

True "multi-pass extraction" based to output on multilevel classification.

Three different input ways suitable for texts of any size including processing files in batches.

Export six file formats TXT, XLSX, JSON, HTML, SQL, and XML.

Maintains the proper encoding for your multilingual corpus.

Highly customizable UI with many options.



Request a quote


User Manual


Evaluation copy


Reference: MNERSYS

Last updated: 21/1/2023