Arabic Named Entity Extractor
Specifications
A context-sensitive Arabic Named Entity Extractor suited for prestages of unstructured Arabic text classification.
Professional productivity tool for extraction of Arabic text; the system incorporates morphological analyzer which in turn works on morphotactical and contextual probabilities before extraction of any token.
The system approaches the issue of named entity recognition by using rule based techniques that go beyond simple string manipulation such as stemming, tokenization, morphological analysis or any other classical techniqes leveraging the ideosyncratic meaning in a completely efficient technology.
Export output to JSON, XML, SQL, TXT, for processing purposes; HTML, XLSX, PDF for viewing purposes.
This tool is designed to be used with large Arabic corpora, however, many simplified features are added to assist novice users making it ideal for use by academic purposes as well.
The system is equipped with a powerful tagset editor so users can edit the built-in tagsets or start compiling a completely new tagset of their own.
Three Arabic varieties in the input (classic, MSA, colloquial).
Over 20 multi-level granuled highly descriptive NER tagset.
Supports semantic joining, entities semantic extension, semantic bridging, as well as optional gazetteers lookup.
Requirements
Specification | Description |
---|---|
Hardware platform | x86, 32bit, 64bit |
Operating system | Windows 10, Windows 8, Windows 7 |
Hard disc free space | 1GB minimum |
Processor | Pentium at 1GHz or higher |
Main memory (RAM) | 1GB or more |
Performance
Criteria | Details |
---|---|
Extraction speed | 100,000 (names/min) |
Multi-user | This software does not support multi-user environment. |
Highlights
Arabic Named Entity Extraction and content classification system that provides a powerful tool to recognize and tokenize Arabic entities.
Implements advanced classification algorithm, compact tokenizer, and morphological analyzer to support categorizing text to more than 20 predefined subject domains.
True "multi-pass extraction" based to output on multilevel classification.
Three different input ways suitable for texts of any size including processing files in batches.
Export six file formats TXT, XLSX, JSON, HTML, SQL, and XML.
Maintains the proper encoding for your multilingual corpus.
Highly customizable UI with many options.
Downloads
Information
Reference: MNERSYS
Last updated: 21/1/2023