Arabic Part of Speech Tagger
Specifications
A context-sensitive Arabic tagger suited for big corpus Arabic text.
A professional productivity tool for tagging Arabic text; it is hybrid meaning it rather does semantic and syntactic analysis (ordinary taggers work on morphology level with heavy reliance on linguistic principles and lexical characteristcs); the system incorporates a solid lexical analyzer (stemmer) that prepares the text to the morphological analyzer which in turn works on morphotactical and contextual probabilities before tagging any token.
The system approaches the issue of part of speech tagging using techniques that go beyond the superficial "Linguistic Mechanics" and string manipulation such as stemming, tokenization, morphological analysis or any other classical techniqes leveraging the ideosyncratic meaning in a completely new technology.
Supports tagsets as specified by CG, Brill, Penn Treebank, CLAWS, Brown, LOB, Khoja.
Export output to JSON, XML, SQL, TXT, for processing purposes; HTML, XLSX, PDF for viewing purposes.
This tool is designed to be used with large Arabic corpora, however, many simplified features are added to assist novice users making it ideal for use by academic purposes as well.
The system is equipped with a powerful tagset editor so users can edit the built-in tagsets or start compiling a completely new tagset of their own.
Arabic text transliteration (KATS) for readibilty for non-Arabic speakers.
Three Arabic varieties in the input (classic, MSA, colloquial).
Verbose dispay of tagged text include some 10 categories (root, clitics, tense, case, mood, voice, form, gloss, etc.)
Sliding window adjustment.
Over 30 multi-level granuled highly descriptive NER tagset.
Joining identical entities.
Related entities extension.
Lookup gazetteers.
Requirements
Specification | Description |
---|---|
Hardware platform | x86, 32bit, 64bit |
Operating system | Windows 10, Windows 8, Windows 7 |
Hard disc free space | 1GB minimum |
Processor | Pentium at 1GHz or higher |
Main memory (RAM) | 1GB or more |
Performance
Criteria | Details |
---|---|
Tagging speed | 100,000 (token/min) |
Multi-user | This software does not support multi-user environment. |
Highlights
MAPSSeman© PoS Tagger is an Arabic part of speech tagger that provides a powerful tool to tokenize Arabic corpus in multi-syntactic levels that exceed the abilities known for many other functional taggers.
Four different input ways suitable for texts of any size.
Utilizes compact tokenizer, lemmatizer, and morphological analyzer.
Supports more than 12 tagsets with robust built-in tokenizer
True "multi-pass tagging", this professional tool can tag Arabic 5 levels beyond any other commercially (or academic) taggers currently available.
Advanced and versatile tagset editor with 12 built-in known tagsets ready for use.
Highly customizable UI with many options.
Export to SQL, JSON, XML and worksheet.
Built-in XML viewer for already tagged corpus enables content browsing.
Downloads
Information
Reference: MSLTAG
Last updated: 21/1/2023