Arabic Part of Speech Tagger

Specifications

A context-sensitive Arabic tagger suited for big corpus Arabic text.

A professional productivity tool for tagging Arabic text; it is hybrid meaning it rather does semantic and syntactic analysis (ordinary taggers work on morphology level with heavy reliance on linguistic principles and lexical characteristcs); the system incorporates a solid lexical analyzer (stemmer) that prepares the text to the morphological analyzer which in turn works on morphotactical and contextual probabilities before tagging any token.

The system approaches the issue of part of speech tagging using techniques that go beyond the superficial "Linguistic Mechanics" and string manipulation such as stemming, tokenization, morphological analysis or any other classical techniqes leveraging the ideosyncratic meaning in a completely new technology.

Supports tagsets as specified by CG, Brill, Penn Treebank, CLAWS, Brown, LOB, Khoja.

Export output to JSON, XML, SQL, TXT, for processing purposes; HTML, XLSX, PDF for viewing purposes.

This tool is designed to be used with large Arabic corpora, however, many simplified features are added to assist novice users making it ideal for use by academic purposes as well.

The system is equipped with a powerful tagset editor so users can edit the built-in tagsets or start compiling a completely new tagset of their own.

Arabic text transliteration (KATS) for readibilty for non-Arabic speakers.

Three Arabic varieties in the input (classic, MSA, colloquial).

Verbose dispay of tagged text include some 10 categories (root, clitics, tense, case, mood, voice, form, gloss, etc.)

Sliding window adjustment.

Over 30 multi-level granuled highly descriptive NER tagset.

Joining identical entities.

Related entities extension.

Lookup gazetteers.

Requirements
Specification Description
Hardware platform x86, 32bit, 64bit
Operating system Windows 10, Windows 8, Windows 7
Hard disc free space 1GB minimum
Processor Pentium at 1GHz or higher
Main memory (RAM) 1GB or more
Performance
Criteria Details
Tagging speed 100,000 (token/min)
Multi-user This software does not support multi-user environment.
Screenshots

Output view interface

Output variants

Highlights

MAPSSeman© PoS Tagger is an Arabic part of speech tagger that provides a powerful tool to tokenize Arabic corpus in multi-syntactic levels that exceed the abilities known for many other functional taggers.

Four different input ways suitable for texts of any size.

Utilizes compact tokenizer, lemmatizer, and morphological analyzer.

Supports more than 12 tagsets with robust built-in tokenizer

True "multi-pass tagging", this professional tool can tag Arabic 5 levels beyond any other commercially (or academic) taggers currently available.

Advanced and versatile tagset editor with 12 built-in known tagsets ready for use.

Highly customizable UI with many options.

Export to SQL, JSON, XML and worksheet.

Built-in XML viewer for already tagged corpus enables content browsing.

Downloads

Buy NOW

Request a quote

Documentation

User Manual

Download

Evaluation copy

Information

Reference: MSLTAG

Last updated: 21/1/2023