Semantics - Arabic Part of Speech Tagger
Kalmasoft PoS Tagger is the answer to most of the problems related to Arabic corpus tagging, a context-sensitive rule-based solution hand-crafted set of comprehensive syntactic rules to deal with Arabic datasets, the output is a structured XML or JSON format but SQL database and CSV are among the other alternatives. Kalmasoft PoS Tagger is designed to prepare Arabic annotated corpus since tagged corpus is more useful than an untagged corpus because there is more information there than in the raw text alone. Once a corpus is tagged, it can be used to extract information. This can then be used for creating dictionaries and grammars of a language using real language data. Tagged corpora are also useful for detailed quantitative analysis of text. The system's output -processed corpus- is therefore suited for machines rather than human although there exists a view interface for testing purposes which works well for short text; output can also be saved as HTML or TXT file. |
![]()
A screenshot of MAPSSeman PoS Tagger interface, you can view the technical specifications. You may also DOWNLOAD Evaluation copy. |
V: verb | A: adjective | C: conjunction |
N: noun | Pr: preposition | a: adverb |
d: demonstrative | r: relative | F: foreign word |
O: ordinal number | E: verbal noun | : |
R: pronoun | T: typographic error | X: No Solution |
  | ||
P: perfective | S: singular | F: feminine |
I: imperfective | D: dual | 1: first person |
M: imperative | P: plural | 2: second person |
E: emphatic | M: masculine | 3: third person |
Check full documentation of Kalmasoft tagset here. |
تعددت وتنوعت الأزمات التي خلفتها الحرب في اليمن وأزمة الانقطاع الكامل لخدمة الكهرباء ضاعفت من معاناة سكان هذه البلاد ودفعتهم نحو مصادر الطاقة البديلة للتخفيف من آثار تلك الأزمة |
KATS: |
toddt wtnwot Al!zmAt Alty KlfthA AlHrb fy Alymn w!zm: AlAnqTAo AlkAml lKdm: AlkhrbA' DAoft mn moAnA: skAn hch AlblAd wdfothm nHw mSAdr AlTAq: Albdyl: lltKfyf mn |xAr tlk Al!zm: |
ID | Token | KATS | Syntax | Arguments | Prefix | Suffix | Gloss* |
---|---|---|---|---|---|---|---|
1 | تعددت | toddt | VPIA | 3PF••• | |||
2 | وتنوعت | wtnwot | VPIA | 053PF••• | PC | ||
3 | الأزمات | Al!zmAt | NNG | ••••PF | PD | ||
4 | التي | Alty | PL | ||||
5 | خلفتها | KlfthA | VPIA | 023SF3SF | |||
6 | الحرب | AlHrb | NNN | ••••S• | PD | ||
7 | في | fy | PP | ||||
8 | اليمن | Alymn | NN•G | ||||
9 | وأزمة | w!zm: | NF•G | ••••SF | PC | ||
10 | الانقطاع | AlAnqTAo | NF•G | 07•SM | PD | ||
11 | الكامل | AlkAml | NA•G | ••••SM | PD | ||
12 | لخدمة | lKdm: | NF•G | ••••SF | |||
13 | الكهرباء | AlkhrbA' | N••G | PD | |||
14 | ضاعفت | DAoft | VPIA | 033SF••• | |||
15 | من | mn | PP | ||||
16 | معاناة | moAnA: | NF•G | 03•SF | |||
17 | سكان | skAn | NQ•G | ••••BM | |||
18 | هذه | hch | PD | ||||
19 | البلاد | AlblAd | N••G | PD | |||
20 | ودفعتهم | wdfothm | VPIA | 013SF3PM | PC | ||
21 | نحو | nHw | NV | ||||
22 | مصادر | mSAdr | NF•A | ••••PM | |||
23 | الطاقة | AlTAq: | N••G | ••••SF | PD | ||
24 | البديلة | Albdyl: | NA•G | ••••SF | PD | ||
25 | للتخفيف | lltKfyf | NF•G | PP | |||
26 | من | mn | PP | ||||
27 | آثار | |xAr | NF•G | ••••B• | |||
28 | تلك | tlk | PD | ||||
29 | الأزمة | Al!zm: | NF•G | ••••SF | |||
(*) These are for reference only, the real module outputs simple version gloss (stems only). (**) Larger xml output sample can be found here XML output sample |
Home » MAPS » MAPS Semantics » Arabic Part of Speech Tagger