Principles for the Manual Revision of Morphological Annotation in Talbanken

The new morphological annotation was manually checked against the guidelines for the SUC tagging (Ejerhed et al, 1992) as well as the usage of the tags in the SUC corpus. Two approaches were used: i) checking of specific word types (a type-based transversal revision) and ii) checking of parts of speech (a part-of-speech-based transversal revision). The first approach was used for frequent function words, which often are ambiguous between different parts of speech. The second approach was used for the rest of the words. In both cases the MAMBA layer of lexical annotation was used as a help in the checking: In cases where both the new SUC tag and the original MAMBA tag agreed on the same analysis, the analysis was not further checked. In cases of disagreement between the tagsets the analysis was further investigated.

Thus, in the first approach, a single word type was checked and corrected over all parts of speech in which it occured, and within each part-of-speech consistency between SUC-tags and MAMBA-tags was checked. In the second approach, which was applied to less frequent words, a list of all words within a specific part of speech together with SUC-tagging and MAMBA-tagging was compiled and checked. In cases of disagreement between SUC-tags and MAMBA-tags, the token was further investigated.

A disagreement between the SUC-tag and the MAMBA-tag could mean an error, and this was always the case between pairs such as noun-verb, but a disagreement could also mean that the different tagsets had chosen different analyses. Examples of this is found by tokens such as andra ("other") and flera ("more, several"). These tokens could according to the SUC scheme appear as adjectives or pronouns in positions where they according to the MAMBA scheme only could appear as pronouns. In such cases, we always followed the SUC scheme.

In the Talbanken part of The Swedish Treebank both the SUC-tagging and the MAMBA-tagging are present on the lexical level.