Flags for Deviant Structures

The table below lists the different categories used to annotate nodes in order to indicate probable annotation errors caused by parsing errors, together with the frequency of each flag in the SUC part of the Swedish Treebank. Note that the flags are not mutually exclusive and that a single annotation error often triggers more than one flag. In total, 30,588 sentences have at least one flag, while 43,655 sentences have no flag.

During the manual revision of the gold standard section of SUC, we observed that a correctly annotated sentence never gets a flag. Unfortunately, the inverse implication does not hold, but the absence of flags usually indicates that there are only minor annotation errors in the sentence.

FlagFrequencyExplanationExamples
Unary2764Unary branching nonterminal node with a nonterminal child.
Permitted exception: ROOT with XP child.
Nonterminal2353Node with (probably) incorrect phrase label.Phrase label is ?? (unknown).
Phrase label is XP but the phrase has the typical structure of a more specific phrase (e.g., PR+PA for PP).
Function7318Node with incorrect function label.Function label is ??.
ForbiddenFunction7383Node with function label that does not occur with this phrase type in Talbanken.Preposition with function head (HD) instead of prepositional (PR).
ForbiddenChild13535Node with child whose function label does not occur under this phrase type in Talbanken Subject (SS) under noun phrase (NP).
Functions other than MS and punctuation under ROOT.
ForbiddenSibling27796Node with function label that is incompatible with the function label of a sibling node.Multiple occurrences of IV, SS, OO inside the same phrase.
Formal subject (FS) together with ordinary subject (SS).
ObligatoryChild17942Node whose phrase label requires a child with a specific function label but no such child exists.A noun phrase (NP) without either a head (HD) or at least one conjunct (CJ).
ObligatorySibling592Node whose function label requires a sibling with a specific function label but no such sibling exists.A logical subject (ES) without a formal subject (FS).