<sentence id="2" user="malt" date=""> <word id="1" form="Genom" postag="pp" head="3" deprel="ADV"/> <word id="2" form="skattereformen" postag="nn.utr.sin.def.nom" head="1" deprel="PR"/> <word id="3" form="införs" postag="vb.prs.sfo" head="0" deprel="ROOT"/> <word id="4" form="individuell" postag="jj.pos.utr.sin.ind.nom" head="5" deprel="ATT"/> <word id="5" form="beskattning" postag="nn.utr.sin.ind.nom" head="3" deprel="SUB"/> <word id="6" form="(" postag="pad" head="5" deprel="IP"/> <word id="7" form="särbeskattning" postag="nn.utr.sin.ind.nom" head="5" deprel="APP"/> <word id="8" form=")" postag="pad" head="5" deprel="IP"/> <word id="9" form="av" postag="pp" head="5" deprel="ATT"/> <word id="10" form="arbetsinkomster" postag="nn.utr.plu.ind.nom" head="9" deprel="PR"/> <word id="11" form="." postag="mad" head="3" deprel="IP"/> </sentence>The tagsets used for parts-of-speech and dependency relations must be specified in the header of the XML document. An example document can be found here. An XML schema for Malt-XML treebanks can be found here.
Malt-TAB is a text-based representation, which is mainly used by MaltParser. Malt-TAB contains a subset of the features in Malt-XML, and attributes are implicitly defined by their position. Each word is represented on one line, with attribute values being separated by tabs. The required order of attributes is as follows:
form (required) < postag (required) < head (optional) < deprel (optional)
Although head and deprel are optional, they must either both be included or both be omitted. (Normally, all four columns are present in the input when training the parser and in the output when parsing, while only form and postag are present in the input when parsing.) Please note also that the id attribute is not represented explicitly at all. Words in a sentence are separated by one newline; sentences are separated by one additional newline. A dependency tree for the Swedish sentence "Genom skattereformen införs individuell beskattning (särbeskattning) av arbetsinkomster." can be represented as follows:Genom pp 3 ADV skattereformen nn.utr.sin.def.nom 1 PR införs vb.prs.sfo 0 ROOT individuell jj.pos.utr.sin.ind.nom 5 ATT beskattning nn.utr.sin.ind.nom 3 SUB ( pad 5 IP särbeskattning nn.utr.sin.ind.nom 5 APP ) pad 5 IP av pp 5 ATT arbetsinkomster nn.utr.plu.ind.nom 9 PR . mad 3 IP
An example document can be found here.