Knowledge Base are located in resources/ path at the root at bilbo2 . There are splitted in three ways:
They are used to train bilbo automatic annotation. This is annotated data used in supervised machine learning algorithms. XML / TEI corpus are available in 4 langages (pt, fr, de, en) for bibliographies references. Only a mixed corpus of french and english is avalaible for footnote.
List can be simple or with multiwords, you must specifiy type of list in options:
- Authors (fullname, surname, forename).
- Abbreviation (month, page, editor).
Models are splited in two ways (in bibl and note directory). There are contains feature templates pattern (CRF++ format), see documentation. Note that we are used crf++ templates with crf-suite for convenience. A script is handling conversion between the both input format.
In note we are used crf and svm model. To svm model see installation and data format documentation in libsvm README.