Annotator bilbo usage¶

For bibliography (Standard tagging)¶

imp = Importer('resources/corpus/bibl/test_bibl.xml')
doc = imp.parse_xml('bibl')

Bilbo.load('bibl')

bilbo = Bilbo(doc)
bilbo.run_pipeline('tag', '/tmp/output.xml', format_= None)

For bibliography (With Lang Detection tagging)¶

imp = Importer('resources/corpus/bibl/test_bibl.xml')
doc = imp.parse_xml('bibl')

Bilbo.load('bibl_lang')

bilbo = Bilbo(doc')
bilbo.run_pipeline('tag', '/tmp/output.xml', format_= None)

For note¶

imp = Importer('resources/corpus/note/test_note.xml')
doc = imp.parse_xml('note')
bilbo = Bilbo(doc, 'pipeline_note.cfg')
bilbo.run_pipeline('tag', '/tmp/output.xml', format_= None)

Train¶

Just modify tag parameter to train parameter!! Note: output could be some binaries constructed model (They must be specified in pipeline_bibl.cfg not as parameters in run_pipeline() function.

Evaluation (end to end)¶

For evaluate the models just launch bilbo on your datatest annotated as:

imp = Importer('resources/corpus/bibl/data_test.xml')
doc = imp.parse_xml('bibl')
bilbo = Bilbo(doc, 'pipeline_bibl.cfg')
bilbo.run_pipeline('evaluate', None, None)

-----------------------------------------------------------
         label  precision     rappel  f-measure  occurences
-----------------------------------------------------------
          abbr      0.874      0.765      0.816        452
     biblScope      0.887      0.571      0.695        594
     booktitle      0.903      0.629      0.742         89
          date      0.716      0.915      0.803        614
       edition      0.690      0.460      0.552        126
          emph      1.000      1.000      1.000          2
        extent      1.000      0.979      0.989         48
      forename      0.929      0.956      0.942        942
       genName      1.000      1.000      1.000          1
       journal      0.823      0.732      0.774        514
      nameLink      0.282      1.000      0.440         11
       orgName      0.902      0.836      0.868        110
         place      0.824      0.933      0.875         15
      pubPlace      0.962      0.934      0.948        379
     publisher      0.936      0.732      0.821        920
           ref      1.000      0.071      0.133         14
       surname      0.937      0.934      0.936        823
         title      0.868      0.889      0.879       5740
-----------------------------------------------------------
          mean      0.863      0.797      0.828      11394
 weighted mean      0.877      0.852      0.864      11394
-----------------------------------------------------------

Evaluation by component¶

You can evaluate each component. In this case we use bilbo as toolkit usage. Load your annotated data : data format annotated is depended of component used. You have to always generate this data first. And just launch (for svm for instance)

svm.evaluate(input_svm_data_format)