bilbo.tokenizers package¶
Submodules¶
bilbo.tokenizers.en module¶
-
class
bilbo.tokenizers.en.EnglishTokenizer¶ Bases:
bilbo.tokenizers.tokenizers.DefaultTokenizer-
tokenize(option)¶
-
bilbo.tokenizers.fr module¶
-
class
bilbo.tokenizers.fr.FrenchTokenizer¶ Bases:
bilbo.tokenizers.tokenizers.DefaultTokenizer-
tokenize(text)¶ Tokenize the sentence given in parameter and return a list of tokens. This is a two-steps process: 1. tokenize text using punctuation marks, 2. merge over-tokenized units using the lexicon or a regex (for compounds, ‘^[A-Z][a-z]+-[A-Z][a-z]+$’).
-
bilbo.tokenizers.tokenizers module¶
tokenizer module
-
class
bilbo.tokenizers.tokenizers.DefaultTokenizer¶ Bases:
object-
lexicon= None¶ The dictionary containing the lexicon.
-
loadlist(path)¶ Load a resource list and generate the corresponding regexp part.
-
resources= None¶ The path of the resources folder.
-
tokenize(text)¶
-
-
class
bilbo.tokenizers.tokenizers.Tokenizer¶ Bases:
objectTokenizer class tokenize a given string
Module contents¶
Tokenizers modules