bilbo.tokenizers package¶
Submodules¶
bilbo.tokenizers.en module¶
-
class
bilbo.tokenizers.en.
EnglishTokenizer
¶ Bases:
bilbo.tokenizers.tokenizers.DefaultTokenizer
-
tokenize
(option)¶
-
bilbo.tokenizers.fr module¶
-
class
bilbo.tokenizers.fr.
FrenchTokenizer
¶ Bases:
bilbo.tokenizers.tokenizers.DefaultTokenizer
-
tokenize
(text)¶ Tokenize the sentence given in parameter and return a list of tokens. This is a two-steps process: 1. tokenize text using punctuation marks, 2. merge over-tokenized units using the lexicon or a regex (for compounds, ‘^[A-Z][a-z]+-[A-Z][a-z]+$’).
-
bilbo.tokenizers.tokenizers module¶
tokenizer module
-
class
bilbo.tokenizers.tokenizers.
DefaultTokenizer
¶ Bases:
object
-
lexicon
= None¶ The dictionary containing the lexicon.
-
loadlist
(path)¶ Load a resource list and generate the corresponding regexp part.
-
resources
= None¶ The path of the resources folder.
-
tokenize
(text)¶
-
-
class
bilbo.tokenizers.tokenizers.
Tokenizer
¶ Bases:
object
Tokenizer class tokenize a given string
Module contents¶
Tokenizers modules