text_quality.settings

Global settings.

Module Contents

text_quality.settings.MINIMUM_PAGE_LENGTH: int = 5[source]

Shorter texts are considered as empty.

text_quality.settings.EMPTY_PAGE_OUTPUT: int | None = 0[source]

Output value for empty pages. If None, empty pages are handled through the standard pipeline.

text_quality.settings.SHORT_COLUMN_WIDTH: int = 5[source]

If all lines (columns) in a page are shorter than this it is considered broken.

text_quality.settings.ENCODING = 'utf-8'[source]

Encoding to be used throughout all text file processing operations.

text_quality.settings.LOG_LEVEL[source]
text_quality.settings.LINE_SEPARATOR[source]
text_quality.settings.Q_GRAM_LENGTH: int[source]
text_quality.settings.Q_GRAMS_GAMMA: int[source]
text_quality.settings.SOURCE_DIR[source]
text_quality.settings.DATA_DIR[source]
text_quality.settings.DICTS_DIR[source]
text_quality.settings.HUNSPELL_DIR[source]
text_quality.settings.QGRAMS_DIR[source]
text_quality.settings.CLASSIFIER_DIR[source]
text_quality.settings.DEFAULT_LANGUAGE = 'nl'[source]
text_quality.settings.HUNSPELL_LANGUAGE[source]
text_quality.settings.TOKEN_DICT_FILE: pathlib.Path[source]
text_quality.settings.QGRAMS_FILE: pathlib.Path[source]
text_quality.settings.PIPELINE_FILE: pathlib.Path[source]