text_quality.feature.featurize

Module Contents

Classes

Scorers

A configuration of features and respective Scorers

Featurizer

A collection of scorers to featurize an input text.

class text_quality.feature.featurize.Scorers[source]

Bases: TypedDict

A configuration of features and respective Scorers

dict_score: text_quality.feature.scorer.dictionary.HunspellDictionary[source]
dict_score_gt: text_quality.feature.scorer.dictionary.TokenDictionary[source]
n_gram_score: text_quality.feature.scorer.q_gram.QGram[source]
garbage_score: text_quality.feature.scorer.garbage.GarbageDetector[source]
class text_quality.feature.featurize.Featurizer(scorers: Scorers, tokenizer: text_quality.feature.tokenizer.Tokenizer)[source]

A collection of scorers to featurize an input text.

property features: List[str][source]
featurize(text: str) tuple[dict[str, float], List[str]][source]
featurize_as_dataframe(text: str) tuple[pandas.DataFrame, List[str]][source]
static as_dataframe(features: dict[str, float]) pandas.DataFrame[source]