text_quality.page.page

Module Contents

Classes

Page

A wrapper around a PageXML file.

class text_quality.page.page.Page(page_doc: pagexml.model.physical_document_model.PageXMLScan)[source]

A wrapper around a PageXML file.

property id: str[source]

The page id.

lines() List[str][source]

Return lines from page.

get_text()[source]

Get the entire text of the page.

classmethod from_file(file: pathlib.Path)[source]