layoutlm.rst
No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Mon, Jul 14, 14:13

layoutlm.rst
View Options

	..
	Copyright 2020 The HuggingFace Team. All rights reserved.

	Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
	an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
	specific language governing permissions and limitations under the License.

	LayoutLM
	-----------------------------------------------------------------------------------------------------------------------

	.. _Overview:

	Overview
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	The LayoutLM model was proposed in the paper `LayoutLM: Pre-training of Text and Layout for Document Image
	Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and
	Ming Zhou. It's a simple but effective pretraining method of text and layout for document image understanding and
	information extraction tasks, such as form understanding and receipt understanding. It obtains state-of-the-art results
	on several downstream tasks:

	- form understanding: the `FUNSD <https://guillaumejaume.github.io/FUNSD/>`__ dataset (a collection of 199 annotated
	forms comprising more than 30,000 words).
	- receipt understanding: the `SROIE <https://rrc.cvc.uab.es/?ch=13>`__ dataset (a collection of 626 receipts for
	training and 347 receipts for testing).
	- document image classification: the `RVL-CDIP <https://www.cs.cmu.edu/~aharley/rvl-cdip/>`__ dataset (a collection of
	400,000 images belonging to one of 16 classes).

	The abstract from the paper is the following:

	*Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the
	widespread use of pretraining models for NLP applications, they almost exclusively focus on text-level manipulation,
	while neglecting layout and style information that is vital for document image understanding. In this paper, we propose
	the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is
	beneficial for a great number of real-world document image understanding tasks such as information extraction from
	scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM.
	To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for
	document-level pretraining. It achieves new state-of-the-art results in several downstream tasks, including form
	understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification
	(from 93.07 to 94.42).*

	Tips:

	- In addition to `input_ids`, :meth:`~transformer.LayoutLMModel.forward` also expects the input :obj:`bbox`, which are
	the bounding boxes (i.e. 2D-positions) of the input tokens. These can be obtained using an external OCR engine such
	as Google's `Tesseract <https://github.com/tesseract-ocr/tesseract>`__ (there's a `Python wrapper
	<https://pypi.org/project/pytesseract/>`__ available). Each bounding box should be in (x0, y0, x1, y1) format, where
	(x0, y0) corresponds to the position of the upper left corner in the bounding box, and (x1, y1) represents the
	position of the lower right corner. Note that one first needs to normalize the bounding boxes to be on a 0-1000
	scale. To normalize, you can use the following function:

	.. code-block::

	def normalize_bbox(bbox, width, height):
	return [
	int(1000 * (bbox[0] / width)),
	int(1000 * (bbox[1] / height)),
	int(1000 * (bbox[2] / width)),
	int(1000 * (bbox[3] / height)),
	]

	Here, :obj:`width` and :obj:`height` correspond to the width and height of the original document in which the token
	occurs. Those can be obtained using the Python Image Library (PIL) library for example, as follows:

	.. code-block::

	from PIL import Image

	image = Image.open("name_of_your_document - can be a png file, pdf, etc.")

	width, height = image.size

	- For a demo which shows how to fine-tune :class:`LayoutLMForTokenClassification` on the `FUNSD dataset
	<https://guillaumejaume.github.io/FUNSD/>`__ (a collection of annotated forms), see `this notebook
	<https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb>`__.
	It includes an inference part, which shows how to use Google's Tesseract on a new document.

	The original code can be found `here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.


	LayoutLMConfig
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.LayoutLMConfig
	:members:


	LayoutLMTokenizer
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.LayoutLMTokenizer
	:members:


	LayoutLMTokenizerFast
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.LayoutLMTokenizerFast
	:members:


	LayoutLMModel
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.LayoutLMModel
	:members:


	LayoutLMForMaskedLM
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.LayoutLMForMaskedLM
	:members:


	LayoutLMForSequenceClassification
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.LayoutLMForSequenceClassification
	:members:


	LayoutLMForTokenClassification
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.LayoutLMForTokenClassification
	:members:

layoutlm.rstNo OneTemporaryActions

File Metadata

layoutlm.rstView Options

Event Timeline

layoutlm.rst
No OneTemporary
Actions

layoutlm.rst
View Options