testing.rst
No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Thu, Jul 3, 16:29

testing.rst
View Options

	..
	Copyright 2020 The HuggingFace Team. All rights reserved.

	Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
	an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
	specific language governing permissions and limitations under the License.

	Testing
	=======================================================================================================================


	Let's take a look at how 🤗 Transformer models are tested and how you can write new tests and improve the existing ones.

	There are 2 test suites in the repository:

	1. ``tests`` -- tests for the general API
	2. ``examples`` -- tests primarily for various applications that aren't part of the API

	How transformers are tested
	-----------------------------------------------------------------------------------------------------------------------

	1. Once a PR is submitted it gets tested with 9 CircleCi jobs. Every new commit to that PR gets retested. These jobs
	are defined in this :prefix_link:`config file <.circleci/config.yml>`, so that if needed you can reproduce the same
	environment on your machine.

	These CI jobs don't run ``@slow`` tests.

	2. There are 3 jobs run by `github actions <https://github.com/huggingface/transformers/actions>`__:

	* :prefix_link:`torch hub integration <.github/workflows/github-torch-hub.yml>`: checks whether torch hub
	integration works.

	* :prefix_link:`self-hosted (push) <.github/workflows/self-push.yml>`: runs fast tests on GPU only on commits on
	``master``. It only runs if a commit on ``master`` has updated the code in one of the following folders: ``src``,
	``tests``, ``.github`` (to prevent running on added model cards, notebooks, etc.)

	* :prefix_link:`self-hosted runner <.github/workflows/self-scheduled.yml>`: runs normal and slow tests on GPU in
	``tests`` and ``examples``:

	.. code-block:: bash

	RUN_SLOW=1 pytest tests/
	RUN_SLOW=1 pytest examples/

	The results can be observed `here <https://github.com/huggingface/transformers/actions>`__.



	Running tests
	-----------------------------------------------------------------------------------------------------------------------





	Choosing which tests to run
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	This document goes into many details of how tests can be run. If after reading everything, you need even more details
	you will find them `here <https://docs.pytest.org/en/latest/usage.html>`__.

	Here are some most useful ways of running tests.

	Run all:

	.. code-block:: console

	pytest

	or:

	.. code-block:: bash

	make test

	Note that the latter is defined as:

	.. code-block:: bash

	python -m pytest -n auto --dist=loadfile -s -v ./tests/

	which tells pytest to:

	* run as many test processes as they are CPU cores (which could be too many if you don't have a ton of RAM!)
	* ensure that all tests from the same file will be run by the same test process
	* do not capture output
	* run in verbose mode



	Getting the list of all tests
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	All tests of the test suite:

	.. code-block:: bash

	pytest --collect-only -q

	All tests of a given test file:

	.. code-block:: bash

	pytest tests/test_optimization.py --collect-only -q



	Run a specific test module
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	To run an individual test module:

	.. code-block:: bash

	pytest tests/test_logging.py


	Run specific tests
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Since unittest is used inside most of the tests, to run specific subtests you need to know the name of the unittest
	class containing those tests. For example, it could be:

	.. code-block:: bash

	pytest tests/test_optimization.py::OptimizationTest::test_adam_w

	Here:

	* ``tests/test_optimization.py`` - the file with tests
	* ``OptimizationTest`` - the name of the class
	* ``test_adam_w`` - the name of the specific test function

	If the file contains multiple classes, you can choose to run only tests of a given class. For example:

	.. code-block:: bash

	pytest tests/test_optimization.py::OptimizationTest


	will run all the tests inside that class.

	As mentioned earlier you can see what tests are contained inside the ``OptimizationTest`` class by running:

	.. code-block:: bash

	pytest tests/test_optimization.py::OptimizationTest --collect-only -q


	You can run tests by keyword expressions.

	To run only tests whose name contains ``adam``:

	.. code-block:: bash

	pytest -k adam tests/test_optimization.py

	To run all tests except those whose name contains ``adam``:

	.. code-block:: bash

	pytest -k "not adam" tests/test_optimization.py

	And you can combine the two patterns in one:


	.. code-block:: bash

	pytest -k "ada and not adam" tests/test_optimization.py



	Run only modified tests
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	You can run the tests related to the unstaged files or the current branch (according to Git) by using `pytest-picked
	<https://github.com/anapaulagomes/pytest-picked>`__. This is a great way of quickly testing your changes didn't break
	anything, since it won't run the tests related to files you didn't touch.

	.. code-block:: bash

	pip install pytest-picked

	.. code-block:: bash

	pytest --picked

	All tests will be run from files and folders which are modified, but not yet committed.

	Automatically rerun failed tests on source modification
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	`pytest-xdist <https://github.com/pytest-dev/pytest-xdist>`__ provides a very useful feature of detecting all failed
	tests, and then waiting for you to modify files and continuously re-rerun those failing tests until they pass while you
	fix them. So that you don't need to re start pytest after you made the fix. This is repeated until all tests pass after
	which again a full run is performed.

	.. code-block:: bash

	pip install pytest-xdist

	To enter the mode: ``pytest -f`` or ``pytest --looponfail``

	File changes are detected by looking at ``looponfailroots`` root directories and all of their contents (recursively).
	If the default for this value does not work for you, you can change it in your project by setting a configuration
	option in ``setup.cfg``:

	.. code-block:: ini

	[tool:pytest]
	looponfailroots = transformers tests

	or ``pytest.ini``/``tox.ini`` files:

	.. code-block:: ini

	[pytest]
	looponfailroots = transformers tests

	This would lead to only looking for file changes in the respective directories, specified relatively to the ini-file’s
	directory.

	`pytest-watch <https://github.com/joeyespo/pytest-watch>`__ is an alternative implementation of this functionality.


	Skip a test module
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	If you want to run all test modules, except a few you can exclude them by giving an explicit list of tests to run. For
	example, to run all except ``test_modeling_*.py`` tests:

	.. code-block:: bash

	pytest `ls -1 tests/*py \| grep -v test_modeling`


	Clearing state
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	CI builds and when isolation is important (against speed), cache should be cleared:

	.. code-block:: bash

	pytest --cache-clear tests

	Running tests in parallel
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	As mentioned earlier ``make test`` runs tests in parallel via ``pytest-xdist`` plugin (``-n X`` argument, e.g. ``-n 2``
	to run 2 parallel jobs).

	``pytest-xdist``'s ``--dist=`` option allows one to control how the tests are grouped. ``--dist=loadfile`` puts the
	tests located in one file onto the same process.

	Since the order of executed tests is different and unpredictable, if running the test suite with ``pytest-xdist``
	produces failures (meaning we have some undetected coupled tests), use `pytest-replay
	<https://github.com/ESSS/pytest-replay>`__ to replay the tests in the same order, which should help with then somehow
	reducing that failing sequence to a minimum.

	Test order and repetition
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	It's good to repeat the tests several times, in sequence, randomly, or in sets, to detect any potential
	inter-dependency and state-related bugs (tear down). And the straightforward multiple repetition is just good to detect
	some problems that get uncovered by randomness of DL.


	Repeat tests
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	* `pytest-flakefinder <https://github.com/dropbox/pytest-flakefinder>`__:

	.. code-block:: bash

	pip install pytest-flakefinder

	And then run every test multiple times (50 by default):

	.. code-block:: bash

	pytest --flake-finder --flake-runs=5 tests/test_failing_test.py

	.. note::
	This plugin doesn't work with ``-n`` flag from ``pytest-xdist``.

	.. note::
	There is another plugin ``pytest-repeat``, but it doesn't work with ``unittest``.


	Run tests in a random order
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	.. code-block:: bash

	pip install pytest-random-order

	Important: the presence of ``pytest-random-order`` will automatically randomize tests, no configuration change or
	command line options is required.

	As explained earlier this allows detection of coupled tests - where one test's state affects the state of another. When
	``pytest-random-order`` is installed it will print the random seed it used for that session, e.g:

	.. code-block:: bash

	pytest tests
	[...]
	Using --random-order-bucket=module
	Using --random-order-seed=573663

	So that if the given particular sequence fails, you can reproduce it by adding that exact seed, e.g.:

	.. code-block:: bash

	pytest --random-order-seed=573663
	[...]
	Using --random-order-bucket=module
	Using --random-order-seed=573663

	It will only reproduce the exact order if you use the exact same list of tests (or no list at all). Once you start to
	manually narrowing down the list you can no longer rely on the seed, but have to list them manually in the exact order
	they failed and tell pytest to not randomize them instead using ``--random-order-bucket=none``, e.g.:

	.. code-block:: bash

	pytest --random-order-bucket=none tests/test_a.py tests/test_c.py tests/test_b.py

	To disable the shuffling for all tests:

	.. code-block:: bash

	pytest --random-order-bucket=none

	By default ``--random-order-bucket=module`` is implied, which will shuffle the files on the module levels. It can also
	shuffle on ``class``, ``package``, ``global`` and ``none`` levels. For the complete details please see its
	`documentation <https://github.com/jbasko/pytest-random-order>`__.

	Another randomization alternative is: ``pytest-randomly`` <https://github.com/pytest-dev/pytest-randomly>`__. This
	module has a very similar functionality/interface, but it doesn't have the bucket modes available in
	``pytest-random-order``. It has the same problem of imposing itself once installed.

	Look and feel variations
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	pytest-sugar
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	`pytest-sugar <https://github.com/Frozenball/pytest-sugar>`__ is a plugin that improves the look-n-feel, adds a
	progressbar, and show tests that fail and the assert instantly. It gets activated automatically upon installation.

	.. code-block:: bash

	pip install pytest-sugar

	To run tests without it, run:

	.. code-block:: bash

	pytest -p no:sugar

	or uninstall it.



	Report each sub-test name and its progress
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	For a single or a group of tests via ``pytest`` (after ``pip install pytest-pspec``):

	.. code-block:: bash

	pytest --pspec tests/test_optimization.py



	Instantly shows failed tests
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	`pytest-instafail <https://github.com/pytest-dev/pytest-instafail>`__ shows failures and errors instantly instead of
	waiting until the end of test session.

	.. code-block:: bash

	pip install pytest-instafail

	.. code-block:: bash

	pytest --instafail

	To GPU or not to GPU
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	On a GPU-enabled setup, to test in CPU-only mode add ``CUDA_VISIBLE_DEVICES=""``:

	.. code-block:: bash

	CUDA_VISIBLE_DEVICES="" pytest tests/test_logging.py

	or if you have multiple gpus, you can specify which one is to be used by ``pytest``. For example, to use only the
	second gpu if you have gpus ``0`` and ``1``, you can run:

	.. code-block:: bash

	CUDA_VISIBLE_DEVICES="1" pytest tests/test_logging.py

	This is handy when you want to run different tasks on different GPUs.

	Some tests must be run on CPU-only, others on either CPU or GPU or TPU, yet others on multiple-GPUs. The following skip
	decorators are used to set the requirements of tests CPU/GPU/TPU-wise:

	* ``require_torch`` - this test will run only under torch
	* ``require_torch_gpu`` - as ``require_torch`` plus requires at least 1 GPU
	* ``require_torch_multi_gpu`` - as ``require_torch`` plus requires at least 2 GPUs
	* ``require_torch_non_multi_gpu`` - as ``require_torch`` plus requires 0 or 1 GPUs
	* ``require_torch_tpu`` - as ``require_torch`` plus requires at least 1 TPU

	Let's depict the GPU requirements in the following table:


	+----------+----------------------------------+
	\| n gpus \| decorator \|
	+==========+==================================+
	\| ``>= 0`` \| ``@require_torch`` \|
	+----------+----------------------------------+
	\| ``>= 1`` \| ``@require_torch_gpu`` \|
	+----------+----------------------------------+
	\| ``>= 2`` \| ``@require_torch_multi_gpu`` \|
	+----------+----------------------------------+
	\| ``< 2`` \| ``@require_torch_non_multi_gpu`` \|
	+----------+----------------------------------+


	For example, here is a test that must be run only when there are 2 or more GPUs available and pytorch is installed:

	.. code-block:: python

	@require_torch_multi_gpu
	def test_example_with_multi_gpu():

	If a test requires ``tensorflow`` use the ``require_tf`` decorator. For example:

	.. code-block:: python

	@require_tf
	def test_tf_thing_with_tensorflow():

	These decorators can be stacked. For example, if a test is slow and requires at least one GPU under pytorch, here is
	how to set it up:

	.. code-block:: python

	@require_torch_gpu
	@slow
	def test_example_slow_on_gpu():

	Some decorators like ``@parametrized`` rewrite test names, therefore ``@require_*`` skip decorators have to be listed
	last for them to work correctly. Here is an example of the correct usage:

	.. code-block:: python

	@parameterized.expand(...)
	@require_torch_multi_gpu
	def test_integration_foo():

	This order problem doesn't exist with ``@pytest.mark.parametrize``, you can put it first or last and it will still
	work. But it only works with non-unittests.

	Inside tests:

	* How many GPUs are available:

	.. code-block:: bash

	from transformers.testing_utils import get_gpu_count
	n_gpu = get_gpu_count() # works with torch and tf



	Distributed training
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	``pytest`` can't deal with distributed training directly. If this is attempted - the sub-processes don't do the right
	thing and end up thinking they are ``pytest`` and start running the test suite in loops. It works, however, if one
	spawns a normal process that then spawns off multiple workers and manages the IO pipes.

	This is still under development but you can study 2 different tests that perform this successfully:

	* :prefix_link:`test_seq2seq_examples_multi_gpu.py <examples/seq2seq/test_seq2seq_examples_multi_gpu.py>` - a
	``pytorch-lightning``-running test (had to use PL's ``ddp`` spawning method which is the default)
	* :prefix_link:`test_finetune_trainer.py <examples/seq2seq/test_finetune_trainer.py>` - a normal (non-PL) test

	To jump right into the execution point, search for the ``execute_subprocess_async`` function in those tests.

	You will need at least 2 GPUs to see these tests in action:

	.. code-block:: bash

	CUDA_VISIBLE_DEVICES="0,1" RUN_SLOW=1 pytest -sv examples/seq2seq/test_finetune_trainer.py \
	examples/seq2seq/test_seq2seq_examples_multi_gpu.py


	Output capture
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	During test execution any output sent to ``stdout`` and ``stderr`` is captured. If a test or a setup method fails, its
	according captured output will usually be shown along with the failure traceback.

	To disable output capturing and to get the ``stdout`` and ``stderr`` normally, use ``-s`` or ``--capture=no``:

	.. code-block:: bash

	pytest -s tests/test_logging.py

	To send test results to JUnit format output:

	.. code-block:: bash

	py.test tests --junitxml=result.xml


	Color control
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	To have no color (e.g., yellow on white background is not readable):

	.. code-block:: bash

	pytest --color=no tests/test_logging.py



	Sending test report to online pastebin service
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Creating a URL for each test failure:

	.. code-block:: bash

	pytest --pastebin=failed tests/test_logging.py

	This will submit test run information to a remote Paste service and provide a URL for each failure. You may select
	tests as usual or add for example -x if you only want to send one particular failure.

	Creating a URL for a whole test session log:

	.. code-block:: bash

	pytest --pastebin=all tests/test_logging.py



	Writing tests
	-----------------------------------------------------------------------------------------------------------------------

	🤗 transformers tests are based on ``unittest``, but run by ``pytest``, so most of the time features from both systems
	can be used.

	You can read `here <https://docs.pytest.org/en/stable/unittest.html>`__ which features are supported, but the important
	thing to remember is that most ``pytest`` fixtures don't work. Neither parametrization, but we use the module
	``parameterized`` that works in a similar way.


	Parametrization
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Often, there is a need to run the same test multiple times, but with different arguments. It could be done from within
	the test, but then there is no way of running that test for just one set of arguments.

	.. code-block:: python

	# test_this1.py
	import unittest
	from parameterized import parameterized
	class TestMathUnitTest(unittest.TestCase):
	@parameterized.expand([
	("negative", -1.5, -2.0),
	("integer", 1, 1.0),
	("large fraction", 1.6, 1),
	])
	def test_floor(self, name, input, expected):
	assert_equal(math.floor(input), expected)

	Now, by default this test will be run 3 times, each time with the last 3 arguments of ``test_floor`` being assigned the
	corresponding arguments in the parameter list.

	and you could run just the ``negative`` and ``integer`` sets of params with:

	.. code-block:: bash

	pytest -k "negative and integer" tests/test_mytest.py

	or all but ``negative`` sub-tests, with:

	.. code-block:: bash

	pytest -k "not negative" tests/test_mytest.py

	Besides using the ``-k`` filter that was just mentioned, you can find out the exact name of each sub-test and run any
	or all of them using their exact names.

	.. code-block:: bash

	pytest test_this1.py --collect-only -q

	and it will list:

	.. code-block:: bash

	test_this1.py::TestMathUnitTest::test_floor_0_negative
	test_this1.py::TestMathUnitTest::test_floor_1_integer
	test_this1.py::TestMathUnitTest::test_floor_2_large_fraction

	So now you can run just 2 specific sub-tests:

	.. code-block:: bash

	pytest test_this1.py::TestMathUnitTest::test_floor_0_negative test_this1.py::TestMathUnitTest::test_floor_1_integer

	The module `parameterized <https://pypi.org/project/parameterized/>`__ which is already in the developer dependencies
	of ``transformers`` works for both: ``unittests`` and ``pytest`` tests.

	If, however, the test is not a ``unittest``, you may use ``pytest.mark.parametrize`` (or you may see it being used in
	some existing tests, mostly under ``examples``).

	Here is the same example, this time using ``pytest``'s ``parametrize`` marker:

	.. code-block:: python

	# test_this2.py
	import pytest
	@pytest.mark.parametrize(
	"name, input, expected",
	[
	("negative", -1.5, -2.0),
	("integer", 1, 1.0),
	("large fraction", 1.6, 1),
	],
	)
	def test_floor(name, input, expected):
	assert_equal(math.floor(input), expected)

	Same as with ``parameterized``, with ``pytest.mark.parametrize`` you can have a fine control over which sub-tests are
	run, if the ``-k`` filter doesn't do the job. Except, this parametrization function creates a slightly different set of
	names for the sub-tests. Here is what they look like:

	.. code-block:: bash

	pytest test_this2.py --collect-only -q

	and it will list:

	.. code-block:: bash

	test_this2.py::test_floor[integer-1-1.0]
	test_this2.py::test_floor[negative--1.5--2.0]
	test_this2.py::test_floor[large fraction-1.6-1]

	So now you can run just the specific test:

	.. code-block:: bash

	pytest test_this2.py::test_floor[negative--1.5--2.0] test_this2.py::test_floor[integer-1-1.0]

	as in the previous example.



	Files and directories
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	In tests often we need to know where things are relative to the current test file, and it's not trivial since the test
	could be invoked from more than one directory or could reside in sub-directories with different depths. A helper class
	:obj:`transformers.test_utils.TestCasePlus` solves this problem by sorting out all the basic paths and provides easy
	accessors to them:

	* ``pathlib`` objects (all fully resolved):

	- ``test_file_path`` - the current test file path, i.e. ``__file__``
	- ``test_file_dir`` - the directory containing the current test file
	- ``tests_dir`` - the directory of the ``tests`` test suite
	- ``examples_dir`` - the directory of the ``examples`` test suite
	- ``repo_root_dir`` - the directory of the repository
	- ``src_dir`` - the directory of ``src`` (i.e. where the ``transformers`` sub-dir resides)

	* stringified paths---same as above but these return paths as strings, rather than ``pathlib`` objects:

	- ``test_file_path_str``
	- ``test_file_dir_str``
	- ``tests_dir_str``
	- ``examples_dir_str``
	- ``repo_root_dir_str``
	- ``src_dir_str``

	To start using those all you need is to make sure that the test resides in a subclass of
	:obj:`transformers.test_utils.TestCasePlus`. For example:

	.. code-block:: python

	from transformers.testing_utils import TestCasePlus
	class PathExampleTest(TestCasePlus):
	def test_something_involving_local_locations(self):
	data_dir = self.examples_dir / "seq2seq/test_data/wmt_en_ro"

	If you don't need to manipulated paths via ``pathlib`` or you just need a path as a string, you can always invoked
	``str()`` on the ``pathlib`` oboject or use the accessors ending with ``_str``. For example:

	.. code-block:: python

	from transformers.testing_utils import TestCasePlus
	class PathExampleTest(TestCasePlus):
	def test_something_involving_stringified_locations(self):
	examples_dir = self.examples_dir_str




	Temporary files and directories
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Using unique temporary files and directories are essential for parallel test running, so that the tests won't overwrite
	each other's data. Also we want to get the temporary files and directories removed at the end of each test that created
	them. Therefore, using packages like ``tempfile``, which address these needs is essential.

	However, when debugging tests, you need to be able to see what goes into the temporary file or directory and you want
	to know it's exact path and not having it randomized on every test re-run.

	A helper class :obj:`transformers.test_utils.TestCasePlus` is best used for such purposes. It's a sub-class of
	:obj:`unittest.TestCase`, so we can easily inherit from it in the test modules.

	Here is an example of its usage:

	.. code-block:: python

	from transformers.testing_utils import TestCasePlus
	class ExamplesTests(TestCasePlus):
	def test_whatever(self):
	tmp_dir = self.get_auto_remove_tmp_dir()

	This code creates a unique temporary directory, and sets :obj:`tmp_dir` to its location.

	* Create a unique temporary dir:

	.. code-block:: python

	def test_whatever(self):
	tmp_dir = self.get_auto_remove_tmp_dir()

	``tmp_dir`` will contain the path to the created temporary dir. It will be automatically removed at the end of the
	test.

	* Create a temporary dir of my choice, ensure it's empty before the test starts and don't empty it after the test.

	.. code-block:: python

	def test_whatever(self):
	tmp_dir = self.get_auto_remove_tmp_dir("./xxx")

	This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests didn't
	leave any data in there.

	* You can override the default behavior by directly overriding the ``before`` and ``after`` args, leading to one of the
	following behaviors:

	- ``before=True``: the temporary dir will always be cleared at the beginning of the test.
	- ``before=False``: if the temporary dir already existed, any existing files will remain there.
	- ``after=True``: the temporary dir will always be deleted at the end of the test.
	- ``after=False``: the temporary dir will always be left intact at the end of the test.

	.. note::
	In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are allowed if
	an explicit obj:`tmp_dir` is used, so that by mistake no ``/tmp`` or similar important part of the filesystem will
	get nuked. i.e. please always pass paths that start with ``./``.

	.. note::
	Each test can register multiple temporary directories and they all will get auto-removed, unless requested
	otherwise.


	Skipping tests
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	This is useful when a bug is found and a new test is written, yet the bug is not fixed yet. In order to be able to
	commit it to the main repository we need make sure it's skipped during ``make test``.

	Methods:

	- A skip means that you expect your test to pass only if some conditions are met, otherwise pytest should skip
	running the test altogether. Common examples are skipping windows-only tests on non-windows platforms, or skipping
	tests that depend on an external resource which is not available at the moment (for example a database).

	- A xfail means that you expect a test to fail for some reason. A common example is a test for a feature not yet
	implemented, or a bug not yet fixed. When a test passes despite being expected to fail (marked with
	pytest.mark.xfail), it’s an xpass and will be reported in the test summary.

	One of the important differences between the two is that ``skip`` doesn't run the test, and ``xfail`` does. So if the
	code that's buggy causes some bad state that will affect other tests, do not use ``xfail``.

	Implementation
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	- Here is how to skip whole test unconditionally:

	.. code-block:: python

	@unittest.skip("this bug needs to be fixed")
	def test_feature_x():

	or via pytest:

	.. code-block:: python

	@pytest.mark.skip(reason="this bug needs to be fixed")

	or the ``xfail`` way:

	.. code-block:: python

	@pytest.mark.xfail
	def test_feature_x():

	- Here is how to skip a test based on some internal check inside the test:

	.. code-block:: python

	def test_feature_x():
	if not has_something():
	pytest.skip("unsupported configuration")

	or the whole module:

	.. code-block:: python

	import pytest
	if not pytest.config.getoption("--custom-flag"):
	pytest.skip("--custom-flag is missing, skipping tests", allow_module_level=True)

	or the ``xfail`` way:

	.. code-block:: python

	def test_feature_x():
	pytest.xfail("expected to fail until bug XYZ is fixed")

	- Here is how to skip all tests in a module if some import is missing:

	.. code-block:: python

	docutils = pytest.importorskip("docutils", minversion="0.3")

	- Skip a test based on a condition:

	.. code-block:: python

	@pytest.mark.skipif(sys.version_info < (3,6), reason="requires python3.6 or higher")
	def test_feature_x():

	or:

	.. code-block:: python

	@unittest.skipIf(torch_device == "cpu", "Can't do half precision")
	def test_feature_x():

	or skip the whole module:

	.. code-block:: python

	@pytest.mark.skipif(sys.platform == 'win32', reason="does not run on windows")
	class TestClass():
	def test_feature_x(self):

	More details, example and ways are `here <https://docs.pytest.org/en/latest/skipping.html>`__.

	Slow tests
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	The library of tests is ever-growing, and some of the tests take minutes to run, therefore we can't afford waiting for
	an hour for the test suite to complete on CI. Therefore, with some exceptions for essential tests, slow tests should be
	marked as in the example below:

	.. code-block:: python

	from transformers.testing_utils import slow
	@slow
	def test_integration_foo():

	Once a test is marked as ``@slow``, to run such tests set ``RUN_SLOW=1`` env var, e.g.:

	.. code-block:: bash

	RUN_SLOW=1 pytest tests

	Some decorators like ``@parameterized`` rewrite test names, therefore ``@slow`` and the rest of the skip decorators
	``@require_*`` have to be listed last for them to work correctly. Here is an example of the correct usage:

	.. code-block:: python

	@parameterized.expand(...)
	@slow
	def test_integration_foo():

	As explained at the beginning of this document, slow tests get to run on a scheduled basis, rather than in PRs CI
	checks. So it's possible that some problems will be missed during a PR submission and get merged. Such problems will
	get caught during the next scheduled CI job. But it also means that it's important to run the slow tests on your
	machine before submitting the PR.

	Here is a rough decision making mechanism for choosing which tests should be marked as slow:

	If the test is focused on one of the library's internal components (e.g., modeling files, tokenization files,
	pipelines), then we should run that test in the non-slow test suite. If it's focused on an other aspect of the library,
	such as the documentation or the examples, then we should run these tests in the slow test suite. And then, to refine
	this approach we should have exceptions:

	* All tests that need to download a heavy set of weights or a dataset that is larger than ~50MB (e.g., model or
	tokenizer integration tests, pipeline integration tests) should be set to slow. If you're adding a new model, you
	should create and upload to the hub a tiny version of it (with random weights) for integration tests. This is
	discussed in the following paragraphs.
	* All tests that need to do a training not specifically optimized to be fast should be set to slow.
	* We can introduce exceptions if some of these should-be-non-slow tests are excruciatingly slow, and set them to
	``@slow``. Auto-modeling tests, which save and load large files to disk, are a good example of tests that are marked
	as ``@slow``.
	* If a test completes under 1 second on CI (including downloads if any) then it should be a normal test regardless.

	Collectively, all the non-slow tests need to cover entirely the different internals, while remaining fast. For example,
	a significant coverage can be achieved by testing with specially created tiny models with random weights. Such models
	have the very minimal number of layers (e.g., 2), vocab size (e.g., 1000), etc. Then the ``@slow`` tests can use large
	slow models to do qualitative testing. To see the use of these simply look for tiny models with:

	.. code-block:: bash

	grep tiny tests examples

	Here is a an example of a :prefix_link:`script <scripts/fsmt/fsmt-make-tiny-model.py>` that created the tiny model
	`stas/tiny-wmt19-en-de <https://huggingface.co/stas/tiny-wmt19-en-de>`__. You can easily adjust it to your specific
	model's architecture.

	It's easy to measure the run-time incorrectly if for example there is an overheard of downloading a huge model, but if
	you test it locally the downloaded files would be cached and thus the download time not measured. Hence check the
	execution speed report in CI logs instead (the output of ``pytest --durations=0 tests``).

	That report is also useful to find slow outliers that aren't marked as such, or which need to be re-written to be fast.
	If you notice that the test suite starts getting slow on CI, the top listing of this report will show the slowest
	tests.


	Testing the stdout/stderr output
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	In order to test functions that write to ``stdout`` and/or ``stderr``, the test can access those streams using the
	``pytest``'s `capsys system <https://docs.pytest.org/en/latest/capture.html>`__. Here is how this is accomplished:

	.. code-block:: python

	import sys
	def print_to_stdout(s): print(s)
	def print_to_stderr(s): sys.stderr.write(s)
	def test_result_and_stdout(capsys):
	msg = "Hello"
	print_to_stdout(msg)
	print_to_stderr(msg)
	out, err = capsys.readouterr() # consume the captured output streams
	# optional: if you want to replay the consumed streams:
	sys.stdout.write(out)
	sys.stderr.write(err)
	# test:
	assert msg in out
	assert msg in err

	And, of course, most of the time, ``stderr`` will come as a part of an exception, so try/except has to be used in such
	a case:

	.. code-block:: python

	def raise_exception(msg): raise ValueError(msg)
	def test_something_exception():
	msg = "Not a good value"
	error = ''
	try:
	raise_exception(msg)
	except Exception as e:
	error = str(e)
	assert msg in error, f"{msg} is in the exception:\n{error}"

	Another approach to capturing stdout is via ``contextlib.redirect_stdout``:

	.. code-block:: python

	from io import StringIO
	from contextlib import redirect_stdout
	def print_to_stdout(s): print(s)
	def test_result_and_stdout():
	msg = "Hello"
	buffer = StringIO()
	with redirect_stdout(buffer):
	print_to_stdout(msg)
	out = buffer.getvalue()
	# optional: if you want to replay the consumed streams:
	sys.stdout.write(out)
	# test:
	assert msg in out

	An important potential issue with capturing stdout is that it may contain ``\r`` characters that in normal ``print``
	reset everything that has been printed so far. There is no problem with ``pytest``, but with ``pytest -s`` these
	characters get included in the buffer, so to be able to have the test run with and without ``-s``, you have to make an
	extra cleanup to the captured output, using ``re.sub(r'~.*\r', '', buf, 0, re.M)``.

	But, then we have a helper context manager wrapper to automatically take care of it all, regardless of whether it has
	some ``\r``'s in it or not, so it's a simple:

	.. code-block:: python

	from transformers.testing_utils import CaptureStdout
	with CaptureStdout() as cs:
	function_that_writes_to_stdout()
	print(cs.out)

	Here is a full test example:

	.. code-block:: python

	from transformers.testing_utils import CaptureStdout
	msg = "Secret message\r"
	final = "Hello World"
	with CaptureStdout() as cs:
	print(msg + final)
	assert cs.out == final+"\n", f"captured: {cs.out}, expecting {final}"

	If you'd like to capture ``stderr`` use the :obj:`CaptureStderr` class instead:

	.. code-block:: python

	from transformers.testing_utils import CaptureStderr
	with CaptureStderr() as cs:
	function_that_writes_to_stderr()
	print(cs.err)

	If you need to capture both streams at once, use the parent :obj:`CaptureStd` class:

	.. code-block:: python

	from transformers.testing_utils import CaptureStd
	with CaptureStd() as cs:
	function_that_writes_to_stdout_and_stderr()
	print(cs.err, cs.out)



	Capturing logger stream
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	If you need to validate the output of a logger, you can use :obj:`CaptureLogger`:

	.. code-block:: python

	from transformers import logging
	from transformers.testing_utils import CaptureLogger

	msg = "Testing 1, 2, 3"
	logging.set_verbosity_info()
	logger = logging.get_logger("transformers.models.bart.tokenization_bart")
	with CaptureLogger(logger) as cl:
	logger.info(msg)
	assert cl.out, msg+"\n"


	Testing with environment variables
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	If you want to test the impact of environment variables for a specific test you can use a helper decorator
	``transformers.testing_utils.mockenv``

	.. code-block:: python

	from transformers.testing_utils import mockenv
	class HfArgumentParserTest(unittest.TestCase):
	@mockenv(TRANSFORMERS_VERBOSITY="error")
	def test_env_override(self):
	env_level_str = os.getenv("TRANSFORMERS_VERBOSITY", None)

	At times an external program needs to be called, which requires setting ``PYTHONPATH`` in ``os.environ`` to include
	multiple local paths. A helper class :obj:`transformers.test_utils.TestCasePlus` comes to help:

	.. code-block:: python

	from transformers.testing_utils import TestCasePlus
	class EnvExampleTest(TestCasePlus):
	def test_external_prog(self):
	env = self.get_env()
	# now call the external program, passing ``env`` to it

	Depending on whether the test file was under the ``tests`` test suite or ``examples`` it'll correctly set up
	``env[PYTHONPATH]`` to include one of these two directories, and also the ``src`` directory to ensure the testing is
	done against the current repo, and finally with whatever ``env[PYTHONPATH]`` was already set to before the test was
	called if anything.

	This helper method creates a copy of the ``os.environ`` object, so the original remains intact.


	Getting reproducible results
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	In some situations you may want to remove randomness for your tests. To get identical reproducable results set, you
	will need to fix the seed:

	.. code-block:: python

	seed = 42

	# python RNG
	import random
	random.seed(seed)

	# pytorch RNGs
	import torch
	torch.manual_seed(seed)
	torch.backends.cudnn.deterministic = True
	if torch.cuda.is_available(): torch.cuda.manual_seed_all(seed)

	# numpy RNG
	import numpy as np
	np.random.seed(seed)

	# tf RNG
	tf.random.set_seed(seed)

	Debugging tests
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	To start a debugger at the point of the warning, do this:

	.. code-block:: bash

	pytest tests/test_logging.py -W error::UserWarning --pdb



	Testing Experimental CI Features
	-----------------------------------------------------------------------------------------------------------------------

	Testing CI features can be potentially problematic as it can interfere with the normal CI functioning. Therefore if a
	new CI feature is to be added, it should be done as following.

	1. Create a new dedicated job that tests what needs to be tested
	2. The new job must always succeed so that it gives us a green ✓ (details below).
	3. Let it run for some days to see that a variety of different PR types get to run on it (user fork branches,
	non-forked branches, branches originating from github.com UI direct file edit, various forced pushes, etc. - there
	are so many) while monitoring the experimental job's logs (not the overall job green as it's purposefully always
	green)
	4. When it's clear that everything is solid, then merge the new changes into existing jobs.

	That way experiments on CI functionality itself won't interfere with the normal workflow.

	Now how can we make the job always succeed while the new CI feature is being developed?

	Some CIs, like TravisCI support ignore-step-failure and will report the overall job as successful, but CircleCI and
	Github Actions as of this writing don't support that.

	So the following workaround can be used:

	1. ``set +euo pipefail`` at the beginning of the run command to suppress most potential failures in the bash script.
	2. the last command must be a success: ``echo "done"`` or just ``true`` will do

	Here is an example:

	.. code-block:: yaml

	- run:
	name: run CI experiment
	command: \|
	set +euo pipefail
	echo "setting run-all-despite-any-errors-mode"
	this_command_will_fail
	echo "but bash continues to run"
	# emulate another failure
	false
	# but the last command must be a success
	echo "during experiment do not remove: reporting success to CI, even if there were failures"

	For simple commands you could also do:

	.. code-block:: bash

	cmd_that_may_fail \|\| true

	Of course, once satisfied with the results, integrate the experimental step or job with the rest of the normal jobs,
	while removing ``set +euo pipefail`` or any other things you may have added to ensure that the experimental job doesn't
	interfere with the normal CI functioning.

	This whole process would have been much easier if we only could set something like ``allow-failure`` for the
	experimental step, and let it fail without impacting the overall status of PRs. But as mentioned earlier CircleCI and
	Github Actions don't support it at the moment.

	You can vote for this feature and see where it is at at these CI-specific threads:

	* `Github Actions: <https://github.com/actions/toolkit/issues/399>`__
	* `CircleCI: <https://ideas.circleci.com/ideas/CCI-I-344>`__

testing.rstNo OneTemporaryActions

File Metadata

testing.rstView Options

Event Timeline

testing.rst
No OneTemporary
Actions

testing.rst
View Options