Edit
ADDI
ActivePublic

AI-DRIVEN DATA INTEROPERABILITY USING ZERO-SHOT LEARNERS

In 2017, a new network architecture called the Transformer was introduced that leverages a self-attention mechanism to achieve state-of-the-art results in solving sequence-to-sequence tasks in the field of Natural Language Processing (NLP). In this thesis, three deep learning language models using the new transformer architecture (GPT2, RoBERTa and XLM-R) have been fine-tuned to extract information, such as name and birth date from ID-cards. The models are trained on Swiss ID-cards, and are shown to perform well in out-of-sample data such as German or even Finnish ID-cards. The models are able to adapt to unseen datasets by unsupervised fine-tuning. The unseen datasets do not only contain different label names and value representations, but can also contain novel labels with no equivalent during training. The best model extracts information with an average accuracy of 97% in out-of-sample datasets using unsupervised fine-tuning with 200 examples. The models have a zero shot accuracy of roughly 50%. In addition the model is resistant to spelling mistakes and fine-tuning to new databases does not result in catastrophic forgetting.

ADDI (master)

datasets/
master_thesis_pdm.pdf
models/
source_code/

Recent Commits

Commit	Author	Details	Committed
a0ebe5d8b65e	gantenbein	updated finish abstract	Jun 24 2021
df5d2493d8ad	gantenbein	Delete dummy change.txt	Jun 15 2021
523f0881f319	gantenbein	Create dummy change.txt	Jun 15 2021
0164de5c5cd4	gantenbein	add new transformer files	Jun 10 2021
73ab19ec174f	gantenbein	transformer updated	Jun 10 2021
b6934ff9f381	gantenbein	added noisy dataset	Jun 4 2021
092e83b9f761	gantenbein	source code, models and datasets	Jun 2 2021
a8311170c72c	gantenbein	source code	Jun 1 2021
e5a13f9db47f	gantenbein	Revert "datasets and source code for information extraction"	Jun 1 2021
8372f9de0fef	gantenbein	datasets and source code for information extraction	Jun 1 2021

EditADDIActivePublic

ADDI (master)

Recent Commits

Edit
ADDI
ActivePublic