Web: http://arxiv.org/abs/2107.06777

Jan. 26, 2022, 2:10 a.m. | Christian Bartz, Hendrik Rätz, Jona Otholt, Christoph Meinel, Haojin Yang

cs.CV updates on arXiv.org arxiv.org

One of the most pressing problems in the automated analysis of historical
documents is the availability of annotated training data. The problem is that
labeling samples is a time-consuming task because it requires human expertise
and thus, cannot be automated well. In this work, we propose a novel method to
construct synthetic labeled datasets for historical documents where no
annotations are available. We train a StyleGAN model to synthesize document
images that capture the core features of the original documents. …

arxiv cv data segmentation semantic synthetic data

