Ph.D. Dissertation Defense: Yi Liu
Friday, July 14, 2023
10:00 AM
347 Avery Hall
“Image Processing Powered Convolutional Neural Network for Document Images and Beyond”
Deep learning, specifically Convolutional Neural Networks (CNNs), has achieved success in document image classification. However, the convolutional layer alone cannot replace conventional image processing techniques. Existing approaches utilize conventional image processing techniques as preprocessing steps for CNNs, but CNN’s ability to adapt these steps is not enabled during training. Integrating CNN’s powerful learning capabilities with these techniques can enable adaptation and enhance classification tasks. We find three needs in improving CNN’s effectiveness for document classification. First, preprocessing steps for document images have been task and dataset specific, there is a need to understand principles to choose preprocessing strategies for CNN and document image tasks. Second, CNN requires exhaustive training data, while manual labeling is expensive. Pseudo-groundtruth (machine-labeling) is needed to support CNN-based document image classification. Third, conventional image processing techniques can obtain features and perform classification, while these abilities are not integrated with CNN’s training. Integrating these techniques with CNN training can unlock the potential for better generalization and improved performance in document classification tasks. This dissertation investigates the impact and potential of image processing techniques on CNN-based approaches. First, we examine the effects of different image processing strategies on CNN models. Second, we develop a pseudo-groundtruth generation method using visual-based and text-based to label document images using a small number of labeled data. Third, we propose two approaches to integrate conventional image processing techniques into CNN models. The first approach enables adaptive selection of preprocessing strategies using the critic-actor method for CNN. The second approach introduces a novel projection profile attention layer that highlights layout information to enhance document image classification performance. These investigations provide valuable insights and tools for document image classification. We identify insights for selecting image processing techniques in preprocessing and introduce a pseudo-groundtruth generation workflow for CNN training in document image classification. Our proposed meta image processing strategies integrating conventional image processing techniques with CNN’s training enable adaptive selection of preprocessing strategies and leverage document layout information to enhance classification performance.
Committee members:
Committee Chair: Leen-Kiat Soh
Readers: Ashok Samal and Stephen Scott
Outside Representative: Liz Lorang