Ph.D. Dissertation Defense: Chulwoo Pack
Wednesday, April 12, 2023
3:00 PM
Location: 211 Schorr Center
"Enhancing Document Layout Analysis on Historical Newspapers: Visual Representation, Pseudo-ground-truth, and Downscaling"
One of the objectives of document image segmentation aims to decompose a digitized document image into a set of homogeneous regions by distinguishing text zones from non-textual ones. While page segmentation on constrained layouts and clean images have been successfully addressed in the past, segmentation on unconstrained layouts and noisy images, such as historical newspapers, is still an open problem based on the following factors. First, using heuristic rules with conventional image processing techniques is less than optimal to cover the variations of layouts and image quality present in historical newspapers. Second, robust segmentation performance of deep convolutional neural network (DCNN) typically requires a vast amount of accurately curated ground-truth, which is cost-intensive. Third, DCNNs usually require downscaling to process large historical newspaper images to fit the GPU memories, which usually leads to precision loss. Given such challenging factors, we intend to improve the accuracy and time-efficiency of the numerical process of identifying a set of textual regions from given unconstraint, noisy, and large historical newspaper images. First, we investigate whether it is worthwhile to utilize conventional geometric feature-based visual representation for segmenting historical newspapers with and without a DCNN. Second, we investigate whether we can generate effective pseudo-ground-truth using document degradation models to address the need for expensive labeling of datasets. Third, we investigate whether we can adaptively downscale large images by preserving visual cues relevant to the layout structure to mitigate the precision loss. Our research contributes to document image segmentation and analysis in general for noisy document images, especially in the domain of historical newspapers. Specifically, our research proposes and evaluates novel methods of utilizing image processing techniques: (1) region-growing merging based on the Docstrum geometric feature, (2) fusion of Gravity-map that is a Voronoi-tessellation-based visual representation and a DCNN, and (3) adaptive image downscaling combining the strengths of content-independent and content-aware strategies) and document image quality assessment, which in turn help generate more accurate spatial-related information efficiently by requiring less computing resources and cost of ground-truthing.
Committee:
Dr. Leen-Kiat Soh, Advisor
Dr. Ashok Samal
Dr. Stephen Scott
Dr. Elizabeth Lorang