Table Structure Understanding and Its Performance Evaluation
Yalin Wang, Ihsin T. Phillips, and Robert M. Haralick
Abstract
This paper presents a table structure understanding algorithm designed using
optimization methods. The algorithm is probability based, where the
probabilities are estimated from
geometric measurements made on the various entities in a large training
set. The methodology includes a global
parameter optimization scheme, a novel automatic table
ground truth generation system and a table structure understanding
performance evaluation protocol. With a
document data set having 518 table and 10,934 cell
entities, it performed at the 96.76% accuracy rate on the cell level and
98.32% accuracy rate on the table level.
Figures (click on each for a larger version):
Related Publications