Exploration and Research on OCR Technology for Standardized Yi Character
DOI:
https://doi.org/10.54097/babd6q58Keywords:
Standardized Yi character; Optical Character Recognition; Deep learning; OpenCV; Tesseract-OCR.Abstract
Optical Character Recognition (OCR) refers to the process of analyzing, understanding, and recognizing textual information in image files, which is beneficial for extracting and collecting standardized Yi text information from paper materials. Using Tesseract-OCR software, based on the coding standards of standardized Yi characters and the font characteristics of standardized Yi characters, a deep learning training and recognition result validation were conducted using Long Short Term Memory Neural Network (LSTM) for standardized Yi character recognition. The Python programming language called OpenCV library, Pytesseract library, and PyQt5 library to achieve the construction of a standardized Yi character recognition system. Experiments show that the system can realize the characters recognition of Yi characters in Baiti, Songti, Heiti and Xiheiti, and has off-line operation and high recognition accuracy.
Downloads
References
[1] Liu Sai ,Li Yidong.Design and Realization on CharacterSegmentation Method for Yi Language[J]. Journal of South-Central University for Nationalities (Nat. Sci. Edition). 2007(03): 70-72.
[2] Wu Bing.Research on the Analysis of Standardized Yi characters from the Perspective of Character Recognition[J]. Journal of Southwest Minzu University(Humanities and Social Sciences Edition). 2018, 39(09):46-53.
[3] Liang Hao .Study of Nonliner Gaussian Filters and its Application to CNS/SAR/SINS Integrated Navigation [D]. Harbin Institute of Technology,2015.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Education and Social Development

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.









