Integrating YOLOv8, EasyOCR, and GTTS for text detection in assistive technology for the visually impaired
Keywords:
Text detection, Assistive technology, Visually impairedAbstract
Technology for visually impaired individuals has advanced, but accessing text-based information remains challenging. Accurate text detection, clear reading, and voice conversion are essential. YOLOv8, EasyOCR, and Google Text-to-Speech (GTTS) are cutting-edge technologies that can be integrated to address this need. This study aims to develop a system combining YOLOv8 for text detection, EasyOCR for text recognition, and GTTS for text-to-speech conversion, focusing on improving accessibility for the visually impaired. The system operates in several stages. First, YOLOv8 detects text in images in real-time. Next, EasyOCR extracts text from the detected regions. Finally, GTTS converts the recognized text into clear speech. A diverse text image dataset was used for training and testing the detection model, while user testing was conducted to assess system usability and effectiveness. The developed system successfully detects and reads text with high accuracy and converts it into clear speech. System evaluation revealed significant improvements in information accessibility for the visually impaired, with users responding positively to its speed, accuracy, and ease of use. Integrating YOLOv8, EasyOCR, and GTTS into a single solution presents an innovative approach to text detection, recognition, and conversion for visually impaired individuals. This system demonstrates significant potential to enhance independence and quality of life by improving access to text-based information. The study contributes to assistive technology development and opens doors for further research into practical applications and system refinement.
References
[1] Partuni, “Siaran Pers: Peran Strategis Pertuni Dalam Memberdayakan Tunanetra Di Indonesia,” Available:https://pertuni.or.id/siaran-pers-peran-strategis-pertuni-dalam-memberdayakan-tunanetra-di-indonesia/, Mar. 04, 2017.
[2] A. , & B. H. Smith, “Evaluating Assistive Technology for the Visually Impaired: Methods and Metrics,” in Proceedings of the International Conference on Human-Computer Interaction (HCI). , 2019.
[3] X. , & Z. L. Liu, “EasyOCR: A Python Library for OCR with a Focus on Chinese Text.,” 2022.
[4] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection.” [Online]. Available: http://pjreddie.com/yolo/
[5] C. , & Z. H. Wang, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” 2021, arXiv preprint.
[6] Smelyakov, kirill, Chupryna Anastasya, Dmytro Darahan, and Midina Serhii, “Effectiveness of modern text recognition solutions and tools for common data sources.,” in 5th International Conference on Computational Linguistics and Intelligent Systems (ICOLINS-2021), Ukraina, Apr. 2021.
[7] R. D. , P. W. S. , & T. A. N. Kusumanto, “Aplikasi Sensor Vision untuk Deteksi MultiFace dan Menghitung Jumlah Orang,” Semantik, 2012.
[8] Jocher, G., & Zhao, D. (2022). YOLOv5: A PyTorch Implementation of YOLOv5. GitHub repository.
[9] Khan, A., & Hussain, M. (2019). A Survey on Optical Character Recognition (OCR) Systems. Journal of
Computer and Communications, 7(3), 25-32.
[10] Mishra, S., & Sharma, A. (2018). Text Detection and Recognition in Natural Images Using Deep Learning. In Proceedings of the International Conference on Computer Vision (ICCV).
[11] Rao, K., & Kannan, A. (2020). A Survey of Text-to-Speech Conversion Techniques. Journal of Computer Science and Technology, 35(4), 710-735.
[12] Dhanasekar, D., & Banu, N. (2018). Text to Speech Conversion using Google Text to Speech API.
International Journal of Engineering and Technology, 7(2), 823-828.
[13] Arumugam, K., & Ramesh, A. (2021). An Overview of Text-to-Speech Synthesis for Assistive Technologies. In Proceedings of the International Conference on Speech and Language Processing (ICSLP).
[14] Liu, Q., & Wang, W. (2019). Assistive Technology for the Visually Impaired: A Comprehensive Review.
Journal of Assistive Technologies, 13(2), 102-119.
Downloads
Published
Conference Proceedings Volume
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.