OCR implementation in archiving system at Borobudur Subdistrict using regular expression and TextRank methods
Keywords:
Digitization, Archive, OCR, Regular expression, TextRankAbstract
Borobudur Subdistrict is part of a government agency that is required to manage archives, including the registration of incoming letters. Over the past five years, approximately 10,000 incoming letters have been stored in the Borobudur Subdistrict, which must be registered and digitized to support implementing an Electronic-Based Government System. The efforts for registration and digitization have been carried out using conventional methods, which require a considerable amount of time. Therefore, this study aims to develop a system that can assist in the process of archive registration and digitization using Optical Character Recognition (OCR) techniques along with Regular Expression and TextRank methods. The system is designed to extract text from physical documents into digital text through OCR, automatically detecting required text patterns using the Regular Expression method and summarizing documents using the TextRank method. The study's results show a significant increase in efficiency, up to 300%, where the number of archives registered and digitized in one day increased from 20 to 80. This solution proves that implementing this system can significantly improve the speed of the process of archive management in the Borobudur Subdistrict and also provide an effective solution for the registration and digitization process.
References
[1] Arsip Nasional Republik Indonesia Peraturan Arsip Nasional Republik Indonesia Nomor 4 Tahun 2021 Tentang Pedoman Penerapan Sistem Informasi Kearsipan Dinamis Terintegrasi; Indonesia, 2021; pp. 1–18;.
[2] Arsip Nasional Republik Indonesia Arsip Dinamis. Arsip Nasional Republik Indonesia Peraturan Arsip Nasional Republik Indonesia Nomor 6 Tahun 2021 Tentang Pengelolaan Arsip Elektronik; 2021; Vol. 1, pp. 1–24;.
[3] Kusmanto, B.T.; Pradana, N.; Prakisya, T.; Hatta, P. Comparative Analysis of Google Vision OCR with Tesseract on Newspaper Text Recognition. Media Comput. Sci. 2024, 1, 31–46, doi:10.69616/mcs.
[4] Dermawan, M.S.; Mulyawan, B.; Lauro, M.D. Perancangan Aplikasi Sistem Manajemen Dokumen Dan Pencarian Teks Dengan Menggunakan Optical Character Recognition (OCR). J. Ilmu Komput. dan Sist. Inf. 2019, 7, 81–86.
[5] Chen, Q.; Banerjee, A.; Demiralp, Ç.; Durrett, G.; Dillig, I. Data Extraction via Semantic Regular Expression Synthesis. Proc. ACM Program. Lang. 2023, 7, doi:10.1145/3622863.
[6] Bintang, J.M.; Ashshidiq, M.F.; Dzakwan, H.F. Penerapan Algoritma String Matching Dan Regular Expression Pada Aplikasi Kamus Besar Bahasa Indonesia (KBBI). BIOS J. Teknol. Inf. dan Rekayasa Komput. 2023, 4, 34–41, doi:10.37148/bios.v4i1.57.
[7] Zamzam, M.A. Sistem Automatic Text Summarization Menggunakan Algoritma Textrank. Matics 2020, 12, 111–116, doi:10.18860/mat.v12i2.8372.
[8] A. A. Wahid “Analisis Metode Waterfall Untuk Pengembangan Sistem Informasi,” . J. Ilmu-ilmu Inform. dan Manaj. STMIK 2020, 1.
[9] Shandra Dewi, E.; Ardya Mesia Putri, E.; Tji Beng, J.; Teknologi Informasi, F. Perbandingan Antara Metode Waterfall Dan Metode Rad Dalam Pembuatan Aplikasi E-Rekrutmen Berbasis Website: Studi Kasus Pt Xyz Comparison Between the Waterfall Method and the Rad Method in Creating Website-Based E-Recruitment Applications: A Case Study Of . J. Inf. Technol. Comput. Sci. 2024, 7, 1067–1072.
[10] Murdiani, D.; Sobirin, M. Perbandingan Metodologi Waterfall Dan Rad (Rapid Application Development) Dalam Pengembangan Sistem Informasi. JUTEKIN (Jurnal Tek. Inform. 2022, 10, doi:10.51530/jutekin.v10i2.655.
[11] Widya Ningsih PERBANDINGAN MODEL WATERFALL DAN METODE PROTOTYPE UNTUK PENGEMBANGAN APLIKASI PADA SISTEM INFORMASI. J. Ilm. Metadata, 2023, 1, 83–95, doi:10.62386/jised.v2i1.50.
[12] Hasibuan, H. novita sari Optical Character Recognition Untuk Manajemen Surat. (CoSIE) 2022, 01, 146–151.
[13] Andreas Gosal, F.; Tinno Dolf Rompas, P. Penerapan Teknologi Optical Character Recognition Pada Pengarsipan Dokumen (Studi Kasus: PT Pertamina Geothermal Energy Area Lahendong). Innov. J. Soc. Sci. Res. 2023, 3, 5404–5422.
[14] Arief, M.; Budi, S.; Sadiah, H.T. Digitalisasi Pengarsipan Surat Pada Kantor Kecamatan Cigudeg Digitalizing Letters in the Kecamatan Office of Cigudeg. Bisnis dan Komput. 2021, 1, 38–43.
Downloads
Published
Conference Proceedings Volume
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.