Indonesian Volunteers Transcribe and Digitize 20.000 Indigenous Manuscripts
Monday, February 20, 2023
The Wikimedia Foundation will use Transkribus, an AI-driven handwriting recognition tool to assist in the digitization process. #Infotempo
Indonesian volunteers from Wikimedia have launched a project to digitize and transcribe over 20,000 pages of Indonesian manuscripts. The project, called Wikisource Loves Manuscripts, was launched at the National Library of Indonesia, February 21, 2023, commemorating the 24th International Mother Language Day.
80 participants from librarians, academics, international institutions, and the Indonesian Wikimedia and Wikisource volunteer communities attended the event hosted by Pusat Pengkajian Islam dan Masyarakat (PPIM), a prominent manuscript research institute and a lead partner for the project.
Professor Ismatu Ropi, Executive Director at PPIM, said In the last five years, PPIM has had experience in digitizing ancient manuscripts in Southeast Asia through the DREAMSEA program.
“PPIM seeks to encourage the digitization of manuscripts to a wider audience through the Wikisource Loves Manuscript program, where more communities will be involved in utilizing digitized manuscript imagery into digital text that can be further processed, to support the development of infrastructure and cultural research resources," said Ropi, Tuesday.
The project will digitize manuscripts from three different regions of Indonesia: Bali, Java, and Sumatra. The Balinese Wikimedia community’s initiative in building WikiPustaka, a digital library of Balinese language manuscripts that runs on Wikisource software, inspired the project. More than 3,000 culturally relevant texts were cataloged in an open-access scholarly publication and transcribed on Wikisource.
Carma Citrawati, a Wikimedia contributor who spearheaded the WikiPustaka project, said many manuscript owners in Bali don’t have a catalog for their manuscript collections. This digitization, she believes, will help Balinese people and researchers.
“Through this project, I can assist the Balinese people in preserving their manuscript data on a digital platform, and I have a chance to learn and create manuscript metadata, to retype and to proofread manuscripts,'' Citrawati said.
Wikimedia Foundation’s Program Officer for Culture & Heritage, Satdeep Gill, explained Balinese WikiPustaka is already being used in university programs, where young people are learning to type in their own script using a custom-built on-screen keyboard. This project will support multilingual education, which is the theme of this year’s International Mother Language Day.
“Wikisource Loves Manuscripts closely aligns with the Wikimedia Foundation’s effort to improve digital access to reliable and locally relevant sources that are crucial for Wikipedia and Wikimedia projects, as well as the wider internet,” Gill said.
The larger Wikisource Loves Manuscript project has been funded by the Wikimedia Foundation to promote knowledge equity. UNESCO Jakarta Office is going to support the project with their digitization expertise and connections with relevant Indonesian institutions. The Wikimedia Foundation has also partnered with READ-COOP to integrate Transkribus, an AI-driven handwriting recognition tool, to assist in the digitization process.
This will enable volunteers to train Optical Character Recognition (OCR) models to recognize manuscripts accurately using their own transcriptions and corrections. Rather than having to manually transcribe every manuscript, volunteers can just check and correct the machine transcriptions. While there are many other OCR services available, they don’t serve underrepresented languages like Balinese.