Semi-automatic Transcription by Transcript Tool

TranscripTool Screenshot

Our team has crafted the TranscriptTool, a platform for semi-automatic transcription integrating artificial intelligence (Szigeti and Héder, 2022). This tool harnesses advanced Handwritten Text Recognition (HTR) models, tailored for a range of cipher symbol systems as outlined by Chen et al. (2021) and Souibgui et al. (2022). The Transcript Tool simplifies the digitization process, from cropping images for transcription to enhancing clarity through binarization and line segmentation. It incorporates cutting-edge models for this purpose.

The tool has been trained on specific ciphers with diverse symbol systems, including the Borg, Copiale, and digit-based ciphers. Additionally, it employs two generic models: one based on the widely recognized Omniglot dataset (Lake et al., 2015), covering standard symbols in handwriting recognition, and another, the Cipherglot dataset, enriched with common cipher symbols like digits, Zodiac, and alchemical signs (Souibgui et al. 2022).

A unique feature of the Transcript Tool is its ability for user-driven model refinement. Users can fine-tune the AI models by correcting a few lines of text, enhancing accuracy for specific sources. This tool also facilitates the comparison of different transcription models, a task that's often challenging but essential for consistent document processing.

Early studies suggest that the models developed under the DECRYPT project surpass conventional models found in tools like Transkribus (2024), especially those trained on standard alphabets and handwriting styles. However, it's important to note that even these advanced DECRYPT models have limitations when encountering unfamiliar writing systems. They often require significant manual input for correction.

References

Szigeti, F., and Héder, M. (2022) The TRANSCRIPT Tool for Historical Ciphers by the DECRYPT Project. In Proceedings of the 5th International Conference on Historical Cryptology, pp. 208–211.

Chen, J., Souibgui, M. A., Fornés, A., and Megyesi, B. (2020) A Web-based Interactive Transcription Tool for Encrypted Manuscripts. In Proceedings of the 3rd International Conference on Historical Cryptology. HistoCrypt 2020. pp. 52-59. Linköping Electronic Press.

Lake B. M., Salakhutdinov, R. and Tenenbaum, J.B. (2015) Human-level concept learning through probabilistic program induction. Journal of Science 350:6266, pp. 1332-1338. American Association for the Advancement of Science.

Souibgui, M. A., Bensalah, A., Chen, J., Fornés, A. & Waldispühl, M. (2022) A User Perspective on HTR Methods for the Automatic Transcription of Rare Scripts. The case of Codex Runicus. ACM Journal on Computing and Cultural Heritage. DOI: https://doi.org/10.1145/3519306

Training Videos