Caveat: given my lacklustre tech saviness, our archivists or volunteers are probably the best ones to be discussing this topic, but here’s an attempt to share what we’ve been up to…
As part of our HLF funded project, New Perspectives, we are digitising the first six volumes of the NUWT’s journal publication, The Woman Teacher. Each volume contains 44-45 issues. The fact that these women were able to publish a journal on a weekly basis… in addition to teaching… and campaigning… is making us all seriously re-evaluate how we spend our free time.
We have been beyond fortunate to have a wonderful group of volunteers who have recently become a part of our Archive Team. Since joining us six weeks ago, they have been hard at work making The Woman Teacher accessible and available for research online. Because of our highly efficient volunteers, we are just half a volume away from having all six volumes scanned!
In addition to the basic scan, each copy has Optical Character Recognition (OCR) software applied to it. OCR involves electronically converting scanned images of handwritten, typewritten or printed text into encoded, searchable text. Essentially, with OCR, the text sits behind the image and makes any scanned document searchable.
Once all six volumes are ready to go, the volunteers will apply the meta-data, which embeds information directly into the PDF. With the meta-data, important information about the document and the PDF are never separated, ensuring the document is more (for lack of a better word) ‘findable’.
After the meta-data and ongoing quality control checks are all complete, we can put everything online so it’s available to anyone with an internet connection, further widening access to archive collections.
To view a (searchable!) PDF of the very first issue of The Woman Teacher, click here… The Woman Teacher Vol. 1 No. 1
The next paragraph may be a bit of a bore if you’re not interested in the logistics of OCR software…
Since this is a new process for everyone, there has been plenty of trial and error as we and the volunteers experiment with the OCR software to find the best approach. While we have been using specialist OCR software, it turns out our fancy scanner actually does a more accurate OCR on its own, in terms of creating an accurate, text-searchable PDF. However, as anyone involved in digitising archives will likely agree, it seems to be a case by case – or archive by archive – basis as to what works best.