I converted this reference book from source files on archive.org that were created by scanning the physical book. The "external" reason was to make this book more accessible and available than the form of a 30Mb PDF download:
- To have one entry per page
- To have any associated images on the same page
- To allow access via an indexed list of links
- To have a free text search facility (Still to be done)
- To create a concordance listing of key terms (Still to be done)
The "internal" reason was that I wanted to understand the actual process of conversion:
- What are the steps in the conversion process
- To what extent can this process be automated
- What kind of architecture is needed to support automation (In progress)
- What data model can be used to represent context and content (In progress)
In summary, I have achieved most of the goals that I set for myself and believe that I can move forward in the remaining areas. During the course of conversion I developed a number of scripts and techniques for data processing, and have sketched out an overall systems architecture that can be used to apply these scripts in an automated fashion. I intend to validate this by converting another work, with a higher level of automation.
I also have some outline academic papers that describe this process, that one day I would like to bring to completion!
DrawShield.net development is entirely powered by coffee and cake.
Please help make sure supplies never run low.