Client Profile.
Engaged in publishing material from a broad base of medical and paramedical researchers / experts, Australia
Industry: Periodicals – Publishing or Publishing & Printing
Objective.
Data collection from Medical Journal Documents, interpretation and conversion from PDF format to XML format.
Solution.
Collected data from Medical Journal Documents into XML editor, applied tagging guidelines as per PubMed Central. This was followed with validating the XML file against the XML schema, before finally converting it from PDF format to XML format.
Technology / Software.
Comprehending client requirements, data entry experts concluded on using PDF reader & double key-in method on client application through remote login facility.
Challenges.
Even though putting at task highly-skilled researchers for converting PDF documents into XML formats, there were two major challenges that were managed:
- Data Integrity Issues – Upon converting PDFs to XML, utmost attention was paid to ensure zero errors with regards to poor character recognition for uncommon fonts, non-removal of tags indicating sections of the articles or text including introduction, conclusion, and materials and methods etc.
- Copyright Implications – Seeking the right to text mine content for commercial purposes was handled swiftly. Also it was ensured that there is no plagiarism while converting PDFs intended for human consumption into XML.
Benefits.
- Entire conversion process enhanced the efficiency of data access.
- Client was able to retain complete PDF document hyperlinks.
- Enabled extraction of text without images; if required.
- Entire content allowed editing and formatting the converted output for redistribution.
Hi-Tech’s Solution.
Client was made to select professionals’ expert at data research and entry both, from a team of experienced and focused online data entry specialists, proficient at understanding and interpreting medical terminologies.
For converting PDF documents into XML format, the process in detail was defined, discussed and implemented as:
- Receipt of Input data: Collect pdf files from client given Website.
- Naming the Files: Rename the files in adherence to PubMed guidelines.
- XML conversion: Insert the XML tags as per the XML schema approved by PubMed Central.
- Quality Control: Validate the XML file against the XML schema.
- Style Checker QC: Checking the Final XML with the online PubMed Style checker to confirm whether an XML file conforms to PMC Style as defined in the PMC XML Tagging Guidelines.
- View on PubMed Article Previewer: View the article as it would appear in PubMed Central.
- Dispatch: Upload the XML file on client’s FTP server.
Technology / Software.
It sufficed client requirement of opting for the most effective and economical way to publish their data on the web. This XML format with adaptability ability, because uses less space to store information, and is cross platform supported; helped in creating large flat files like Periodic Medical Journals- conveniently.