Back to Journals » International Journal of Nanomedicine » Volume 12

An annotated corpus with nanomedicine and pharmacokinetic parameters

Authors Lewinski NA, Jimenez I, McInnes BT

Received 15 March 2017

Accepted for publication 21 July 2017

Published 12 October 2017 Volume 2017:12 Pages 7519—7527


Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Thomas Webster

Nastassja A Lewinski,1 Ivan Jimenez,1 Bridget T McInnes2

1Department of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA, 2Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA

A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided.

Keywords: nanotechnology, informatics, natural language processing, text mining, corpora

Creative Commons License This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.

Download Article [PDF]  View Full Text [HTML][Machine readable]