Citation

Material Information

Title:
Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases
Creator:
Zhiwei Chen
Zhe He
Xiuwen Liu
Jiang Bian
Publisher:
BMC, BMC Medical Informatics and Decision Making
Publication Date:
Language:
English

Subjects

Subjects / Keywords:
Word embedding ( fast )
Semantic relation ( fast )
UMLS Wordnet ( fast )

Notes

Abstract:
Background: In the past few years, neural word embeddings have been widely used in text mining. However, the vector representations of word embeddings mostly act as a black box in downstream applications using them, thereby limiting their interpretability. Even though word embeddings are able to capture semantic regularities in free text documents, it is not clear how different kinds of semantic relations are represented by word embeddings and how semantically-related terms can be retrieved from word embeddings. Methods: To improve the transparency of word embeddings and the interpretability of the applications using them, in this study, we propose a novel approach for evaluating the semantic relations in word embeddings using external knowledge bases: Wikipedia, WordNet and Unified Medical Language System (UMLS). We trained multiple word embeddings using health-related articles in Wikipedia and then evaluated their performance in the analogy and semantic relation term retrieval tasks. We also assessed if the evaluation results depend on the domain of the textual corpora by comparing the embeddings of health-related Wikipedia articles with those of general Wikipedia articles. Results: Regarding the retrieval of semantic relations, we were able to retrieve semanti. Meanwhile, the two popular word embedding approaches, Word2vec and GloVe, obtained comparable results on both the analogy retrieval task and the semantic relation retrieval task, while dependency-based word embeddings had much worse performance in both tasks. We also found that the word embeddings trained with health-related Wikipedia articles obtained better performance in the health-related relation retrieval tasks than those trained with general Wikipedia articles. Conclusion: It is evident from this study that word embeddings can group terms with diverse semantic relations together. The domain of the training corpus does have impact on the semantic relations represented by word embeddings. We thus recommend using domain-specific corpus to train word embeddings for domain-specific text mining tasks. Keywords: Word embedding, Semantic relation, UMLS, WordNet
General Note:
Chen et al. BMC Medical Informatics and DecisionMaking 2018, 18(Suppl 2):65 https://doi.org/10.1186/s12911-018-0630-x; Pages 1-157

Record Information

Source Institution:
UF Special Collections
Holding Location:
UF Special Collections
Rights Management:
© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

UFDC Membership

Aggregations:
University of Florida Institutional Repository