DNER Clinical (Named Entity Recognition) from Free Clinical Text to Snomed-CT Concept

WSEAS Transactions on Computers

Print ISSN: 1109-2750
E-ISSN: 2224-2872

Volume 16, 2017

Notice: As of 2014 and for the forthcoming years, the publication frequency/periodicity of WSEAS Journals is adapted to the 'continuously updated' model. What this means is that instead of being separated into issues, new papers will be added on a continuous basis, allowing a more regular flow and shorter publication times. The papers will appear in reverse order, therefore the most recent one will be on top.

DNER Clinical (Named Entity Recognition) from Free Clinical Text to Snomed-CT Concept

AUTHORS: Ignacio Martinez Soriano, Juan Luis Castro Peña

Download as PDF

ABSTRACT: We have developed a new approach for the (NER) named entity recognition problem, in specific domains like the medical environment. The main idea is recognize clinical concepts in free text clinical reports. Actually most of the information contained in clinical reports from the Electronic Health System (EHR) of a hospital, is written in natural language free text, so we are researching the problem of automatic clinical named entities recognition from free text clinical reports, in this kind of texts we design a new NER approach, like a hybrid of theses approach, dictionary-based, machine learning, and a fuzzy function. To develop this, from clinical reports free text, we apply an unsupervised, shallow learning neural network, word2vec to represent words of the text as “words vectors”. Second, we use a specific domain dictionary-based gazetteer (using the ontology Snomed-CT to get the standard clinical code for the clinical concept), for match the correct concept, and recognize the named entity like a clinical concept, we use the distance and similarity between of the “words vector” of the terms from the document and the distance of the “word vector” with the Snomed-CT description term, applying a fuzzy function “DNER”, to get the best degree of identification for the named entity recognized. We have applied our approach on a Dataset with 318.585 clinical reports in Spanish from the emergency service of the Hospital “Rafael Méndez” from Lorca (Murcia) Spain, and preliminary results are encouraging.

KEYWORDS: Snomed-CT, word2vec, doc2vec, clinical information extraction, skipgram, medical terminologies, search semantic, named entity recognition, ner, medical entity recognition

REFERENCES:

[1] A. Gangemi. A Comparison of Knowledge Extraction Tools for the Semantic Web. In P. Cimiano, O. Corcho, V. Presutti, L. Hollink, and S. Rudolph, editors, The Semantic Web: Semantics and Big Data,number 7882 in Lecture Notes in Computer Science, pages 351{366. Springer Berlin Heidelberg, Jan. 2013.

[2] S. v. Hooland, M. D. Wilde, R. Verborgh, T. Steiner, and R. V. d. Walle. Exploring entity recognition and disambiguation for cultural heritage collections. Literary and Linguistic Computing, page fqt067, Nov. 2013.

[3] Timm Heuss, Bernhard Humm, Christian Henninger, and Thomas Rippl. A comparison of NER tools w.r.t. a domain-specific vocabulary. In Proceedings of the 10th International Conference on Semantic Systems (SEM '14), Harald Sack, Agata Filipowska, Jens Lehmann, and Sebastian Hellmann (Eds.). ACM, New York, NY, USA, 100-107. 2014.

[4] Quoc V Le and Tomas Mikolov, Distributed representations of sentences and document,.arXiv preprint arXiv:1405.4053., 2014.

[5] Rumelhart, David E, Hinton, Geoffrey E, and Williams, Ronald J. Learning representations by back-propagatingerrors. Nature, 323(6088):533–536, 1986.

[6] L. Ratinov and D. Roth. Design challengesand misconceptions in named entity recognition. InCoNLL, 6.2009.

[7] Radim Rehurek, Software Framework for topic Modelling with Large Corpora, Proceedings of LREC 2010 workshop on New Challenges for NLP Frameworks, 2010.

[8] Pedregosa et al. Scikit-learn: Machine Learning in Python. JMLR 12, pp. 2825-2830, 2011.

[9] Wagner, Wiebke, Steven Bird, Ewan Klein and Edward Loper. Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit - O'Reilly Media, Beijing, 2009.

[10] Jin D. Kim, Tomoko Ohta, Yoshimasa Tsuruoka, Yuka Tateisi, and Nigel Collier. Introduction to thebio-entity recognition task at JNLPBA. In Proceedingsof the International Joint Workshop on NaturalLanguage Processing in Biomedicine and its Applications, JNLPBA ’04, pages 70–75, 2004.

[11] Shaodian Zhang, Nóemie Elhadad, Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts, J Biomed Inform. 2013.

[12] Chen Y, Lasko TA, Mei Q, Denny JC, Xu H. A Study of Active Learning Methods for Named Entity Recognition in Clinical Text. Journal of biomedical informatics. 58:11-18. 2015.

[13] K. Gojenola, M.Oronoz, A. Pérez, A. Casillas. IxaMed: Applying Freeling and a Perceptron Sequential Tagger at the Shared Task on Analyzing Clinical Texts”, Proceedings of the 8th International Workshop on Semantic Evaluation , pages 361–365, Dublin, Ireland, August 23-24, 2014.

[14] Fernando Aparicio et al. TMT: A tool to guide users in finding information on clinical texts. Procesamiento del Lenguaje Natural,

[S.l.], v. 46, p. 27-34, feb. 2010.

[15] Katona, Melinda and RichárdFarkas. “SZTENLP: Clinical Text Analysis with Named Entity Recognition.” SemEval@COLING (2014).

[16] Tseytlin E, Mitchell K, Legowski E, Corrigan J, Chavan G, Jacobson RS. NOBLE - Flexible concept recognition for large-scale biomedical natural language processing. BMC Bioinformatics. 2016.

[17] SNOMED® international delivers SNOMED clinical terms®. 2017.

[18] SNOMED CT® Starter Guide - International Release. 2017.

[19] Mikolov, Tomas, Chen, Kai, Corrado, Greg, and Dean, Jeffrey, Efficient estimation of word representations in vector space. 2013a

[20] Y. Bengio, R. Ducharme, P. Vincent. A neural probabilistic language model. Journal of Machine LearningResearch, 3:1137-1155, 2003.

[21] Pastor, Mª Dolores, Navalon, Rafael, Manual de Codificacion CIE-10-Diagnosticos, ministerio de sanidad. 2016.

[22] L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579- 2605, 2008.

[23] G.E. Hinton and S.T. Roweis. Stochastic Neighbor Embedding. In Advances in Neural Information Processing Systems, volume 15, pages 833–840, Cambridge, MA, USA, 2002.

WSEAS Transactions on Computers, ISSN / E-ISSN: 1109-2750 / 2224-2872, Volume 16, 2017, Art. #10, pp. 83-91

Copyright © 2017 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution License 4.0

Quick Links

Login

Other Articles by Author(s)

Author(s) and WSEAS

WSEAS Transactions on Computers

Bulletin Board