A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining

Donghyeon Kim, Jinhyuk Lee, Chan Ho So, Hwisang Jeon, Minbyul Jeong, Yonghwa Choi, Wonjin Yoon, Mujeen Sung and Jaewoo Kang

We propose a neural biomedical entity recognition and multi-type normalization tool (BERN) that uses neural network based NER models (BioBERT (Lee et al., 2019)) to recognize known entities and discover new entities. Also, BERN uses effective decision rules to identify the types of overlapping entities. Furthermore, various named entity normalization models are integrated into BERN for assigning a distinct ID to each recognized entity. BERN provides RESTful Web service for tagging entities in PubMed articles or raw text.

News: Check out BERN2, an improved version of BERN with much faster and more accurate inference!


DEMO

Type any biomedical texts:

Type a PMID or comma separated PMIDs:

(e.g. 29446767,25681199)


APIs

https://bern.korea.ac.kr/pubmed/<one or more PMIDs>[/<pubtator or json>]

Single PMID, output in PubAnnotation JSON (default)

https://bern.korea.ac.kr/pubmed/29446767
https://bern.korea.ac.kr/pubmed/29446767/json

Single PMID, output in PubTator

https://bern.korea.ac.kr/pubmed/29446767/pubtator

Multiple PMIDs

https://bern.korea.ac.kr/pubmed/29446767,25681199

Raw Text

For raw text, use the following Python code.

import requests

def query_raw(text, url="https://bern.korea.ac.kr/plain"):
    return requests.post(url, data={'sample_text': text}).json()

if __name__ == '__main__':
    print(query_raw("YOUR TEXT HERE"))

Bulk Request

Please e-mail us for bulk tagging requests for non-PubMed data.

Download

Click here to download the annotations (NER and normalization) for 18.4+ millions of PubMed articles (From pubmed19n0001 to pubmed19n1200 (2019.5.22)) (Compressed, 23.1 GB).

Note that start and end offsets are calculated based on the concatenation of title, a space, and abstract of an article.

You can download an external entity ID list corresponding to BERN entity ID list from here.

The data provided by BERN is post-processed and may differ from the most current/accurate data available from U.S. National Library of Medicine (NLM).

Implementation

Our BERN implementation is available at https://github.com/dmis-lab/bern.

Publication

"A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining" Donghyeon Kim, Jinhyuk Lee, Chan Ho So, Hwisang Jeon, Minbyul Jeong, Yonghwa Choi, Wonjin Yoon, Mujeen Sung and Jaewoo Kang. 2019, IEEE Access

Citation

@article{kim2019neural,
  title={A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining},
  author={Kim, Donghyeon and Lee, Jinhyuk and So, Chan Ho and Jeon, Hwisang and Jeong, Minbyul and Choi, Yonghwa and Yoon, Wonjin and Sung, Mujeen and and Kang, Jaewoo},
  journal={IEEE Access},
  volume={7},
  pages={73729--73740},
  year={2019},
  publisher={IEEE}
}

Contacts

If you have any questions or have found a bug, please contact donghyeon@korea.ac.kr and kangj@korea.ac.kr