We propose a neural biomedical entity recognition and multi-type normalization tool (BERN) that uses neural network based NER models (BioBERT (Lee et al., 2019)) to recognize known entities and discover new entities. Also, BERN uses effective decision rules to identify the types of overlapping entities. Furthermore, various named entity normalization models are integrated into BERN for assigning a distinct ID to each recognized entity. BERN provides RESTful Web service for tagging entities in PubMed articles or raw text.
News: Check out BERN2, an improved version of BERN with much faster and more accurate inference!
https://bern.korea.ac.kr/pubmed/<one or more PMIDs>[/<pubtator or json>]
https://bern.korea.ac.kr/pubmed/29446767
https://bern.korea.ac.kr/pubmed/29446767/json
https://bern.korea.ac.kr/pubmed/29446767/pubtator
https://bern.korea.ac.kr/pubmed/29446767,25681199
For raw text, use the following Python code.
import requests def query_raw(text, url="https://bern.korea.ac.kr/plain"): return requests.post(url, data={'sample_text': text}).json() if __name__ == '__main__': print(query_raw("YOUR TEXT HERE"))
Please e-mail us for bulk tagging requests for non-PubMed data.
Click here to download the annotations (NER and normalization) for 18.4+ millions of PubMed articles (From pubmed19n0001 to pubmed19n1200 (2019.5.22)) (Compressed, 23.1 GB).
Note that start and end offsets are calculated based on the concatenation of title, a space, and abstract of an article.
You can download an external entity ID list corresponding to BERN entity ID list from here.
The data provided by BERN is post-processed and may differ from the most current/accurate data available from U.S. National Library of Medicine (NLM).
Our BERN implementation is available at https://github.com/dmis-lab/bern.
@article{kim2019neural, title={A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining}, author={Kim, Donghyeon and Lee, Jinhyuk and So, Chan Ho and Jeon, Hwisang and Jeong, Minbyul and Choi, Yonghwa and Yoon, Wonjin and Sung, Mujeen and and Kang, Jaewoo}, journal={IEEE Access}, volume={7}, pages={73729--73740}, year={2019}, publisher={IEEE} }
If you have any questions or have found a bug, please contact donghyeon@korea.ac.kr and kangj@korea.ac.kr