---
language:
- {en}  # Example: fr
license: mit
widget:
- text: "Lou Gehrig who works for XCorp and lives in New York suffers from [MASK]"
  example_title: "Test for entity type: Disease"
- text: "Overexpression of [MASK] occurs across a wide range of cancers"
  example_title: "Test for entity type: Gene"
- text: "Patients treated with [MASK] are vulnerable to infectious diseases"
  example_title: "Test for entity type: Drug"
- text: "A eGFR level below [MASK] indicates chronic kidney disease"
  example_title: "Test for entity type: Measure "
- text: "In the [MASK], increased daily imatinib dose induced MMR"
  example_title: "Test for entity type: STUDY/TRIAL"
- text: "Paul Erdos died at [MASK]"
  example_title: "Test for entity type: TIME"
tags:
- {fill-mask}  # Example: audio
---


This model was pretrained from scratch on a custom vocabulary on Pubmed, Clinical trials corpus, and a small subset of Bookcorpus

It was used to do NER as is, **with no fine-tuning** as described [in this post](https://ajitrajasekharan.github.io/2021/01/02/my-first-post.html)

[Towards Data Science link](https://twitter.com/TDataScience/status/1486300137366466560?s=20) to the same post

[Github link](https://github.com/ajitrajasekharan/unsupervised_NER) to NER using this model in an ensemble with bert-base cased to detect 69 entity types (17 broad entity groups)

 <img src="https://ajitrajasekharan.github.io/images/1.png" width="600">