---
license: cc-by-nc-4.0
datasets:
- starmpcc/Asclepius-Synthetic-Clinical-Notes
language:
- en
pipeline_tag: text2text-generation
tags:
- medical
---
# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This is official model checkpoint for Asclepius-7B [arxiv](todo)
This model is the first publicly shareable clinical LLM, trained with synthetic data.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->


- **Model type:** Clinical LLM (Large Language Model)
- **Language(s) (NLP):** English
- **License:** CC-BY-NC-SA 4.0
- **Finetuned from model [optional]:** LLaMA-7B

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/starmpcc/Asclepius
- **Paper [optional]:** TODO Arxiv
- **Data:** https://huggingface.co/datasets/starmpcc/Asclepius-Synthetic-Clinical-Notes

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
This model can perform below 8 clinical NLP tasks, with clincal notes.
- Named Entity Recognition
- Abbreviation Expansion
- Relation Extraction
- Temporal Information Extraction
- Coreference Resolution
- Paraphrasing
- Summarization
- Question Answering

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

[More Information Needed]

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

ONLY USE THIS MODEL FOR RESEARCH PURPOSE!!

## How to Get Started with the Model

```python
prompt = """You are an intelligent clinical languge model.
Below is a snippet of patient's discharge summary and a following instruction from healthcare professional.
Write a response that appropriately completes the instruction.
The response should provide the accurate answer to the instruction, while being concise.

[Discharge Summary Begin]
{note}
[Discharge Summary End]

[Instruction Begin]
{question}
[Instruction End] 
"""

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("starmpcc/Asclepius-7B")
model = AutoModel.from_pretrained("starmpcc/Asclepius-7B")

note = "This is a sample note"
question = "What is the diagnosis?"

model_input = prompt.format(note=note, question=question)
input_ids = tokenizer(model_input, return_tensors="pt").input_ids
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))
```

## Training Details

### Training Data

<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

https://huggingface.co/datasets/starmpcc/Asclepius-Synthetic-Clinical-Notes

### Training Procedure 

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
- Initial training was conducted using causal language modeling on synthetic clinical notes.
- It was then fine-tuned with clinical instruction-response pairs.
- For a comprehensive overview of our methods, our upcoming paper will serve as a resource.

#### Training Hyperparameters

- We followed config used in [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
- 
#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
- Pre-Training (1 epoch): 1h 33m with 8x A100 80G
- Instruction Fine-Tuning (3 epoch): 7h 26m with 8x A100 80G


## Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]