--- license: cc-by-nc-4.0 datasets: - starmpcc/Asclepius-Synthetic-Clinical-Notes language: - en pipeline_tag: text2text-generation tags: - medical --- # Model Card for Model ID This is official model checkpoint for Asclepius-7B [arxiv](todo) This model is the first publicly shareable clinical LLM, trained with synthetic data. ## Model Details ### Model Description - **Model type:** Clinical LLM (Large Language Model) - **Language(s) (NLP):** English - **License:** CC-BY-NC-SA 4.0 - **Finetuned from model [optional]:** LLaMA-7B ### Model Sources [optional] - **Repository:** https://github.com/starmpcc/Asclepius - **Paper [optional]:** TODO Arxiv - **Data:** https://huggingface.co/datasets/starmpcc/Asclepius-Synthetic-Clinical-Notes ## Uses This model can perform below 8 clinical NLP tasks, with clincal notes. - Named Entity Recognition - Abbreviation Expansion - Relation Extraction - Temporal Information Extraction - Coreference Resolution - Paraphrasing - Summarization - Question Answering ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use ONLY USE THIS MODEL FOR RESEARCH PURPOSE!! ## How to Get Started with the Model ```python prompt = """You are an intelligent clinical languge model. Below is a snippet of patient's discharge summary and a following instruction from healthcare professional. Write a response that appropriately completes the instruction. The response should provide the accurate answer to the instruction, while being concise. [Discharge Summary Begin] {note} [Discharge Summary End] [Instruction Begin] {question} [Instruction End] """ from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("starmpcc/Asclepius-7B") model = AutoModel.from_pretrained("starmpcc/Asclepius-7B") note = "This is a sample note" question = "What is the diagnosis?" model_input = prompt.format(note=note, question=question) input_ids = tokenizer(model_input, return_tensors="pt").input_ids output = model.generate(input_ids) print(tokenizer.decode(output[0])) ``` ## Training Details ### Training Data https://huggingface.co/datasets/starmpcc/Asclepius-Synthetic-Clinical-Notes ### Training Procedure - Initial training was conducted using causal language modeling on synthetic clinical notes. - It was then fine-tuned with clinical instruction-response pairs. - For a comprehensive overview of our methods, our upcoming paper will serve as a resource. #### Training Hyperparameters - We followed config used in [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) - #### Speeds, Sizes, Times [optional] - Pre-Training (1 epoch): 1h 33m with 8x A100 80G - Instruction Fine-Tuning (3 epoch): 7h 26m with 8x A100 80G ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed]