Model Card for Model longluu/Medical-QA-gatortrons-COVID-QA

The model is an extractive Question Answering algorithm that can find an answer to a question by finding a segment in a text.

Model Details

Model Description

The base pretrained model is GatorTronS which was trained on billions of words in various clinical texts (https://huggingface.co/UFNLP/gatortronS). Then using the COVID-QA dataset (https://huggingface.co/datasets/covid_qa_deepset), I fine-tuned the model for an extractive Question Answering algorithm that can answer a question by finding it within a text.

Model Sources [optional]

The github code associated with the model can be found here: https://github.com/longluu/Medical-QA-extractive.

Training Details

Training Data

This dataset contains 2,019 question/answer pairs annotated by volunteer biomedical experts on scientific articles regarding COVID-19 and other medical issues. The dataset can be found here: https://github.com/deepset-ai/COVID-QA. The preprocessed data can be found here https://huggingface.co/datasets/covid_qa_deepset.

Training Hyperparameters

The hyperparameters are --per_device_train_batch_size 4
--learning_rate 3e-5
--num_train_epochs 2
--max_seq_length 512
--doc_stride 250
--max_answer_length 200 \

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was trained and validated on train and validation sets.

Metrics

Here we use 2 metrics for QA tasks exact match and F-1.

Results

{'exact_match': 37.12871287128713, 'f1': 64.90491019877854}

Model Card Contact

Feel free to reach out to me at thelong20.4@gmail.com if you have any question or suggestion.