Update README.md
Browse files
README.md
CHANGED
@@ -22,9 +22,11 @@ This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentence
|
|
22 |
This model is a fine-tune of [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) using the HCA case law in the [Open Australian Legal Corpus](https://huggingface.co/datasets/umarbutler/open-australian-legal-corpus) by Umar Butler. The PDF/OCR cases were not used.
|
23 |
|
24 |
The cases were split into < 512 context chunks using the bge-small-en tokeniser and [semchunk](https://github.com/umarbutler/semchunk).
|
|
|
25 |
[mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) was used to generate a legal question for each context chunk.
|
26 |
|
27 |
129,137 context-question pairs were used for training.
|
|
|
28 |
14,348 context-question pairs were used for evaluation (see the table below for results).
|
29 |
|
30 |
Using a 10% subset of the val dataset the following hit-rate performance was reached and is compared to the base model and OpenAI's default ada embedding model.
|
|
|
22 |
This model is a fine-tune of [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) using the HCA case law in the [Open Australian Legal Corpus](https://huggingface.co/datasets/umarbutler/open-australian-legal-corpus) by Umar Butler. The PDF/OCR cases were not used.
|
23 |
|
24 |
The cases were split into < 512 context chunks using the bge-small-en tokeniser and [semchunk](https://github.com/umarbutler/semchunk).
|
25 |
+
|
26 |
[mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) was used to generate a legal question for each context chunk.
|
27 |
|
28 |
129,137 context-question pairs were used for training.
|
29 |
+
|
30 |
14,348 context-question pairs were used for evaluation (see the table below for results).
|
31 |
|
32 |
Using a 10% subset of the val dataset the following hit-rate performance was reached and is compared to the base model and OpenAI's default ada embedding model.
|