khang119966 commited on
Commit
c38c780
•
1 Parent(s): 0e819ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -50,7 +50,7 @@ The fine-tuning dataset was meticulously sampled in part from the following data
50
 
51
  ## Benchmarks 📈
52
 
53
- Since there are still many different metrics that need to be tested, we chose a quick and simple metric first to guide the development of our model. Our metric is inspired by Lavy[4]. For the time being, we are using GPT-4 to evaluate the quality of answers on two datasets: OpenViVQA and ViTextVQA. Detailed results can be found at the provided . The inputs are images, questions, labels, and predicted answers. The model will return a score from 0 to 10 for the corresponding answer quality. The results table is shown below.
54
 
55
  <table border="1" cellspacing="0" cellpadding="5">
56
  <tr align="center">
 
50
 
51
  ## Benchmarks 📈
52
 
53
+ Since there are still many different metrics that need to be tested, we chose a quick and simple metric first to guide the development of our model. Our metric is inspired by Lavy[4]. For the time being, we are using GPT-4 to evaluate the quality of answers on two datasets: OpenViVQA and ViTextVQA. Detailed results can be found at the provided [here](https://huggingface.co/datasets/5CD-AI/Vintern-1B-v2-Benchmark-gpt4o-score). The inputs are images, questions, labels, and predicted answers. The model will return a score from 0 to 10 for the corresponding answer quality. The results table is shown below.
54
 
55
  <table border="1" cellspacing="0" cellpadding="5">
56
  <tr align="center">