khang119966 commited on
Commit
37f825a
β€’
1 Parent(s): 00f78b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -28,12 +28,12 @@ tags:
28
  <img src="Vintern_logo.png" width="700"/>
29
  </div>
30
 
31
- [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/khang119966/Vintern-v2-Demo)
32
-
33
  ## Vintern-1B-v2 ❄️ (Viet-InternVL2-1B-v2) - The LLaVA πŸŒ‹ Challenger
34
 
35
  We are excited to introduce **Vintern-1B-v2** the Vietnamese πŸ‡»πŸ‡³ multimodal model that combines the advanced Vietnamese language model [Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct)[1] with the latest visual model, [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px)[2], CVPR 2024. This model excels in tasks such as OCR-VQA, Doc-VQA, and Chart-VQA,... With only 1 billion parameters, it is **4096 context length** finetuned from the Viet-InternVL-1B model on over 3 million specialized image-question-answer pairs for optical character recognition πŸ”, text recognition πŸ”€, document extraction πŸ“‘, and general QA. The model can be integrated into various on-device applications πŸ“±, demonstrating its versatility and robust capabilities.
36
 
 
 
37
  ## Model Details
38
 
39
  | Model Name | Vision Part | Language Part |
 
28
  <img src="Vintern_logo.png" width="700"/>
29
  </div>
30
 
 
 
31
  ## Vintern-1B-v2 ❄️ (Viet-InternVL2-1B-v2) - The LLaVA πŸŒ‹ Challenger
32
 
33
  We are excited to introduce **Vintern-1B-v2** the Vietnamese πŸ‡»πŸ‡³ multimodal model that combines the advanced Vietnamese language model [Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct)[1] with the latest visual model, [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px)[2], CVPR 2024. This model excels in tasks such as OCR-VQA, Doc-VQA, and Chart-VQA,... With only 1 billion parameters, it is **4096 context length** finetuned from the Viet-InternVL-1B model on over 3 million specialized image-question-answer pairs for optical character recognition πŸ”, text recognition πŸ”€, document extraction πŸ“‘, and general QA. The model can be integrated into various on-device applications πŸ“±, demonstrating its versatility and robust capabilities.
34
 
35
+ [**\[πŸ€— HF Demo\]**](https://huggingface.co/spaces/khang119966/Vintern-v2-Demo)
36
+
37
  ## Model Details
38
 
39
  | Model Name | Vision Part | Language Part |