Edit model card

Thai-TrOCR Model

Introduction

ThaiTrOCR is a fine-tuned version of the TrOCR base handwritten model, specifically crafted for Optical Character Recognition (OCR) in both Thai and English. This multilingual model adeptly processes handwritten text-line images in both languages, leveraging the TrOCR architecture, which combines a Vision Transformer encoder with an Electra-based text decoder. Designed to be compact and lightweight, ThaiTrOCR is optimized for efficient deployment in resource-constrained environments while achieving high accuracy in character recognition.

  • Encoder: TrOCR Base Handwritten
  • Decoder: Electra Small (Trained with Thai corpus)

Training Dataset

  • pythainlp/thai-wiki-dataset-v3
  • pythainlp/thaigov-corpus
  • Salesforce/wikitext

How to Use

Here’s how to use this model in PyTorch:

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# Load processor and model
processor = TrOCRProcessor.from_pretrained('openthaigpt/thai-trocr')
model = VisionEncoderDecoderModel.from_pretrained('openthaigpt/thai-trocr')

# Load an image
url = 'your_image_url_here'
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

# Process and generate text
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)

Model Performance Comparison

The table below summarizes the performance metrics of various models across different document types, based on the adjusted mean score:

Document Type ThaiTrOCR EasyOCR Tesseract
Handwritten 0.190034 0.410738 1.032375
PDF Document 0.057597 0.085937 0.761595
PDF Document (EN-TH) 0.053968 0.308075 1.061107
Real Document 0.147440 0.293482 0.915707
Scene Text 0.134182 0.390583 2.408704
Adjusted Mean 0.123600 0.298474 1.269101

Notes

  • The CER metric indicates that lower scores reflect better performance.
  • Tesseract supports only one language at a time; this benchmark uses only Thai.
  • Benchmarking was performed on a Google Colab CPU task.
  • The evaluation dataset is sourced from the openthaigpt/thai-ocr-evaluation.

Sponsors

Sponsors

Authors

Downloads last month
58
Safetensors
Model size
103M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.