Fine-Tuning PaddleOCR’s Recognition Model For Dummies by A Dummy

Anush Somasundaram
10 min readMar 8, 2024

There can only be three possible reasons for you to have landed on this article. The first of which could be that I sent you the link and you’re visiting out of pity. The second most likely reason is that like me you’re frustrated with the Paddle OCR documentation and even after hours of breaking your head on it, you’ve not been able to start finetuning the Paddle OCR recognition model. The third possible reason is that you’ve just now started researching the procedure to finetune paddle OCR, if you belong to this category I seriously envy you.

Let’s lay a path for you to finetune Paddle OCR with ease:

  1. Create the first of two conda environments with python3.8 (One for training and one for inferring…. don’t question me on why this is necessary … I found that this works best).
  2. Download the right version of paddlepaddle based on your GPU setup.
  3. Install the PaddleOCR library directly from the GitHub repository.
  4. Create you’re dataset and annotations files.
  5. Download the required pre-trained weights from the Paddle OCR GitHub repository and set it up to fine-tune.
  6. Configure your YAML file with the required training parameters.
  7. Fine Tuning the model.
  8. Export to inference model.
  9. Create the second conda environment with Python 3.8 for the Paddle OCR API and Pip install Paddle OCR.
  10. Write an inference script to test out your fine-tuned model.

And that’s about it, let’s look at each step in detail now.

1. Creating the first of two conda environments

OK, let’s not get into what conda is, if you need help with that just google it, there are thousands of articles that will guide you through the process of downloading and setting up an anaconda environment.

conda  create  --name paddleocr python=3.8

2. Download PaddlePaddle

Now that we have our conda environment created let’s now pip install paddlepaddle. Now based on your GPU configuration and CUDA version (if you have a Nvidia GPU), you will have to select the right version of Paddlepaddle from this website. (https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html)

I’m using a Mac so the pip install command I require is:

python -m pip install paddlepaddle==2.6.0 -i https://mirror.baidu.com/pypi/simple

3. Download the PaddleOCR package

The next step is to navigate to your python3.8 site-packages folder in the conda environment folder. Once you’re in that directory, we can git clone Paddle OCR into the site-packages folder from the Paddle OCR git repository.

cd anaconda3/envs/paddleocr/lib/python3.8/site-packages
git clone https://github.com/PaddlePaddle/PaddleOCR

After the installation is completed, move into the PaddleOCR directory inside the python3.8 site-packages folder and pip install all the requirements of the PaddleOCR library with the help of the following command.

cd PaddleOCR
pip3 install -r requirements.txt

With this, we have completed the conda environment set up for Fine-tuning Paddle OCR.

4. Create your dataset and annotations file

I’m assuming you have a dataset that you want to finetune PaddleOCR on. Split the dataset into 3 parts, train test, and eval (In whatever ratio that your heart desires, but preferably 80:10:10). Create three separate folders with the train, test, and eval images. The next step is to create CSVs for each set of data. The CSV should contain two columns, the first of which contains the path to the image, and the second will contain the ground truth or the text present in that image.

Label CSV files with path to image and ground truth

Now if you think you’re done with prepping the dataset, think again my friend. Paddle OCR only takes annotations in a Txt file, where the path of the image and the ground truth are separated by a tab. Now you might say, “You moron, why didn’t we just do that in the first place instead of creating a csv”. Trust me on this, the only way the PaddleOCR training script will run without any errors is when you generate the txt file with the help of a script called gen_label.py in the PaddleOCR/ppocr/utils folder in the PaddleOCR package. Now remember you have to run this script the same number of times as the number of splits in your data, as you need to generate annotations for all the splits in your data.

To run the script, just go into ppocr/utils in the PaddleOCR directory and run:

python gen_label.py --mode="rec" —input_path={path to csv file} —output_label=(folder to output txt.file}

Replace the flower brackets in the command with the required paths on your desktop. Upon running the script a .txt file should pop up with the required labels/annotations that we can feed into the PaddleOCR training script and it should run without errors. Do this for all the CSV files, train, test, and val. This is probably the most tedious part of the entire process but we’re almost there so hang on.

gen_label.py terminal output

The content of the generated txt file will look like this:

Generated Txt file

This completes the process of setting up the data set.

4.1 Specifying a dictionary

Based on you’re requirements you might want to specify a custom dictionary, to do that all you have to do is create a Txt file and specify the characters you need. Make sure each character gets its own line. For example, a dictionary of just numbers would look like this:

Custom Dictionary

This step is not compulsory as you can use the default dictionaries in PaddleOCR as well.

This ends the process of setting up the dataset and dictionary for training.

5. Downloading the pre-trained weights of the model that is required to be finetuned.

PaddleOCR has many pre-trained recognition models for different languages and character sets. Please select the model that will suit your use case the best. The models list can be found in the Git Repository, now remember there are both detection and recognition models, make sure to choose the right one. For the sake of this tutorial, we’re going to use the en_ppocr_v3_rec model. It is an English character recognition model and it can be downloaded from the git repo. Before downloading the model, you will have to create a folder called pretrain_model in the PaddleOCR directory inside site-packages.

Create directory

Once you have created the folder navigate into the folder and then download the required pre-trained model into that folder using wget, if you’re copying the link from this page always copy the trained model link and not the inference model link (in the case of fine-tuning).

Model List screenshot from PaddleOCR Git Repo
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar
tar -xf en_PP-OCRv3_rec_train.tar && rm -rf en_PP-OCRv3_rec_train.tar

Now that we have our pre-trained model downloaded we can get our configuration file set up.

6. Configure your YAML file with the required training parameters

Every PaddleOCR model available on the git repository also has a default yml file, download the yml file, and make the required adjustments. You can change parameters like number of epochs, learning rate, GPU specification, the number of epochs after which the model state is saved, etc. At first glance, it is pretty evident what parameters can be messed around with on the first glance of the default yml file. You will also have to provide the path to your train data and train labels under the train section, Similarly, you have to provide the path to the evaluation data and the evaluation labels in the evaluation section. This file will also contain the architecture of the model but do not touch any of that. Let’s look at all the lines you will have to change in this file in order for you to run the training script.

Screen shot of config file

1. save_model_dir -> Path to the folder where you want to save the checkpoints to the fine-tuned model.

2. epoch_num -> Number of epochs.

3. eval_batch_step → Specify after how many epochs you want to run evaluation.

4. character_dict_path(if you have a custom character dictionary) → Specify the path to your custom character dictionary (ref. step 4.1).

5. use_space_char -> This option varies depending on the type of data you have.

6. save_res_path -> The location where you want to store your results.

Screen shot of config file

7. data_dir and label_file_list -> Path to your data and labels file (change these parameters for both the train and eval sections).

Make sure your batch size is less than the number of images in your dataset.

With this we have our config file ready, and finally we can move on to training the model.

7. Fine Tuning the Model (Yipee! Finally!):

Finally, all the prep for Fine-tuning the model is done. Surprisingly this is probably the second the easiest step in the entire process. Make sure you’re in the PaddleOCR directory, from here just change the paths in the below command and hit enter. The script should start running.

python3 tools/train.py -c {path to config file} -o Global.pretrained_model= {path to pretrained model}/best_accuracy

The command on my machine looks like this:

python3 tools/train.py -c /Users/software/anaconda3/envs/paddleocr/lib/python3.8/site-packages/PaddleOCR/pretrain_models/en_PP-OCRv3_rec-2.yml -o Global.checkpoints=/Users/software/anaconda3/envs/paddleocr/lib/python3.8/site-packages/PaddleOCR/pretrain_models/en_PP-OCRv3_rec_train/best_accuracy
Screen shot terminal when Fine Tuning script starts

You will see the architecture and eval info from your config file printed out before the model starts training. If there are any issues or errors, it will be specified right after this bit. To solve most of these issues that might pop up, a small tweak in the config file might do the trick.

This is what the last part of the training log should look like, I’ve just used dummy data for the sake of demonstration, so the numbers you see in the image should not have you worried about the performance you will achieve from Fine Tuning the model.

Best metrics, path to model and other info provided by script

The best model weights along with the checkpoints would be saved in the location you specified in the config file.

WARNING: Make sure you have enough space in you’re machine, ideally 10+ gigs of free space, if you do not have a ton of space, consider increasing the save epoch parameter in the config file so that you don’t end up with a disk full error.

8. Exporting the Fine Tuned model to an inference model

To use the fine-tuned model, we will have to export the model to an inference model, this can be done with a simple command, but make sure you’re in the PaddleOCR directory in site-packages.

python3 tools/export_model.py -c {path to yml file inside the fine tuned model folder}  -o Global.pretrained_model={path to model folder} Global.save_inference_dir={path to inference model folder}

In my case the command is :

python3 tools/export_model.py -c /Users/software/anaconda3/envs/paddleocr/lib/python3.8/site-packages/PaddleOCR/pretrain_models/model/config.yml -o Global.pretrained_model=/Users/software/anaconda3/envs/paddleocr/lib/python3.8/site-packages/PaddleOCR/pretrain_models/model,Global.save_inference_dir=/Users/software/anaconda3/envs/paddleocr/lib/python3.8/site-packages/PaddleOCR/pretrain_models/model_inference
Terminal output of export_model.py

Now you should have a new folder (at the path you provided), with the .pdi____ files that are required to infer with the model.

Inference Model Folder

9. Creating a second Conda environment for infereing with the model.

Pretty much the same as the first step except instead of git cloning the PaddleOCR repository, we’re just gonna pip install PaddleOCR.

conda  create  --name paddleocr-inference python=3.8
conda activate paddleocr-inference
python -m pip install paddlepaddle==2.6.0 -i https://mirror.baidu.com/pypi/simple
pip install paddleocr

That’s pretty much it, all we have to do now is write a script to infer with the fine-tuned model.

10. Write an inference script to test out the fine-tuned model.

Just copy the code below and store it in a file called test.py.

from paddleocr import PaddleOCR,draw_ocr

ocr = PaddleOCR(rec_model_dir="/Users/software/anaconda3/envs/paddleocr/lib/python3.8/site-packages/PaddleOCR/pretrain_models/model_inference",use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = '/Users/software/Desktop/credit_card_number_rec_Dataset/Val/5.png'
result = ocr.ocr(img_path, cls=True)
for idx in range(len(result)):
res = result[idx]
for line in res:
print(line)


# draw result
from PIL import Image
result = result[0]
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
print(txts)

Change the rec_model_dir argument to the path of your fine-tuned model, and the image path to the path of the image you want to perform OCR on.

Now let’s run the script in the paddleocr-inference envrionment.

conda activate paddleocr-inference

python3 /Users/software/Desktop/test.py

Input image :

Test Image

Terminal Output:

PaddleOCR terminal output

OCR reading:

5412751234123456

Now that we are able to infer with our finetuned model, we have successfully completed the process of Fine Tuning Paddle OCR on our custom dataset.

This concludes the process of Fine Tuning the PaddleOCRs recognition model, hope this article helped you get the job done without frying you’re brain.

--

--

Anush Somasundaram

Looking for interesting software projects (ML/DL/NLP anything).