OCR: Optical Character Recognition

Optical character recognition for translation.

OCR stands for optical character recognition. OCR software is used to convert files such as PDFs or faxes to editable documents such as Microsoft Word. This is done prior to translation work undertaken.

OCR can also be used to recreate documents when you have lost your originals.

Conversion

Why do we need to convert the files?

PDF files are not editable files. They are not the original source files. If we can’t edit the file, we can’t translate it. So we need to convert them into a format we can edit and modify. This conversion process can be done with OCR. OCR software allows PDF files to be converted into Microsoft Word files for translation. It can also convert faxes to editable formats.

What is OCR?

The Translation Process

Before translation begins we have to convert the file, including the layout using Desktop Publishing (DTP) software and OCR software. In effect, we recreate your file in Microsoft Word, layout, text, graphics everything. We then review the document to fix any sentence errors or segment errors to ensure the translated document will be perfect for you. It’s a very labour intensive process. Once completed it makes the translation process much easier for the translator.

What are the different ways to convert a file?

Depending on how the PDF was produced, if the document contains only text that can be selected, the text can be copied and pasted into a word document. Some PDFs have security features that prevent copying and pasting.

If the PDF is a scanned document or the text cannot be selected, then the use of OCR software is needed. The OCR tool scans each character as an image and tries to convert it into an editable character in Word. It’s about 95% accurate and does a great job. Keeping the layout is the hard part.

Why is it better to avoid OCR?

Cost and time is the main reason to avoid using OCR. Converting a PDF file takes time; it will impact your delivery schedule. The extra workload to convert and check documents means we have to charge conversion costs to do this work.

Quality is another reason; using the conversion process can often reduce the quality of your final documents. Images can degrade slightly depending on the resolutions used. We deliver great quality but can seldom match the quality of an original source document using OCR. It is always better to work with original sources files for the highest quality output.