Text recognition, commonly referred to as optical character recognition (OCR), is a widespread technology used for recognizing text in images or documents; in our case it is from a live camera feed. OCR has many uses in everyday life, such as automated data-entry, identification, invoice checking, document copying and handwriting recognition, amongst many other things.
The text recognition API for Google’s can recognize text written in any Latin-based character set. For more information on this specific API, check out the overview on Google’s ML Kit website here.
Text Recognition on the VIA VAB-950
We have implemented text recognition on the VIA VAB-950 using Google’s ML Kit. This API allows you to detect text in almost any context, be that traffic signs, on a computer screen, from a pdf document or from written letters, and recognizes them in order to recreate the word as a usable entity on our machine. In this case, boxes are formed around words and phrases, and labelled accordingly. Check out the GitHub here in order to download it for yourself, or read our tutorial on installing Google’s ML Kit here. This is alongside the CameraX API, and the physical camera used is the MIPI CSI-2 Camera.
There are three main classes used within this API that are fundamentally important to how the app works. These three are “CameraXLivePreviewActivity”, “TextGraphic” and “TextRecognitionProcessor”, with the latter arguably being the most important, as this is where the main processes of taking a camera preview and running it through a TextRecognizer object happens.
“CameraXLivePreviewActivity” is where the CameraX use cases (Preview and Analysis) are set up. You can find a more detailed explanation of this class in the face detection blog here, but essentially it provides a bridge between the live feed coming from the camera and the main logic of the program, the main methods here being “bindAnalysisUseCase” and “bindPreviewUseCase”, where the later provides the back-end for the user’s image preview and the former acting as the bridge. Within “bindAnalysisUseCase”, the case for when using text recognition is shown here:
Simpler than the face detection case, this case need only set the imageProcessor to a “textRecogntionProcessor”, nothing more.
The “Log.i” process simply adds the process to your device logs, for this program it is not strictly important.”
Depending on which imageProcessor is selected, the code will process the input image from the Preview (which happens to be an ImageProxy):
Then, the two use case methods “bindAnalysisUseCase” and “bindPreviewUseCase” are wrapped together by the “bindAllCameraUseCases” method. Next is the “Text RecognitionProcessor” class. As mentioned before, it is arguably the most important, but is also the simplest. Most of the actual recognising is done “under the hood”, on Google’s part, but this class instantiates a “TextRecognizer” object from the text recognition API, and then calls the “.process” method, using the live image feed as input.
The third predominantly-used class is the “TextGraphic” class. This is what contains the methods for drawing boxes around detected text and displaying what it believes the text to be in conjunction with the boxes. As with the code for the face detection graphic, the main method is the “draw” method, which takes a Canvas object as input. Within this, there are some nested “for” loops. The first “for” loop is for a text box, the second “for” loop (nested within the first) is for the lines within the text box, and the third “for” loop (nested within the second) is for the elements within the lines. Most of the drawing is done in the second “for” loop. Initially, a bounding box is drawn round the text block, and then if the image is flipped, the left will be translated to the right, and vice versa. The text and the boxes are then rendered on the canvas using the “drawText” method. The code for this is shown here:
The algorithm seems to be very accurate with words printed on a screen, but slightly less with handwriting recognition. Visit our website’s news section to find all the latest updates and tutorials on the VIA VAB-950!