A Text Recognition Augmented Deep Learning Approach for Logo Identification

Moushumi Medhi1โœจ     Shubham Sinha2     Rajiv Ranjan Sahay1    
1. India Institute of Technology Kharagpur, India     2. Intern, IIT Kharagpur

Abstract

Logo/brand name detection and recognition in unstructured and highly unpredictable natural images has always been a challenging problem. We notice that in most natural images logos are accompanied with associated text. Therefore, we address the problem of logo recognition by first detecting and isolating text of varying color, font size and orientation in the input image using affine invariant maximally stable extremal regions (MSERs). Using an off-the-shelf OCR, we identify the text associated with the logo image. Then an effective grouping technique is employed to combine the remaining stable regions based on spatial proximity of MSERs. Deep learning has the advantage that optimal features can be learned automatically from image pixel data. This motivates us to feed the clustered logo candidate image regions to a pre-trained deep convolutional neural network (DCNN) to generate a set of complex features which are further input to a multiclass support vector machine (SVM) for classification. We tested our proposed logo recognition system on logo classes, and a non-logo class obtained by combining FlickrLogos-32 and MICC logo databases, amounting to a total of 23582 training and testing images. Our method yields robust recognition performance, outperforming state-of-the-art techniques achieving 97.8% precision, 95.7% recall and 95.7% average accuracy on the combined MICC and FlickrLogos-32 datasets and a precision of 98.6%, recall of 97.9% and average accuracy of 99.6% on only the FlickrLogos-32 dataset.

Method Overview



MSER Detection
MSERs detection and filtering: (a) Original image. (b) Detected MSERs. (c) MSER filtered image using shape parameters.
MSER Detection
Candidate regions are extracted from grayscale images using Maximally Stable Extremal Regions (MSER). Shape properties like solidity, extent, and eccentricity help filter meaningful regions of interest while removing background noise.
Text Region Proposals
Bounding boxes are created around detected MSERs. Overlaps are reduced using Jaccard similarity and graph-based clustering, producing compact proposals for potential text regions.
Text Region Proposals
(a) Detected MSER. (b) Pixels of the detected MSER enclosed by its convex envelope. (c) Binary convex hull image with pixels within the hull set to 1. (d) Bounding box of the binary convex hull image. (e) Bounding box for the MSER in the original image. (f) MBRs of all the filtered MSERs. (g) Elimination of inner MBRs and merging of partially overlapping MBRs. (h) Graph of the connected components of MSER regions.
Text Detection
Stages of text detection: (a) Character proposals. (b) Detected characters. (c) Chain of overlapping characters. (d) Word level detection. (e) OCR text recognition on full image. (f) OCR text recognition on our detected text regions.
Text Detection
A pre-trained CNN (AlexNet) extracts features from text region proposals. These are classified using an SVM, and OCR is applied to recognized words, improving robustness in logo-related text detection.
Logo Detection & Classification
Remaining non-text MSERs are clustered into logo candidates. AlexNet-based deep features (4096-d) are extracted and classified using an SVM into logo classes, achieving high recognition accuracy.
Logo Classification
(a) Architecture of the pre-trained Alexnet used in our work. (b) Features extracted by several hidden layers of the pre-trained CNN from sample input logo images.

Experiments

Precision

0%
MICC + Flickr-32

Recall

0%
MICC + Flickr-32

F-score

0%
MICC + Flickr-32

Accuracy

0%
MICC + Flickr-32

Precision

0%
Flickr-32

Recall

0%
Flickr-32

F-score

0%
Flickr-32

Accuracy

0%
Flickr-32

Text Recognition Augmentation Impact

Without OCR

0%
Base CNN+SVM Method

With OCR

0%
Enhanced Method

Accuracy and ROC Analysis

Our logo classification method achieved an average accuracy of 95.74% using the proposed CNN + SVM classifier alone. However, since many logos in natural images often appear alongside brand-related text, we integrated text identification into the pipeline. This augmentation helped recover cases where the base classifier failed or where logo candidate regions were not generated correctly, resulting in an improved overall accuracy of 97.17%.

ROC curves for 32 logo classes show near-ideal separation, reflecting high precision and recall across the dataset.
Accuracy Plot
Accuracy plot of the proposed logo detection algorithm.

ROC Curve
ROC curve.


Experiments
Assessment of the proposed method using three different types of logo images. (a) Word formation and subsequent OCR application. (b) Logo candidate generation. (c) Final output images.


Method Year Precision Recall F-Measure
Romberg et al. [3] 2011 0.982 0.610 0.752
Revaud et al. [18] 2012 ≥ 0.980 0.726 0.834
Romberg et al. [19] 2013 0.999 0.832 0.908
Farajzadeh [20] 2015 0.931 0.857 0.892
Iandola et al. [9] - AlexNet 2015 0.735 Not reported Not reported
Liu et al. [21] 2016 0.962 0.864 0.910
Oliveira et al. [11] - Caffenet 2016 0.928 0.891 0.909
Proposed method
(only FlickrLogos-32)
โ€“ 0.986 0.979 0.982
While earlier methods made valuable progress, our approach delivers a clear boost in both precision and recall.



Limitations
A failure case.
Limitations

Our algorithm may fail on blurred images or when adjacent text lines (noise) are misclassified as part of the logo. It also cannot handle purely text-based logos (e.g., FEDEX, GOOGLE), which are suppressed as regular text and therefore excluded from our database.

Related Works on Text Detection in GUI Applications can be found via this [๐Ÿ“ฃ Text Detection in GUI Applications -- GitHub]

BibTeX


@InProceedings{10.1007/978-3-319-68124-5_13,
author="Medhi, Moushumi
and Sinha, Shubham
and Sahay, Rajiv Ranjan",
title="A Text Recognition Augmented Deep Learning Approach for Logo Identification",
booktitle="Computer Vision, Graphics, and Image Processing",
year="2017",
pages="145--156"
}