A Text Recognition Augmented Deep Learning Approach for Logo Identification

Abstract

Logo/brand name detection and recognition in unstructured and highly unpredictable natural images has always been a challenging problem. We notice that in most natural images logos are accompanied with associated text. Therefore, we address the problem of logo recognition by first detecting and isolating text of varying color, font size and orientation in the input image using affine invariant maximally stable extremal regions (MSERs). Using an off-the-shelf OCR, we identify the text associated with the logo image. Then an effective grouping technique is employed to combine the remaining stable regions based on spatial proximity of MSERs. Deep learning has the advantage that optimal features can be learned automatically from image pixel data. This motivates us to feed the clustered logo candidate image regions to a pre-trained deep convolutional neural network (DCNN) to generate a set of complex features which are further input to a multiclass support vector machine (SVM) for classification. We tested our proposed logo recognition system on logo classes, and a non-logo class obtained by combining FlickrLogos-32 and MICC logo databases, amounting to a total of 23582 training and testing images. Our method yields robust recognition performance, outperforming state-of-the-art techniques achieving 97.8% precision, 95.7% recall and 95.7% average accuracy on the combined MICC and FlickrLogos-32 datasets and a precision of 98.6%, recall of 97.9% and average accuracy of 99.6% on only the FlickrLogos-32 dataset.

Method Overview

MSERs detection and filtering: (a) Original image. (b) Detected MSERs. (c) MSER filtered image using shape parameters.	MSER Detection Candidate regions are extracted from grayscale images using Maximally Stable Extremal Regions (MSER). Shape properties like solidity, extent, and eccentricity help filter meaningful regions of interest while removing background noise.
Text Region Proposals Bounding boxes are created around detected MSERs. Overlaps are reduced using Jaccard similarity and graph-based clustering, producing compact proposals for potential text regions.	(a) Detected MSER. (b) Pixels of the detected MSER enclosed by its convex envelope. (c) Binary convex hull image with pixels within the hull set to 1. (d) Bounding box of the binary convex hull image. (e) Bounding box for the MSER in the original image. (f) MBRs of all the filtered MSERs. (g) Elimination of inner MBRs and merging of partially overlapping MBRs. (h) Graph of the connected components of MSER regions.
Stages of text detection: (a) Character proposals. (b) Detected characters. (c) Chain of overlapping characters. (d) Word level detection. (e) OCR text recognition on full image. (f) OCR text recognition on our detected text regions.	Text Detection A pre-trained CNN (AlexNet) extracts features from text region proposals. These are classified using an SVM, and OCR is applied to recognized words, improving robustness in logo-related text detection.
Logo Detection & Classification Remaining non-text MSERs are clustered into logo candidates. AlexNet-based deep features (4096-d) are extracted and classified using an SVM into logo classes, achieving high recognition accuracy.	(a) Architecture of the pre-trained Alexnet used in our work. (b) Features extracted by several hidden layers of the pre-trained CNN from sample input logo images.

Experiments

Precision

0%

MICC + Flickr-32

Recall

0%

MICC + Flickr-32

F-score

0%

MICC + Flickr-32

Accuracy

0%

MICC + Flickr-32

Precision

0%

Flickr-32

Recall

0%

Flickr-32

F-score

0%

Flickr-32

Accuracy

0%

Flickr-32

Text Recognition Augmentation Impact

Without OCR

0%

Base CNN+SVM Method

With OCR

0%

Enhanced Method

Accuracy and ROC Analysis

Our logo classification method achieved an average accuracy of 95.74% using the proposed CNN + SVM classifier alone. However, since many logos in natural images often appear alongside brand-related text, we integrated text identification into the pipeline. This augmentation helped recover cases where the base classifier failed or where logo candidate regions were not generated correctly, resulting in an improved overall accuracy of 97.17%.

ROC curves for 32 logo classes show near-ideal separation, reflecting high precision and recall across the dataset.

Accuracy plot of the proposed logo detection algorithm.

ROC curve.

Assessment of the proposed method using three different types of logo images. (a) Word formation and subsequent OCR application. (b) Logo candidate generation. (c) Final output images.

Method	Year	Precision	Recall	F-Measure
Romberg et al. [3]	2011	0.982	0.610	0.752
Revaud et al. [18]	2012	≥ 0.980	0.726	0.834
Romberg et al. [19]	2013	0.999	0.832	0.908
Farajzadeh [20]	2015	0.931	0.857	0.892
Iandola et al. [9] - AlexNet	2015	0.735	Not reported	Not reported
Liu et al. [21]	2016	0.962	0.864	0.910
Oliveira et al. [11] - Caffenet	2016	0.928	0.891	0.909
Proposed method (only FlickrLogos-32)	–	0.986	0.979	0.982

While earlier methods made valuable progress, our approach delivers a clear boost in both precision and recall.

A failure case.

Limitations

Our algorithm may fail on blurred images or when adjacent text lines (noise) are misclassified as part of the logo. It also cannot handle purely text-based logos (e.g., FEDEX, GOOGLE), which are suppressed as regular text and therefore excluded from our database.

BibTeX


@InProceedings{10.1007/978-3-319-68124-5_13,
author="Medhi, Moushumi
and Sinha, Shubham
and Sahay, Rajiv Ranjan",
title="A Text Recognition Augmented Deep Learning Approach for Logo Identification",
booktitle="Computer Vision, Graphics, and Image Processing",
year="2017",
pages="145--156"
}