for large-scale retrieval applications
Frederick Tung Julieta Martinez Holger H. Hoos James J. Little
University of British Columbia
{ftung,julm,hoos,little}@cs.ubc.ca
Abstract R={ , , }
We explore a novel paradigm in learning binary codes
Database ( y1 , ), ( y2 , ) , ( y3 , ), ( y4 , )
for large-scale image retrieval applications. Instead of
learning a single globally optimal quantization model as
in previous approaches, we encode the database points in
a data-specific manner using a bank of quantization mod- Rotate , sgn
Query xq yq Compute dhamm(yq , y3)
els. Each individual database point selects the quantization
model that minimizes its individual quantization error. We
apply the idea of a bank of quantization models to data- Rotate , sgn
yq Compute dhamm(yq , y2),
independent and data-driven hashing methods for learn- dhamm(yq , y4)
ing binary codes, obtaining state-of-the-art performance on
Rotate , sgn
three benchmark datasets.
yq Compute dhamm(yq , y1)
Figure 1. Toy example illustrating an online query with a bank
of random rotations. Distances are computed adaptively, using
1. Introduction each database point’s selected rotation. Note that we perform K
rotations but only the usual n Hamming distance computations
The continuing growth of online image and video collec- (where n is the number of database points).
tions has motivated many advances in algorithms for large-
scale recognition and retrieval. The goal of large-scale re-
trieval algorithms is to support efficient and accurate query- supported directly in modern hardware. Techniques for fast
ing into the collection under memory and time constraints. exact search in Hamming space [17] can further accelerate
Methods for large-scale retrieval may be divided into two retrieval. Finally, hashing methods produce binary codes
main groups: hashing methods [3, 4, 6, 9, 12, 13, 15, 20, 21, that can be used directly in downstream clustering or clas-
22, 23] and lookup-based methods [2, 7, 8, 10, 16]. Hashing sification modules, which is not possible with lookup ta-
methods learn a mapping from high-dimensional feature de- ble methods [5]. In this paper, we propose a novel hash-
scriptors to compact binary codes such that the locality rela- ing paradigm in learning binary codes for large-scale re-
tionships in the original feature space are closely preserved trieval applications. Previous hashing methods for learn-
in the reduced Hamming space. In large-scale applications, ing binary codes optimize for a single global quantization
binary codes greatly reduce both memory requirements and model. In contrast, we employ a bank of multiple quan-
query time. Lookup-based methods adaptively quantize the tization models, and encode the database points in a data-
feature space and create a lookup table from cluster cen- specific manner. Each individual database point selects the
troids to full-dimensional feature descriptors. quantization model that minimizes its individual quantiza-
Which group of methods to choose is application depen- tion error. In other words, we take quantization error min-
dent. State-of-the-art lookup-based methods tend to achieve imization to the extreme case and quantize the database in
higher recall at the cost of longer query times [6]. Hash- a data point specific manner. The framework can accom-
ing methods can take advantage of hardware-accelerated modate both data-independent (Section 3.1) and data-driven
Hamming distance computation: the Hamming distance be- (Section 3.2) models. We show that the proposed approach
tween two binary codes can be computed by performing yields state-of-the-art hashing performance on three stan-
an XOR and counting the ones, which is a fast operation dard retrieval benchmarks (Section 4).
, 2. Related work zation error (Eq. 1). However, quantization error may
still be high for individual data points.
Retrieval performance has been shown to be closely re-
lated to quantization error [2, 4, 6]. Therefore, it is useful 2. For bit ranges of 32 to 128, several bits can be dis-
to view retrieval approaches as lossy compression, whose carded with minimal effects on quantization perfor-
objective is to reduce the quantization error. For a set of mance.
d-dimensional vectors X = {x1 , x2 , . . . , xn } and a set of
m cluster centers (i.e., codebook) C = {c1 , c2 , . . . , cm } the The first observation motivates the search for multiple mod-
objective is to reduce the quantization error by solving els to further reduce quantization error. The second obser-
vation leads us to allocate bits from the quantization model
X
E = min kc(x) − xk2 (1) to index the best model per data point.
C Formally, our objective function is
x∈X
where c is a quantizer function that maps x to its nearest
X
cluster center in the codebook. Without any restrictions, E = min kc(x) − xk2
the above function can be heuristically optimized by the k- C
x∈X
means algorithm, which iteratively optimizes cluster loca- (3)
C = {Ci | i = 1 . . . 2k }
tions and assignments. Although k-means exhibits better
performance as the number of clusters increases, the algo- c ∈ Ci = {c | c · Ri = {−a, a}d−k }, Ri⊤ · Ri = I
rithm is very hard to scale, as the required computation is
quadratic in the number of clusters. Moreover, for k bits one where i indicates the index of the quantization model as-
would like to express 2k clusters in order to achieve maxi- signed to the data point x. Note that we have 2k models,
mal compression, which makes the algorithm infeasible for use k bits to index them and therefore are still able to ex-
large values of k. press 2k (2d−k ) = 2d cluster centers in total. However, the
It has been noticed, however, that the above formulation orthogonality constraint imposed by Ri does not span all the
can be made tractable under certain restrictions. For exam- dimensions; rather, we have 2k subspaces of d − k dimen-
ple, product quantization (PQ) [8] looks for orthogonal sub- sions each. This union of orthogonal models can be seen as
spaces that, taken together, can express a superlinear num- a generalization of the orthogonality constraint of ITQ [4].
ber of clusters. However, the method still requires a lookup We now apply the simplest instantiation of our bank
table. On the other hand, the iterative quantization (ITQ) of quantization models idea to create an effective data-
method [4] minimizes the objective independent hashing method.
3.1. Data-independent hashing: Bank of random
X rotations (BRR)
2
E = min kc(x) − xk ,
C
x∈X (2) The simplest instantiation of our idea is inspired by the
d ⊤ observation that a random rotation of the cluster centers is a
with c ∈ C = {c | c · R = {−a, a} }, R · R = I.
remarkably strong benchmark in both PQ and ITQ. There-
Since the distance between codewords remains constant, fore, we propose to instantiate Eq. 3 by generating a collec-
the resulting codes can be efficiently compared using Ham- tion of K = 2k random rotations R = {R1 , R2 , ..., RK }.
ming distance. Intuitively, ITQ finds a rotation R of the We encode the database as follows. We first preprocess
PCA-projected, zero-centred data that minimizes the quan- the database points by zero-centering and PCA embedding.
tization error when mapping the data points to the closest Next, for each individual data point x we find R∗ ∈ R that
corners of a binary hypercube, which represents 2d clus- satisfies
ter centers. This rotation is determined iteratively by alter-
nating between updating the assignments given the current R∗ = argmaxkxRk1 (4)
rotation, and updating the rotation to minimize the quan- R∈R
tization error given the current assignments. The iterative where || · || is the L1 norm (for space reasons we omit a
optimization is initialized using a random rotation. proof showing that this has the equivalent effect as select-
ing based on individual quantization error). Encoding must
3. Proposed approach be performed by assigning x to its nearest cluster center.
Two observations motivate the bank of quantization Due to the orthogonality constraint, this is achieved by sim-
models. ply thresholding at zero: y = sgn(xR∗ ). Finally, we store
the pair (y, j), where j is the index of R∗ in the bank of
1. Most data-driven hashing methods try to solve for the random rotations R. That is, j identifies the rotation used
quantization model that minimizes the global quanti- to produce the binary code y. Note that we allocate bits