caffe
-based implementation of the Deep Similarity Learning algorithm used in the MM 2014 paper.
Other similarity learning algorithms are still under development.
caffe-sl
is fully compatible with caffe
, please follow the instructions from the project site of caffe
to build caffe-sl
before you start.
# get the MNIST dataset
./data/mnist/get_mnist.sh
# extract images and generate triplets
./examples/mnist_sl/create_mnist.sh
# training
./examples/mnist_sl/train_lenet_naive.sh
caffe-sl
loads data with TripletImageDataLayer
, which works with a triplet list file contains image paths:
caffe-sl $ head examples/mnist_sl/data/train.tri
040846.png 051449.png 041185.png
024899.png 033969.png 039096.png
000520.png 022406.png 006979.png
025207.png 020904.png 020818.png
054660.png 040836.png 023925.png
009412.png 035528.png 003730.png
033029.png 011219.png 017586.png
053240.png 033959.png 007701.png
021132.png 042217.png 015489.png
021732.png 028399.png 031010.png
...
The first column consists of query images, the second column consists of positive images, and the third column consists of negative images.
The example use cosine similarity. The output of the last feature layer is normalized and then feed to the NaiveTripletLossLayer. The NaiveTripletLossLayer splits the features into three parts: query, positive, and negative features and compute hinge loss on triplet:
l(qry, pos, neg) = max{0, margin - S(qry, pos) + S(qry, neg)}
-
TripletImageDataLayer
load image data in triplet manner -
TripletBinaryDataLayer
load binary data in triplet manner -
BinaryDataLayer
similar to ImageDataLayer -
L2NormLayer
l2-normalization -
DotProductSimilarityLayer
element-wise dot-product similarity -
EuclideanSimilarityLayer
element-wise euclidean similarity -
BatchTripletLossLayer
triplet based similarity learning in batch mode -
NaiveTripletLossLayer
triplet based similarity learning in list mode -
PairwiseRankingLossLayer
pair wise learning to rank -
RankAccuracyLayer
ranking accuracy
HDMLLossUpperBoundLayer
Hamming distance metric learning
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.
Check out the project site for all the details like
- DIY Deep Learning for Vision with Caffe
- Tutorial Documentation
- BVLC reference models and the community model zoo
- Installation instructions
and step-by-step examples.
Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.
Happy brewing!
Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.
Please cite Caffe in your publications if it helps your research:
@article{jia2014caffe,
Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
Journal = {arXiv preprint arXiv:1408.5093},
Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
Year = {2014}
}