TRANSFORMING IMAGES TO FEATURE VECTORS

I’m keen to explore some challenges in multimodal learning, such as jointly learning visual and textual semantics. However, I would rather not start by attempting to train an image recognition system from scratch, and prefer to leave this part to researchers who are more experienced in vision and image analysis.

Therefore, the goal is to use an existing image recognition system, in order to extract useful features for a dataset of images, which can then be used as input to a separate machine learning system or neural network. We start with a directory of images, and create a text file containing feature vectors for each image.

1. Install Caffe

Caffe is an open-source neural network library developed in Berkeley, with a focus on image recognition. It can be used to construct and train your own network, or load one of the pretrained models. A web demo is available if you want to test it out.

Follow the installation instructions to compile Caffe. You will need to install quite a few dependencies (Boost, OpenCV, ATLAS, etc), but at least for Ubuntu 14.04 they were all available in public repositories.

Once you’re done, run

1 2	`make` `test` `make` `runtest`

This will run the tests and make sure the installation is working properly.

2. Prepare your dataset

Put all your images you want to process into one directory. Then generate a file containing the path to each image. One image per line. We will use this file to read the images, and it will help you map images to the correct vectors later.

You can run something like this:

1	`find` ``pwd```/images` `-type` `f -exec` `echo` `{} \; > images.txt`

This will find all files in subdirectory called “images” and write their paths to images.txt

3. Download the model

There are a number of pretrained models publically available for Caffe. Four main models are part of the original Caffe distribution, but more are available in the Model Zoo wiki page, provided by community members and other researchers.

We’ll be using the BVLC GoogLeNet model, which is based on the model described in Going Deeper with Convolutions by Szegedy et al. (2014). It is a 22-layer deep convolutional network, trained on ImageNet data to detect 1,000 different image types. Just for fun, here’s a diragram of the network, rotated 90 degrees:

The Caffe models consist of two parts:

A description of the model (in the form of *.prototxt files)
The trained parameters of the model (in the form of a *.caffemodel file)

The prototxt files are small, and they came included with the Caffe code. But the parameters are large and need to be downloaded separately. Run the following command in your main Caffe directory to download the parameters for the GoogLeNet model:

1	`python scripts/download_model_binary.py models/bvlc_googlenet`

This will find out where to download the caffemodel file, based on information already in the models/bvlc_googlenet/ directory, and will then place it into the same directory.

In addition, run this command as well:

1	`./data/ilsvrc12/get_ilsvrc_aux.sh`

It will download some auxiliary files for the ImageNet dataset, including the file of class labels which we will be using later.

4. Process images and print vectors

Now is the time to load the model into Caffe, process each image, and print a corresponding vector into a file. I created a script for that (see below, also available as a Gist):

import numpy as np

import os, sys, getopt

# Main path to your caffe installation

caffe_root = ‘/path/to/your/caffe/‘

# Model prototxt file

model_prototxt = caffe_root + ‘models/bvlc_googlenet/deploy.prototxt‘

# Model caffemodel file

model_trained = caffe_root + ‘models/bvlc_googlenet/bvlc_googlenet.caffemodel‘

# File containing the class labels

imagenet_labels = caffe_root + ‘data/ilsvrc12/synset_words.txt‘

# Path to the mean image (used for input processing)

mean_path = caffe_root + ‘python/caffe/imagenet/ilsvrc_2012_mean.npy‘

# Name of the layer we want to extract

layer_name = ‘pool5/7x7_s1‘

sys.path.insert(0, caffe_root + ‘python‘)

import caffe

def main(argv):

inputfile = ‘‘

outputfile = ‘‘

try:

opts, args = getopt.getopt(argv,"hi:o:",["ifile=","ofile="])

except getopt.GetoptError:

print ‘caffe_feature_extractor.py -i <inputfile> -o <outputfile>‘

sys.exit(2)

for opt, arg in opts:

if opt == ‘-h‘:

print ‘caffe_feature_extractor.py -i <inputfile> -o <outputfile>‘

sys.exit()

elif opt in ("-i"):

inputfile = arg

elif opt in ("-o"):

outputfile = arg

print ‘Reading images from "‘, inputfile

print ‘Writing vectors to "‘, outputfile

# Setting this to CPU, but feel free to use GPU if you have CUDA installed

caffe.set_mode_cpu()

# Loading the Caffe model, setting preprocessing parameters

net = caffe.Classifier(model_prototxt, model_trained,

mean=np.load(mean_path).mean(1).mean(1),

channel_swap=(2,1,0),

raw_scale=255,

image_dims=(256, 256))

# Loading class labels

with open(imagenet_labels) as f:

labels = f.readlines()

# This prints information about the network layers (names and sizes)

# You can uncomment this, to have a look inside the network and choose which layer to print

#print [(k, v.data.shape) for k, v in net.blobs.items()]

#exit()

# Processing one image at a time, printint predictions and writing the vector to a file

with open(inputfile, ‘r‘) as reader:

with open(outputfile, ‘w‘) as writer:

writer.truncate()

for image_path in reader:

image_path = image_path.strip()

input_image = caffe.io.load_image(image_path)

prediction = net.predict([input_image], oversample=False)

print os.path.basename(image_path), ‘ : ‘ , labels[prediction[0].argmax()].strip() , ‘ (‘, prediction[0][prediction[0].argmax()] , ‘)‘

np.savetxt(writer, net.blobs[layer_name].data[0].reshape(1,-1), fmt=‘%.8g‘)

if __name__ == "__main__":

main(sys.argv[1:])

You will first need to set the caffe_root variable to point to your Caffe installation. Then run it with:

1	`python caffe_feature_extractor.py -i <inputfile> -o <outputfile>`

It will first print out a lot of model-specific debugging information, and will then print a line for each input image containing the image name, the label of the most probable class, and the class probability.

flower.jpg : n11939491 daisy ( 0.576037 )

horse.jpg : n02389026 sorrel ( 0.996444 )

beach.jpg : n09428293 seashore, coast, seacoast, sea-coast ( 0.568305 )

At the same time, it will also print vectors into the output file. By default, it will extract the layer pool5/7x7_s1 after processing each image. This is the last layer before the final softmax in the end, and it contains 1024 elements. I haven’t experimented with choosing different layers yet, but this seemed like a reasonable place to start – it should contain all the high-level processing done in the network, but before forcing it to choose a specific class. Feel free to choose a different layer though, just change the corresponding parameter in the script. If you find that specific layers work better, let me know as well.

The outputfile will contain vectors for each image. There will be one line of values for each input image, and every line will contain 1024 values (if you printed the default layer). Mission accomplished!

Epilogue

There you have it – going from images to vectors. Now you can use these vectors to represent your images in various tasks, such as classification, multi-modal learning, or clustering. Ideally, you will probably want to train the whole network on a specific task, including the visual component, but for starters these pretrained vectors should be quite helpful as well.

These instructions and the script are loosely based on Caffe examples on ImageNet classification and filter visualisation. If the code here isn’t doing quite what you want it to, it’s worth looking at these other similar applications.

If you have any suggestions or fixes, let me know and I’ll be happy to incorporate them in this post.

时间： 2024-11-05 22:06:23

TRANSFORMING IMAGES TO FEATURE VECTORS

TRANSFORMING IMAGES TO FEATURE VECTORS

1. Install Caffe

2. Prepare your dataset

3. Download the model

4. Process images and print vectors

Epilogue

TRANSFORMING IMAGES TO FEATURE VECTORS的相关文章

OpenCV Tutorials —— Feature Matching with FLANN

scikit-learn：4.2.3. Text feature extraction

python data analysis | python数据预处理（基于scikit-learn模块）

hog源码分析

【计算机视觉】OpenCV的最近邻开源库FLANN

Image Retrieval Using Customized Bag of Features

OpenCV 3.2 FlannBasedMatcher

Distinctive Image Features from Scale-Invariant Keypoints（个人翻译+笔记）-介绍

11 Python Libraries You Might Not Know