TRANSFORMING IMAGES TO FEATURE VECTORS

TRANSFORMING IMAGES TO FEATURE VECTORS

I’m keen to explore some challenges in multimodal learning, such as jointly learning visual and textual semantics. However, I would rather not start by attempting to train an image recognition system from scratch, and prefer to leave this part to researchers who are more experienced in vision and image analysis.

Therefore, the goal is to use an existing image recognition system, in order to extract useful features for a dataset of images, which can then be used as input to a separate machine learning system or neural network. We start with a directory of images, and create a text file containing feature vectors for each image.

1. Install Caffe

Caffe is an open-source neural network library developed in Berkeley, with a focus on image recognition. It can be used to construct and train your own network, or load one of the pretrained models. A web demo is available if you want to test it out.

Follow the installation instructions to compile Caffe. You will need to install quite a few dependencies (Boost, OpenCV, ATLAS, etc), but at least for Ubuntu 14.04 they were all available in public repositories.

Once you’re done, run


1

2

make test

make runtest

This will run the tests and make sure the installation is working properly.

2. Prepare your dataset

Put all your images you want to process into one directory. Then generate a file containing the path to each image. One image per line. We will use this file to read the images, and it will help you map images to the correct vectors later.

You can run something like this:


1

find `pwd`/images -type f -exec echo {} \; > images.txt

This will find all files in subdirectory called “images” and write their paths to images.txt

3. Download the model

There are a number of pretrained models publically available for Caffe. Four main models are part of the original Caffe distribution, but more are available in the Model Zoo wiki page, provided by community members and other researchers.

We’ll be using the BVLC GoogLeNet model, which is based on the model described in Going Deeper with Convolutions by Szegedy et al. (2014). It is a 22-layer deep convolutional network, trained on ImageNet data to detect 1,000 different image types. Just for fun, here’s a diragram of the network, rotated 90 degrees:

The Caffe models consist of two parts:

  1. A description of the model (in the form of *.prototxt files)
  2. The trained parameters of the model (in the form of a *.caffemodel file)

The prototxt files are small, and they came included with the Caffe code. But the parameters are large and need to be downloaded separately. Run the following command in your main Caffe directory to download the parameters for the GoogLeNet model:


1

python scripts/download_model_binary.py models/bvlc_googlenet

This will find out where to download the caffemodel file, based on information already in the models/bvlc_googlenet/ directory, and will then place it into the same directory.

In addition, run this command as well:


1

./data/ilsvrc12/get_ilsvrc_aux.sh

It will download some auxiliary files for the ImageNet dataset, including the file of class labels which we will be using later.

4. Process images and print vectors

Now is the time to load the model into Caffe, process each image, and print a corresponding vector into a file. I created a script for that (see below, also available as a Gist):


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

import numpy as np

import os, sys, getopt

# Main path to your caffe installation

caffe_root = ‘/path/to/your/caffe/‘

# Model prototxt file

model_prototxt = caffe_root + ‘models/bvlc_googlenet/deploy.prototxt‘

# Model caffemodel file

model_trained = caffe_root + ‘models/bvlc_googlenet/bvlc_googlenet.caffemodel‘

# File containing the class labels

imagenet_labels = caffe_root + ‘data/ilsvrc12/synset_words.txt‘

# Path to the mean image (used for input processing)

mean_path = caffe_root + ‘python/caffe/imagenet/ilsvrc_2012_mean.npy‘

# Name of the layer we want to extract

layer_name = ‘pool5/7x7_s1‘

sys.path.insert(0, caffe_root + ‘python‘)

import caffe

def main(argv):

    inputfile = ‘‘

    outputfile = ‘‘

    try:

        opts, args = getopt.getopt(argv,"hi:o:",["ifile=","ofile="])

    except getopt.GetoptError:

        print ‘caffe_feature_extractor.py -i <inputfile> -o <outputfile>‘

        sys.exit(2)

    for opt, arg in opts:

        if opt == ‘-h‘:

            print ‘caffe_feature_extractor.py -i <inputfile> -o <outputfile>‘

            sys.exit()

        elif opt in ("-i"):

            inputfile = arg

        elif opt in ("-o"):

            outputfile = arg

    print ‘Reading images from "‘, inputfile

    print ‘Writing vectors to "‘, outputfile

    # Setting this to CPU, but feel free to use GPU if you have CUDA installed

    caffe.set_mode_cpu()

    # Loading the Caffe model, setting preprocessing parameters

    net = caffe.Classifier(model_prototxt, model_trained,

                           mean=np.load(mean_path).mean(1).mean(1),

                           channel_swap=(2,1,0),

                           raw_scale=255,

                           image_dims=(256, 256))

    # Loading class labels

    with open(imagenet_labels) as f:

        labels = f.readlines()

    # This prints information about the network layers (names and sizes)

    # You can uncomment this, to have a look inside the network and choose which layer to print

    #print [(k, v.data.shape) for k, v in net.blobs.items()]

    #exit()

    # Processing one image at a time, printint predictions and writing the vector to a file

    with open(inputfile, ‘r‘) as reader:

        with open(outputfile, ‘w‘) as writer:

            writer.truncate()

            for image_path in reader:

                image_path = image_path.strip()

                input_image = caffe.io.load_image(image_path)

                prediction = net.predict([input_image], oversample=False)

                print os.path.basename(image_path), ‘ : ‘ , labels[prediction[0].argmax()].strip() , ‘ (‘, prediction[0][prediction[0].argmax()] , ‘)‘

                np.savetxt(writer, net.blobs[layer_name].data[0].reshape(1,-1), fmt=‘%.8g‘)

if __name__ == "__main__":

    main(sys.argv[1:])

You will first need to set the caffe_root variable to point to your Caffe installation. Then run it with:


1

python caffe_feature_extractor.py -i <inputfile> -o <outputfile>

It will first print out a lot of model-specific debugging information, and will then print a line for each input image containing the image name, the label of the most probable class, and the class probability.


1

2

3

flower.jpg  :  n11939491 daisy  ( 0.576037 )

horse.jpg  :  n02389026 sorrel  ( 0.996444 )

beach.jpg  :  n09428293 seashore, coast, seacoast, sea-coast  ( 0.568305 )

At the same time, it will also print vectors into the output file. By default, it will extract the layer pool5/7x7_s1 after processing each image. This is the last layer before the final softmax in the end, and it contains 1024 elements. I haven’t experimented with choosing different layers yet, but this seemed like a reasonable place to start – it should contain all the high-level processing done in the network, but before forcing it to choose a specific class. Feel free to choose a different layer though, just change the corresponding parameter in the script. If you find that specific layers work better, let me know as well.

The outputfile will contain vectors for each image. There will be one line of values for each input image, and every line will contain 1024 values (if you printed the default layer). Mission accomplished!

Epilogue

There you have it – going from images to vectors. Now you can use these vectors to represent your images in various tasks, such as classification, multi-modal learning, or clustering. Ideally, you will probably want to train the whole network on a specific task, including the visual component, but for starters these pretrained vectors should be quite helpful as well.

These instructions and the script are loosely based on Caffe examples on ImageNet classification and filter visualisation. If the code here isn’t doing quite what you want it to, it’s worth looking at these other similar applications.

If you have any suggestions or fixes, let me know and I’ll be happy to incorporate them in this post.

时间: 2024-11-05 22:06:23

TRANSFORMING IMAGES TO FEATURE VECTORS的相关文章

OpenCV Tutorials &mdash;&mdash; Feature Matching with FLANN

Extractors of keypoint descriptors in OpenCV have wrappers with a common interface that enables you to easily switch between different algorithms solving the same problem.   DescriptorExtractor::compute Computes the descriptors for a set of keypoints

scikit-learn:4.2.3. Text feature extraction

http://scikit-learn.org/stable/modules/feature_extraction.html 4.2节内容太多,因此将文本特征提取单独作为一块. 1.the bag of words representation 将raw data表示成长度固定的数字特征向量.scikit-learn提供了三个方式: tokenizing:给每个token(字.词,粒度自己把握)一个整数索引id counting:每一个token在每一个文档中出现的次数 normalizing:

python data analysis | python数据预处理(基于scikit-learn模块)

原文:http://www.jianshu.com/p/94516a58314d Dataset transformations| 数据转换 Combining estimators|组合学习器 Feature extration|特征提取 Preprocessing data|数据预处理 1 Dataset transformations scikit-learn provides a library of transformers, which may clean (see Preproce

hog源码分析

http://www.cnblogs.com/tornadomeet/archive/2012/08/15/2640754.html 在博客目标检测学习_1(用opencv自带hog实现行人检测) 中已经使用了opencv自带的函数detectMultiScale()实现了对行人的检测,当然了,该算法采用的是hog算法,那么hog算法是怎样实现的呢?这一节就来简单分析一下opencv中自带 hog源码. 网上也有不少网友对opencv中的hog源码进行了分析,很不错,看了很有收获.比如: htt

【计算机视觉】OpenCV的最近邻开源库FLANN

FLANN介绍 FLANN库全称是Fast Library for Approximate Nearest Neighbors,它是目前最完整的(近似)最近邻开源库.不但实现了一系列查找算法,还包含了一种自动选取最快算法的机制. flann::Index_类 该类模板是最近邻索引类,该类用于抽象不同类型的最近邻搜索的索引. 以下是flann::Index_类的声明: template <typename T> class #ifndef _MSC_VER FLANN_DEPRECATED #e

Image Retrieval Using Customized Bag of Features

This example shows how to create a Content Based Image Retrieval (CBIR) system using a customized bag-of-features workflow. Introduction Content Based Image Retrieval (CBIR) systems are used to find images that are visually similar to a query image.

OpenCV 3.2 FlannBasedMatcher

#include <iostream> #include <string> #include <boost/timer.hpp> #include "opencv2/core/core.hpp" #include "opencv2/features2d/features2d.hpp" #include "opencv2/highgui/highgui.hpp" #include "opencv2/fl

Distinctive Image Features from Scale-Invariant Keypoints(个人翻译+笔记)-介绍

Distinctive Image Features from Scale-Invariant Keypoints,这篇论文是图像识别领域SIFT算法最为经典的一篇论文,导师给布置的第一篇任务就是它.网上找了好多找不到中译本,那就自己动手丰衣足食吧,顺便造福后人,花时间翻译啃下来并做一个笔记在这吧. ---------------------------------------------------------------------------------------------------

11 Python Libraries You Might Not Know

11 Python Libraries You Might Not Know by Greg | January 20, 2015 There are tons of Python packages out there. So many that no one man or woman could possibly catch them all. PyPi alone has over 47,000 packages listed! Recently, with so many data sci