These days, all modern digital
cameras include a sensor that detects which way the camera is being held when a photo is taken. This meta-
data is then included in the image file, so that image organization programs know the correct orientation —
i.e., which way is “up” in the image. But for photos scanned in from film or from older digital cameras,
rotating images to be in the correct orientation must typically be done by hand.
The task in this project is to create a classifier that decides the correct orientation of a given image.
Data: The dataset is of images from the Flickr photo sharing website.
The images were taken and uploaded by real users from across the world, making this a challenging task
on a very realistic dataset. We’ll simply treat the raw
images as numerical feature vectors, on which we can then apply standard machine learning techniques. In
particular, we’ll take an n × m × 3 color image (the third dimension is because color images are stored as
three separate planes – red, green, and blue), and append all of the rows together to produce a single vector
of size 1 × 3mn. Two ASCII text files, one for the training dataset and one for testing contain the feature vectors. The training dataset consists of about 10,000 images, while the test set contains about 1,000. To generate this file, each image has been
to a very tiny “micro-thumbnail” of 8 × 8 pixels, resulting in an 8 × 8 × 3 = 192 dimensional feature vector.
The text files have one row per image, where each row is formatted like:
photo_id correct_orientation r11 g11 b11 r12 g12 b12 ...
where:
• photo id is a photo ID for the image.
• correct orientation is 0, 90, 180, or 270.
• r11 refers to the red pixel value at row 1 column 1, r12 refers to red pixel at row 1 column 2, etc.,
each in the range 0-255.
To run the code for training:
./orient.py train train_file.txt model_file.txt [model]
where [model] is one of nearest, adaboost, or nnet. This program uses the data in train file.txt to
produce a trained classifier of the specified type, and save the parameters in model file.txt.
./orient.py test test_file.txt model_file.txt [model]
The program loads in the trained
parameters from model file.txt, runs each test example through the model, displays the classification accuracy
(in terms of percentage of correctly-classified images), and outputs a file called output.txt which indicates the
estimated label for each image in the test file. The output file cooresponds to one test image per line,
with the photo id, a space, and then the estimated label.
Here are more detailed specifications for each classifier.
• nearest: At test time, for each image to be classified, the program finds the k “nearest” images
in the training file, i.e. the ones with the closest distance (least vector difference) in Euclidean space,
and have them vote on the correct orientation.
• adaboost: Use very simple decision stumps that simply compare one entry in the image matrix to
another, e.g. compare the red pixel at position 1,1 to the green pixel value at position 3,8.
• nnet: A fully-connected feed-forward network to classify image orientation and implements
the backpropagation algorithm to train the network using gradient descent.