Detecting People in Photographs Using Skin Tone
SAN FRANCISCO, CA — As a Data Scientist at OpenTable, my computer screen often fills up with images of scrumptious food items (effectively keeping my metabolic rate on a high gear!). Many of these photographs are professionally or semi-professionally taken by food photographers or enthusiasts (aka FoodSpotters!).
Annoyingly, a lot of photos, especially those from social media channels, come with portraits of eaters posing with the eaten, such as the one shown here. Of course, one could use face detection algorithms, and other sophisticated techniques to weed out these photos. These techniques often require a lot of overhead and dependency on external libraries. Also, there may not even be a face in the photo to detect, but a hand or some portion of the torso might be showing. Over the weekend, I have been thinking about very fast ways of finding a human subject in a photograph, so that I could generate some quick features for classifying photos. It turns out that one promising way to do that could be to detect human skin pixels in the photos.
Detecting skin pixels
A quick digging into the subject of detecting skin pixels revealed a rich literature on this subject. As this is a blog post and not a review paper, I will only describe the bits I used, and leave the reader with this paper or this one as an entry point into this subject. The fundamental concept behind pixel based skin detection is that the color of human skin (across various races and ethnicities) occupies a very tight region in the space of colors. In brief, there are three main ways to detect skin pixels:
- Explicit Skin Model Based Method: This class of methods try to use machine learning to find the best colorspace and a simple decision rule to define the boundaries the skin cluster in that colorspace.
- Non-parametric Methods: The key idea here is to estimate skin color distribution from a training data without deriving an explicit model of the skin color, e.g. a Naive-Bayes classifier. A skin/non-skin training data set can be found here.
- Parametric Methods: Here, one models the skin color distributions as parameterized probability distributions, such as Gaussians, or mixtures of Gaussians.
As a first stab, I decided to go with method (1) above. A little more investigation led me to the Gomez and Morales (2002) paper where the authors used a constructive induction algorithm which produces a single rule that defines the skin color boundary conditions in the RGB colorspace. Their method leads to an extremely simple rule that goes as follows:
- Extract R, G, B pixel values from the input image.
- Normalize: , and , so that for each pixel.
- Next, generate three quantities (note that $(r+g+b)$ being unity is redundant here, but the authors probably left it in to make the normalization explicit):
- Finally, a pixel is categorized as “skin” if the pixel satisfies all of the following three conditions:
, and .
Note that these rules were based on the training set available when the paper was written, and I should probably be regenerating the rules with more training examples available now. But, the color of skin has not changed significantly in 10 years, so lets continue!
Here is a quick Python implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13
which gives us the following result, where white represents pixels labeled as skin (the original image is also below for easy comparison):
As you can see, this one simple rule does remarkably well in isolating skin pixels.
Here is another example: