After being inspired by the Computer Vision and Mobile Robotics modules at uni, I decided to build a lego robot using the Mindstorms kit and a webcam that would be able to chase a ball. This page give details about the colour tracker application that I developed in order to guide the robot towards a particular object of interest. The article about the robot can be found here.
This was designed to be a quick and (relatively) simple application, but it was also supposed to be very general so that it could be re-used in other projects. From my comptuer vision course it seemed that the easiest way to identify objects is by colour. This has the obvious problem of there being multiple objects in the scene of the same colour, but to solve that you need to know something about the shape of the object that you wish to track. Edge detection is another method of image segmentation, but then to identify objects you have to follw the edges round and detect the shape to see if it matches the shape of the object you wish to track. Also, in one of my robotics modules at uni I used a robot with a camera to follow objects and the vision system there worked by tracking colour.
So what we need to do it capture images from an imaging device (e.g. a webcam) and segment the image into regions of different colour. As long as we know the colour of the object we want to track we can then select the regions whose average colour falls within a user specified range. Dealing with multiple objects of the same colour is tricky so it was decided that the largest region will be tracked. The method used to perform the region detection is Blob Colouring (covered later).
Once we have identified the region we wish to track we can easily work out the bounding box for the region and get the coordinates. These coordinates then need to be passed to another application to decide what to do. In this particular case a robot was to track a ball, so a remote control client would receive the coordinates from the colour tracker and then instruct the robot to more forwards, left or right as appropriate. To allow other programs to get the coordinates of the object being tracked the vision program implements a multi-threaded internet server allowing clients to connect locally or over the network. They can then query the vision program and get the coordinates of the object being tracked.
3. Blob Colouring
Blob colouring is a technique used to find regions of similar colour in an image. The algorithm works by passing a “backwards L” shaped template over the image from left to right and top to bottom.
The template is used to calculate the distance between the current pixel and the one to the left, and between the current pixel and the one above. The distance between two pixels can be calculated in a number of ways depending on the image:
- For greyscale images the distance is simply the difference of the two grey levels.
- For RGB images the distance is equal to the Euclidean distance in the RGB colour space.
- For HSI images the distance is the difference in hue or intensity.
A pixel is considered to belong to a different region if the distance between the adjacent pixel is greater than a certain threshold. There are four possible outcomes when comparing the current pixel to its left neighbour and the one above:
(d1 > T) and (d2 > T)– Current pixel is different to both neighbours, so assign it a new region.
(d1 < T) and (d2 > T)– Current pixel is different to pixel above but similar to left hand one, so assign it the same region as the pixel to the left.
(d1 > T) and (d2 < T)– Current pixel is different to the left hand pixel but similar to one above, so assign it the same region as the pixel above.
(d1 < T) and (d2 < T)– The pixel is similar to both neighbours, so assign it to the same region as the neighbours.
A problem occurs in case 4 when the current pixel is similar to both neighbours, but the regions for the neighbouring pixels differ. In this situation it has become apparent that although both the neighbouring pixels have different regions, the regions are in fact equivalent.
In implementing the blob colouring algorithm, a two-dimensional integer array is used whose size is equal to that of the image. This array contains the region number for each pixel in the image. So, for a given pixel in the image its region number is equal to the value at the same x,y co-ordinate in the array.
The image is then scanned from left to right and from top to bottom using two
for loops. The RGB values of the pixels are converted to HSI. A series of
if statements then check each of the cases outlined in the previous section where T is a user-defined threshold.
So far, this is fairly straightforward. The complexity arises in the implementation of the region equivalence mapping. An equivalence mapping is required in case 4 where the current pixel is found to be the same as both neighbours, but both neighbours have different region numbers. In this situation it has been found that both regions are equivalent and need to be made the same. One way is to renumber the entire region array taking into account this new fact. However, this would obviously take a lot of processing time as this situation may be encountered numerous times throughout the processing of the image. A better method is to use an array that maps a region to a different region.
The example above shows that at some point it has been discovered that region 2 is in fact equivalent to region 1 and also that region 3 is equivalent to region 0. Once the image has been scanned the region array can be re-calculated taking into account this mapping, so that anywhere there is a 2 in the region array it is replaced with a 1, etc.
At first this method may seem an efficient way of solving the equivalence problem, but there are a lot of situations where there is a problem with this method. Using the table above, what if it is later discover that region 2 is also equivalent to region 0? The equivalent region entry has already been used to state that region 2 is equivalent to region 1, so if it is changed to make it equivalent to region 0 the piece of information stating that region 2 is equivalent to region 1 will be lost. In this situation before region 2 can be mapped to 0, region 1 must first be mapped to 0.
In the scenario described above only one other region had to be remapped, but in large complicated images with thousands of regions this process can take a long time, and it must be done every time a region needs to be re-mapped. In fact, using a large (1024×768) test image the blob colouring algorithm using the region equivalence map took over 15 minutes to complete on an AMD Athlon 2600 CPU with 512Mb RAM.
A more efficient alternative to the region equivalence map is required, one that can handle multiple equivalence pointers. A solution was found using
Vector is simply a one-dimensional resizable array, and a
TreeSet is simply an ordered set. This provides an almost treelike structure. The second figure below shows the region equivalencies found in the first figure below and also incorporates the scenario described earlier where region 2 is equivalent to region 0 as well as 1.
It is important to ensure that regions always point to lower regions and that the list of equivalent regions is ordered from lowest to highest. This makes it easier to flatten the tree after the image has been processed so it is possible to re-calculate the region array.
To flatten the trees it is easiest to start with the last region in the table and determine the lowest region that is points to (this may involve following multiple links and paths). Once the lowest region is found it is now known what each intermediate equivalent region points to, so each of these intermediate regions can be remapped directly to the lowest region. In other words, the lowest equivalent region is propagated down the tree in order to flatten it.
It was found that this implementation also took a long time to complete using the same test image as before. Upon investigation the problem occurred when the program was attempting to find the lowest region pointed to by a given region (e.g. 3 points to 2 points to 1, so 3 points to 1). The solution was to work out the lowest equivalent region when adding a region to the table rather than work it out at the end. This made the paths down the tree much shorter and therefore much quicker to search.
Using the second method of mapping equivalent regions, the blob colouring algorithm only took on average 4 seconds rather than 15 minutes to process the test image (including time to flatten the tree and re-calculate the region array).
4. The Java Classes
In all this program consists of 13 classes:
BoundingBox– A class to represent the bounding box of the objectr being tracked.
BoundingBoxUpdateListener– A listener interface used to notify classes when the bounding box for the object being tracked has been updated.
CameraController– Initialises the webcam and then brabs frames. Uses the Java Media Framework
ColourSpaceConversions– Contains the methods to convert RGB to HSI.
ColourTracker– The main application class. This class initialises the camera, the image processor, the internet server, and the GUI.
ColourTrackerGUI– The graphical user interface (see screenshot at top of page). The live camera feed is displayed allowing the user to select a region of colour for tracking. Six sliders for minimum and maximum HSI values can also be used to set the colour to be tracked. A second image displays the processed image. The bottom third of the window contains a message console.
ImageProcessor– Performs the blob colouring algorithm on frames from the camera and locatates the largest object of the required colour.
ImageUpdateListener– A listener interface
JImagePanel– A panel on which images are drawn.
JImagePanelto allow drawing of boxes around sections of the image.
NetServer– A multi-threaded internet server allowing multiple clients to connect and receive tracking information. By default the server listens on port 5500.
SelectionEvent– A class to store information about a selection event. Stores the top-left and bottom-right coordinates of the selection, as well as the object that the selection was drawn on.
SelectionListener– A selection listener interface used to notify classes when a selection has been drawn on an object.
5. Running the Code
First you need to download the JMF for your operating system. I’m using Windows XP but this program should work under Linux with only a slight modification. Once you’ve downloaded the JMF you need to install it. Under Windows you simply need to run the EXE file. The installer will then detect your video and audio capture devices and install JMF to your hard drive. The installer will even modify your classpath, so there’s no worrying about locations of JAR files later on.
Once you have downloaded, installed and configured JMF with your capture device then you can download the Java Colour Tracker JAR file and/or source. Note that you may need to edit 2 lines in the
ColourTracker.java file and re-compile, especially if you are not using Windows. These two lines are at the top of
ColourTracker.java and will currently be set to:
public static final String CAPTURE_DEVICE_NAME = "vfw:Microsoft WDM Image Capture (Win32):0"; public static final String CAPTURE_DEVICE_LOCATION = "vfw://0";
If these two lines are not correct for your system then the Java Colour Tracker will not run and you will receive an error similar to:
Initialising video capture device: vfw:Microsoft WDM Image Capture (Win32):0 java.io.IOException: Could not connect to capture device Error Initialising video capture device! javax.media.NoDataSourceException: Error instantiating class: com.sun.media.prot ocol.vfw.DataSource : java.io.IOException: Could not connect to capture device at javax.media.Manager.createDataSource(Manager.java:1012) at CameraController.<init>(CameraController.java:57) at ColourTracker.<init>(ColourTracker.java:36) at ColourTracker.main(ColourTracker.java:88)
Assuming everything is ok, you should see a window similar to the one in the screenshot at the top of this page. You can then adjust the sliders to set the minimum and maximum HSI values for the colour you wish to track. You can also drag a box around the region of colour you wish to track in the left-hand ‘live stream’ window (note that is does not always work!). Once you’ve done that you should only the colour range you have specified in the right-hand ‘processed image’ window where a red box will be drawn around the largest region. The bounding coords of the region will be displayed in the text area at the bottom of the screen. If you telnet to port 5500 and type
Get coords and then press enter you should receive the current coordinates of the object being tracked. You can then use this in your own application to control something like a robot as I did. If you haven’t already, have a look at my ball following lego robot project.