Bee Tracking

Tom Goddard
March 15, 2015

Captured full day video on February 2, 2015 at front of hive using two cameras. Sunrise was at 7:13 AM and sunset at 5:34 PM pacific daylight time. February 2, 2015 was a Monday.

Previous bee tracking report from 2011.

Camera 2 video Camera 1 video

Recorded with two cameras to check reliability of tracking and counts of bee flights in and out. The two cameras can also give a 3-dimensional stereoscopic view. You can see this by crossing your eyes (viewing a point half way between your face and your computer screen) with this stereo video which simply places the two videos side by side. The cameras are separated by about twice human eye separation so the depth effect is exaggerated.

Counting Bee Flights

Plotting the counts of bees leaving and entering the video frame shows problems in the tracking. One camera recorded a net loss of 560 bees and the other 754 bees for the day. These numbers are too high. The cumulative out plot gives clues about the tracking errors. From 1:00 - 1:30 there is a large increase in flights. Inspecting the video shows this was when new bees were making orientation flights. Both cameras show an increase of about 300-400 bees out during this time, but in fact the number of bees out probably changed very little. Apparently during orientation flights many more bees were detected leaving the video frame than entering.

Inspecting the tracked paths suggests that the trouble is that when bees come close to the camera they are not tracked (they appear too big and confuse the tracking). The thumbnail images below show that the bees leaving usually head out in a narrow corridor, while returning bees come in from a wider angle of directions causing incoming ones to come close to the camera more often. So more tracking failures occur with incoming bees. In support of this theory, the lost bees is significantly lower in camera 1 which is near the wall and fence adjacent to the hive in the direction the bees cannot fly. Direct observation shows many more bee flights close to the camera for camera 2 as the bees come in at 45 degrees to the landing board. The mistracking likely occurs through-out the day -- notice that camera 1 has 300 bees out just before 1:00 while camera 2 has 400 bees out. The easiest fix is to move the camera so the bees to not fly close to the cameras. Using a camera with narrower field of view positioned further away would help, although there are bushes that block the view of my hive. I am now trying placing the cameras near the wall aimed perpendicular to the typical flight path.

Another sign that tracking is failing frequently is that the pass through flights (magenta in top plot) are too few during the orientation flight period. The orientation flight bees are flying in and out of the video frame which should result in many tracks entering and then leaving the frame (the definition of "pass through"). Pass through tracks are not counted as "in" or "out" and I would expect them to out-number the in/out counts. That they do not suggests that the tracking is often losing the bees as they pass through the image frame, resulting in detection of an "in" flight and an "out" flight instead of a pass through flight.

Camera 1 tracks

Green lines are outgoing bees (accelerating), red lines are incoming bees (deccelerating), yellow dots at every bee location in every frame. Each image includes all bee tracks for 10 minutes.

Camera 2 tracks

Video Capture

I used command-line program ffmpeg to capture the video in 10 minute files. The capture command for camera 1 putting video files named 000.mp4, 001.mp4 in a directory named cam1 and recording for 11 hours (39600 = 11 * 60 * 60 seconds) was

ffmpeg -i rtsp://admin:12345@ -an -vcodec copy -t 39600 -map 0 -f segment -segment_time 600 -reset_timestamps 1 cam1/%03d.mp4 >& cam1/log &

I use a similar command to simultaneously capture video from camera 2. Each ffmpeg process takes less than 1% CPU on a Mac Book Pro, mid 2012 model, Intel i7 processor. The CPU use is very low because the streamed video is simply being copied to disk (a solid state drive), and is not reencoded.

Attempted to start recording at 7:00 AM but automatic start failed and I manually started recording at 7:35 AM. A cron job was supposed to start the camer recording but the Mac laptop was asleep so it did not start. Energy saving settings were to never sleep, but the laptop lid was closed which puts it to sleep inspite of the never sleep setting. Leaving the laptop lid open avoids this problem.


Two Hikvision DS-2CD2032-I security cameras, 3 Mpixels 2048x1536 max resolution, 4mm lense giving 75.8 degree field of view, video and power is provided by an ethernet cable, one for each camera about 100 feet to my house, into a power-over-ethernet gigabit switch (model TP-Link TL-SG1008P). The switch connects to my wireless router and video is recorder on a Mac laptop connected via wireless. I record only 640x480 video streams at 30 frames per second. Higher resolutions take more disk space, are slow to process, and the extra detail such as seeing the legs of the bees is not useful for tracking. Video is recorded with a maximum bit rate setting of 512 Kbits/second. Recording 11 hours of video took 2 Gbytes per camera (average bit rate 430 Kbits/sec). The cameras do not have audio.

This camera was a poor choice for a few reasons. The field of view is too wide (76 degrees) requiring a camera position very close to the hive (about 2 feet away) if I want the image frame to be just twice the hive width. A camera position further back could be used but then more moving plants in the field of view cause problems tracking bees. The close position causes trouble tracking bees that come too close to the camera lense -- the appear too big in the image. The light sensitivity of this camera is relatively poor, specifications say minimum illumination is 0.19 lux at F2.0, automatic gain correction on. A better choice would be the Hikvision DS-2CD2012-I with 7 times more light sensitivity, 0.028 lux at F2.0, AGC on, max resolution 1280x960 and a 12 mm lens to provide a narrower 22 degree field of view. The higher light sensitivity allows seeing the bees near sunset and sunrise. Seeing the bees in dim conditions can also be done by turning on the camera infrared LEDS and switching it to infrared mode (can be done automatically in dim light conditions).

Tracking algorithm

The tracking algorithm works roughly as follows. It uses a background image which is a running average of the 30 previous and 30 next frames (ie. 1 second of video before and after the current frame). It subtracts the current video frame from this average, smooths that with several iterations (usually 8) of nearest pixel averaging, to make each bee become a single local maximum in the difference image. It then extracts all the local maxima above some threshold. A typical threshold would be 20 for pixel values ranging from 0-255, and this is about 5 to 10 times the mean standard deviation of all pixels from the background image. Then the spots are ordered from highest to lowest local maxima value and extra spots within a specified distance (e.g. 12 pixels) of the a higher neighbor are removed. This is so that when a smoothed bee appears as two bumps (body and head) it gets reduced to one spot.

Next all these image spots are assembled into tracks. It looks for nearby spots in consecutive video frames. If spot s0 in frame i has nearest spot s1 in frame i+1, and if s0 is the nearest in frame i to s1, then the two spots are connected to form the first segment of a path. Then we linearly extrapolate to an expected location of a spot in frame i+2, find the nearest spot to that point s2 and if s1 is also the nearest spot in frame i+1 to s2 then this spot is connected to the growing track. We also extend backwards in time to frame i-1 in the same way. There are some additional rules, we don't extend if the nearest spot is too far from the expected location (4 times the linear extrapolation step). Also we sometimes extend without requiring the end of the path to be the nearest spot to the new spot if the extension is close enough to the expected spot location (threshold 0.5 times linear extrapolation).

Plants blowing in a breeze, and shadows as the sun moves create spots. Those don't move far. So we reject tracks that are short in space (e.g. < 50 pixels long in 640x480 video) or are short in time (e.g. fewer than 6 video frames).

Another source of noise spots that is likely to cause trouble are the shadows of flying bees. My hive has the entrance facing north and does not get exposed to sun since I am in the northern hemisphere (San Francisco, California), and a shade tree shields the hive to the south. I recorded video today and I also opened the hive and I expect people walking in the video is going to cause chaos for the tracking.

There may be some nuances to the algorithm I've left out. The algorithm needs changes to make it more reliable.

The initial implementation used Python and numpy and was about 20x slower than the current code where I have ported some of the time-critical routines to C. Still it only runs at about real time for 640x480 video -- in other words, it takes about 10 hours to calculate tracks for 10 hours of video. This is on a single core of a mid-2012 MacBook Pro with 2.6 GHz Intel i7 procesor, 16 Gbytes RAM, 512 Mb solid state drive. Half of the time is taken smoothing the images. The code allows binning the images (e.g. averaging 2x2 pixels down to 1 pixel) which can increase the processing speed. Further C optimization or some GPU code could speed it up. I tried parallel processing four 10 minute videos on this same 4-core laptop and it took about 25 minutes -- while a single 10 minute video took 8 minutes. So only a small speed-up was achieved, possibly because the speed is memory bandwidth limited, and the per-core 256 Kbyte cache is too small to hold a single image. Smoothing could probably be optimized to fit in per-core cache.


The code is a mess. I've made no effort yet to allow others to easily run it. I include it here,, for fanatics. Or a newer October 2015 version here It uses Python 2.7.6, numpy 1.8.0, ffvideo 0.0.13, and C accessed from Python with ctypes. The ffvideo library is a Python interface to libav which decodes the video (H.264 encoding from camera) into individual image frames. I also used matplotlib for making the plot of counts in this web page.