Scientific Study Description
Read below on how we want to improve autonomous distance estimation by this robotic crowd-sourcing experiment.
Visual Swarm Learning
Animals can only learn from their own visual experiences. Robots are not limited in this respect. They can directly exchange raw camera images or, in order to limit bandwidth, abstract visual features extracted from their images. This exchange possibly can bring speed-ups in the visual learning process, while providing robots with a much broader visual experience than in a single-robot scenario. Below we describe the specific visual skill we try to learn in the current project.
Appearance-based variation cue
In the Astro Drone project, we investigate whether it is possible to learn the distance to obstacles by just looking at the variation in their appearance. The assumption made for this cue, is that there is less variation in textures and colors when a robot approaches a wall or large object. To measure this variation, the probability distribution of the occurrence of different textures is estimated . Then the Shannon entropy  of this distribution is calculated to measure the amount of variation. It is shown that out of a series of 385 image sequences, in 88.6% of the cases the entropy decreases as the camera approaches an obstacle. However, the dataset used to test the cue had some limitations, as most image series were simulated by scaling a single image. Also, the real sequences were made with handheld cameras and probably had limited variation in video locations. This brings us to one of the goals of the crowdsourcing experiment, which is to gather more varied and larger datasets made with real robots. With such datasets, novel cues like the appearance variation cue can be trained and tested more thoroughly. By letting users all over the world help creating datasets, we hope to enhance the robustness of monocular distance cues.
During the game, images will be taken at 10 evenly-spaced distances to the marker. From these images very small 5x5 pixel patches will be extracted when the level is finished. If the player agrees, these image patches are sent to a database, together with the state information of the drone (height, velocities, attitude angles, etc.). No complete images will be sent, nor can they be reconstructed from the sent information. Several examples of drone images accompanied by the extracted patches can be seen in the images below.
Top: Sample images taken with an AR.Drone during flight (which will not be sent)
Bottom: The extracted 5-by5 pixel patches (which will be sent)
Besides the image patches, the experiment focuses on measuring the variation in the crowdsourced dataset. To get information on the way the scene of the image is structured, the squared magnitude of the Discrete Fourier Transform (DFT) of one image will be sent. Before calculating this power spectrum, the image is converted to gray-scale and a Hanning window is applied. An example of an image taken with the drone, a preprocessed image and the corresponding DFT can be seen below.
Top: Sample image taken with an AR.Drone during flight (will not be sent)
Middle: Grayscale image with Hanning window applied (will not be sent)
Bottom: The power spectrum of the windowed image (will be sent)
The magnitude spectrum can not be used to restore the original image. The reason for that, is the fact that the Fourier Transformed image is composed of complex numbers. To restore the original image from the transformed one, both real and imaginary components are required. As the data extracted in the app only contains the magnitude of the transformed image, insufficient information is available to restore the images. However, the power spectrum does contain information about the geometrical structure of the image . Therefore the power spectrum will aid in analyzing the variety of images captured with 'Astro Drone'.
1. G.C.H.E. De Croon, E. De Weerdt, C. De Wagter and B. Remes. “The appearance variation cue for obstacle avoidance.” In 2010 IEEE International Conference on Robotics and Biomimetics (pp. 1606–1611).
2. C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379–423, 623–656, 1948.
3. I. T. Young, J. J. Gerbrands, and L. J. van Vliet, “Fundamentals of Image".
4. Processing.” Delft University of Technology, version 2.3, 2007.
5. Torralba, Antonio, and Oliva, Aude. "Statistics of natural image categories." Network: computation in neural systems 14.3 (2003): 391-412.
6. Aude Oliva, Antonio Torralba, Anne Guerin-Dugue and Jeanny Herault "Global Semantic Classification of Scenes using Power Spectrum Templates." Challenge of Image Retrieval, Newcastle (1999).
For questions or comments, please contact: