Vision ComponentThe emotion detection was achieved with a 3rd party library. While it is great at the task of recognising emotions on faces in front of a laptop screen, it wasn't designed for multi-camera use. The first step here was to develop a clustered camera system. Each host PC ran one or two camera instances and did its own local emotion analysis. The results were locally logged and sent via OSC to the combiner. At this level, an initial layer of heuristics were applied to ensure background faces were not detected.
The next component is the camera combiner. This application receives the face information from all cameras and combines it. In this setup, simply combining face images would not work, as the view frustums of each camera overlaps. This layer applies heuristics to filter out faces which are potentially the same face. There were a number of complex heuristics, both spacial and temporal filters, applied at this layer to ensure the interaction application received accurate, clean information. As well as AI the camera combiner acted as a reliability layer, detected any stalled camera tracking processes and restarting them as necessary.
Excluding hot-spares, the system was running with 13 computers - 6 vision processing PCs, 6 display PCs, and a master PC. This required a significant amount of system administration at bump-in to ensure all computers were in-sync (to implement and deploy client requests across the cluster pre-show), with the same data set, and remotely accessable (for remote support). A number of Windows Powershell scripts were developed to assist with this task.
We had to use more than one display PC for this system (Fortunate it that it gave me the opportunity to further some interesting tech, unfortunate in that it was very complex and also pushed the sys admin and PC hire cost up!). The decision had to be made on the exact number of PCs required. The trade off here was between the number of PCs required for the system vs the amount of videos each PC had to decode. In the end we went with 6 PCs for the 24 screens (although I have now optimised video playback so that I'd be able to reduce the number of PCs significantly). As you will see in the video, the wall was seamless, interactions could spread across multiples screens depending on where the users were standing and how much they swayed around as they were interacting. Because of this the videos had to play in sync across the PCs, so here a multi-movie clustered video library was developed.
Content Customisation system
There were around 30 layers of non-interactive media in the system. Some were animated videos and some were static layers. These assets programatically proved to be time consuming. In order to make the visual tweaking process more efficient for the designers, I developed a front end editor using Qt which allowed all the interactive object properties (scale, position, depth, alpha/blend values etc) to be modified and saved out in real time, while viewing the results on the clustered 24 screen wall.
A master controller, running on a separate PC, was the brain of the system. It received a clean stream of face information from the face combiner application, it chose appropriate animations to play to the users in the space, performing more straight-forward programming tasks such as minimising two similar animations from playing on users standing next to each other, keeping track of all smiles for the logging statistics on viewers, and finally sending the appropriate commands to the display and audio systems to bring the stunning artwork to life across the wall.
I'm proud to have helped Boffswana deliver such a cutting edge interaction to industry. Boffswana have been recognised already for this project, with two official commendations being the AIMIA 17 Awards And the FWA Site of The Day for 3rd April, 2011.