New system mixes iPhone videos for advanced 4D viewing


Carnegie Mellon University researchers have combined iPhone videos taken from separate cameras to create 4D visualizations, which allow viewers to see fast action from various angles, according to a new press release. The new shooting process can even remove people or objects from view.

This new method, which enables video editors to showcase new tricks in real time, could have implications at a time when face swapping technology is already about to cause a seismic shift in the reliability of video content.

RELATED: UNDERSTANDING THE FOURTH DIMENSION FROM OUR 3D PERSPECTIVE

Create 4D scenes from multiple visualizations

Imagine a live event where the concert or sports game (for example) is captured from every smartphone in the theater or arena, and that any obstruction to the field of view could be quickly removed.

It is dizzying, but also exciting.

Each video, shot independently from various points of view, of wedding guests could put viewers there, amid moments that last forever, Aayush Bansal, Ph.D. student at the CMU Institute of Robotics, explained in the press release.

Another application is to record actors in one environment and then insert them in another, he added.

“We are only limited by the number of cameras,” Bansal said, explaining that there is no upper limit on the number of videos that can be combined.

Bringing film studios to iPhones

Virtualized reality, as the Carnegie Mellon press release calls it, is nothing new. This new method of video capture is important because it is easy and accessible to use. While nothing new, virtualized reality was previously exclusive to studio setups like CMU’s Panoptic Studio, which has more than 500 video cameras built into its geodetic walls.

However, combining visual information from real-world scenes taken from multiple handheld and independent cameras to make a single comprehensive model that reconstructs a 3D scene has not been possible until now.

To develop their method, Bansal and his colleagues used convolutional neural networks (CNN), a type of deep learning program that is adept at analyzing visual data. The team found that scene-specific CNNs can adequately compose different snippets in one complete 4D scene.

‘The world is our study’

To demonstrate their method, the researchers used up to 15 iPhones to capture various scenes, including dances, martial arts demonstrations, and even flamingos at the National Aviary in Pittsburgh, in the United States.

“The goal of using iPhones was to demonstrate that anyone can use this system,” said Bansal. “The world is our study.”

Bansal and colleagues presented their 4D visualization method at the Computer Vision and Pattern Recognition virtual conference last month. While unprecedented, this new technology is only the beginning of a new future for video and media capture.

.