How to Start with VV?
According to my sources, a system in volumetric video studio normally generates point cloud or mesh using around 100 synchronized DSLR cameras as same as photogrammetry setup. Even if those “Multi-View Stereo (MVS)” approach can get dense point cloud, it’s too expensive, right? Don’t worry. You can start from several depth sensors such as Azure Kinect, which can directly and independently get point cloud. It is a little bit noisy, but it can be reduced in post-processing.
Dataset using Multi-View Kinect
First of all, I thought to use Azure Kinect for my experiment, but it was still expensive for me… (around 450 USD/Kinect). Additionally, it requires 3 or 4 Kinect at least for capturing 360-degree VV. Then, I found the “PanopticStudio 3D PointCloud Dataset”, which is a 3D point cloud dataset captured from 10 Kinect V2.
However, unfortunately, this dataset is distributed not as a 3D point cloud but colored video and depth acquired from each camera, camera parameters (written about the position, angle, focal length, and distortion) and time stamp for camera synchronization. The point is that you need to run code in their GitHub repository for generating point cloud yourself with synchronizing images got from each Kinect. Furthermore, their code for generating point cloud were written in Matlab, which requires a license for users…, so I re-written them with Python with the “Open3D” library (I’ll describe it later). It’s free 🙂
Finally, I could export all point cloud acquired from multi-view Kinect as .ply format for each frame as shown in Fig.4. A colored point cloud data was incredibly large, so it would be larger for multiple frames. Therefore I also implemented code to remove pointcloud of the wall (dome) part based on a threshold for depth value on the depth image. Eventually, I could decrease pointcloud into 25MB for each frame.
Making a color point cloud was just a little difficult because the resolution of color and depth on Kinect was different. (Colour: 1920×1080 (.jpg), depth: 512×424 (.png). To pick-up color information, The point cloud generated from depth is necessary to be projected onto color image space. After picking-up color for each point, I re-projected on 3D space again. Honestly, you can also ignore this process because we will apply texture mapping later.
Visualization on Blender
For visualization of the pointcloud, I used Blender 2.82 with the add-on “Point Cloud Visualizer”, and it was very convenient to visualize colored pointcloud sequences. The loading time for point cloud sequences (110 frames in my case) might be long, and it may require large memory on your PC. In my case, it is Windows 10 (64bit), 16GB, GeForce RTX2020.
This “offline” process (mainly for data loading) requires to have patience. I’d like to think about how to optimize it in real-time when I get multiple Kinects in near future 🙂
3D Library for Meshing
Generating point cloud and meshing can be achieved by using “Open3D”, which is a 3D computer vision friendly library. I believe this might be the best choice to implement an offline VV system for beginners. It was easy to acquire a mesh(.ply) from point cloud [sample]. However the Open3D library has an “OpenCV” – like axis (X-right, Y-down, Z-front), so please use it carefully on “OpenGL” software. (Fig.5).
I visualized the mesh generated from the dataset with lighting on Blender in Fig. 6. When you use the “Truncated Sign Distance Function (TSDF)” function in Open3D for meshing, you can choose the voxel resolution parameter (I used 5.0 in Fig. 6). I’ll describe the details later. The mesh for each frame was less than 20MB.
As a tip, Blender add-on “Stop-motion-OBJ” was really convenient for loading continuous mesh (.ply or .obj) sequences even though it takes much time for loading. If you can remove floor vertices, it would be faster for loading too. I also implemented floor removal function using Blender Script.
3D Tool for Texture Mapping
Texture mapping was the most difficult part of this experiment because there is no popular library and tools. However, fortunately, I’ve found a texture mapping tutorial using MeshLab. The system inputs are mesh, camera intrinsic/extrinsic parameters, and color texture acquired from each camera. Because I couldn’t find many details related to the texture mapping process on MeshLab, I spent a lot of time to automatically input a mesh, color images, and camera parameters such as batch process (because it’s necessary for applying all frames). Finally I achieved texture mapping (2K texture resolution of my setting) for all frames (Fig. 7) by a process found by touch…
- Synthesizing MeshLab scene file by self-developed Python code
- Loading it using MeshLab script.
- Applying texture mapping function from MeshLab script
Trial for Quality Improvement
To improve the final visual quality based on texture mapping, the TSDF parameter (mentioned at the meshing part) is quite important. Through several trials of tuning TSDF parameters related to mesh resolution, I found that high-resolution mesh occurred texture mapping artifacts in some cases like the original pointcloud has too much noise (Fig. 8).
Additionally, I’ve tried different types of meshing algorithms with using Wrap3 for Retopology and MeshLab for Poisson surface mesh (Fig. 9). The Poisson surface mesh seemed better, but the surface has been too smooth. It is still a trade-off between texture quality and shape quality in my trial.
In this article, I broke down my volumetric video making experiments into processing parts: direction, dataset, making point cloud, visualization, meshing, texturing, and parameter trial. I believe this is the shortest way to develop an “offline” volumetric video system even though it takes time to load… Furthermore, I still have some artifacts for the textured result because of noisy depth. At least, that noise might be decreased by closing each Kinect position to the target. Therefore, I need to purchase several Kinect for my next experiment.
Additionally, for solving basic noise issues, it would be necessary to develop a surface tracking system such as “Embedded Deformation” with “Deformation Graph” shown in Fig. 10. The technique can be decreased noise by accumulating pointcloud acquired for each frame.
What I Wanted To Do
Volumetric video technology has a high potential to make a new experience, visual, and live. Unfortunately, my current experiment is still “offline”, and I can easily guess that it’s such hard work to make it work in real-time. It requires GLSL knowledge and skills. Debugging is also difficult… Now, I’ll still continue offline experiments to achieve studio quality. Even this achievement might be possible to use in making offline content such as music videos or movies using volumetric video technology with “Head Mount Display (HMD)” soon.
Volumetric video is one of the technologies and just a way to achieve your purpose. I’m always thinking that an important thing is the story behind the technology. The story is meaningful and emotional, and technology should be new and impressive. The thought might be similar to “Volumetric Filmmaking”. One of the offline systems “USC Light Stage” (Fig. 11) is also contributed to producing many contents with stories. That might be my current ideal.
Eventually, I might be proud if I can expand my own world by collaborating with many artists or creators to make new artworks. Thus I’d like to make a meaningful and exciting future through volumetric video technology.
AUTHOR: Naoya Iwamoto
Researcher in Computer Graphics field. After I received a Ph.D. (engineering), I joined a smartphone company in 2017. Currently I develop new animation technology combined with Augmented Reality technology.