How to Create Volumetric Video Without Owning a Kinect?

Volumetric capture experiment.

Hi everyone, I’m Naoya Iwamoto, a researcher in the computer graphics field. I’m interested in character animation technology; deformation, facial animation, motion synthesis, and control.
Recently I’m focusing on Volumetric Video(VV)” technology, which can be captured dynamic movement of the living object. From this year I’ve read several research papers and also had some experiments for making VV. In this article, I would like to share tips got from my experiment shown below.

How to Start with VV? 

According to my sources, a system in volumetric video studio normally generates point cloud or mesh using around 100 synchronized DSLR cameras as same as photogrammetry setup. Even if those “Multi-View Stereo (MVS)” approach can get dense point cloud, it’s too expensive, right? Don’t worry. You can start from several depth sensors such as Azure Kinect, which can directly and independently get point cloud. It is a little bit noisy, but it can be reduced in post-processing. 

Dataset using Multi-View Kinect

First of all, I thought to use Azure Kinect for my experiment, but it was still expensive for me… (around 450 USD/Kinect). Additionally, it requires 3 or 4 Kinect at least for capturing 360-degree VV. Then, I found the “PanopticStudio 3D PointCloud Dataset”, which is a 3D point cloud dataset captured from 10 Kinect V2. 

However, unfortunately, this dataset is distributed not as a 3D point cloud but colored video and depth acquired from each camera, camera parameters (written about the position, angle, focal length, and distortion) and time stamp for camera synchronization. The point is that you need to run code in their GitHub repository for generating point cloud yourself with synchronizing images got from each Kinect. Furthermore, their code for generating point cloud were written in Matlab, which requires a license for users…, so I re-written them with Python with the “Open3D” library (I’ll describe it later). It’s free 🙂

Finally, I could export all point cloud acquired from multi-view Kinect as .ply format for each frame as shown in Fig.4. A colored point cloud data was incredibly large, so it would be larger for multiple frames. Therefore I also implemented code to remove pointcloud of the wall (dome) part based on a threshold for depth value on the depth image. Eventually, I could decrease pointcloud into 25MB for each frame. 

Making a color point cloud was just a little difficult because the resolution of color and depth on Kinect was different. (Colour: 1920×1080 (.jpg), depth: 512×424 (.png). To pick-up color information, The point cloud generated from depth is necessary to be projected onto color image space. After picking-up color for each point, I re-projected on 3D space again. Honestly, you can also ignore this process because we will apply texture mapping later.

Visualization on Blender

For visualization of the pointcloud, I used Blender 2.82 with the add-on “Point Cloud Visualizer”, and it was very convenient to visualize colored pointcloud sequences. The loading time for point cloud sequences (110 frames in my case) might be long, and it may require large memory on your PC. In my case, it is Windows 10 (64bit), 16GB, GeForce RTX2020.

This “offline” process (mainly for data loading) requires to have patience. I’d like to think about how to optimize it in real-time when I get multiple Kinects in near future 🙂

3D Library for Meshing

Generating point cloud and meshing can be achieved by using “Open3D”, which is a 3D computer vision friendly library. I believe this might be the best choice to implement an offline VV system for beginners. It was easy to acquire a mesh(.ply) from point cloud [sample]. However the Open3D library has an “OpenCV” – like axis (X-right, Y-down, Z-front), so please use it carefully on “OpenGL” software. (Fig.5).

I visualized the mesh generated from the dataset with lighting on Blender in Fig. 6. When you use the “Truncated Sign Distance Function (TSDF)” function in Open3D for meshing, you can choose the voxel resolution parameter (I used 5.0 in Fig. 6).  I’ll describe the details later. The mesh for each frame was less than 20MB.

As a tip, Blender add-on “Stop-motion-OBJ” was really convenient for loading continuous mesh (.ply or .obj) sequences even though it takes much time for loading. If you can remove floor vertices, it would be faster for loading too. I also implemented floor removal function using Blender Script.

3D Tool for Texture Mapping

Texture mapping was the most difficult part of this experiment because there is no popular library and tools. However, fortunately, I’ve found a texture mapping tutorial using MeshLab. The system inputs are mesh, camera intrinsic/extrinsic parameters, and color texture acquired from each camera. Because I couldn’t find many details related to the texture mapping process on MeshLab, I spent a lot of time to automatically input a mesh, color images, and camera parameters such as batch process (because it’s necessary for applying all frames). Finally I achieved texture mapping (2K texture resolution of my setting) for all frames (Fig. 7) by a process found by touch…

  • Synthesizing MeshLab scene file by self-developed Python code
  • Loading it using MeshLab script.
  • Applying texture mapping function from MeshLab script

Trial for Quality Improvement 

To improve the final visual quality based on texture mapping, the TSDF parameter (mentioned at the meshing part) is quite important. Through several trials of tuning TSDF parameters related to mesh resolution, I found that high-resolution mesh occurred texture mapping artifacts in some cases like the original pointcloud has too much noise (Fig. 8).

TSDF and mesh resolution
TSDF parameter and mesh resolution

Additionally, I’ve tried different types of meshing algorithms with using Wrap3 for Retopology and MeshLab for Poisson surface mesh (Fig. 9). The Poisson surface mesh seemed better, but the surface has been too smooth. It is still a trade-off between texture quality and shape quality in my trial.

Volumetric video meshing
Volumetric video meshing


In this article, I broke down my volumetric video making experiments into processing parts: direction, dataset, making point cloud, visualization, meshing, texturing, and parameter trial. I believe this is the shortest way to develop an “offline” volumetric video system even though it takes time to load… Furthermore, I still have some artifacts for the textured result because of noisy depth. At least, that noise might be decreased by closing each Kinect position to the target. Therefore, I need to purchase several Kinect for my next experiment. 

Additionally, for solving basic noise issues, it would be necessary to develop a surface tracking system such as “Embedded Deformation” with “Deformation Graph shown in Fig. 10. The technique can be decreased noise by accumulating pointcloud acquired for each frame.

What I Wanted To Do 

Volumetric video technology has a high potential to make a new experience, visual, and live. Unfortunately, my current experiment is still “offline”, and I can easily guess that it’s such hard work to make it work in real-time. It requires GLSL knowledge and skills. Debugging is also difficult… Now, I’ll still continue offline experiments to achieve studio quality. Even this achievement might be possible to use in making offline content such as music videos or movies using volumetric video technology with “Head Mount Display (HMD)” soon.

Volumetric video is one of the technologies and just a way to achieve your purpose. I’m always thinking that an important thing is the story behind the technology. The story is meaningful and emotional, and technology should be new and impressive. The thought might be similar to “Volumetric Filmmaking”. One of the offline systems “USC Light Stage” (Fig. 11) is also contributed to producing many contents with stories. That might be my current ideal. 

Eventually, I might be proud if I can expand my own world by collaborating with many artists or creators to make new artworks. Thus I’d like to make a meaningful and exciting future through volumetric video technology.

AUTHOR: Naoya Iwamoto

Researcher in Computer Graphics field. After I received a Ph.D. (engineering), I joined a smartphone company in 2017. Currently I develop new animation technology combined with Augmented Reality technology. 





  1. Hey, nice article! I have a question… If you had to do it again and had more finances, would you still code it yourself, or would you use an existing program?

    1. Thanks Tony! It’s essential question for me. I suppose that I would develop it again, and it has three reasons. One is “democritization”. I wanted to make other choice instead of finantial solution. Second is a “sense of accomplishment”. That would be special when I did it through own development. The last is “awareness”. I may discover some awareness through the DIY for my next idea or next issue to be solved. That’s important for academia and engineering. Thanks Tony again to give good opportunity to think of that.

  2. This is great article. I was looking for something similar – where some one who is trying to have some result would share their workflow how they do. Thank you and good luck with your creativity!

    1. Thanks for enjoying article! I’m happy to hear. I’ll keep up and share the progress on my Instagram. If you have some interests for my post, please send DM anytime:)

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button