BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image

KAIST1,  Imperial College London2
CVPR 2024
MY ALT TEXT

Taking a single image input, our method renders the personalized texture of two hands at novel views, poses, and light conditions, through utilizing symmetric information of left and right hand, and hand texture parameteric model.



Abstract

Creating personalized hand avatars is important to offer a realistic experience to users on AR / VR platforms. While most prior studies focused on reconstructing 3D hand shapes, some recent work has tackled the reconstruction of hand textures on top of shapes. However, these methods are often limited to capturing pixels on the visible side of a hand, requiring diverse views of the hand in a video or multiple images as input. In this paper, we propose a novel method, BiTT(Bi-directional Texture reconstruction of Two hands), which is the first end-to-end train- able method for relightable, pose-free texture reconstruction of two interacting hands taking only a single RGB image, by three novel components: 1) bi-directional (left ↔︎ right) texture reconstruction using the texture symmetry of left / right hands, 2) utilizing a texture parametric model for hand texture recovery, and 3) the overall coarse-to-fine stage pipeline for reconstructing personalized texture of two interacting hands. BiTT first estimates the scene light condition and albedo image from an input image, then reconstructs the texture of both hands through the texture para- metric model and bi-directional texture reconstructor. In experiments using InterHand2.6M and RGB2Hands datasets, our method significantly outperforms state-of-the-art hand texture reconstruction methods quantitatively and qualitatively.

Two Hands Texture Reconstruction from Single Image

Comparing with Prior-arts

BiTT achieves the state-of-the-art two-hand reconstruction accuracy on InterHand2.6M and RGB2Hands. From a single image containing two-interacting hands, BiTT can . Please also refer to the paper for quantitative comparisons with more baseline methods.

MY ALT TEXT

Rendering Two Hands with Novel Pose | Viewpoint, and Relighting

As our method estimates texture uv map of two hand, we can freely change pose, view point of the hands. Also, estimating light of the scene and albedo of the hands, we can relight the hands in novel condition.

Results on InterHand2.6M

MY ALT TEXT

Results on Re:InterHand

MY ALT TEXT

Shadow Rendering

Our method relies on mesh-based rendering, which makes it easily compatible with traditional concepts in computer graphics. Given that our method involves two hands, it faces significant challenges in occlusion from each hand, along with self-occlusion toward the light source. Despite these complexities, it effectively captures the shadow appearance occured by self-occlusion and interhand-occlusion.

MY ALT TEXT

Model Architecture

Our method consists of three steps: (1) scene estimation, (2) coarse stage, and (3) fine stage estimation. The scene estimation understands the scene by predicting the albedo image and lighting conditions with a given input image. Full detailed textures of both hands are reconstructed from the single image input. The hand texture parametric model is adopted in the coarse stage, then the bi-directional texture reconstruction refines the personalized hand textures by the texture symmetry of left-right-hands. Finally, we render both hands with Phong Illumination.

MY ALT TEXT

BibTeX

@InProceedings{kim2024bitt,
      author = {Kim, Minje and Kim, Tae-Kyun},
      title = {BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image},
      booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      year = {2024}
    }
  

Acknowledgements

This work was in part supported by NST grant (CRC 21011, MSIT), KOCCA grant (R2022020028, MCST), IITP grant (RS-2023-00228996, MSIT).