Creating personalized hand avatars is important to offer a realistic experience to users on AR / VR platforms. While most prior studies focused on reconstructing 3D hand shapes, some recent work has tackled the reconstruction of hand textures on top of shapes. However, these methods are often limited to capturing pixels on the visible side of a hand, requiring diverse views of the hand in a video or multiple images as input. In this paper, we propose a novel method, BiTT(Bi-directional Texture reconstruction of Two hands), which is the first end-to-end train- able method for relightable, pose-free texture reconstruction of two interacting hands taking only a single RGB image, by three novel components: 1) bi-directional (left ↔︎ right) texture reconstruction using the texture symmetry of left / right hands, 2) utilizing a texture parametric model for hand texture recovery, and 3) the overall coarse-to-fine stage pipeline for reconstructing personalized texture of two interacting hands. BiTT first estimates the scene light condition and albedo image from an input image, then reconstructs the texture of both hands through the texture para- metric model and bi-directional texture reconstructor. In experiments using InterHand2.6M and RGB2Hands datasets, our method significantly outperforms state-of-the-art hand texture reconstruction methods quantitatively and qualitatively.
BiTT achieves the state-of-the-art two-hand reconstruction accuracy on InterHand2.6M and RGB2Hands. From a single image containing two-interacting hands, BiTT can . Please also refer to the paper for quantitative comparisons with more baseline methods.
As our method estimates texture uv map of two hand, we can freely change pose, view point of the hands. Also, estimating light of the scene and albedo of the hands, we can relight the hands in novel condition.
Our method relies on mesh-based rendering, which makes it easily compatible with traditional concepts in computer graphics. Given that our method involves two hands, it faces significant challenges in occlusion from each hand, along with self-occlusion toward the light source. Despite these complexities, it effectively captures the shadow appearance occured by self-occlusion and interhand-occlusion.
Our method consists of three steps: (1) scene estimation, (2) coarse stage, and (3) fine stage estimation. The scene estimation understands the scene by predicting the albedo image and lighting conditions with a given input image. Full detailed textures of both hands are reconstructed from the single image input. The hand texture parametric model is adopted in the coarse stage, then the bi-directional texture reconstruction refines the personalized hand textures by the texture symmetry of left-right-hands. Finally, we render both hands with Phong Illumination.
@InProceedings{kim2024bitt,
author = {Kim, Minje and Kim, Tae-Kyun},
title = {BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}
}
This work was in part supported by NST grant (CRC 21011, MSIT), KOCCA grant (R2022020028, MCST), IITP grant (RS-2023-00228996, MSIT).