Table 2: Davis Jaccard Mean, Recall, and Decay for the YouTubeVos and Davis Datasets. All birds are detected and tracked by the LWL tracker, and the corresponding image is cropped using the bounding box predicted by the tracker for each bird. The cropped patch is processed by a pretrained mesh reconstruction model for camera pose prediction, solving the perspective-n-point problem by employing keypoint correspondences.
Method Jaccard-Mean ↑ Jaccard-Recall ↑ Jaccard-Decay ↓
Quaternion Goel et. al (2020) 0.45 0.48 0.020
Gram-Schmidt Zhou et. al (2020) 0.40 0.27 0.007
SVD Levinson et. al (2020) 0.44 0.43 0.028
Keypoint (ours) 0.45 0.53 0.018