Table 2: Davis Jaccard Mean, Recall, and Decay for the YouTubeVos and Davis Datasets.
All birds are detected and tracked by the LWL tracker, and the corresponding image is cropped using
the bounding box predicted by the tracker for each bird. The cropped patch is processed by a
pretrained mesh reconstruction model for camera pose prediction, solving the perspective-n-point
problem by employing keypoint correspondences.