Table 2: Performance of our proposed MoE-VRD with K = 2 and a total of N = 10 experts, in comparison with state-of-the-art approaches on the ImageNet-VidVRD dataset.
Approach Relation Detection Relation Tagging
mAP R@50 R@100 P@1 P@5 P@10
VidVRD Shang et. al (2017) 8.58 5.54 6.37 43.00 28.90 20.80
GSTEG Tsai et. al (2019) 9.52 7.05 7.67 51.50 39.50 28.23
VRD-GCN Qian et. al (2019) 14.23 7.43 8.75 59.50 40.50 27.85
3DRN Cao et. al (2021) 14.68 5.53 6.39 57.89 41.80 29.15
VRD-STGC Liu et. al (2019) 18.38 11.21 13.69 60.00 43.10 32.24
SFE Chen et. al (2021) 20.08 13.73 16.88 62.50 49.20 38.45
IVRD Li et. al (2021) 22.97 12.40 14.46 68.83 49.87 35.57
BIG-C Gao et. al (2022) 26.08 14.10 16.25 73.00 55.10 40.00
CKERN Cao et. al (2022) - - - 74.50 55.59 41.34
VidVRD-II Shang et. al (2021) 29.37 ± 0.40 19.63 ± 0.19 22.92 ± 0.48 70.40 ± 1.53 53.88 ± 0.31 40.16 ± 0.70
MoE-VRD (K = 2) (ours) 33.02 ± 0.23 22.77 ± 0.28 24.20 ± 0.22 74.12 ± 1.44 56.47 ± 0.17 42.05 ± 0.92
Table 3: A comparison with the state-of-the-art, as in Table 2, but here on the VidOR dataset.
Approach Relation Detection Relation Tagging
mAP R@50 R@100 P@1 P@5 P@10
3DRN Cao et. al (2021) 2.47 1.58 1.85 33.05 35.27 -
VRD-STGC Liu et. al (2019) 6.85 8.21 9.90 48.92 36.78 -
IVRD Li et. al (2021) 7.42 7.36 9.41 53.40 42.70 -
CKERN Cao et. al (2022) - - - 58.80 46.07 34.29
BIG Gao et. al (2022) 8.54 8.03 10.04 64.42 51.80 40.96
Ens-5 Gao et. al (2021) 9.48 8.56 10.43 63.46 54.07 41.94
SFE Chen et. al (2021) 11.21 9.99 11.94 68.86 55.16 -
VidVRD-II Shang et. al (2021) 8.65 ± 0.11 8.59 ± 0.11 10.69 ± 0.08 57.40 ± 0.57 44.54 ± 0.68 33.30 ± 0.31
MoE-VRD (K = 2) (ours) 9.44 ± 0.21 9.54 ± 0.13 11.51 ± 0.31 58.92 ± 0.67 45.11 ± 0.19 34.85 ± 0.22