Tversky index

From testwiki
Jump to navigation Jump to search

The Tversky index, named after Amos Tversky,[1] is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of the Sørensen–Dice coefficient and the Jaccard index.

For sets X and Y the Tversky index is a number between 0 and 1 given by

S(X,Y)=|XY||XY|+α|XY|+β|YX|

Here, XY denotes the relative complement of Y in X.

Further, α,β0 are parameters of the Tversky index. Setting α=β=1 produces the Jaccard index; setting α=β=0.5 produces the Sørensen–Dice coefficient.

If we consider X to be the prototype and Y to be the variant, then α corresponds to the weight of the prototype and β corresponds to the weight of the variant. Tversky measures with α+β=1 are of special interest.[2]

Because of the inherent asymmetry, the Tversky index does not meet the criteria for a similarity metric. However, if symmetry is needed a variant of the original formulation has been proposed using max and min functions[3] .

S(X,Y)=|XY||XY|+β(αa+(1α)b)

a=min(|XY|,|YX|),

b=max(|XY|,|YX|),

This formulation also re-arranges parameters α and β. Thus, α controls the balance between |XY| and |YX| in the denominator. Similarly, β controls the effect of the symmetric difference |XY| versus |XY| in the denominator.

Notes

Template:Reflist

  1. Template:Cite journal
  2. Template:Cite web
  3. Jimenez, S., Becerra, C., Gelbukh, A. SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for Semantic Textual Similarity. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, p.194-201, June 7–8, 2013, Atlanta, Georgia, USA.