Misplaced Pages

Tversky index

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

The Tversky index, named after Amos Tversky, is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of the Sørensen–Dice coefficient and the Jaccard index.

For sets X and Y the Tversky index is a number between 0 and 1 given by

S ( X , Y ) = | X Y | | X Y | + α | X Y | + β | Y X | {\displaystyle S(X,Y)={\frac {|X\cap Y|}{|X\cap Y|+\alpha |X\setminus Y|+\beta |Y\setminus X|}}}

Here, X Y {\displaystyle X\setminus Y} denotes the relative complement of Y in X.

Further, α , β 0 {\displaystyle \alpha ,\beta \geq 0} are parameters of the Tversky index. Setting α = β = 1 {\displaystyle \alpha =\beta =1} produces the Jaccard index; setting α = β = 0.5 {\displaystyle \alpha =\beta =0.5} produces the Sørensen–Dice coefficient.

If we consider X to be the prototype and Y to be the variant, then α {\displaystyle \alpha } corresponds to the weight of the prototype and β {\displaystyle \beta } corresponds to the weight of the variant. Tversky measures with α + β = 1 {\displaystyle \alpha +\beta =1} are of special interest.

Because of the inherent asymmetry, the Tversky index does not meet the criteria for a similarity metric. However, if symmetry is needed a variant of the original formulation has been proposed using max and min functions .

S ( X , Y ) = | X Y | | X Y | + β ( α a + ( 1 α ) b ) {\displaystyle S(X,Y)={\frac {|X\cap Y|}{|X\cap Y|+\beta \left(\alpha a+(1-\alpha )b\right)}}}

a = min ( | X Y | , | Y X | ) {\displaystyle a=\min \left(|X\setminus Y|,|Y\setminus X|\right)} ,

b = max ( | X Y | , | Y X | ) {\displaystyle b=\max \left(|X\setminus Y|,|Y\setminus X|\right)} ,

This formulation also re-arranges parameters α {\displaystyle \alpha } and β {\displaystyle \beta } . Thus, α {\displaystyle \alpha } controls the balance between | X Y | {\displaystyle |X\setminus Y|} and | Y X | {\displaystyle |Y\setminus X|} in the denominator. Similarly, β {\displaystyle \beta } controls the effect of the symmetric difference | X Y | {\displaystyle |X\,\triangle \,Y\,|} versus | X Y | {\displaystyle |X\cap Y|} in the denominator.

Notes

  1. Tversky, Amos (1977). "Features of Similarity" (PDF). Psychological Review. 84 (4): 327–352. doi:10.1037/0033-295x.84.4.327.
  2. "Daylight Theory: Fingerprints".
  3. Jimenez, S., Becerra, C., Gelbukh, A. SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for Semantic Textual Similarity. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, p.194-201, June 7–8, 2013, Atlanta, Georgia, USA.
Categories: