マハラノビス距離

コンテンツ

マハラノビスメトリックを使用する主な意味

1.用語と定義

2.2つのポイント間およびポイントとクラス間のマハラノビス距離

2.1。理論情報

2.2。2点間および点とクラス間の距離を計算するためのアルゴリズム

2.3。2点間および点とクラス

3間の距離の計算例。2つのクラス間のマハラノビス距離

3.1。理論情報

3.2。2つのクラス間の距離を計算するためのアルゴリズム

3.3。2つのクラス間の距離を計算する例

4.マハラノビス距離とk最近傍法

5.マハラノビスの加重距離

6.結論





コメントやエラーがある場合は、quwarm @ gmail.comまたはコメントに書き込んでください






マハラノビス距離を使用する主なポイント

図1では、2つの観測値が赤い点で示されています。

クラスの中心は青い点で示されています。





図1.予測楕円を含む2Dデータ
図1.予測楕円を含む2Dデータ

問題は、どの観察がクラスの中心に近いかということです。

答えは、距離の測定方法によって異なります。





我々はに応じて距離を測定する場合は、ユークリッドメトリック、我々は、クラスの中心からの距離ことを得る(0、0)ポイントには(-4、4)に等しくなる\ sqrt {32}点まで、(5、5)等しく\ sqrt {50}、それはある、ポイントは(-4、4) 近いクラスの中心に。





Y , バツ, (-4、4) « » , (5、5).





, , , (5、5) , (-4、4). , , (0、0) (-4、4) 0.15686, (5、5) 0.07519, . . (5、5) . — .





, , .






1.

— , \ mathbb {R} ^ n, n — .





C — : C = \ {X_1、\ ldots、X_m \}, mC.





バツn : X =(x_1、\ ldots、x_n).





n , 私私 .





«» n- . .





-. - — {\正方形} ^ T, .





: 私 バツ k X _ {(k)i}.





, (-) .





2.

( ) ( ) .





2.1

U V , ( ) C COV:





d_M(U、V、COV ^ {-1})= \ sqrt {(U-V)\ cdot COV ^ {-1} \ cdot(U-V)^ T}

T , COV ^ {-1} , .





, .

, ( 1) ( 0) , .





-.





( [internet archive] ), . . d_M U V COV :

1. : d_M(U、V、COV ^ {-1})= 0 \ iff U = V;

2. : d_M(U、V、COV ^ {-1})= d_M(V、U、COV ^ {-1});

3. : d_M(U、W、COV ^ {-1})\ le d_M(U、V、COV ^ {-1})+ d_M(V、W、COV ^ {-1}).

: d_M(U、V、COV ^ {-1})\ ge 0.





, 0, 0 (max(0.0, value)



) NaN, ( sqrt



0.5



) 0 (, \ mathrm {-1e ^ {-17}} \約0). .





, — .





( ) , — ( ) (. . « »).





, . , .

(, k- , . 4) , .





, , * .





*

:





\mu_i = \frac {1} {|C|} \sum_{X \in C} {X_i}

\mu_iC i , |C|C, X_ii X.





\mu C:





\mu = ( \mu_1, \ldots, \mu_n ) = \left ( \frac {1} {|C|} \sum_{X \in C} {X_i} \middle| i=1 \ldots n \right )

— .

, ().





. n, — n \times n, :





COV= \begin{pmatrix} cov_{1,1} & cov_{1,2} & \cdots & cov_{1,n} \\ cov_{2,1} & cov_{2,2} & \cdots & cov_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ cov_{n,1} & cov_{n,2} & \cdots & cov_{n,n} \end{pmatrix}

— — ( , . «sample covariance»):





cov_{a,b} = \frac {1} {|C|-1} \sum_{X \in C} {(X_a - \mu_a) \cdot (X_b - \mu_b)} \tag {SC}

\mu_a \mu_ba b , |C|C.





\mathrm {(SC)} , \operatorname E_a \operatorname E_b . , ( , . «population covariance»):





cov_{a,b} = \frac {1} {|C|} \sum_{X \in C} {(X_a - \operatorname E_a) \cdot (X_b - \operatorname E_b)} \tag {PC}

:





  1. a b () , cov_{a,b}>0;





  2. a , b ( ), cov_{a,b}<0;





  3. a b , cov_{a,b}=0( *).





  4. : cov_{a,b} = cov_{b,a}.





  5. — : \left|{cov_{a,b}}\right| \leq \sigma_a \sigma_b.





*

X[-1, 1] Y=X^2.

X Y , :





\operatorname {cov}(X, Y) = \operatorname {cov}(X, X^2) = M[X \cdot X^2] - M[X] \cdot M[X^2] = \\ = M[X^3] - M[X] \cdot M[X^2] =0-0 \cdot M[X^2] = 0

M — .





2.





 2.      X  Y
2. X Y

COV , , ( ), , COV . .





, :

1. - i , , i .

: \{ (1, 1), (2, 1), (3, 1) \}.

2. (\forall a \forall b \space cov_{a,b}=\sigma_a \sigma_b, «perfect covariance»). :

\{(1, 1), (2, 2), (3, 3)\} — ;

\{(1, 3), (2, 2), (3, 1)\} — .

3. |C| n 1:

|C|<n+1

.





, ?

.

, .

:





1.
  1. , () i .





  2. i .





2. —

(, ), — - ( ):





d_{E-M}(U, V, (COV+E)^{-1}) = \sqrt {(U - V) \cdot {(COV+E)}^{-1} \cdot (U - V)^T}

E — , COV.





, .





3.

.

{\square}^{+} — ().

:

ginv MASS (R);

pinv numpy (Python);

pinv MATLAB;

pinv Octave.





, A^{+}, ( ) , «» : Ax=b \implies x=A^{+}b, A — , () (); , «» x, «» Ax b.

. .

, A^{-1} ( , A — ), A^{-1} : \mathrm {det}(A_{n \times n}) \ne 0 \iff A^{+}=A^{-1}.





:





d_M^+ (U, V, COV^{+}) = \sqrt {(U - V) \cdot COV^{+} \cdot (U - V)^T}

, : « . , , » ( ).





, , (- ).





4.

(shrinkage) — (. . ).

COV COV_{(*)} = \left ((1 - \lambda) COV + \lambda T \right), T , \lambda \in (0,1] — , COV_{(*)} \lambda, T.

:





d_{M{(*)}}(U, V, COV^{-1}_{(*)}) = \sqrt {(U - V) \cdot COV^{-1}_{(*)} \cdot (U - V)^T}

Olivier Ledoit Michael Wolf((1 - \lambda) COV + \lambda \mu E), \mu=\mathbb{trace}(COV)/nCOV, , E — , \lambda .

, , Python scikit-learn (sklearn.covariance.LedoitWolf, sklearn.covariance.ledoit_wolf, sklearn.covariance.ledoit_wolf_shrinkage).





. 8 , « , , » ( ). — ( T, \lambda, ) , .

\lambda \in (0,1].





C=\{ (1, 1), (2, 2) \}, ( Python):

\lambda=0;

(1,1) (1.5,1.5): \approx 0.7071;

(2,2) (1.5,1.5): \approx 0.7071;

(1,1) (1.51,1.5): \approx 671088.64 \ldots {63} \ldots;

(2,2) (1.51,1.5): \approx 671088.64 \ldots 04 \ldots.

:

T=\mathrm {diag}(COV) \implies COV_{(*)}= ((1 - \lambda) COV + \lambda \mathrm {diag}(COV))

\mathrm {diag}(COV)COV.





Shrunk Covariance (sklearn.covariance.ShrunkCovariance, sklearn.covariance.shrunk_covariance). \lambda, ( \lambda_{SC}=0.1).

( Ledoit — Wolf): ((1 - \lambda) COV + \lambda \mu E).





.





, LedoitWolf ShrunkCovariance ( ) empirical_covariance, (. «population covariance», \mathrm {(PC)}).





5.

( ), « »:





d_{std}(U, V, \sigma) = \sqrt {\sum_{i=1}^n {\left (\frac {U_i - V_i} {\sigma_i} \right)^2}}

\sigma_i — ( U / V) i ( , «corrected sample standard deviation»):





\sigma_i = \sqrt {\frac {1} {|C|-1} \sum_{X \in C} {(X_i - \mu_i)^2}} \tag {CSSD}

\mu_iU / V) i .

:





\mu_i = \frac {1} {|C|} \sum_{X \in C} {X_i}

\mathrm {(CSSD)} , \operatorname E_i . , ( , . «uncorrected sample standard deviation»):





\sigma_i = \sqrt {\frac {1} {|C|} \sum_{X \in C} {(X_i - \operatorname E_i)^2}} \tag {USSD}

- .





6.

:





d_{diag}(U, V, \sigma) = d_{std}(U, V, \sigma) \cdot \sqrt[n] {\prod^n_{i=1} \sigma^2_i}

:





d_{diag}(U, V, \sigma) = \sqrt {\sum_{i=1}^n {\left (\frac {U_i - V_i} {\sigma_i} \right)^2}} \cdot \sqrt[n] {\prod^n_{i=1} \sigma^2_i}

- .





, .





, , 10 , . , , , .





2.2

1. .





2. .





3. .





4. , . , .





2.3

(4, 2) (. 3):





C_{(1)} = \{ ( 1 , 1 ) , ( 2 , 2 ) , ( 3 , 3 ) , ( 4 , 4 ) , ( 5 , 5 ) \} \\ C_{(2)} = \{ ( 3 , 1 ) , ( 4 , 0 ) , ( 6 , 0 ) , ( 6 , 2 ) , ( 5 , 3 ) \}
 3.
3.

1. .





\mu_{(1)} = \left (\frac {1 + 2 + 3 + 4 + 5} {5}, \frac {1 + 2 + 3 + 4 + 5} {5} \right) = (3, 3) \\ \mu_{(2)} = \left (\frac {3 + 4 + 6 + 6 + 5} {5}, \frac {1 + 0 + 0 + 2 + 3} {5} \right) = (4.8, 1.2)

2. .





:





\sigma_{(1)1} = \sqrt {\frac {(1-3)^2+(2-3)^2+(3-3)^2+(4-3)^2+(5-3)^2} {5 - 1}} = \sqrt {2.5} \\ \sigma_{(1)2} = \sqrt {\frac {(1-3)^2+(2-3)^2+(3-3)^2+(4-3)^2+(5-3)^2} {5 - 1}} = \sqrt {2.5}

:





\sigma_{(2)1} = \sqrt {\frac {(3-4.8)^2+(4-4.8)^2+(6-4.8)^2+(6-4.8)^2+(5-4.8)^2} {5 - 1}} = \sqrt {1.7} \\ \sigma_{(2)2} = \sqrt {\frac {(1-1.2)^2+(0-1.2)^2+(0-1.2)^2+(2-1.2)^2+(3-1.2)^2} {5 - 1}} = \sqrt {1.7}

3. .





.





cov_{(1)1,1} = \sigma^2_{(1)1} = 2.5 \quad cov_{(1)1,2} = cov_{(1)2,1} \quad cov_{(1)2,2} = \sigma^2_{(1)2} = 2.5 \\ cov_{(1)1,2} = \frac {1} {5-1} \sum_{X \in C_{(1)}} {(X_1 - \mu_1) \cdot (X_2 - \mu_2)} = \\  \frac {1} {4} \left ( (1 - 3) (1 - 3) + (2 - 3) (2 - 3) + (3 - 3) (3 - 3) + \\ + (4 - 3) (4 - 3) + (5 - 3) (5 - 3) \right) = \frac {10} {4} = 2.5

:





COV_{(1)} = \begin{pmatrix} cov_{(1)1,1} & cov_{(1)1,2} \\ cov_{(1)2,1} & cov_{(1)2,2} \end{pmatrix} = \begin{pmatrix} 2.5 & 2.5 \\ 2.5 & 2.5 \end{pmatrix}

: 2.5 \cdot 2.5 - 2.5 \cdot 2.5 = 0. , COV_{(1)} .





.





cov_{(2)1,1} = \sigma^2_{(2)1} = 1.7 \quad cov_{(2)1,2} = cov_{(2)2,1} \quad cov_{(2)2,2} = \sigma^2_{(2)2} = 1.7 \\ cov_{(2)1,2} = \frac {1} {5-1} \sum_{X \in C_{(2)}} {(X_1 - \mu_1) \cdot (X_2 - \mu_2)} = \\ = \frac {1} {4} \left ( (3 - 4.8) (1 - 1.2) + (4 - 4.8) (0 - 1.2) + (6 - 4.8) (0 - 1.2) + \\ + (6 - 4.8) (2 - 1.2) + (5 - 4.8) (3 - 1.2) \right) = \frac {1.2} {4} = 0.3

:





COV_{(2)} = \begin{pmatrix} cov_{(2)1,1} & cov_{(2)1,2} \\ cov_{(2)2,1} & cov_{(2)2,2} \end{pmatrix} = \begin{pmatrix} 1.7 & 0.3 \\ 0.3 & 1.7 \end{pmatrix}

: 1.7*1.7-0.3*0.3=2.8 \neq 0. , COV_{(2)} .





Python 3.6 numpy 1.19.5
import numpy as np

classes = [
    np.array([[1, 1], [2, 2], [3, 3], [4, 4], [5, 5]]),
    np.array([[3, 1], [4, 0], [6, 0], [6, 2], [5, 3]])
]

centroids = [class_.mean(axis=0) for class_ in classes]
standard_deviations = [class_.std(axis=0, ddof=1) for class_ in classes]
covariance_matrices = np.array([np.cov(class_, rowvar=False, ddof=1) for class_ in classes])
det_covariance_matrices = [np.linalg.det(cov) for cov in covariance_matrices]

print("Centroids:", *centroids)
print("Standard deviations:", *standard_deviations)
print("Covariance matrices:", *covariance_matrices.tolist())
print("Determinants of covariance matrices:", det_covariance_matrices)

      
      



:





Centroids: [3. 3.] [4.8 1.2]
Standard deviations: [1.58113883 1.58113883] [1.30384048 1.30384048]
Covariance matrices: [[2.5, 2.5], [2.5, 2.5]] [[1.7, 0.3], [0.3, 1.7]]
Determinants of covariance matrices: [0.0, 2.8]
      
      



4. , — . , .





, , , .

(4,2) , .





. , 5 : (1) — , (2) (LedoitWolf), (3) , (4) , (5) — :





1. — .





d_{E-M}\left((4,2), (1,1), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.8257 \\ d_{E-M}\left((4,2), (2,2), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.5275 \\ d_{E-M}\left((4,2), (3,3), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.4142 \\ d_{E-M}\left((4,2), (4,4), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.5275 \\ d_{E-M}\left((4,2), (5,5), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.8257
Python 3.6 numpy 1.19.5
import numpy as np


def mahalanobis(point_from, point_to, inverse_covariance_matrix):
    delta = point_from - point_to
    return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta)) ** 0.5


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
covariance_matrix = np.cov(class_, rowvar=False, ddof=1)
inverse_covariance_matrix = np.linalg.inv(covariance_matrix + np.identity(covariance_matrix.shape[0]))
print("  :\n", inverse_covariance_matrix, sep='')

for point_to in [class_.mean(axis=0), *class_]:
    print("d_E-M (", test_point, ", ", point_to, ", (COV+E)^(-1)) = ",
          mahalanobis(test_point, point_to, inverse_covariance_matrix), sep='')

      
      



:





  :
[[ 0.58333333 -0.41666667]
 [-0.41666667  0.58333333]]
d_E-M ([4. 2.], [3. 3.], (COV+E)^(-1)) = 1.414213562373095
d_E-M ([4. 2.], [1. 1.], (COV+E)^(-1)) = 1.8257418583505538
d_E-M ([4. 2.], [2. 2.], (COV+E)^(-1)) = 1.5275252316519465
d_E-M ([4. 2.], [3. 3.], (COV+E)^(-1)) = 1.414213562373095
d_E-M ([4. 2.], [4. 4.], (COV+E)^(-1)) = 1.5275252316519465
d_E-M ([4. 2.], [5. 5.], (COV+E)^(-1)) = 1.8257418583505536
      
      



— , .





2. (LedoitWolf).





d_{M{(*)}}\left((4,2), (1,1), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 2.4284 \\ d_{M{(*)}}\left((4,2), (2,2), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 2.0378 \\ d_{M{(*)}}\left((4,2), (3,3), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 1.8898 \\ d_{M{(*)}}\left((4,2), (4,4), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 2.0378 \\ d_{M{(*)}}\left((4,2), (5,5), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 2.4284
Python 3.6 numpy 1.19.5 scikit-learn 0.24.1
import numpy as np
from sklearn.covariance import LedoitWolf


def mahalanobis(point_from, point_to, inverse_covariance_matrix):
    delta = point_from - point_to
    return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta)) ** 0.5


def approx(number, *, sign, epsilon=1e-4):
    return number + np.sign(sign) * epsilon


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
lw = LedoitWolf().fit(class_)
lw_covariance_matrix = lw.covariance_
lw_lambda = lw.shrinkage_
covariance_matrix = np.cov(class_, rowvar=False, ddof=0)
mu = np.sum(np.trace(covariance_matrix)) / class_.shape[0]
T = mu * np.identity(class_.shape[1])
print("T:", *T)
print("COV(*):", *lw_covariance_matrix)
print("Lambda:", lw_lambda)

#   - T    
# ( :     T )
# ddof=0, . . LedoitWolf  empirical_covariance (.  )
first_condition = (np.linalg.eig(T)[0] > approx(0., sign=+1)).all()
print("All(", np.linalg.eig(T)[0], ") > 0 ? -> ", first_condition, sep='')

#   -    (0, 1]
second_condition = approx(0., sign=+1) < lw_lambda <= 1
print("Lambda =", lw_lambda, "in (0, 1] ? ->", second_condition)

#   -     COV(*)
#     lambda,      T
cov_eig = min(np.linalg.eig(lw_covariance_matrix)[0])
lambda_t_eig = lw_lambda * min(np.linalg.eig(T)[0])
third_condition = cov_eig >= lambda_t_eig
print(cov_eig, ">=", lambda_t_eig, "? ->", third_condition)
conditions = [first_condition, second_condition, third_condition]

if all(conditions):
    print("   ")
    #  
    inverse_lw_covariance_matrix = np.linalg.inv(lw_covariance_matrix)
    print("  :\n", inverse_lw_covariance_matrix, sep='')
    for point_to in [class_.mean(axis=0), *class_]:
        print("d_M(*) (", test_point, ", ", point_to, ", COV(*)) = ",
              mahalanobis(test_point, point_to, inverse_lw_covariance_matrix), sep='')
else:
    print("  (1-3): ", [i for i, x in enumerate(conditions, 1) if not x])

      
      



:





T: [0.8 0. ] [0.  0.8]
COV(*): [2.   1.44] [1.44 2.  ]
Lambda: 0.27999999999999997
All([0.8 0.8]) > 0 ? -> True
Lambda = 0.27999999999999997 in (0, 1] ? -> True
0.56 >= 0.22399999999999998 ? -> True
   
  :
[[ 1.03820598 -0.74750831]
 [-0.74750831  1.03820598]]
d_M(*) ([4. 2.], [3. 3.], COV(*)) = 1.889822365046136
d_M(*) ([4. 2.], [1. 1.], COV(*)) = 2.4283759936997833
d_M(*) ([4. 2.], [2. 2.], COV(*)) = 2.037847864848056
d_M(*) ([4. 2.], [3. 3.], COV(*)) = 1.889822365046136
d_M(*) ([4. 2.], [4. 4.], COV(*)) = 2.037847864848056
d_M(*) ([4. 2.], [5. 5.], COV(*)) = 2.4283759936997833
      
      



— , .





3. .





. .





d_{M^+}\left((4,2), (1,1), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) \approx 1.2649 \\ d_{M^+}\left((4,2), (2,2), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) \approx 0.6324 \\ d_{M^+}\left((4,2), (3,3), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) = 0.0000 \\ d_{M^+}\left((4,2), (4,4), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) \approx 0.6324 \\ d_{M^+}\left((4,2), (5,5), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) \approx 1.2649

, — .





Python 3.6 numpy 1.19.5
import numpy as np


def mahalanobis(point_from, point_to, inverse_covariance_matrix):
    delta = point_from - point_to
    return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta)) ** 0.5


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
covariance_matrix = np.cov(class_, rowvar=False, ddof=1)
#    (Singular Value Decomposition, SVD)
#    
pseudo_inverse_covariance_matrix = np.linalg.pinv(covariance_matrix)
print("  :\n", pseudo_inverse_covariance_matrix, sep='')

for point_to in [class_.mean(axis=0), *class_]:
    print("d_M+ (", test_point, ", ", point_to, ", COV+) = ",
          mahalanobis(test_point, point_to, pseudo_inverse_covariance_matrix), sep='')

      
      



:





  :
[[0.1 0.1]
 [0.1 0.1]]
d_M+ ([4. 2.], [3. 3.], COV+) = 0.0
d_M+ ([4. 2.], [1. 1.], COV+) = 1.2649110640673513
d_M+ ([4. 2.], [2. 2.], COV+) = 0.6324555320336757
d_M+ ([4. 2.], [3. 3.], COV+) = 0.0
d_M+ ([4. 2.], [4. 4.], COV+) = 0.6324555320336757
d_M+ ([4. 2.], [5. 5.], COV+) = 1.2649110640673513
      
      



— , .





4. .





d_{std}((4,2), (1,1), (\sqrt {2.5}, \sqrt {2.5})) = 2.0000 \\ d_{std}((4,2), (2,2), (\sqrt {2.5}, \sqrt {2.5})) \approx 1.2649 \\ d_{std}((4,2), (3,3), (\sqrt {2.5}, \sqrt {2.5})) \approx 0.8944 \\ d_{std}((4,2), (4,4), (\sqrt {2.5}, \sqrt {2.5})) \approx 1.2649 \\ d_{std}((4,2), (5,5), (\sqrt {2.5}, \sqrt {2.5})) = 2.0000
Python 3.6 numpy 1.19.5
import numpy as np


def euclid_std(point_from, point_to, standard_deviations):
    return sum(((point_from - point_to) / standard_deviations) ** 2) ** 0.5


def approx(number, *, sign, epsilon=1e-4):
    return number + np.sign(sign) * epsilon


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
standard_deviations = class_.std(axis=0, ddof=1)

#       0
std_le_0 = standard_deviations <= approx(0., sign=+1, epsilon=1e-6)
print(" :\n", standard_deviations, sep='')

if std_le_0.any():
    print("     0: ", np.where(std_le_0)[0])
else:
    for point_to in [class_.mean(axis=0), *class_]:
        print("d_std (", test_point, ", ", point_to, ", sigma) = ",
              euclid_std(test_point, point_to, standard_deviations), sep='')

      
      



:





 :
[1.58113883 1.58113883]
d_std ([4. 2.], [3. 3.], sigma) = 0.8944271909999159
d_std ([4. 2.], [1. 1.], sigma) = 1.9999999999999998
d_std ([4. 2.], [2. 2.], sigma) = 1.2649110640673518
d_std ([4. 2.], [3. 3.], sigma) = 0.8944271909999159
d_std ([4. 2.], [4. 4.], sigma) = 1.2649110640673518
d_std ([4. 2.], [5. 5.], sigma) = 1.9999999999999998
      
      



— , .





5. .





d_{diag}((4,2), (1,1), (\sqrt {2.5}, \sqrt {2.5})) = 5.0000 \\ d_{diag}((4,2), (2,2), (\sqrt {2.5}, \sqrt {2.5})) \approx 3.1623 \\ d_{diag}((4,2), (3,3), (\sqrt {2.5}, \sqrt {2.5})) \approx 2.2360 \\ d_{diag}((4,2), (4,4), (\sqrt {2.5}, \sqrt {2.5})) \approx 3.1623 \\ d_{diag}((4,2), (5,5), (\sqrt {2.5}, \sqrt {2.5})) = 5.0000
Python 3.6 numpy 1.19.5
import numpy as np


def euclid_std(point_from, point_to, standard_deviations):
    return sum(((point_from - point_to) / standard_deviations) ** 2) ** 0.5


def euclid_diag(point_from, point_to, standard_deviations):
    return euclid_std(point_from, point_to, standard_deviations) \
           * (np.prod(standard_deviations ** 2)) ** (1. / point_from.shape[0])


def approx(number, *, sign, epsilon=1e-4):
    return number + np.sign(sign) * epsilon


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
standard_deviations = class_.std(axis=0, ddof=1)

#       0
std_le_0 = standard_deviations <= approx(0., sign=+1, epsilon=1e-6)
print(" :\n", standard_deviations, sep='')

if std_le_0.any():
    print("     0: ", np.where(std_le_0)[0])
else:
    for point_to in [class_.mean(axis=0), *class_]:
        print("d_diag (", test_point, ", ", point_to, ", sigma) = ",
              euclid_diag(test_point, point_to, standard_deviations), sep='')

      
      



:





 :
[1.58113883 1.58113883]
d_diag ([4. 2.], [3. 3.], sigma) = 2.2360679774997902
d_diag ([4. 2.], [1. 1.], sigma) = 5.0
d_diag ([4. 2.], [2. 2.], sigma) = 3.16227766016838
d_diag ([4. 2.], [3. 3.], sigma) = 2.2360679774997902
d_diag ([4. 2.], [4. 4.], sigma) = 3.16227766016838
d_diag ([4. 2.], [5. 5.], sigma) = 5.0
      
      



— , .





. ( ). :





COV^{-1}_{(2)} = \frac {1} {\Delta_{(2)}} A^{T}_{(2)}

\Delta_{(2)}COV_{(2)}, A^{T}_{(2)} — .





A^{T}_{(2)} = \begin{pmatrix} A_{(2)1,1} & A_{(2)2,1} \\ A_{(2)1,2} & A_{(2)2,2} \end{pmatrix} = \begin{pmatrix} 1.7 & -0.3 \\ -0.3 & 1.7 \end{pmatrix} \\ COV^{-1}_{(2)} = \frac {1} {2.8} \begin{pmatrix} 1.7 & -0.3 \\ -0.3 & 1.7 \end{pmatrix} = \begin{pmatrix} 0.6071 & -0.1071 \\ -0.1071 & 0.6071 \end{pmatrix} d_{M}((4,2), (3,1), COV^{-1}_{(2)}) = \sqrt {((4,2)-(3,1)) \cdot COV^{-1}_{(2)} \cdot ((4,2)-(3,1))^T} = \\ \sqrt {(4-3, 2-1) \cdot \begin{pmatrix} 0.6071 & -0.1071 \\ -0.1071 & 0.6071 \end{pmatrix} \cdot (4-3, 2-1)^T} = \\ \sqrt {(0.5, 0.5) \cdot \begin{pmatrix} 1 \\ 1 \end{pmatrix}} = 1 d_{M}((4,2), (4.8,1.2), COV^{-1}_{(2)}) \approx 0.9562 \\ d_{M}((4,2), (3,1), COV^{-1}_{(2)}) = 1.0000 \\ d_{M}((4,2), (4,0), COV^{-1}_{(2)}) \approx 1.5584 \\ d_{M}((4,2), (6,0), COV^{-1}_{(2)}) \approx 2.3905 \\ d_{M}((4,2), (6,2), COV^{-1}_{(2)}) \approx 1.5584 \\ d_{M}((4,2), (5,3), COV^{-1}_{(2)}) = 1.0000 d_{E-M}((4,2), (4.8,1.2), (COV_{(2)}+E)^{-1}) \approx 0.7303 \\ d_{E-M}((4,2), (3,1), (COV_{(2)}+E)^{-1}) \approx 0.8165 \\ d_{E-M}((4,2), (4,0), (COV_{(2)}+E)^{-1}) \approx 1.2247 \\ d_{E-M}((4,2), (6,0), (COV_{(2)}+E)^{-1}) \approx 1.8257 \\ d_{E-M}((4,2), (6,2), (COV_{(2)}+E)^{-1}) \approx 1.2247 \\ d_{E-M}((4,2), (5,3), (COV_{(2)}+E)^{-1}) \approx 0.8165
Python 3.6 numpy 1.19.5
import numpy as np


def mahalanobis(point_from, point_to, inverse_covariance_matrix):
    delta = point_from - point_to
    return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta)) ** 0.5


def approx(number, *, sign, epsilon=1e-4):
    return number + np.sign(sign) * epsilon


test_point = np.array([4., 2.])
class_ = np.array([[3., 1.], [4., 0.], [6., 0.], [6., 2.], [5., 3.]])
covariance_matrix = np.cov(class_, rowvar=False, ddof=1)
if abs(np.linalg.det(covariance_matrix)) <= approx(0., sign=+1):
    print("  0.  .")
else:
    inverse_covariance_matrix = np.linalg.inv(covariance_matrix)
    print("   (d_M):\n", inverse_covariance_matrix, sep='')
    for point_to in [class_.mean(axis=0), *class_]:
        print("d_M (", test_point, ", ", point_to, ", COV^(-1)) = ",
              mahalanobis(test_point, point_to, inverse_covariance_matrix), sep='')

covariance_matrix = covariance_matrix + np.identity(class_.shape[1])
inverse_covariance_matrix = np.linalg.inv(covariance_matrix)
print("   (d_E-M):\n", inverse_covariance_matrix, sep='')

for point_to in [class_.mean(axis=0), *class_]:
    print("d_E-M (", test_point, ", ", point_to, ", (COV+E)^(-1)) = ",
          mahalanobis(test_point, point_to, inverse_covariance_matrix), sep='')

      
      



:





   (d_M):
[[ 0.60714286 -0.10714286]
 [-0.10714286  0.60714286]]
d_M ([4. 2.], [4.8 1.2], COV^(-1)) = 0.9561828874675149
d_M ([4. 2.], [3. 1.], COV^(-1)) = 1.0
d_M ([4. 2.], [4. 0.], COV^(-1)) = 1.5583874449479593
d_M ([4. 2.], [6. 0.], COV^(-1)) = 2.3904572186687876
d_M ([4. 2.], [6. 2.], COV^(-1)) = 1.5583874449479593
d_M ([4. 2.], [5. 3.], COV^(-1)) = 1.0
   (d_E-M):
[[ 0.375      -0.04166667]
 [-0.04166667  0.375     ]]
d_E-M ([4. 2.], [4.8 1.2], (COV+E)^(-1)) = 0.7302967433402214
d_E-M ([4. 2.], [3. 1.], (COV+E)^(-1)) = 0.8164965809277259
d_E-M ([4. 2.], [4. 0.], (COV+E)^(-1)) = 1.224744871391589
d_E-M ([4. 2.], [6. 0.], (COV+E)^(-1)) = 1.8257418583505536
d_E-M ([4. 2.], [6. 2.], (COV+E)^(-1)) = 1.224744871391589
d_E-M ([4. 2.], [5. 3.], (COV+E)^(-1)) = 0.8164965809277259
      
      



— .









:

1. — : \approx 1.4142;

2. (LedoitWolf): \approx 1.8898;

3. : 0.0000;

4. : \approx 0.8944;

5. : \approx 2.2360.





( ): 1.0000.

( — ): \approx 0.8165.





3, ( ) , .









:

1. — : \approx 1.8257;

2. (LedoitWolf): \approx 2.4284;

3. : \approx 1.2649;

4. : 2.0000;

5. : 5.0000.





( ): \approx 2.3905.

( — ): \approx 1.8257.





3, ( ) .









:

1. — : \approx 1.4142;

2. (LedoitWolf): \approx 1.8898;

3. : 0.0000;

4. : \approx 0.8944;

5. : \approx 2.2360.





( ): \approx 0.9562.

( — ): \approx 0.7303.





3, ( ) — , .









.





3.

( ) .





, . .:

d_M(C_i, C_j, COV_0^{-1}) \ge 0, d_M(C_i, C_j, COV_0^{-1})=0 \impliedby C_i=C_j, d_M(C_i, C_j, COV_0^{-1})=d_M(C_j, C_i, COV_0^{-1}).

— ( ) d_M(C_i, C_j, COV_0^{-1})=0 \implies C_i=C_j.

.





3.1

C_1 C_2 \overline {C_1} \overline {C_2} COV_1 COV_2 :





d_M \left(\overline {C_1}, \overline {C_2}, COV^{-1}_0 \right) = \sqrt {\left (\overline {C_1} - \overline {C_2} \right) \cdot COV^{-1}_0 \cdot \left (\overline {C_1} - \overline {C_2} \right)^T} \\ COV_0 = \frac {1} {|C_1| + |C_2| - 2} \left (COV_{(1)} + COV_{(2)} \right)

COV_0 — , COV^{-1}_0 — , COV_1 COV_2 — , |C_1| |C_2| — .





:





COV_0 = COV_{(1)} + COV_{(2)}

.

, , .





, , ( ) (. 2 « »):





d_M \left (X_{(1)}, \overline C_2, COV^{-1}_0 \right) = \sqrt {\left(X_{(1)} - \overline {C_2} \right) \cdot COV^{-1}_0 \cdot \left (X_{(1)} - \overline {C_2} \right)^T} \\ COV_0 = 0 + COV_{(2)} = COV_{(2)}

- :





d_{E-M} \left(\overline {C_1}, \overline {C_2}, \left (COV_0+E \right)^{-1} \right) = \sqrt {\left (\overline {C_1} - \overline {C_2} \right) \cdot (COV_0+E)^{-1} \cdot \left (\overline {C_1} - \overline {C_2} \right)^T}

E — , COV_0.





3.2

1. .





2. .





3. , .





4. , . , — .





3.3

. 2.2.





C_{(1)} = \{ ( 1 , 1 ) , ( 2 , 2 ) , ( 3 , 3 ) , ( 4 , 4 ) , ( 5 , 5 ) \} \\ C_{(2)} = \{ ( 3 , 1 ) , ( 4 , 0 ) , ( 6 , 0 ) , ( 6 , 2 ) , ( 5 , 3 ) \}

.

3 . 2.2.

4 .





4. , . , — .





:





COV_0 = \frac {1} {5 + 5 - 2} \left (\begin{pmatrix} 2.5 & 2.5 \\ 2.5 & 2.5 \end{pmatrix} + \begin{pmatrix} 1.7 & 0.3 \\ 0.3 & 1.7 \end{pmatrix} \right) = \\ = \frac {1} {8} \begin{pmatrix} 4.2 & 2.8 \\ 2.8 & 4.2 \end{pmatrix} = \begin{pmatrix} 0.525 & 0.35 \\ 0.35 & 0.525 \end{pmatrix}

:





COV^{-1}_0 \approx \begin{pmatrix} 3.42857143 & -2.28571429 \\ -2.28571429 & 3.42857143 \end{pmatrix}





\approx 6.0851.





d_M \left((3, 3), (4.8, 1.2), COV^{-1}_0 \right) = \\ = \sqrt {\left ((3, 3) - (4.8, 1.2) \right) \cdot COV^{-1}_0 \cdot \left ((3, 3) - (4.8, 1.2) \right)^T} = \\ = \sqrt {(-1.8, 1.8) \cdot \begin{pmatrix} 3.42857143 & -2.28571429 \\ -2.28571429 & 3.42857143 \end{pmatrix} \cdot (-1.8, 1.8)^T} \approx 6.0851

\approx 2.3484.





COV_0 = COV_{(1)} + COV_{(2)}:

\approx 2.1514;

— — \approx 1.6432.









\approx 3.3806 (2,2) (3,1) (4,4) (5,3).





\approx 1.3047 (2,2) (3,1) (4,4) (5,3).





COV_0 = COV_{(1)} + COV_{(2)}:

\approx 1.1952 (2,2) (3,1) (4,4) (5,3);

— — \approx 0.9129 (2,2) (3,1) (4,4) (5,3).









\approx 10.5830 (1,1) (6,0) (5,5) (6,0).





\approx 4.4256 (1,1) (6,0) (5,5) (6,0).





COV_0 = COV_{(1)} + COV_{(2)}:

\approx 3.7417 (1,1) (6,0) (5,5) (6,0);

— — \approx 2.9155 (1,1) (6,0) (5,5) (6,0).





Python 3.6 numpy 1.19.5 .





4. k-

k- . , k- ( ) k .





k- .

:

— : (COV+E)^{-1} ( — );

— ( , ).





Python 3.6 numpy 1.19.5
#    

import heapq
from collections import Counter
from operator import itemgetter

import numpy as np


class MkNN:
    def __init__(self, k, classes, inv_cov_matrices):
        self.n = len(classes)
        self.k = k
        self.classes = classes
        self.inv_cov_matrices = inv_cov_matrices

    @staticmethod
    def mahalanobis_sqr(point_from, point_to, inverse_covariance_matrix):
        delta = point_from - point_to
        return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta))

    def _get_k_smallest(self, test_point):
        generator = (
            (MkNN.mahalanobis_sqr(test_point, point, inv_cov), i)
            for i, (class_, inv_cov) in enumerate(zip(self.classes, self.inv_cov_matrices))
            for point in class_
        )
        return heapq.nsmallest(self.k, generator, key=itemgetter(0))

    def predict(self, test_point):
        return heapq.nlargest(1, Counter((i for _, i in self._get_k_smallest(test_point))).items(),
                              key=lambda t: (t[1], -t[0]))[0][0]

    def predict_proba(self, test_point):
        most_common = Counter((i for _, i in self._get_k_smallest(test_point)))
        classes_proba = np.array([most_common.get(i, 0) for i in range(self.n)])
        return classes_proba / classes_proba.sum()

    def predict_all_max(self, test_point):
        p = self.predict_proba(test_point)
        return np.where(p == max(p))[0]


def main():
    #  ,     -  classes
    test_points = np.array([[4., 2.]])
    #   
    classes = [
        np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]]),
        np.array([[3., 1.], [4., 0.], [6., 0.], [6., 2.], [5., 3.]])
    ]
    #   
    n_train_points = sum(class_.shape[0] for class_ in classes)
    #      
    cov_matrices = [np.cov(class_, rowvar=False, ddof=1) for class_ in classes]
    #        --   - 
    inv_cov_matrices = [np.linalg.inv(cov + np.identity(cov.shape[0])) for cov in cov_matrices]
    for test_point in test_points:
        print("Point:", test_point)
        # k  1    ( )
        for i in range(1, n_train_points):
            classifier = MkNN(i, classes, inv_cov_matrices)
            print(f"{i}nn:",
                  1 + classifier.predict(test_point),
                  classifier.predict_proba(test_point),
                  classifier.predict_all_max(test_point))


if __name__ == "__main__":
    main()

      
      



:

"knn: [ ( 1), ] [ ] [ ( 0), ]".





Point: [4. 2.]
1nn: 2 [0. 1.] [1]
2nn: 2 [0. 1.] [1]
3nn: 2 [0. 1.] [1]
4nn: 2 [0. 1.] [1]
5nn: 2 [0.2 0.8] [1]
6nn: 2 [0.33333333 0.66666667] [1]
7nn: 2 [0.42857143 0.57142857] [1]
8nn: 1 [0.5 0.5] [0 1]
9nn: 2 [0.44444444 0.55555556] [1]
      
      



k . 4.





 4.
4.
Python 3.6
# ...

from operator import sub

import numpy as np  # 1.19.5
from matplotlib import colors, pyplot as plt  # 3.3.4

#    
def show_data_on_mesh(k, classes, inv_cov_matrices):
    #  
    min_ = np.min([np.min(class_, axis=0) for class_ in classes], axis=1) - 1
    max_ = np.max([np.max(class_, axis=0) for class_ in classes], axis=1) + 1
    min_c = min(min_[0], min_[1])
    max_c = max(max_[0], max_[1])
    h = 0.05
    test_mesh = np.meshgrid(np.arange(min_c, max_c, h), np.arange(min_c, max_c, h))
    test_points = np.c_[test_mesh[0].ravel(), test_mesh[1].ravel()]
    #   
    classifier = MkNN(k, classes, inv_cov_matrices)
    test_mesh_labels = [sub(*classifier.predict_proba(x)) for x in test_points]
    #  
    plt.figure(figsize=(6, 5), dpi=90)
    class_colormap = colors.ListedColormap(['#070648', '#480607'])
    plt.pcolormesh(test_mesh[0], test_mesh[1],
                   np.asarray(test_mesh_labels).reshape(test_mesh[0].shape),
                   cmap='coolwarm', shading='nearest')
    plt.colorbar()
    plt.scatter([point[0] for class_ in classes for point in class_],
                [point[1] for class_ in classes for point in class_],
                c=[-i for i, class_ in enumerate(classes) for _ in class_],
                cmap=class_colormap)
    plt.axis([min_c, max_c, min_c, max_c])
    plt.xlabel("X")
    plt.ylabel("Y")
    plt.title("k=" + str(k))
    plt.show()

# ...

      
      



k.





5.

, — \Lambda, :





d_{M-\Lambda} ( U , V, COV^{-1}, \Lambda ) = \sqrt { ( U - V ) * \Lambda \cdot COV^{-1} \cdot \Lambda^T * ( U - V )^T }

( , ) :





\Lambda = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}

— :





d_ {EM- \ Lambda}(U、V、\ left(COV + E \ right)^ {-1}、\ Lambda)= \ sqrt {(U --V)* \ Lambda \ cdot \ left(COV + E \右)^ {-1} \ cdot \ラムダ^ T *(U-V)^ T}

6.

, ( ) , k- — .





?

1. , .

2. «Metric Learning» (: Aurélien Bellet, Amaury Habrard, Marc Sebban; ).

3. (, k-means: 1, 2, 3).

4. ().

5. ( 1, 記事2記事3)。






コメントやエラーがある場合は、quwarm @ gmail.comまたはコメントに書き込んでください








All Articles