🥠 🍷 ⛰️ マハラノビス距離 🕒 🖐🏾 🛰️

コンテンツ

マハラノビスメトリックを使用する主な意味

1.用語と定義

2.2つのポイント間およびポイントとクラス間のマハラノビス距離

2.1。理論情報

2.2。2点間および点とクラス間の距離を計算するためのアルゴリズム

2.3。2点間および点とクラス

3間の距離の計算例。2つのクラス間のマハラノビス距離

3.1。理論情報

3.2。2つのクラス間の距離を計算するためのアルゴリズム

3.3。2つのクラス間の距離を計算する例

4.マハラノビス距離とk最近傍法

5.マハラノビスの加重距離

6.結論

コメントやエラーがある場合は、quwarm @ gmail.comまたはコメントに書き込んでください。

マハラノビス距離を使用する主なポイント

図1では、2つの観測値が赤い点で示されています。

クラスの中心は青い点で示されています。

問題は、どの観察がクラスの中心に近いかということです。

答えは、距離の測定方法によって異なります。

我々はに応じて距離を測定する場合は、ユークリッドメトリック、我々は、クラスの中心からの距離ことを得る（0、0）ポイントには（-4、4）に等しくなる $\ sqrt {32}$ 点まで、（5、5）等しく $\ sqrt {50}$ 、それはある、ポイントは近いクラスの中心に。

, , （-4、4） « » , （5、5） .

, , , （5、5） , （-4、4） . , , （0、0） 0.15686 , 0.07519 , . . . — .

, , .

1.

— , $\ mathbb {R} ^ n$ , — .

— : $C = \ {X_1、\ ldots、X_m \}$ , — .

— : $X =（x_1、\ ldots、x_n）$ .

, — .

«» - . .

-. - — ${\正方形} ^ T$ , .

: $X _ {（k）i}$ .

, (-) .

2.

( ) ( ) .

2.1

— , ( ) COV :

$d_M（U、V、COV ^ {-1}）= \ sqrt {（U-V）\ cdot COV ^ {-1} \ cdot（U-V）^ T}$

, $COV ^ {-1}$ , .

, .

, ( 1) ( 0) , .

( [internet archive] ), . . d_M COV :

1. : $d_M（U、V、COV ^ {-1}）= 0 \ iff U = V$ ;

2. : $d_M（U、V、COV ^ {-1}）= d_M（V、U、COV ^ {-1}）$ ;

3. : $d_M（U、W、COV ^ {-1}）\ le d_M（U、V、COV ^ {-1}）+ d_M（V、W、COV ^ {-1}）$ .

: $d_M（U、V、COV ^ {-1}）\ ge 0$ .

, 0, 0 (max(0.0, value)

) NaN, ( sqrt

0.5

) 0 (, $\ mathrm {-1e ^ {-17}} \約0$ ). .

, — .

( ) , — ( ) (. . « »).

, . , .

(, - , . 4) , .

, , * .

$\mu_i = \frac {1} {|C|} \sum_{X \in C} {X_i}$

$\mu_i$ — , |C| — , X_i — .

$\mu$ :

$\mu = ( \mu_1, \ldots, \mu_n ) = \left ( \frac {1} {|C|} \sum_{X \in C} {X_i} \middle| i=1 \ldots n \right )$

— .

, ().

. , — $n \times n$ , :

$COV= \begin{pmatrix} cov_{1,1} & cov_{1,2} & \cdots & cov_{1,n} \\ cov_{2,1} & cov_{2,2} & \cdots & cov_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ cov_{n,1} & cov_{n,2} & \cdots & cov_{n,n} \end{pmatrix}$

— — ( , . «sample covariance»):

$cov_{a,b} = \frac {1} {|C|-1} \sum_{X \in C} {(X_a - \mu_a) \cdot (X_b - \mu_b)} \tag {SC}$

$\mu_a$ $\mu_b$ — , |C| — .

$\mathrm {(SC)}$ , $\operatorname E_a$ $\operatorname E_b$ . , ( , . «population covariance»):

$cov_{a,b} = \frac {1} {|C|} \sum_{X \in C} {(X_a - \operatorname E_a) \cdot (X_b - \operatorname E_b)} \tag {PC}$

() , $cov_{a,b}>0$ ;
, ( ), $cov_{a,b}<0$ ;
, $cov_{a,b}=0$ ( *).
: $cov_{a,b} = cov_{b,a}$ .
— : $\left|{cov_{a,b}}\right| \leq \sigma_a \sigma_b$ .

— [-1, 1] Y=X^2 .

, :

$\operatorname {cov}(X, Y) = \operatorname {cov}(X, X^2) = M[X \cdot X^2] - M[X] \cdot M[X^2] = \\ = M[X^3] - M[X] \cdot M[X^2] =0-0 \cdot M[X^2] = 0$

— .

COV , , ( ), , COV . .

, :

1. - , , .

: $\{ (1, 1), (2, 1), (3, 1) \}$ .

2. ( $\forall a \forall b \space cov_{a,b}=\sigma_a \sigma_b$ , «perfect covariance»). :

$\{(1, 1), (2, 2), (3, 3)\}$ — ;

$\{(1, 3), (2, 2), (3, 1)\}$ — .

3. |C| :

|C|<n+1

.

, ?

.

, .

:

, ( — ) .
.

2. —

(, ), — - ( ):

$d_{E-M}(U, V, (COV+E)^{-1}) = \sqrt {(U - V) \cdot {(COV+E)}^{-1} \cdot (U - V)^T}$

— , COV .

, .

.

${\square}^{+}$ — ( — ).

:

— ginv MASS (R);

— pinv numpy (Python);

— pinv MATLAB;

— pinv Octave.

, $A^{+}$ , ( ) , «» : $Ax=b \implies x=A^{+}b$ , — , () (); , «» , «» .

. .

, $A^{-1}$ ( , — ), $A^{-1}$ : $\mathrm {det}(A_{n \times n}) \ne 0 \iff A^{+}=A^{-1}$ .

$d_M^+ (U, V, COV^{+}) = \sqrt {(U - V) \cdot COV^{+} \cdot (U - V)^T}$

, : « . , , » ( ).

, , (- ).

(shrinkage) — (. . ).

COV $COV_{(*)} = \left ((1 - \lambda) COV + \lambda T \right)$ , — , $\lambda \in (0,1]$ — , $COV_{(*)}$ $\lambda$ , .

:

$d_{M{(*)}}(U, V, COV^{-1}_{(*)}) = \sqrt {(U - V) \cdot COV^{-1}_{(*)} \cdot (U - V)^T}$

Olivier Ledoit Michael Wolf — $((1 - \lambda) COV + \lambda \mu E)$ , $\mu=\mathbb{trace}(COV)/n$ — COV , , — , $\lambda$ .

, , Python scikit-learn (sklearn.covariance.LedoitWolf, sklearn.covariance.ledoit_wolf, sklearn.covariance.ledoit_wolf_shrinkage).

. 8 , « , , » ( ). — ( , $\lambda$ , ) , .

$\lambda \in (0,1]$ .

$C=\{ (1, 1), (2, 2) \}$ , ( Python):

— $\lambda=0$ ;

— (1,1) (1.5,1.5) : $\approx 0.7071$ ;

— (2,2) : $\approx 0.7071$ ;

— (1,1) (1.51,1.5) : $\approx 671088.64 \ldots {63} \ldots$ ;

— (2,2) : $\approx 671088.64 \ldots 04 \ldots$ .

:

$T=\mathrm {diag}(COV) \implies COV_{(*)}= ((1 - \lambda) COV + \lambda \mathrm {diag}(COV))$

$\mathrm {diag}(COV)$ — COV .

Shrunk Covariance (sklearn.covariance.ShrunkCovariance, sklearn.covariance.shrunk_covariance). $\lambda$ , ( $\lambda_{SC}=0.1$ ).

( Ledoit — Wolf): $((1 - \lambda) COV + \lambda \mu E)$ .

, LedoitWolf ShrunkCovariance ( ) empirical_covariance, (. «population covariance», $\mathrm {(PC)}$ ).

( ), « »:

$d_{std}(U, V, \sigma) = \sqrt {\sum_{i=1}^n {\left (\frac {U_i - V_i} {\sigma_i} \right)^2}}$

$\sigma_i$ — ( / ) ( , «corrected sample standard deviation»):

$\sigma_i = \sqrt {\frac {1} {|C|-1} \sum_{X \in C} {(X_i - \mu_i)^2}} \tag {CSSD}$

$\mu_i$ — / ) .

:

$\mu_i = \frac {1} {|C|} \sum_{X \in C} {X_i}$

$\mathrm {(CSSD)}$ , $\operatorname E_i$ . , ( , . «uncorrected sample standard deviation»):

$\sigma_i = \sqrt {\frac {1} {|C|} \sum_{X \in C} {(X_i - \operatorname E_i)^2}} \tag {USSD}$

- .

$d_{diag}(U, V, \sigma) = d_{std}(U, V, \sigma) \cdot \sqrt[n] {\prod^n_{i=1} \sigma^2_i}$

$d_{diag}(U, V, \sigma) = \sqrt {\sum_{i=1}^n {\left (\frac {U_i - V_i} {\sigma_i} \right)^2}} \cdot \sqrt[n] {\prod^n_{i=1} \sigma^2_i}$

- .

, .

, , 10 , . , , , .

2.2

1. .

2. .

3. .

4. , . , .

2.3

(4, 2) (. 3):

$C_{(1)} = \{ ( 1 , 1 ) , ( 2 , 2 ) , ( 3 , 3 ) , ( 4 , 4 ) , ( 5 , 5 ) \} \\ C_{(2)} = \{ ( 3 , 1 ) , ( 4 , 0 ) , ( 6 , 0 ) , ( 6 , 2 ) , ( 5 , 3 ) \}$

1. .

$\mu_{(1)} = \left (\frac {1 + 2 + 3 + 4 + 5} {5}, \frac {1 + 2 + 3 + 4 + 5} {5} \right) = (3, 3) \\ \mu_{(2)} = \left (\frac {3 + 4 + 6 + 6 + 5} {5}, \frac {1 + 0 + 0 + 2 + 3} {5} \right) = (4.8, 1.2)$

2. .

$\sigma_{(1)1} = \sqrt {\frac {(1-3)^2+(2-3)^2+(3-3)^2+(4-3)^2+(5-3)^2} {5 - 1}} = \sqrt {2.5} \\ \sigma_{(1)2} = \sqrt {\frac {(1-3)^2+(2-3)^2+(3-3)^2+(4-3)^2+(5-3)^2} {5 - 1}} = \sqrt {2.5}$

$\sigma_{(2)1} = \sqrt {\frac {(3-4.8)^2+(4-4.8)^2+(6-4.8)^2+(6-4.8)^2+(5-4.8)^2} {5 - 1}} = \sqrt {1.7} \\ \sigma_{(2)2} = \sqrt {\frac {(1-1.2)^2+(0-1.2)^2+(0-1.2)^2+(2-1.2)^2+(3-1.2)^2} {5 - 1}} = \sqrt {1.7}$

3. .

$cov_{(1)1,1} = \sigma^2_{(1)1} = 2.5 \quad cov_{(1)1,2} = cov_{(1)2,1} \quad cov_{(1)2,2} = \sigma^2_{(1)2} = 2.5 \\ cov_{(1)1,2} = \frac {1} {5-1} \sum_{X \in C_{(1)}} {(X_1 - \mu_1) \cdot (X_2 - \mu_2)} = \\ \frac {1} {4} \left ( (1 - 3) (1 - 3) + (2 - 3) (2 - 3) + (3 - 3) (3 - 3) + \\ + (4 - 3) (4 - 3) + (5 - 3) (5 - 3) \right) = \frac {10} {4} = 2.5$

$COV_{(1)} = \begin{pmatrix} cov_{(1)1,1} & cov_{(1)1,2} \\ cov_{(1)2,1} & cov_{(1)2,2} \end{pmatrix} = \begin{pmatrix} 2.5 & 2.5 \\ 2.5 & 2.5 \end{pmatrix}$

: $2.5 \cdot 2.5 - 2.5 \cdot 2.5 = 0$ . , $COV_{(1)}$ .

$cov_{(2)1,1} = \sigma^2_{(2)1} = 1.7 \quad cov_{(2)1,2} = cov_{(2)2,1} \quad cov_{(2)2,2} = \sigma^2_{(2)2} = 1.7 \\ cov_{(2)1,2} = \frac {1} {5-1} \sum_{X \in C_{(2)}} {(X_1 - \mu_1) \cdot (X_2 - \mu_2)} = \\ = \frac {1} {4} \left ( (3 - 4.8) (1 - 1.2) + (4 - 4.8) (0 - 1.2) + (6 - 4.8) (0 - 1.2) + \\ + (6 - 4.8) (2 - 1.2) + (5 - 4.8) (3 - 1.2) \right) = \frac {1.2} {4} = 0.3$

$COV_{(2)} = \begin{pmatrix} cov_{(2)1,1} & cov_{(2)1,2} \\ cov_{(2)2,1} & cov_{(2)2,2} \end{pmatrix} = \begin{pmatrix} 1.7 & 0.3 \\ 0.3 & 1.7 \end{pmatrix}$

: $1.7*1.7-0.3*0.3=2.8 \neq 0$ . , $COV_{(2)}$ .

Python 3.6 numpy 1.19.5

import numpy as np

classes = [
    np.array([[1, 1], [2, 2], [3, 3], [4, 4], [5, 5]]),
    np.array([[3, 1], [4, 0], [6, 0], [6, 2], [5, 3]])
]

centroids = [class_.mean(axis=0) for class_ in classes]
standard_deviations = [class_.std(axis=0, ddof=1) for class_ in classes]
covariance_matrices = np.array([np.cov(class_, rowvar=False, ddof=1) for class_ in classes])
det_covariance_matrices = [np.linalg.det(cov) for cov in covariance_matrices]

print("Centroids:", *centroids)
print("Standard deviations:", *standard_deviations)
print("Covariance matrices:", *covariance_matrices.tolist())
print("Determinants of covariance matrices:", det_covariance_matrices)

Centroids: [3. 3.] [4.8 1.2]
Standard deviations: [1.58113883 1.58113883] [1.30384048 1.30384048]
Covariance matrices: [[2.5, 2.5], [2.5, 2.5]] [[1.7, 0.3], [0.3, 1.7]]
Determinants of covariance matrices: [0.0, 2.8]

4. , — . , .

, , , .

(4,2) , .

. — , 5 : (1) — , (2) (LedoitWolf), (3) , (4) , (5) — :

1. — .

$d_{E-M}\left((4,2), (1,1), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.8257 \\ d_{E-M}\left((4,2), (2,2), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.5275 \\ d_{E-M}\left((4,2), (3,3), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.4142 \\ d_{E-M}\left((4,2), (4,4), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.5275 \\ d_{E-M}\left((4,2), (5,5), \begin{pmatrix} 0.5833 & -0.4167 \\ -0.4167 & 0.5833 \end{pmatrix} \right) \approx 1.8257$

Python 3.6 numpy 1.19.5

import numpy as np


def mahalanobis(point_from, point_to, inverse_covariance_matrix):
    delta = point_from - point_to
    return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta)) ** 0.5


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
covariance_matrix = np.cov(class_, rowvar=False, ddof=1)
inverse_covariance_matrix = np.linalg.inv(covariance_matrix + np.identity(covariance_matrix.shape[0]))
print("  :\n", inverse_covariance_matrix, sep='')

for point_to in [class_.mean(axis=0), *class_]:
    print("d_E-M (", test_point, ", ", point_to, ", (COV+E)^(-1)) = ",
          mahalanobis(test_point, point_to, inverse_covariance_matrix), sep='')

  :
[[ 0.58333333 -0.41666667]
 [-0.41666667  0.58333333]]
d_E-M ([4. 2.], [3. 3.], (COV+E)^(-1)) = 1.414213562373095
d_E-M ([4. 2.], [1. 1.], (COV+E)^(-1)) = 1.8257418583505538
d_E-M ([4. 2.], [2. 2.], (COV+E)^(-1)) = 1.5275252316519465
d_E-M ([4. 2.], [3. 3.], (COV+E)^(-1)) = 1.414213562373095
d_E-M ([4. 2.], [4. 4.], (COV+E)^(-1)) = 1.5275252316519465
d_E-M ([4. 2.], [5. 5.], (COV+E)^(-1)) = 1.8257418583505536

— , .

2. (LedoitWolf).

$d_{M{(*)}}\left((4,2), (1,1), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 2.4284 \\ d_{M{(*)}}\left((4,2), (2,2), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 2.0378 \\ d_{M{(*)}}\left((4,2), (3,3), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 1.8898 \\ d_{M{(*)}}\left((4,2), (4,4), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 2.0378 \\ d_{M{(*)}}\left((4,2), (5,5), \begin{pmatrix} 1.0382 & -0.7475 \\ -0.7475 & 1.0382 \end{pmatrix} \right) \approx 2.4284$

Python 3.6 numpy 1.19.5 scikit-learn 0.24.1

import numpy as np
from sklearn.covariance import LedoitWolf


def mahalanobis(point_from, point_to, inverse_covariance_matrix):
    delta = point_from - point_to
    return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta)) ** 0.5


def approx(number, *, sign, epsilon=1e-4):
    return number + np.sign(sign) * epsilon


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
lw = LedoitWolf().fit(class_)
lw_covariance_matrix = lw.covariance_
lw_lambda = lw.shrinkage_
covariance_matrix = np.cov(class_, rowvar=False, ddof=0)
mu = np.sum(np.trace(covariance_matrix)) / class_.shape[0]
T = mu * np.identity(class_.shape[1])
print("T:", *T)
print("COV(*):", *lw_covariance_matrix)
print("Lambda:", lw_lambda)

#   - T    
# ( :     T )
# ddof=0, . . LedoitWolf  empirical_covariance (.  )
first_condition = (np.linalg.eig(T)[0] > approx(0., sign=+1)).all()
print("All(", np.linalg.eig(T)[0], ") > 0 ? -> ", first_condition, sep='')

#   -    (0, 1]
second_condition = approx(0., sign=+1) < lw_lambda <= 1
print("Lambda =", lw_lambda, "in (0, 1] ? ->", second_condition)

#   -     COV(*)
#     lambda,      T
cov_eig = min(np.linalg.eig(lw_covariance_matrix)[0])
lambda_t_eig = lw_lambda * min(np.linalg.eig(T)[0])
third_condition = cov_eig >= lambda_t_eig
print(cov_eig, ">=", lambda_t_eig, "? ->", third_condition)
conditions = [first_condition, second_condition, third_condition]

if all(conditions):
    print("   ")
    #  
    inverse_lw_covariance_matrix = np.linalg.inv(lw_covariance_matrix)
    print("  :\n", inverse_lw_covariance_matrix, sep='')
    for point_to in [class_.mean(axis=0), *class_]:
        print("d_M(*) (", test_point, ", ", point_to, ", COV(*)) = ",
              mahalanobis(test_point, point_to, inverse_lw_covariance_matrix), sep='')
else:
    print("  (1-3): ", [i for i, x in enumerate(conditions, 1) if not x])

T: [0.8 0. ] [0.  0.8]
COV(*): [2.   1.44] [1.44 2.  ]
Lambda: 0.27999999999999997
All([0.8 0.8]) > 0 ? -> True
Lambda = 0.27999999999999997 in (0, 1] ? -> True
0.56 >= 0.22399999999999998 ? -> True
   
  :
[[ 1.03820598 -0.74750831]
 [-0.74750831  1.03820598]]
d_M(*) ([4. 2.], [3. 3.], COV(*)) = 1.889822365046136
d_M(*) ([4. 2.], [1. 1.], COV(*)) = 2.4283759936997833
d_M(*) ([4. 2.], [2. 2.], COV(*)) = 2.037847864848056
d_M(*) ([4. 2.], [3. 3.], COV(*)) = 1.889822365046136
d_M(*) ([4. 2.], [4. 4.], COV(*)) = 2.037847864848056
d_M(*) ([4. 2.], [5. 5.], COV(*)) = 2.4283759936997833

— , .

3. .

. .

$d_{M^+}\left((4,2), (1,1), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) \approx 1.2649 \\ d_{M^+}\left((4,2), (2,2), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) \approx 0.6324 \\ d_{M^+}\left((4,2), (3,3), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) = 0.0000 \\ d_{M^+}\left((4,2), (4,4), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) \approx 0.6324 \\ d_{M^+}\left((4,2), (5,5), \begin{pmatrix} 0.1 & 0.1 \\ 0.1 & 0.1 \end{pmatrix} \right) \approx 1.2649$

, — .

Python 3.6 numpy 1.19.5

import numpy as np


def mahalanobis(point_from, point_to, inverse_covariance_matrix):
    delta = point_from - point_to
    return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta)) ** 0.5


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
covariance_matrix = np.cov(class_, rowvar=False, ddof=1)
#    (Singular Value Decomposition, SVD)
#    
pseudo_inverse_covariance_matrix = np.linalg.pinv(covariance_matrix)
print("  :\n", pseudo_inverse_covariance_matrix, sep='')

for point_to in [class_.mean(axis=0), *class_]:
    print("d_M+ (", test_point, ", ", point_to, ", COV+) = ",
          mahalanobis(test_point, point_to, pseudo_inverse_covariance_matrix), sep='')

  :
[[0.1 0.1]
 [0.1 0.1]]
d_M+ ([4. 2.], [3. 3.], COV+) = 0.0
d_M+ ([4. 2.], [1. 1.], COV+) = 1.2649110640673513
d_M+ ([4. 2.], [2. 2.], COV+) = 0.6324555320336757
d_M+ ([4. 2.], [3. 3.], COV+) = 0.0
d_M+ ([4. 2.], [4. 4.], COV+) = 0.6324555320336757
d_M+ ([4. 2.], [5. 5.], COV+) = 1.2649110640673513

— , .

4. .

$d_{std}((4,2), (1,1), (\sqrt {2.5}, \sqrt {2.5})) = 2.0000 \\ d_{std}((4,2), (2,2), (\sqrt {2.5}, \sqrt {2.5})) \approx 1.2649 \\ d_{std}((4,2), (3,3), (\sqrt {2.5}, \sqrt {2.5})) \approx 0.8944 \\ d_{std}((4,2), (4,4), (\sqrt {2.5}, \sqrt {2.5})) \approx 1.2649 \\ d_{std}((4,2), (5,5), (\sqrt {2.5}, \sqrt {2.5})) = 2.0000$

Python 3.6 numpy 1.19.5

import numpy as np


def euclid_std(point_from, point_to, standard_deviations):
    return sum(((point_from - point_to) / standard_deviations) ** 2) ** 0.5


def approx(number, *, sign, epsilon=1e-4):
    return number + np.sign(sign) * epsilon


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
standard_deviations = class_.std(axis=0, ddof=1)

#       0
std_le_0 = standard_deviations <= approx(0., sign=+1, epsilon=1e-6)
print(" :\n", standard_deviations, sep='')

if std_le_0.any():
    print("     0: ", np.where(std_le_0)[0])
else:
    for point_to in [class_.mean(axis=0), *class_]:
        print("d_std (", test_point, ", ", point_to, ", sigma) = ",
              euclid_std(test_point, point_to, standard_deviations), sep='')

 :
[1.58113883 1.58113883]
d_std ([4. 2.], [3. 3.], sigma) = 0.8944271909999159
d_std ([4. 2.], [1. 1.], sigma) = 1.9999999999999998
d_std ([4. 2.], [2. 2.], sigma) = 1.2649110640673518
d_std ([4. 2.], [3. 3.], sigma) = 0.8944271909999159
d_std ([4. 2.], [4. 4.], sigma) = 1.2649110640673518
d_std ([4. 2.], [5. 5.], sigma) = 1.9999999999999998

— , .

5. .

$d_{diag}((4,2), (1,1), (\sqrt {2.5}, \sqrt {2.5})) = 5.0000 \\ d_{diag}((4,2), (2,2), (\sqrt {2.5}, \sqrt {2.5})) \approx 3.1623 \\ d_{diag}((4,2), (3,3), (\sqrt {2.5}, \sqrt {2.5})) \approx 2.2360 \\ d_{diag}((4,2), (4,4), (\sqrt {2.5}, \sqrt {2.5})) \approx 3.1623 \\ d_{diag}((4,2), (5,5), (\sqrt {2.5}, \sqrt {2.5})) = 5.0000$

Python 3.6 numpy 1.19.5

import numpy as np


def euclid_std(point_from, point_to, standard_deviations):
    return sum(((point_from - point_to) / standard_deviations) ** 2) ** 0.5


def euclid_diag(point_from, point_to, standard_deviations):
    return euclid_std(point_from, point_to, standard_deviations) \
           * (np.prod(standard_deviations ** 2)) ** (1. / point_from.shape[0])


def approx(number, *, sign, epsilon=1e-4):
    return number + np.sign(sign) * epsilon


test_point = np.array([4., 2.])
class_ = np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]])
standard_deviations = class_.std(axis=0, ddof=1)

#       0
std_le_0 = standard_deviations <= approx(0., sign=+1, epsilon=1e-6)
print(" :\n", standard_deviations, sep='')

if std_le_0.any():
    print("     0: ", np.where(std_le_0)[0])
else:
    for point_to in [class_.mean(axis=0), *class_]:
        print("d_diag (", test_point, ", ", point_to, ", sigma) = ",
              euclid_diag(test_point, point_to, standard_deviations), sep='')

 :
[1.58113883 1.58113883]
d_diag ([4. 2.], [3. 3.], sigma) = 2.2360679774997902
d_diag ([4. 2.], [1. 1.], sigma) = 5.0
d_diag ([4. 2.], [2. 2.], sigma) = 3.16227766016838
d_diag ([4. 2.], [3. 3.], sigma) = 2.2360679774997902
d_diag ([4. 2.], [4. 4.], sigma) = 3.16227766016838
d_diag ([4. 2.], [5. 5.], sigma) = 5.0

— , .

. ( ). :

$COV^{-1}_{(2)} = \frac {1} {\Delta_{(2)}} A^{T}_{(2)}$

$\Delta_{(2)}$ — $COV_{(2)}$ , $A^{T}_{(2)}$ — .

$A^{T}_{(2)} = \begin{pmatrix} A_{(2)1,1} & A_{(2)2,1} \\ A_{(2)1,2} & A_{(2)2,2} \end{pmatrix} = \begin{pmatrix} 1.7 & -0.3 \\ -0.3 & 1.7 \end{pmatrix} \\ COV^{-1}_{(2)} = \frac {1} {2.8} \begin{pmatrix} 1.7 & -0.3 \\ -0.3 & 1.7 \end{pmatrix} = \begin{pmatrix} 0.6071 & -0.1071 \\ -0.1071 & 0.6071 \end{pmatrix}$ $d_{M}((4,2), (3,1), COV^{-1}_{(2)}) = \sqrt {((4,2)-(3,1)) \cdot COV^{-1}_{(2)} \cdot ((4,2)-(3,1))^T} = \\ \sqrt {(4-3, 2-1) \cdot \begin{pmatrix} 0.6071 & -0.1071 \\ -0.1071 & 0.6071 \end{pmatrix} \cdot (4-3, 2-1)^T} = \\ \sqrt {(0.5, 0.5) \cdot \begin{pmatrix} 1 \\ 1 \end{pmatrix}} = 1$ $d_{M}((4,2), (4.8,1.2), COV^{-1}_{(2)}) \approx 0.9562 \\ d_{M}((4,2), (3,1), COV^{-1}_{(2)}) = 1.0000 \\ d_{M}((4,2), (4,0), COV^{-1}_{(2)}) \approx 1.5584 \\ d_{M}((4,2), (6,0), COV^{-1}_{(2)}) \approx 2.3905 \\ d_{M}((4,2), (6,2), COV^{-1}_{(2)}) \approx 1.5584 \\ d_{M}((4,2), (5,3), COV^{-1}_{(2)}) = 1.0000$ $d_{E-M}((4,2), (4.8,1.2), (COV_{(2)}+E)^{-1}) \approx 0.7303 \\ d_{E-M}((4,2), (3,1), (COV_{(2)}+E)^{-1}) \approx 0.8165 \\ d_{E-M}((4,2), (4,0), (COV_{(2)}+E)^{-1}) \approx 1.2247 \\ d_{E-M}((4,2), (6,0), (COV_{(2)}+E)^{-1}) \approx 1.8257 \\ d_{E-M}((4,2), (6,2), (COV_{(2)}+E)^{-1}) \approx 1.2247 \\ d_{E-M}((4,2), (5,3), (COV_{(2)}+E)^{-1}) \approx 0.8165$

Python 3.6 numpy 1.19.5

import numpy as np


def mahalanobis(point_from, point_to, inverse_covariance_matrix):
    delta = point_from - point_to
    return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta)) ** 0.5


def approx(number, *, sign, epsilon=1e-4):
    return number + np.sign(sign) * epsilon


test_point = np.array([4., 2.])
class_ = np.array([[3., 1.], [4., 0.], [6., 0.], [6., 2.], [5., 3.]])
covariance_matrix = np.cov(class_, rowvar=False, ddof=1)
if abs(np.linalg.det(covariance_matrix)) <= approx(0., sign=+1):
    print("  0.  .")
else:
    inverse_covariance_matrix = np.linalg.inv(covariance_matrix)
    print("   (d_M):\n", inverse_covariance_matrix, sep='')
    for point_to in [class_.mean(axis=0), *class_]:
        print("d_M (", test_point, ", ", point_to, ", COV^(-1)) = ",
              mahalanobis(test_point, point_to, inverse_covariance_matrix), sep='')

covariance_matrix = covariance_matrix + np.identity(class_.shape[1])
inverse_covariance_matrix = np.linalg.inv(covariance_matrix)
print("   (d_E-M):\n", inverse_covariance_matrix, sep='')

for point_to in [class_.mean(axis=0), *class_]:
    print("d_E-M (", test_point, ", ", point_to, ", (COV+E)^(-1)) = ",
          mahalanobis(test_point, point_to, inverse_covariance_matrix), sep='')

   (d_M):
[[ 0.60714286 -0.10714286]
 [-0.10714286  0.60714286]]
d_M ([4. 2.], [4.8 1.2], COV^(-1)) = 0.9561828874675149
d_M ([4. 2.], [3. 1.], COV^(-1)) = 1.0
d_M ([4. 2.], [4. 0.], COV^(-1)) = 1.5583874449479593
d_M ([4. 2.], [6. 0.], COV^(-1)) = 2.3904572186687876
d_M ([4. 2.], [6. 2.], COV^(-1)) = 1.5583874449479593
d_M ([4. 2.], [5. 3.], COV^(-1)) = 1.0
   (d_E-M):
[[ 0.375      -0.04166667]
 [-0.04166667  0.375     ]]
d_E-M ([4. 2.], [4.8 1.2], (COV+E)^(-1)) = 0.7302967433402214
d_E-M ([4. 2.], [3. 1.], (COV+E)^(-1)) = 0.8164965809277259
d_E-M ([4. 2.], [4. 0.], (COV+E)^(-1)) = 1.224744871391589
d_E-M ([4. 2.], [6. 0.], (COV+E)^(-1)) = 1.8257418583505536
d_E-M ([4. 2.], [6. 2.], (COV+E)^(-1)) = 1.224744871391589
d_E-M ([4. 2.], [5. 3.], (COV+E)^(-1)) = 0.8164965809277259

— .

•

:

1. — : $\approx 1.4142$ ;

2. (LedoitWolf): $\approx 1.8898$ ;

3. : 0.0000 ;

4. : $\approx 0.8944$ ;

5. : $\approx 2.2360$ .

( ): 1.0000 .

( — ): $\approx 0.8165$ .

3, ( ) — , .

•

:

1. — : $\approx 1.8257$ ;

2. (LedoitWolf): $\approx 2.4284$ ;

3. : $\approx 1.2649$ ;

4. : 2.0000 ;

5. : 5.0000 .

( ): $\approx 2.3905$ .

( — ): $\approx 1.8257$ .

3, ( ) .

•

:

1. — : $\approx 1.4142$ ;

2. (LedoitWolf): $\approx 1.8898$ ;

3. : 0.0000 ;

4. : $\approx 0.8944$ ;

5. : $\approx 2.2360$ .

( ): $\approx 0.9562$ .

( — ): $\approx 0.7303$ .

3, ( ) — , .

3.

( ) .

, . .:

— $d_M(C_i, C_j, COV_0^{-1}) \ge 0$ , $d_M(C_i, C_j, COV_0^{-1})=0 \impliedby C_i=C_j$ , $d_M(C_i, C_j, COV_0^{-1})=d_M(C_j, C_i, COV_0^{-1})$ .

— ( ) $d_M(C_i, C_j, COV_0^{-1})=0 \implies C_i=C_j$ .

.

3.1

— C_1 C_2 $\overline {C_1}$ $\overline {C_2}$ COV_1 COV_2 :

$d_M \left(\overline {C_1}, \overline {C_2}, COV^{-1}_0 \right) = \sqrt {\left (\overline {C_1} - \overline {C_2} \right) \cdot COV^{-1}_0 \cdot \left (\overline {C_1} - \overline {C_2} \right)^T} \\ COV_0 = \frac {1} {|C_1| + |C_2| - 2} \left (COV_{(1)} + COV_{(2)} \right)$

COV_0 — , $COV^{-1}_0$ — , COV_1 COV_2 — , |C_1| |C_2| — .

$COV_0 = COV_{(1)} + COV_{(2)}$

.

, , .

, , ( ) (. 2 « »):

$d_M \left (X_{(1)}, \overline C_2, COV^{-1}_0 \right) = \sqrt {\left(X_{(1)} - \overline {C_2} \right) \cdot COV^{-1}_0 \cdot \left (X_{(1)} - \overline {C_2} \right)^T} \\ COV_0 = 0 + COV_{(2)} = COV_{(2)}$

- :

$d_{E-M} \left(\overline {C_1}, \overline {C_2}, \left (COV_0+E \right)^{-1} \right) = \sqrt {\left (\overline {C_1} - \overline {C_2} \right) \cdot (COV_0+E)^{-1} \cdot \left (\overline {C_1} - \overline {C_2} \right)^T}$

— , COV_0 .

3.2

1. .

2. .

3. , .

4. , . , — .

3.3

. 2.2.

$C_{(1)} = \{ ( 1 , 1 ) , ( 2 , 2 ) , ( 3 , 3 ) , ( 4 , 4 ) , ( 5 , 5 ) \} \\ C_{(2)} = \{ ( 3 , 1 ) , ( 4 , 0 ) , ( 6 , 0 ) , ( 6 , 2 ) , ( 5 , 3 ) \}$

.

3 . 2.2.

4 .

4. , . , — .

$COV_0 = \frac {1} {5 + 5 - 2} \left (\begin{pmatrix} 2.5 & 2.5 \\ 2.5 & 2.5 \end{pmatrix} + \begin{pmatrix} 1.7 & 0.3 \\ 0.3 & 1.7 \end{pmatrix} \right) = \\ = \frac {1} {8} \begin{pmatrix} 4.2 & 2.8 \\ 2.8 & 4.2 \end{pmatrix} = \begin{pmatrix} 0.525 & 0.35 \\ 0.35 & 0.525 \end{pmatrix}$

$COV^{-1}_0 \approx \begin{pmatrix} 3.42857143 & -2.28571429 \\ -2.28571429 & 3.42857143 \end{pmatrix}$

•

$\approx 6.0851$ .

$d_M \left((3, 3), (4.8, 1.2), COV^{-1}_0 \right) = \\ = \sqrt {\left ((3, 3) - (4.8, 1.2) \right) \cdot COV^{-1}_0 \cdot \left ((3, 3) - (4.8, 1.2) \right)^T} = \\ = \sqrt {(-1.8, 1.8) \cdot \begin{pmatrix} 3.42857143 & -2.28571429 \\ -2.28571429 & 3.42857143 \end{pmatrix} \cdot (-1.8, 1.8)^T} \approx 6.0851$

— $\approx 2.3484$ .

$COV_0 = COV_{(1)} + COV_{(2)}$ :

— $\approx 2.1514$ ;

— — $\approx 1.6432$ .

•

$\approx 3.3806$ (2,2) (3,1) (4,4) (5,3) .

— $\approx 1.3047$ (2,2) (3,1) (4,4) (5,3) .

$COV_0 = COV_{(1)} + COV_{(2)}$ :

— $\approx 1.1952$ (2,2) (3,1) (4,4) (5,3) ;

— — $\approx 0.9129$ (2,2) (3,1) (4,4) (5,3) .

•

$\approx 10.5830$ (1,1) (6,0) (5,5) (6,0) .

— $\approx 4.4256$ (1,1) (6,0) (5,5) (6,0) .

$COV_0 = COV_{(1)} + COV_{(2)}$ :

— $\approx 3.7417$ (1,1) (6,0) (5,5) (6,0) ;

— — $\approx 2.9155$ (1,1) (6,0) (5,5) (6,0) .

Python 3.6 numpy 1.19.5 .

4. k-

- . , - ( ) .

- .

:

— : $(COV+E)^{-1}$ ( — );

— ( , ).

Python 3.6 numpy 1.19.5

#    

import heapq
from collections import Counter
from operator import itemgetter

import numpy as np


class MkNN:
    def __init__(self, k, classes, inv_cov_matrices):
        self.n = len(classes)
        self.k = k
        self.classes = classes
        self.inv_cov_matrices = inv_cov_matrices

    @staticmethod
    def mahalanobis_sqr(point_from, point_to, inverse_covariance_matrix):
        delta = point_from - point_to
        return max(np.float64(0), np.dot(np.dot(delta, inverse_covariance_matrix), delta))

    def _get_k_smallest(self, test_point):
        generator = (
            (MkNN.mahalanobis_sqr(test_point, point, inv_cov), i)
            for i, (class_, inv_cov) in enumerate(zip(self.classes, self.inv_cov_matrices))
            for point in class_
        )
        return heapq.nsmallest(self.k, generator, key=itemgetter(0))

    def predict(self, test_point):
        return heapq.nlargest(1, Counter((i for _, i in self._get_k_smallest(test_point))).items(),
                              key=lambda t: (t[1], -t[0]))[0][0]

    def predict_proba(self, test_point):
        most_common = Counter((i for _, i in self._get_k_smallest(test_point)))
        classes_proba = np.array([most_common.get(i, 0) for i in range(self.n)])
        return classes_proba / classes_proba.sum()

    def predict_all_max(self, test_point):
        p = self.predict_proba(test_point)
        return np.where(p == max(p))[0]


def main():
    #  ,     -  classes
    test_points = np.array([[4., 2.]])
    #   
    classes = [
        np.array([[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]]),
        np.array([[3., 1.], [4., 0.], [6., 0.], [6., 2.], [5., 3.]])
    ]
    #   
    n_train_points = sum(class_.shape[0] for class_ in classes)
    #      
    cov_matrices = [np.cov(class_, rowvar=False, ddof=1) for class_ in classes]
    #        --   - 
    inv_cov_matrices = [np.linalg.inv(cov + np.identity(cov.shape[0])) for cov in cov_matrices]
    for test_point in test_points:
        print("Point:", test_point)
        # k  1    ( )
        for i in range(1, n_train_points):
            classifier = MkNN(i, classes, inv_cov_matrices)
            print(f"{i}nn:",
                  1 + classifier.predict(test_point),
                  classifier.predict_proba(test_point),
                  classifier.predict_all_max(test_point))


if __name__ == "__main__":
    main()

:

"knn: [ ( 1), ] [ ] [ ( 0), ]".

Point: [4. 2.]
1nn: 2 [0. 1.] [1]
2nn: 2 [0. 1.] [1]
3nn: 2 [0. 1.] [1]
4nn: 2 [0. 1.] [1]
5nn: 2 [0.2 0.8] [1]
6nn: 2 [0.33333333 0.66666667] [1]
7nn: 2 [0.42857143 0.57142857] [1]
8nn: 1 [0.5 0.5] [0 1]
9nn: 2 [0.44444444 0.55555556] [1]

. 4.

Python 3.6

# ...

from operator import sub

import numpy as np  # 1.19.5
from matplotlib import colors, pyplot as plt  # 3.3.4

#    
def show_data_on_mesh(k, classes, inv_cov_matrices):
    #  
    min_ = np.min([np.min(class_, axis=0) for class_ in classes], axis=1) - 1
    max_ = np.max([np.max(class_, axis=0) for class_ in classes], axis=1) + 1
    min_c = min(min_[0], min_[1])
    max_c = max(max_[0], max_[1])
    h = 0.05
    test_mesh = np.meshgrid(np.arange(min_c, max_c, h), np.arange(min_c, max_c, h))
    test_points = np.c_[test_mesh[0].ravel(), test_mesh[1].ravel()]
    #   
    classifier = MkNN(k, classes, inv_cov_matrices)
    test_mesh_labels = [sub(*classifier.predict_proba(x)) for x in test_points]
    #  
    plt.figure(figsize=(6, 5), dpi=90)
    class_colormap = colors.ListedColormap(['#070648', '#480607'])
    plt.pcolormesh(test_mesh[0], test_mesh[1],
                   np.asarray(test_mesh_labels).reshape(test_mesh[0].shape),
                   cmap='coolwarm', shading='nearest')
    plt.colorbar()
    plt.scatter([point[0] for class_ in classes for point in class_],
                [point[1] for class_ in classes for point in class_],
                c=[-i for i, class_ in enumerate(classes) for _ in class_],
                cmap=class_colormap)
    plt.axis([min_c, max_c, min_c, max_c])
    plt.xlabel("X")
    plt.ylabel("Y")
    plt.title("k=" + str(k))
    plt.show()

# ...