🛡️ 😽 🤱 アクティベーションの違いを広めることにより、重要な機能を探索します。DeepLIFT 😪 🧚🏻 🗝️

注釈

ニューラルネットワークの認識されたブラックボックスの性質は、解釈可能性が重要なアプリケーションで使用する際の障害です。ここでは、DeepLIFT（Deep Learning Important FeaTures）を紹介します。これは、ネットワークのすべてのニューロン（ノード）の応答を入力信号の各特徴に逆伝播することにより、特定の入力でニューラルネットワークの出力予測を分解する方法です。 DeepLIFTは、各ニューロンの活性化をその「参照活性化」と比較し、その個々の寄与の推定値を割り当てます。 DeepLIFTは、プラスとマイナスの寄与を別々に検討することで、他のアプローチが見逃している依存関係を特定することもできます。スコアは、1回のリターンパスで効率的に計算できます。 DeepLIFTをMNISTでトレーニングされたモデルとシミュレートされたゲノムデータに適用します。勾配法に比べて大きな利点を示しています。

ビデオチュートリアル：http：//goo.gl/qKb7pL

ICMLスライド：bit.ly/deeplifticmlslides

ICMLトーク：https：//vimeo.com/238275076

コード：http：//goo.gl/RM8jvH

1.はじめに

, , « » , . DeepLIFT ( ), . . -, «» , «» . , , DeepLIFT , , , . -, , DeepLIFT , . DeepLIFT , , ,

2.

2.1.

. & ( & , 2013 [12]) . «In-silico mutagenesis» (Zhou & Troyanskaya, 2015 [13]) . Zintgraf . (Zintgraf et al., 2017 [14]) . , . , (. 1).

図： 1.摂動アプローチと勾配アプローチは飽和をシミュレートすることができません。 — . 1. , , .

, . , i₁ = 1 i₂ = 1, i₁ i₂ 0 . , , i₁ + i₂> 1.

2.2. ,

, , . DeepLIFT.

2.2.1. , (, )

. ( ., 2013 [9]) « » . , () (Zeiler & Fergus, 2013 [12]), (ReLU). , ReLU , , ReLU . , , ReLU , , , ReLU . . (Springenberg et al., 2014 [10]) , ReLU, ReLU , . , , , ReLU. - , , () , . , , . 1, y h ( ), h i₁ i₂ , i₁ + i₂> 1 ( ). (. 2).

2.2.2. ×

. (Bach et al., 2015 [1]) , (LRP). . Kindermans et al. (Shrikumar et al., 2016; Kindermans et al., 2016 [8]) , , , LRP ReLU Simonyan et al. ( , × ). DeepLIFT gradient × input, GPU, LRP GPU, .

× , , , . 1 . 2.

2.2.3.

, , (: ) (Sundararajan et al., 2016). , 1 2, ( , , ) . , (. 3.4.3).

2.3. Grad-CAM CAM

Grad-CAM (Selvaraju et al., 2016 [7]) , , , , . ( ) , , Grad-CAM , , Grad-CAM. , . .

3. DeepLIFT

3.1. DeepLIFT

DeepLIFT «» «». «» - «» , , ( . 3.3). , t , , x₁, x₂, ..., x_n , t. t₀ t. ∆t , ∆t = t − t0. DeepLIFT

$C _ {\ Delta x_i \ Delta t} \; for \ Delta xi \; なので$

$C _ {∆x_i ∆t} \ text {ゼロ以外の可能性があります}$ $\ text {たとえ} \ frac {∂t} {∂x_i} \ text {がゼロに等しい場合でも。 }$

DeepLIFT , , . 1, , . , DeepLIFT, .2, - () . , , , .

図2.不連続な勾配は、重要性の誤った推定を与える可能性があります。 — 2. .

-10. , x = 10; x = 10 + e, × 10 + e x -10 ( - ). x < 10, x 0. , ( , ) .

3.2.

3.2.1.

x ∆x t ∆t, , m∆x∆t :

, m∆x∆t - ∆x ∆t, ∆x. : ∂t / ∂x - ∆t, ∆x, ∆x. , .

3.2.2.

, x₁, ..., x_n, y₁, ..., y_n t.

$m_ {∆x_i∆y_j} \; および\; m_ {∆y_j∆t} \; 次 \; 定義\; m_ {∆x_i∆t}$

. 1 (. ):

. 3 . , - , .

3.3.

DeepLift, 3.5, , - . : y x₁, x₂, ... , y = f(x₁, x₂,...).

, ... , y0 :

, .

DeepLIFT. , , , DeepLIFT . « ?». MNIST , . ( {A,C,G, T}) , , ACGT (. 5), , ( J).

, × ( × ∆, ∆ ). , ( 2.2.3) , , DeepLIFT. Guided Backprop , , , , , .

3.4.

3.5.3 , - . , y ∆y + ∆y−, ∆y, :

∆y+ ∆y− ∆y , ∆x_i, . RevealCancel ( 3.5.3), t , m∆y + ∆t m∆y − ∆t . ( 3.5.1 3.5.2) : m∆y∆t = m∆y + ∆t = m∆y − ∆t.

3.5.

. ( 3.2) ( ) .

3.5.1.

( ). y - x_i ,

$y = b + \ sum_ {i = 1} ^ n w_ix_i$

$∆y =\sum_{i}w_ix_i$

∆y :

, 3.2.1.

, ∆x_i = 0? « » « », , ∆x + i ∆x - i ( ), « » . ,

$m_{∆x^+_ i ∆y^+} = m_{∆x^+_i ∆y^−} = 0,5 w_i$

∆x_i 0 ( ∆x-).

. B, , .

3.5.2.

, , ReLU, tanh sigmoid. y - x , y = f(x). y , , ,

$C_{∆∆} = ∆Y, , \; , m_{∆∆} =\frac { ∆y}{∆x }$

: ∆y+ ∆y− ∆+ ∆x− :

, :

$x → x^0, \; \; ∆x → 0 \; \; y → 0.$

, . .

$m_{∆x∆y} → \frac{dy}{dx}, \frac{dy}{dx} \; \; x = x^0.$

, , x , , .

, , , . 1 . 2. . 1,

$i^0_1 = i^0_2 = 0, \; \; \; i_1 + i_2 > 1 \; ∆h= \text{-} 1$ $∆y = 1, \; m_{∆h∆y} = \frac{∆h}{∆y} = \text{-}1, \; \; \frac{d}{dh} = 0$

( , , ). . 2, ₀ = ₀ = 0, x = 10 + , ∆y =

, × 10+e x -10 (DeepLIFT ).

(Lundberg & Lee, 2016 [6]), DeepLIFT Shapely. , Shapely , . «» , DeepLIFT Shapely. Lundberg & Lee DeepLIFT, .

3.5.3. : REVEALCANCEL

, , . min (i₁, i₂), . 3, i₁ = 0 i₂ = 0. , i₁, i₂ ( , ). , min.

, , ,

$i_1 > i_2. \; \; \; \; h_1 = (i_1 - i_2) > 0 \; \; h_2 = max(0, h_1) = h_1.$

$C_{∆i_1∆h_1} = i_1 \;\; C_{∆i_2∆h_1} = \text{-}i2.$

$M_{∆h_1,∆h_2} \; \; \frac{∆h_2 }{∆h_1} = 1,$

, ,

$C_{∆i_1∆h_2} = m_{∆h_1 ∆h_2}C_{∆i_1∆h_1} =i_1 \; \; C_ {∆i_2 ∆h_2} = m_{∆h_1∆ h_2}C_{∆i_2∆h_1} = \text{-}i2.$

i₁

$(i_1 \text{-} C_{∆i_1∆h_2}) = (i_1 \text{-} i_1) = 0,$

$i_2 \; to \; o\; is \; \text{-}∆i_2∆h_2 = i_2.$

, ,

$C_{∆i_2∆h_2} \; \; \; \;0,\; \; \; i_1$

- , , i₁ i₂, - , i₂ i₁ h₂. i₁ < i₂;

$C_{∆i_1∆_o} = i_1 \; \; C_{∆i_2∆o} = 0.$

, , ×, i₁, i₂, i₁ i₂ ( . C).

. y = f (x). , ∆y + ∆y−

$∆^+ ∆^− \; \; m_{∆^+∆y^+} = m_{∆^\text{-}∆y^\text{-}} = m_{∆x∆y}$

( ), :

, ∆y+ ∆x+ , ∆x−, ∆y− ∆x− , ∆x+. Shapely ∆x+ ∆x−, y.

, , - , . . 3 RevealCancel 0,5min(i₁, i₂) ( . C).

RevealCancel , . 1 .2, , . , ReLU, ∆y > 0 iff ∆x ≥ b. ∆x < b , ∆x+, ∆x− ( ), («») . RevealCancel , ∆x+ ∆x- .

$i^0_1 =i^0_2 = 0.\; \; i_1 < i_2 \; \frac {dy}{di_2} = 0, \; \; \; i_2 < i_1 \; \; \frac{do}{di_1}=0$

, , 2.2, i₁ i₂. RevealCancel 0,5min(i₁, i₂) .

3.6.

softmax , , . , , , 3.1. , o = (y), y - .

$, \; y = x_1 + x_2, \; \; x^0_1 = x^0_2 = 0. x1 = 50 \; \; x_2 = 0,$

o 1, x₁ x₂ 0,5 0 . , x₁ = 100 x₂ = 100, o - 1, x₁ x₂ 0,25 . , DeepLIFT. , y, o.

Softmax

, softmax, softmax, , softmax , softmax - . , , . , n - ,

$C_{∆x∆c_i}$

ci ,

$C'_{ ∆x c_i}$

, :

, softmax softmax .

4.

4.1. (MNIST)

MNIST (Le-Cun et al., 1999) Keras (Chollet, 2015) 99,2%. , , softmax (. D ). > 1 , , (Springenberg et al., 2014 [10]). DeepLift ( ).

, , : , co, , , C_o. ,

$S_{x_idiff} = S_{x_ic_o}-S_{x_ic_t} ( S_{x_ic} \text{ - } \; x_i \text{ } \;c)$

157 (20% ),

$S_{x_idiff}, \text{ }S_{x_idiff} > 0.$

C_o C_t .

: , (8) (3 6). 8, 3 6. 8→6 * . : - 1K , . " -n" n .

() TAL1 (. G GATA1). -5 . X: log- TAL1 . Y - : . , TAL1 GATA1; GATA1, TAL1, . “DeepLIFT-fc-RC-conv-RS” RevealCancel ( ) , , -, RevealCancel .

() (log-odds > 7) TAL1 , TAL1 GATA1, <= 0 0; * INP DeepLIFT RevealCancel , 1 ( ()).

4.2. ()

( {A,C,G, T}). ( 200-1000), , (RPs), . RP (, GATA1) (, ) (, GATAA GATTA). , (), . , DeepLIFT , , , .

200 ACGT 0,3, 0,2, 0,2 0,3 . (. F) RPs GATA1 TAL1(. 6) (Kheradpour &Kellis, 2014 [3]), 0-3 . , 3 . 1 « - GATA1 TAL1 ()», 2 «GATA1 ()» 3 «TAL1 ()». 1/4 GATA1, TAL1 ( 111), 1/4 GATA1 ( 010), 1/4 TAL1 ( 001) 1/4 ( 000). , F. , ACGT (. . ACGT 0.3, 0.2, 0.2, 0.3; . J). × × ( "", measured ). , , , × , , ; , .

, , ACGT. , 5 ( ) , , . . 5 ( TAL1) E ( GATA1). , : (1) TAL1 2 (2) TAL1 1, (3) ; GATA1 ( 1, 2); (4) TAL1 GATA1 0, (5) , , , ( ; , . 5).

× (2) TAL1 1 ( . H). (4), 0 ( ). Guided Backprop × input, gradient × input (3), , 7, logodds (, ). , Guided Backprop × input gradient × input (. 6). . 2. ( y) .

DeepLIFT: (DeepLIFT-Rescale), RevealCancel (DeepLIFT-RevealCancel) RevealCancel (DeepLIFT-fc-RC-conv-RS). MNIST, , DeepLIFT-fc-RC-convRS RevealCancel. , - , 3.5.3; , , , , (. 6 ).

Gradient × inp, DeepLIFT-Rescale TAL1 0 (. 5b), RevealCancel (. . 6). , RevealCancel . I, (: TAL1, , TAL1, ).

図： 6. RevealCancelは、タスク0にTAL1およびGATA1の動機を割り当てます。 — . 6. RevealCancel TAL1 GATA1 0.

(a) PWM- GATA1 TAL1, . (b) , , , TAL1, GATA1. . - GATA1, - TAL1. - TAL1 (CAGTTG CAGATG). TAL1 GATA1 0. RevealCancel RevealCancel .

5.

DeepLIFT, , «» «» . (. 1), , , tanh. DeepLIFT ( * - . . 2). , DeepLIFT-RevealCancel , (. 3). : () DeepLIFT RNN,(b) (c) «» ( Maxout Maxpooling ) .

[1] Bach, Sebastian, Binder, Alexander, Montavon, Gregoire, Klauschen, Frederick, Muller, Klaus-Robert, and Samek, Wojciech. On Pixel-Wise explanations for Non-Linear classifier decisions by Layer-Wise relevance propagation. PLoS One, 10(7):e0130140, 10 July 2015.

[2] Chollet, Franois. keras. https://github.com/fchollet/keras, 2015.

[3] Kheradpour, Pouya and Kellis, Manolis. Systematic discovery and characterization of regulatory motifs in encode tf binding experiments. Nucleic acids research, 42 (5):2976–2987, 2014.

[4] Kindermans, Pieter-Jan, Schtt, Kristof, Mller, KlausRobert, and Dhne, Sven. Investigating the influence of noise and distractors on the interpretation of neural networks. CoRR, abs/1611.07270, 2016. URL https://arxiv.org/abs/1611.07270.

[5] LeCun, Yann, Cortes, Corinna, and Burges, Christopher J.C. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/,1999.

[6] Lundberg, Scott and Lee, Su-In. An unexpected unity among methods for interpreting model predictions. CoRR, abs/1611.07478, 2016. URL http://arxiv.org/abs/1611.07478.

[7] Selvaraju, Ramprasaath R., Das, Abhishek, Vedantam, Ramakrishna, Cogswell, Michael, Parikh, Devi, and Batra, Dhruv. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. CoRR, abs/1610.02391, 2016. URL http://arxiv.org/abs/1610.02391.

[8] Shrikumar, Avanti, Greenside, Peyton, Shcherbina, Anna,and Kundaje, Anshul. Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713, 2016.

[9] Simonyan, Karen, Vedaldi, Andrea, and Zisserman, Andrew. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.

[10] Springenberg, Jost Tobias, Dosovitskiy, Alexey, Brox, Thomas, and Riedmiller, Martin A. Striving for simplicity: The all convolutional net. CoRR, abs/1412.6806, 2014. URL http://arxiv.org/abs/1412.6806.

[11] Sundararajan, Mukund, Taly, Ankur, and Yan, Qiqi. Gradients of counterfactuals. CoRR, abs/1611.02639, 2016. URL http://arxiv.org/abs/1611.02639.

[12] Zeiler、Matthew D.およびFergus、Rob。畳み込みネットワークの視覚化と理解。CoRR、abs / 1311.2901、2013。URLhttp ： //arxiv.org/abs/1311.2901 。

[13] Zhou、JianおよびTroyanskaya、OlgaG。深層学習ベースのシーケンスモデルを使用したノンコーディングバリアントの影響の予測。Nat Methods、12：931-4、2015年10月2015年。ISSN1548-7105。土井：10.1038 /nmeth.3547。

[14] Zintgraf、Luisa M、Cohen、Taco S、Adel、Tameem、およびWelling、Max。ディープニューラルネットワークの決定の視覚化：予測差分析。ICLR、2017年URL https://openreview.net/pdf?id=BJ5UeU9xx

アクティベーションの違いを広めることにより、重要な機能を探索します。DeepLIFT

注釈

1.はじめに

2.

3. DeepLIFT

4.

5.

More articles: