Pythonã§èšè¿°ãããã«ã¹ã¿ãã€ãºå¯èœã§ã€ã³ã¿ã©ã¯ãã£ããªææ決å®ããªãŒæ§é ã®çŽ¹ä»ããã®å®è£ ã¯ãããŒã¿ããç¥èãæœåºããçŽæããã¹ãããææ決å®ããªãŒã®å éšåäœã®ç解ãæ·±ããåŠç¿åé¡ã®ä»£æ¿ã®åå ãšçµæã®é¢ä¿ã調æ»ããã®ã«é©ããŠããŸããããã¯ãããè€éãªã¢ã«ãŽãªãºã ãèŠèŠåãã¬ããŒãã®äžéšãšããŠãããããç 究ç®çã§ããŸãææ決å®ããªãŒã¢ã«ãŽãªãºã ã®ã¢ã€ãã¢ãç°¡åã«ãã¹ãããããã®ã¢ã¯ã»ã¹å¯èœãªãã©ãããã©ãŒã ãšããŠäœ¿çšã§ããŸãã
TL; DR
- HDTreeãªããžããª
- å
éšã®è£å®çãªããŒãããã¯
examples
ããªããžããªãã£ã¬ã¯ããªã¯ããã«ãããŸãïŒããã«è¡šç€ºããããã¹ãŠã®å³ã¯ã¡ã¢åž³ã§çæãããŸãïŒãèªåã§ã€ã©ã¹ããäœæã§ããŸãã
æçš¿ã¯äœã§ããïŒ
è«æã®äžéšãšããŠæžãã決å®ããªãŒã®å¥ã®å®è£ ãäœæ¥ã¯æ¬¡ã®ããã«3ã€ã®éšåã«åãããŠããŸãã
- æéããããŠææ決å®ããªãŒã®ç¬èªã®å®è£ ãèãåºãããšã«ããçç±ã説æããããšæããŸãããã®æ©èœã®ããã€ãããªã¹ãããŸãããçŸåšã®å®è£ ã®æ¬ ç¹ããªã¹ãããŸãã
- HDTreeã®åºæ¬çãªäœ¿çšæ³ããããã€ãã®ã³ãŒãã¹ãããããšãã®éçšã§èª¬æãããŠãã詳现ãšãšãã«ç€ºããŸãã
- ããªãã®ã¢ã€ãã¢ã§HDTreeãã«ã¹ã¿ãã€ãºããã³æ¡åŒµããæ¹æ³ã«é¢ãããã³ãã
åæ©ãšèæ¯
ç§ã®è«æã§ã¯ã決å®ããªãŒã䜿ãå§ããŸãããç§ã®çŸåšã®ç®æšã¯ãHDTreeïŒããã«èšãã°Human Decision TreeïŒãããã®ã¢ãã«ã®å®éã®ãŠãŒã¶ãŒã€ã³ã¿ãŒãã§ã€ã¹ã®äžéšãšããŠé©çšãããè¿œå ã®èŠçŽ ã§ããã人éäžå¿ã®MLã¢ãã«ãå®è£ ããããšã§ãããã®è©±ã¯HDTreeã®ã¿ã«é¢ãããã®ã§ãããä»ã®ã³ã³ããŒãã³ãã®è©³çŽ°ã説æããç¶ç·šãæžãããšãã§ããŸãã
HDTreeã®æ©èœãšscikitåŠç¿æ±ºå®ããªãŒãšã®æ¯èŒ
åœç¶ãç§ã¯ææ決å®ããªãŒã®å®è£ ã«åºããããŸãã
scikit-learn
[4]ãå®è£
ã«sckit-learn
ã¯å€ãã®å©ç¹ããããŸãã
- é«éã§åçåãããŠããŸãã
- Cythonæ¹èšã§æžãããŠããŸããCythonã¯ãPythonã€ã³ã¿ãŒããªã¿ãŒãšå¯Ÿè©±ããªãããCã³ãŒãã«ã³ã³ãã€ã«ããŸãïŒæ¬¡ã«ããã€ããªã«ã³ã³ãã€ã«ããŸãïŒã
- ã·ã³ãã«ã§äŸ¿å©ã
- MLã®å€ãã®äººã
ã¯ãã¢ãã«ã®æäœæ¹æ³ãç¥ã£ãŠããŸã
scikit-learn
ããã®ãŠãŒã¶ãŒããŒã¹ã®ãããã§ã©ãã§ãå©ããåŸãã - æŠéæ¡ä»¶ã§ãã¹ããããŠããŸãïŒå€ãã®äººã䜿çšããŠããŸãïŒã
- ããã¯ããŸããããŸãã
- ããŸããŸãªãã¬ããªãã³ã°ããã³ãã¹ãããªãã³ã°æè¡[6]ããµããŒãããå€ãã®æ©èœãæäŸããŸãïŒããšãã°ãæå°éã®ã³ã¹ããšãµã³ãã«ééã§ã®ããªãã³ã°ïŒã
- åºæ¬çãªã¬ã³ããªã³ã°ããµããŒãããŸã[7]ã
ãã ãã確ãã«ããã€ãã®æ¬ ç¹ããããŸãã
- ããªãçããCythonæ¹èšã®ããã«ãå€æŽããã®ã¯ç°¡åã§ã¯ãããŸããïŒäžèšã®å©ç¹ãåç §ïŒã
- äž»é¡åéã«é¢ãããŠãŒã¶ãŒã®ç¥èãèæ ®ããããåŠç¿ããã»ã¹ãå€æŽãããããæ¹æ³ã¯ãããŸããã
- èŠèŠåã¯éåžžã«æå°éã§ãã
- ã«ããŽãªæ©èœã¯ãµããŒããããŠããŸããã
- æ¬ èœããŠããå€ã¯ãµããŒããããŠããŸããã
- ããŒãã«ã¢ã¯ã»ã¹ããŠããªãŒããã©ããŒã¹ããããã®ã€ã³ã¿ãŒãã§ã€ã¹ã¯ç ©éã§çŽæçã§ã¯ãããŸããã
- æ¬ èœããŠããå€ã¯ãµããŒããããŠããŸããã
- ãã€ããªããŒãã£ã·ã§ã³ã®ã¿ïŒä»¥äžãåç §ïŒã
- å€å€éããŒãã£ã·ã§ã³ã¯ãããŸããïŒä»¥äžãåç §ïŒã
HDTreeã®æ©èœ
HDTreeã¯ããããã®åé¡ã®ã»ãšãã©ã«å¯Ÿãã解決çãæäŸããŸãããscikit-learnã®å®è£ ã®å©ç¹ã®å€ããç ç²ã«ããŸããåŸã§ãããã®ãã€ã³ãã«æ»ããŸãã®ã§ã次ã®ãªã¹ãå šäœããŸã ç解ããŠããªããŠãå¿é ããªãã§ãã ããã
- åŠç¿è¡åãšçžäºäœçšããŸãã
- äž»èŠãªã³ã³ããŒãã³ãã¯ã¢ãžã¥ãŒã«åŒã§ãããæ¡åŒµïŒã€ã³ã¿ãŒãã§ãŒã¹ã®å®è£ ïŒã¯ããªãç°¡åã§ãã
- çŽç²ãªPythonã§æžãããŠããŸãïŒããå©çšå¯èœïŒ
- è±å¯ãªèŠèŠåãåããŠããŸãã
- ã«ããŽãªããŒã¿ããµããŒãããŸãã
- æ¬ èœããŠããå€ããµããŒãããŸãã
- å€å€éåå²ããµããŒãããŸãã
- ããªãŒæ§é ãããã²ãŒãããããã®äŸ¿å©ãªã€ã³ã¿ãŒãã§ã€ã¹ããããŸãã
- n-aryããŒãã£ã·ã§ãã³ã°ïŒ2ã€ä»¥äžã®åããŒãïŒããµããŒãããŸãã
- ãœãªã¥ãŒã·ã§ã³ã®ããã¹ãè¡šçŸã
- 人éãèªããããã¹ããå°å·ããããšã«ããã説æå¯èœæ§ãä¿é²ããŸãã
ãã€ãã¹ïŒ
- ã¹ããŒ;
- æŠéã§ã¯ãã¹ããããŠããŸããã
- ãœãããŠã§ã¢ã®å質ã¯å¹³å¡ã§ãã
- ããªãã³ã°ãªãã·ã§ã³ã¯å€ããããŸããããã ããå®è£ ã¯ããã€ãã®åºæ¬çãªãã©ã¡ãŒã¿ããµããŒãããŠããŸãã
å€ãã®æ¬ ç¹ã¯ãããŸããããé倧ã§ããããã«æ確ã«ããŸãããããã®å®è£ ã«å€§ããªããŒã¿ãäŸçµŠããªãã§ãã ãããããªãã¯æ°žé ã«åŸ ã€ã§ããããå®çšŒåç°å¢ã§ã¯äœ¿çšããªãã§ãã ãããäºæããç Žæããå¯èœæ§ããããŸããããªãã¯èŠåãããŸããïŒäžèšã®åé¡ã®ããã€ãã¯ãæéã®çµéãšãšãã«è§£æ±ºã§ããŸãããã ããåŠç¿çã¯äœããŸãŸã§ããå¯ââèœæ§ããããŸãïŒæšè«ã¯æå¹ã§ããïŒããããä¿®æ£ããã«ã¯ãããè¯ã解決çãèãåºãå¿ èŠããããŸãããã²ãååãã ãããããããå¯èœãªã¢ããªã±ãŒã·ã§ã³ã¯äœã§ããïŒ
- ããŒã¿ããç¥èãæœåºããã
- ããŒã¿ã®çŽæçãªãã¥ãŒã確èªããŸãã
- ãã·ãžã§ã³ããªãŒã®å éšåäœãç解ããã
- ããªãã®åŠç¿åé¡ã«é¢é£ãã代æ¿ã®å æé¢ä¿ãæ¢ããŸãã
- ããè€éãªã¢ã«ãŽãªãºã ã®äžéšãšããŠäœ¿çšããŸãã
- ã¬ããŒãã®äœæãšèŠèŠåã
- ããããç 究ç®çã«äœ¿çšããŸãã
- ãã·ãžã§ã³ããªãŒã¢ã«ãŽãªãºã ã®ã¢ã€ãã¢ãç°¡åã«ãã¹ãããããã®ã¢ã¯ã»ã¹å¯èœãªãã©ãããã©ãŒã ãšããŠã
ãã·ãžã§ã³ããªãŒæ§é
ãã®ãã¯ã€ãããŒããŒã§ã¯ææ決å®ããªãŒã«ã€ããŠè©³ãã説æããŸããããäž»èŠãªæ§æèŠçŽ ãèŠçŽããŸããããã¯ãåŸã§äŸãç解ããããã®åºç€ãæäŸããHDTreeã®æ©èœã®ããã€ãã匷調ããŸãã次ã®å³ã¯ãHDTreeã®å®éã®åºåã瀺ããŠããŸãïŒããŒã«ãŒãé€ãïŒã
ããŒã
- ai: , . . * * . . 3.
- aii: , , , , . , . . , ( , .. ). HDTree.
- aiiiïŒããŒãã®å¢çã¯ããã®ããŒããééããããŒã¿ãã€ã³ãã®æ°ã瀺ããŸããå¢çç·ã倪ãã»ã©ãããŒããæµããããŒã¿ãå€ããªããŸãã
- aivïŒãã®ããŒããééããããŒã¿ãã€ã³ããæã€äºæž¬ã¿ãŒã²ãããšã©ãã«ã®ãªã¹ããæãäžè¬çãªã¯ã©ã¹ãããŒã¯ãããŠããŸãã
- avïŒãªãã·ã§ã³ã§ãããžã¥ã¢ã©ã€ãŒãŒã·ã§ã³ã¯ãåã ã®ããŒã¿ãã€ã³ãããã©ããã¹ãããŒã¯ã§ããŸãïŒããŒã¿ãã€ã³ããããªãŒãééãããšãã«è¡ããã決å®ã瀺ããŸãïŒãããã¯ã決å®ããªãŒã®é ã«ç·ã§ããŒã¯ãããŠããŸãã
ãªã
- biïŒç¢å°ã¯ãå¯èœãªååå²çµæïŒaiïŒããã®åããŒãã«æ¥ç¶ããŸãã芪ã«é¢é£ããããŒã¿ããšããžã®åšãããæµãããã»ã©ã衚瀺ãããããŒã¿ã¯åããªããŸãã
- biiïŒåãšããžã«ã¯ã察å¿ããåå²çµæã®äººéãèªããããã¹ãè¡šçŸããããŸãã
ã»ãããšãã¹ãã®ç°ãªãåé¢ã¯ã©ãããæ¥ãŠããŸããïŒ
ãã®æç¹ã§ãHDTreeãããªãŒ
scikit-learn
ïŒãŸãã¯ä»ã®å®è£
ïŒãšã©ã®ããã«ç°ãªãã®ãããªãç°ãªãçš®é¡ã®ããŒãã£ã·ã§ã³ãå¿
èŠãªã®ãçåã«æããããããããŸããããããæ確ã«ããŠã¿ãŸããããå€åããªãã¯ç¹åŸŽç©ºéã®çŽæçãªç解ãæã£ãŠããŸããç§ãã¡ãæ±ããã¹ãŠã®ããŒã¿ã¯ãç¹å®ã®å€æ¬¡å
空éã«ãããŸããããã¯ãããŒã¿å
ã®ç¹åŸŽã®æ°ãšã¿ã€ãã«ãã£ãŠæ±ºãŸããŸããã¿ã¹ã¯åé¡ã¢ã«ãŽãªãºã ã§ããããã«ä»åè£ãã®ã¹ããŒã¹ãééè€é åãšã«å²ãåœãŠãããã®é åã¯ã¯ã©ã¹ã§ãããããèŠèŠåããŠã¿ãŸããããç§ãã¡ã®è³ã¯é«æ¬¡å
ãããããåãã®ã«èŠåŽããŠããã®ã§ã2Dã®äŸãšæ¬¡ã®ãããªéåžžã«åçŽãª2ã¯ã©ã¹ã®åé¡ã«åºå·ããŸãã
2ã€ã®ãã£ã¡ã³ã·ã§ã³ïŒç¹æ§/å±æ§ïŒãš2ã€ã®ã¯ã©ã¹ã§æ§æãããéåžžã«åçŽãªããŒã¿ã»ããã衚瀺ãããŸããçæãããããŒã¿ãã€ã³ãã¯éåžžãäžå€®ã«åæ£ãããŠããŸãããåãªãç·åœ¢é¢æ°ã§ããéãã¯ã ã¯ã©ã¹1ïŒå³äžïŒãšã¯ã©ã¹2ïŒå·ŠäžïŒ
f(x) = y
ã®2ã€ã®ã¯ã©ã¹ãåé¢ããŸããåŸã§ãªãŒããŒãã£ããã®åœ±é¿ã説æããããã«ãããã€ãã®ã©ã³ãã ãã€ãºãè¿œå ãããŸããïŒãªã¬ã³ãžè²ã®éãããŒã¿ãã€ã³ãããã³ãã®éïŒã HDTreeã®ãããªåé¡ã¢ã«ãŽãªãºã ã®ä»äºã¯ïŒååž°åé¡ã«ã䜿çšã§ããŸããïŒãåããŒã¿ãã€ã³ããã©ã®ã¯ã©ã¹ã«å±ããŠããããèŠã€ããããšã§ããèšãæããã°ã次ã®ãããªåº§æšã®ãã¢ãäžããããŸã(x, y)
(6, 2)
..ãç®æšã¯ããã®åº§æšããªã¬ã³ãžã¯ã©ã¹1ãŸãã¯ãã«ãŒã¯ã©ã¹2ã®ã©ã¡ãã«å±ãããã確èªããããšã§ããèå¥ã¢ãã«ã¯ããªããžã§ã¯ã空éïŒããã§ã¯ïŒxãyïŒè»žïŒããããããã«ãŒãšãªã¬ã³ãžã®é åã«åå²ããããšããŸãã
ãã®ããŒã¿ãèãããšãããŒã¿ãã©ã®ããã«åé¡ãããã«ã€ããŠã®æ±ºå®ïŒã«ãŒã«ïŒã¯éåžžã«ç°¡åã«æããŸããåççãªäººã¯ãæåã«èªåã§èããªããããšèšãã§ãããããããã¯ãx> yã®å Žåã¯ã¯ã©ã¹1 ããã以å€ã®å Žåã¯ã¯ã©ã¹2ã§ããã
y=x
ç¹ç·ã®é¢æ°ã¯å®å
šãªåé¢ãäœæããŸããå®éããµããŒããã¯ã¿ãŒãã·ã³[8]ãªã©ã®æ倧ããŒãžã³åé¡åã¯åæ§ã®è§£æ±ºçã瀺åããŸããããããã©ã®æ±ºå®ããªãŒãåé¡ãç°ãªãæ¹æ³ã§è§£æ±ºããããèŠãŠã¿ãŸãããã
ãã®ç»åã¯ãæ·±ããå¢ãæšæºã®æ±ºå®ããªãŒãããŒã¿ãã€ã³ããã¯ã©ã¹1ïŒãªã¬ã³ãžïŒãŸãã¯ã¯ã©ã¹2ïŒéïŒãšããŠåé¡ããé åã瀺ããŠããŸãã
決å®ããªãŒã¯ãã¹ãããé¢æ°ã䜿çšããŠç·åœ¢é¢æ°ãè¿äŒŒããŸããããã¯ã決å®ããªãŒã䜿çšããæ€èšŒããã³ããŒãã£ã·ã§ã³åã«ãŒã«ã®ã¿ã€ãã«ãããã®ã§ãããããã¯ãã¹ãŠã軞ã«å¹³è¡ãªãã€ããŒãã¬ãŒã³ã
attribute < threshold
ãããããã¿ãŒã³ã§æ©èœã ãŸãã 2D空éã§ã¯ãé·æ¹åœ¢ã¯ãã«ããããããŸãã 3Dã§ã¯ããããã¯ç«æ¹äœãªã©ã«ãªããŸããããã«ã決å®ããªãŒã¯ããã§ã«8ã€ã®ã¬ãã«ãããå Žåãã€ãŸããªãŒããŒãã£ãããçºçããå Žåã«ãããŒã¿å
ã®ãã€ãºã®ã¢ããªã³ã°ãéå§ããŸãããã ããå®éã®ç·åœ¢é¢æ°ã®é©åãªè¿äŒŒã¯èŠã€ãããŸããããããæ€èšŒããããã«ããã¬ãŒãã³ã°ããŒã¿ãšãã¹ãããŒã¿ã®å
žåçãª2察1ã®åå²ã䜿çšããŠãããªãŒã®ç²ŸåºŠãèšç®ããŸããããã¹ãã»ããã§ã¯93.84ïŒ
ã93.03ïŒ
ã90.81ïŒ
ããã¬ãŒãã³ã°ã»ããã§ã¯94.54ïŒ
ã96.57ïŒ
ã98.81ïŒ
ã§ããïŒæšã®æ·±ã4ã8ã16ã®é ã«äžŠã¹ãããŸãïŒããã¹ãã®ç²ŸåºŠã¯äœäžããŸããããã¬ãŒãã³ã°ã®ç²ŸåºŠã¯åäžããŸãã
ãã¬ãŒãã³ã°å¹çã®åäžãšãã¹ãçµæã®äœäžã¯ããªãŒããŒãã¬ãŒãã³ã°ã®å åã§ããçµæãšããŠåŸããã決å®ããªãŒã¯ããã®ãããªåçŽãªé¢æ°ã§ã¯éåžžã«è€éã§ããscikit learnã§ã¬ã³ããªã³ã°ãããæãåçŽãªãã®ïŒæ·±ã4ïŒã¯ããã§ã«æ¬¡ã®ããã«ãªã£ãŠããŸãã
ç§ã¯ããªãã«ãã£ãšé£ããæšãåãé€ããŸãã次ã®ã»ã¯ã·ã§ã³ã§ã¯ãHDTreeããã±ãŒãžã䜿çšããŠãã®åé¡ã解決ããããšããå§ããŸããHDTreeã䜿çšãããšããŠãŒã¶ãŒã¯ããŒã¿ã«é¢ããç¥èãé©çšã§ããŸãïŒäŸã®ç·åœ¢åé¢ã«é¢ããç¥èãšåãããã«ïŒããŸããåé¡ã®ä»£æ¿ãœãªã¥ãŒã·ã§ã³ãèŠã€ããããšãã§ããŸãã
HDTreeããã±ãŒãžã®é©çš
ãã®ã»ã¯ã·ã§ã³ã§ã¯ãHDTreeã®åºæ¬ã玹ä»ããŸããAPIã®ããã€ãã®éšåã«è§ŠããŠã¿ãŸããã³ã¡ã³ãã§è³ªåããããããã«ã€ããŠè³ªåãããå Žåã¯ç§ã«é£çµ¡ããŠãã ãããåãã§ãçãããå¿ èŠã«å¿ããŠèšäºãè£è¶³ããŸããHDTreeã®ã€ã³ã¹ããŒã«ã¯ããããå°ãè€éã§ã
pip install hdtree
ãããããªããããŸããPython3.5以éãå¿
èŠã§ãã
- 空ã®ãã£ã¬ã¯ããªãäœæãããã®äžã«hdtreeïŒ
your_folder/hdtree
ïŒãšããååã®ãã©ã«ããäœæããŸã - ãªããžããªãhdtreeãã£ã¬ã¯ããªïŒå¥ã®ãµããã£ã¬ã¯ããªã§ã¯ãªãïŒã«ã¯ããŒã³ããŸãã
- å¿
èŠãªäŸåé¢ä¿ãã€ã³ã¹ããŒã«ããŸã
numpy
ãpandas
ãgraphviz
ãsklearn
ã - ã«è¿œå
your_folder
ãPYTHONPATH
ãŸããããã«ã¯ãPythonã€ã³ããŒããšã³ãžã³ã®ãã£ã¬ã¯ããªãå«ãŸããŸããéåžžã®Pythonããã±ãŒãžã®ããã«äœ¿çšã§ããŸãã
ãŸãã¯ãã€ã³ã¹ããŒã«
hdtree
ãã©ã«ãã«è¿œå ããŸããåŸã§ã€ã³ã¹ããŒã«ãã¡ã€ã«ãè¿œå ã§ããŸããå·çæç¹ã§ã¯ãã³ãŒãã¯pipãªããžããªã§å©çšã§ããŸããã以äžã®ã°ã©ãã£ãã¯ã¹ãšåºåãçæãããã¹ãŠã®ã³ãŒãïŒããã³åã«ç€ºãããã®ïŒã¯ãªããžããªã«ãããããã«çŽæ¥æçš¿ãããŸããå
åŒããªãŒã䜿çšããç·åœ¢åé¡ã®è§£æ±º
ããã«ã³ãŒãããå§ããŸããããsite-packages
python
from hdtree import HDTreeClassifier, SmallerThanSplit, EntropyMeasure
hdtree_linear = HDTreeClassifier(allowed_splits=[SmallerThanSplit.build()], # Split rule in form a < b
information_measure=EntropyMeasure(), # Use Information Gain for the scores attribute_names=['x', 'y' ]) # give the
attributes some interpretable names # standard sklearn-like interface hdtree_linear.fit(X_street_train,
y_street_train) # create tree graph hdtree_linear.generate_dot_graph()
ã¯ããçµæã®ããªãŒã¯1ã¬ãã«ã®é«ãã§ããããã®åé¡ã«å¯Ÿããå®ç§ãªãœãªã¥ãŒã·ã§ã³ãæäŸããŸããããã¯ãå¹æã瀺ãããã®äººå·¥çãªäŸã§ãããã ãããã€ã³ããæ確ã«ãªãããšãé¡ã£ãŠããŸããããŒã¿ãçŽæçã«è¡šç€ºããããæ©èœã¹ããŒã¹ãåå²ããããã®ããŸããŸãªãªãã·ã§ã³ã決å®ããªãŒã«æäŸããã ãã§ãããåçŽã§ãå Žåã«ãã£ãŠã¯ããã«æ£ç¢ºãªãœãªã¥ãŒã·ã§ã³ãæäŸãããå¯èœæ§ããããŸããæçšãªæ å ±ãèŠã€ããããã«ãããã«ç€ºãããŠããããªãŒããã«ãŒã«ã解éããå¿ èŠããããšæ³åããŠãã ãããæåã«ç解ã§ãã解éãšãä¿¡é Œã§ãã解éïŒãã«ãã¹ãããé¢æ°ã䜿çšããè€éãªè§£éããŸãã¯å°ããªæ£ç¢ºãªããªãŒïŒçãã¯ãšãŠãç°¡åã ãšæããŸããããããã³ãŒãèªäœãããå°ãæ·±ãæãäžããŠã¿ãŸããããåæåãããšãã
HDTreeClassifier
ããªããæäŸããªããã°ãªããªãæãéèŠãªããšã¯allowed_splits
ã§ããããã§ã¯ãããŒã¿ã®é©åãªããŒã«ã«ããŒãã£ã·ã§ãã³ã°ãèŠã€ããããã«ãã¢ã«ãŽãªãºã ãåããŒãã®ãã¬ãŒãã³ã°äžã«è©Šè¡ããå¯èœæ§ã®ããããŒãã£ã·ã§ãã³ã°ã«ãŒã«ãå«ããªã¹ããæäŸããŸãããã®å Žåãç§ãã¡ã¯ç¬å çã«æäŸããŸãã SmallerThanSplit
ããã®åå²ã¯ã衚瀺ãããŠãããšããã«å®è¡ãããŸãã2ã€ã®å±æ§ãåãïŒä»»æã®çµã¿åãããè©Šè¡ãïŒãã¹ããŒãã«åŸã£ãŠããŒã¿ãåå²ããŸãa_i < a_j
ãã©ããïŒããŸãã«ãã©ã³ãã ã§ã¯ãªãïŒç§ãã¡ã®ããŒã¿ãšå¯èœãªéãäžèŽããŸãã
ãã®ã¿ã€ãã®åå²ã¯ãå€å€éåå²ãšåŒã°ããŸãããã¯ãåé¢ã決å®ãäžãããã«è€æ°ã®æ©èœã䜿çšããããšãæå³ããŸããããã¯ã1ã€ã®å±æ§ã®ã¿ãèæ ®ã«å ¥ããïŒè©³çŽ°ã«ã€ããŠã¯äžèšãåç §ïŒãªã©ãä»ã®ã»ãšãã©ã®ããªãŒã§äœ¿çšãããäžæ¹åã®ããŒãã£ã·ã§ã³åå²ãšã¯ç°ãªããŸãããã¡ãããscikitããªãŒã®ãããªãéåžžã®ããŒãã£ã·ã§ãã³ã°ããå®çŸãããªãã·ã§ã³ããããŸãâãã¡ããªãŒãèšäºãé²ãã«ã€ããŠããã£ãšãèŠãããŸããã³ãŒãã«è¡šç€ºãããå¯èœæ§ã®ãããã1ã€ã®èŠæ £ããªããã®ã¯ããã€ããŒãã©ã¡ãŒã¿ã§ãããã®ãã©ã¡ãŒã¿ãŒã¯ãåäžããŒããŸãã¯å®å šãªåå²ïŒèŠªããŒããšãã®åïŒã®å€ãè©äŸ¡ããããã«äœ¿çšããããã£ã¡ã³ã·ã§ã³ãè¡šããŸããéžæããããªãã·ã§ã³ã¯ãšã³ããããŒã«åºã¥ããŠããŸã[10]ãããªããèããããšããããããããŸãã
scikit-tree
HDTree
QuantileSplit
information_measure
ãžãä¿æ°ãããã¯å¥ã®æå¹ãªãªãã·ã§ã³ã§ãããã¡ãããé©åãªã€ã³ã¿ãŒãã§ã€ã¹ãå®è£
ããã ãã§ãç¬èªã®ãã£ã¡ã³ã·ã§ã³ãæäŸã§ããŸããå¿
èŠã«å¿ããŠãgini-Indexãå®è£
ããŸããããã¯ãä»ã«äœãåå®è£
ããã«ããªãŒã§äœ¿çšã§ããŸããã³ããŒEntropyMeasure()
ããŠèªåã«åãããŠãã ãããã¿ã€ã¿ããã¯ã®çœå®³ãããã«æ·±ãæãäžããŸããããç§ã¯èªåã®äŸããåŠã¶ã®ã倧奜ãã§ããããã§ãçæãããããŒã¿ã§ã¯ãªããç¹å®ã®äŸã§ããã«ããã€ãã®HDTreeé¢æ°ã衚瀺ãããŸãã
ããŒã¿ã»ãã
è¥ãæŠéæ©ã³ãŒã¹ã®æåãªæ©æ¢°åŠç¿ããŒã¿ã»ããã§ããã¿ã€ã¿ããã¯çœå®³ããŒã¿ã»ããã䜿çšããŸããããã¯éåžžã«åçŽãªã»ããã§ãããããã»ã©å€§ããã¯ãããŸããããå®å šã«äºçŽ°ãªããšã§ã¯ãããŸããããããã€ãã®ç°ãªãããŒã¿ã¿ã€ããšæ¬ èœããå€ãå«ãŸããŠããŸãããŸãã人éã«ãããããããããã§ã«å€ãã®äººã䜿ã£ãŠããŸããããŒã¿ã¯æ¬¡ã®ãã
ã«ãªããŸãããã¹ãŠã®çš®é¡ã®å±æ§ãããããšãããããŸããæ°å€ãã«ããŽãªãæŽæ°ã¿ã€ããããã«ã¯æ¬ èœããŠããå€ïŒãã£ãã³åãã芧ãã ããïŒã課é¡ã¯ãå ¥æå¯èœãªä¹å®¢æ å ±ã«åºã¥ããŠãä¹å®¢ãã¿ã€ã¿ããã¯çœå®³ãçã延ã³ããã©ãããäºæž¬ããããšã§ããå€å±æ§ã®èª¬æã¯ããã«ãããŸãã MLãã¥ãŒããªã¢ã«ãåŠç¿ãããã®ããŒã¿ã»ãããé©çšããããšã§ãããããçš®é¡ã®äœæ¥ãè¡ãããšãã§ããŸããååŠçæ¬ æå€ãé€å»ãäŸãã°ãäžè¬çãªæ©æ¢°åŠç¿ã¢ãã«ã§äœæ¥ã§ããããã«ãã
NaN
ã[12]ã®å€ã眮ãæããããšã«ãããè¡/åããããããããåäžç¬Šå·[13]ã«ããŽãªããŒã¿ãïŒäŸãã°ãEmbarked
ããã³Sex
ãŸãã¯æå¹ãªããŒã¿ã»ãããååŸããããã«ããŒã¿ãã°ã«ãŒãåããŸãããã¯MLã¢ãã«ãåãå
¥ããŸãããã®çš®ã®ã¯ãªãŒã³ã¢ããã¯HDTreeã§ã¯æè¡çã«å¿
èŠãããŸãããããŒã¿ããã®ãŸãŸæäŸã§ããã¢ãã«ã¯åãã§åãå
¥ããŸããå®éã®ãªããžã§ã¯ããèšèšããå Žåã«ã®ã¿ããŒã¿ãå€æŽããŸãããã¹ãŠãç°¡ç¥åããŠéå§ããŸããã
ã¿ã€ã¿ããã¯ããŒã¿ã§æåã®HDTreeããã¬ãŒãã³ã°ãã
ããŒã¿ããã®ãŸãŸååŸããŠã¢ãã«ã«ãã£ãŒãããŠã¿ãŸããããåºæ¬çãªã³ãŒãã¯äžèšã®ã³ãŒããšäŒŒãŠããŸããããã®äŸã§ã¯ããã«å€ãã®ããŒã¿åå²ãå¯èœã§ãã
hdtree_titanic = HDTreeClassifier(allowed_splits=[FixedValueSplit.build(), # e.g., Embarked = 'C'
SingleCategorySplit.build(), # e.g., Embarked -> ['C', 'Q', 'S']
TwentyQuantileRangeSplit.build(), # e.g., IN Quantile 3-5
TwentyQuantileSplit.build()], # e.g., BELOW Quantile 7
information_measure=EntropyMeasure(),
attribute_names=col_names,
max_levels=3) # restrict to grow to a max of 3 levels
hdtree_titanic.fit(X_titanic_train.values, y_titanic_train.values)
hdtree_titanic.generate_dot_graph()
äœãèµ·ãã£ãŠããã®ãã詳ããèŠãŠã¿ãŸãããã3ã€ã®ã¬ãã«ãæã€æ±ºå®ããªãŒãäœæãã4ã€ã®å¯èœãªSplitRuleã®ãã¡3ã€ã䜿çšããããšãéžæããŸããããããã¯æåS1ãS2ãS3ã§ããŒã¯ãããŠããŸãã圌ããäœãããŠããã®ãç°¡åã«èª¬æããŸãã
- S1 ïŒ
FixedValueSplit
ããã®åå²ã¯ãã«ããŽãªããŒã¿ãåŠçããå¯èœãªå€ã®1ã€ãéžæããŸãã次ã«ãããŒã¿ã¯ããã®å€ãæã€1ã€ã®éšåãšãå€ãèšå®ãããŠããªãå¥ã®éšåã«åå²ãããŸããããšãã°ãPClass = 1ããã³Pclassâ 1ã§ãã - S2: ()
QuantileRangeSplit
. . , . 1 5 . ( ) (measure_information
). (i) (ii) â . . - S3 :( 20ïŒ
QuantileSplit
ãSplit RangeïŒS2ïŒã«äŒŒãŠããŸããããããå€ã«åºã¥ããŠããŒã¿ãåå²ããŸããããã¯åºæ¬çã«éåžžã®æ±ºå®ããªãŒãè¡ãããšã§ãããéåžžãåºå®æ°ã§ã¯ãªããã¹ãŠã®å¯èœãªãããå€ãè©Šè¡ããŸãã
ããªãã¯èªåã
SingleCategorySplit
é¢äžããŠããªãããšã«æ°ã¥ãããããããŸããããã®éšéã®çç¥ã¯åŸã§æããã«ãªãã®ã§ããšã«ããæ確ã«ããæ©äŒããããŸãïŒ
- S4ïŒ
SingleCategorySplit
åæ§FixedValueSplit
ã«æ©èœããŸãããå¯èœãªãã¹ãŠã®å€ã«å¯ŸããŠåããŒããäœæããŸããããšãã°ãPClasså±æ§ã®å Žåã3ã€ã®åããŒãã«ãªããŸãïŒããããã¯ã©ã¹1ãã¯ã©ã¹2ãããã³ã¯ã©ã¹3ïŒãå¯èœãªã«ããŽãªã2ã€ãããªãå ŽåFixedValueSplit
ã¯ãåãã§ããããšã«æ³šæããŠãã ããSingleValueSplit
ã
åã ã®éšéã¯ããåãå ¥ãããããŒã¿ã¿ã€ã/å€ã«é¢ããŠãããããã¹ããŒããã§ããäœããã®å»¶é·ãè¡ããããŸã§ã圌ãã¯ã©ã®ãããªç¶æ³ã§é©çšãããé©çšãããªãããç¥ã£ãŠããŸããããªãŒã¯ããã¬ãŒãã³ã°ããŒã¿ãšãã¹ãããŒã¿ã2察1ã«åå²ããŠãã¬ãŒãã³ã°ãããŸãããããã©ãŒãã³ã¹ã¯ããã¬ãŒãã³ã°ããŒã¿ã§80.37ïŒ ããã¹ãããŒã¿ã§81.69ïŒ ã®ç²ŸåºŠã§ããããããªã«æªããªãã
åå²ã®å¶é
äœããã®çç±ã§èŠã€ãã£ã解決çã«ããŸãæºè¶³ããŠããªããšä»®å®ããŸãããããã¶ããããªãŒã®äžçªäžã§æåã«åå²ããã®ã¯ç°¡åãããïŒå±æ§ã§åå²ãã
sex
ïŒãšå€æãããããããŸãããHDTreeã¯åé¡ã解決ããŸããæãç°¡åãªè§£æ±ºçã¯ãäžéšã«è¡šç€ºãããªãããã«ããããšã§ãFixedValueSplit
ïŒããã«èšãã°ãåçã®ãã®SingleCategorySplit
ïŒãããã¯ããªãç°¡åã§ãã次ã®ããã«åå²ã®åæåãå€æŽããŸãã
- SNIP -
...allowed_splits=[FixedValueSplit.build_with_restrictions(min_level=1),
SingleCategorySplit.build_with_restrictions(min_level=1),...],
- SNIP -
æ°ããçæãããããªãŒå ã§æ¬ èœããŠããåå²ïŒS4ïŒã確èªã§ãããããçµæã®HDTreeå šäœã瀺ããŸãããã©ã¡ãŒã¿ã®ãããã§
åå²
sex
ãã«ãŒãã«è¡šç€ºãããªãããã«ããããšã§ min_level=1
ïŒãã³ãïŒãã¡ããæå®ããããšãã§ããŸãmax_level
ïŒãããªãŒãå®å
šã«åæ§ç¯ããŸããããã®ããã©ãŒãã³ã¹ã¯çŸåš80.37ïŒ
ãš81.69ïŒ
ïŒãã¬ãŒãã³ã°/ãã¹ãïŒã§ããã«ãŒãããŒãã§ããè¯ãåé¢ãè¡ã£ããšããŠããããã¯ãŸã£ããå€åããŸããã§ããã
ãã·ãžã§ã³ããªãŒã¯è²ªæ¬²ã§ãããããåããŒãã®ããŒã«ã«ã®_æè¯ã®ããŒãã£ã·ã§ã³ã®ã¿ãæ€åºãããŸããããã¯ãå¿ ããã_æè¯ã®_ãªãã·ã§ã³ã§ãããšã¯éããŸãããå®éã[15]ã§èšŒæãããŠããããã«ãææ決å®ããªãŒã®åé¡ã«å¯Ÿããçæ³çãªè§£æ±ºçãèŠã€ããããšã¯ãNPå®å šãªåé¡ã§ãããããã£ãŠãç§ãã¡ãæ±ããããšãã§ããæåã®æ¹æ³ã¯ãã¥ãŒãªã¹ãã£ãã¯ã§ããäŸã«æ»ããŸããããããŒã¿ã®éèŠãªè¡šçŸããã§ã«ããããšã«æ³šæããŠãã ãããããããããã¯äºçŽ°ãªããšã§ããç·æ§ã®çåã®å¯èœæ§ã¯äœããšèšããŸãã
PClass
ããã§ã«ããŒã«ïŒEmbarked=C
ïŒããé£ã³ç«ã€1幎çãŸãã¯2幎çã§ãããšãçåã®å¯èœæ§ãé«ãŸãå¯èœæ§ããããšçµè«ä»ããããšãã§ããŸãããŸãã¯PClass 3
ã33æ³æªæºã®ç·æ§ã®å Žå ããã£ã³ã¹ãå¢ããŸããïŒèŠããŠãããŠãã ããïŒæåã«å¥³æ§ãšåäŸãã¡ãèŠèŠåã解éããŠããããã®çµè«ãèªåã§åŒãåºãããšããå§ãããŸãããããã®çµè«ã¯ãããªãŒã®å¶éã®ããã«ã®ã¿å¯èœã§ãããä»ã®å¶éãé©çšããããšã«ãã£ãŠä»ã«äœãæããã«ãªãããšãã§ããã誰ãç¥ã£ãŠããŸããïŒãããè©ŠããŠã¿ãŠãã ããïŒ
ãã®çš®ã®æåŸã®äŸãšããŠãããŒãã£ã·ã§ãã³ã°ãç¹å®ã®å±æ§ã«å¶éããæ¹æ³ã瀺ããããšæããŸããããã¯ãäžèŠãªçžé¢é¢ä¿ã匷å¶çãªä»£æ¿æ¡ã§ã®ããªãŒåŠç¿ãé²ãã ãã§ãªããæ€çŽ¢ã¹ããŒã¹ãçããããã«ãé©çšã§ããŸãããã®ã¢ãããŒãã«ãããç¹ã«å€å€éããŒãã£ã·ã§ãã³ã°ã䜿çšããå Žåã«ãå®è¡æéãå€§å¹ ã«ççž®ã§ããŸããåã®äŸã«æ»ããšãå±æ§ããã§ãã¯ããããŒããèŠã€ããå ŽåããããŸã
PassengerId
ãå°ãªããšãçåã«é¢ããæ
å ±ã«è²¢ç®ããã¹ãã§ã¯ãªãã®ã§ãã¢ãã«åããããªããããããŸãããä¹å®¢IDã®ç¢ºèªã¯ãåãã¬ãŒãã³ã°ã®å
åã§ããå¯èœæ§ããããŸãããã©ã¡ãŒã¿ã§ç¶æ³ãå€ããŠã¿ãŸãããblacklist_attribute_indices
ã
- SNIP -
...allowed_splits=[TwentyQuantileRangeSplit.build_with_restrictions(blacklist_attribute_indices=['PassengerId']),
FixedValueSplit.build_with_restrictions(blacklist_attribute_indices=['Name Length']),
...],
- SNIP -
ãªã
name length
ããããŸã£ããçŸããã®ããšããªãã¯å°ãããããããŸãããé·ãååïŒããã«ããŒã ãŸãã¯[é«è²Žãª]ã¿ã€ãã«ïŒã¯è±ããªéå»ã瀺ããçåã®å¯èœæ§ãé«ããå¯èœæ§ãããããšã«æ³šæããŠãã ããã
è¿œå ã®ãã³ãïŒåããã®ããã€ã§ãSplitRule
2åè¿œå ã§ããŸã ãç¹å®ã®HDTreeã¬ãã«ã®å±æ§ã®ã¿ããã©ãã¯ãªã¹ãã«ç»é²ããå Žåã¯SplitRule
ãã¬ãã«å¶éãè¿œå ããªãã§ãã ããã
ããŒã¿ãã€ã³ãã®äºæž¬
ãã§ã«ãæ°ã¥ããããããŸããããscikit-learnå ±éã€ã³ã¿ãŒãã§ãŒã¹ãäºæž¬ã«äœ¿çšã§ããŸãããã
predict()
ãpredict_proba()
åæ§ã« score()
ãããããããã«å
ã«é²ãããšãã§ããŸããããexplain_decision()
ãœãªã¥ãŒã·ã§ã³ã®ããã¹ãè¡šçŸã衚瀺ããŸãäžã€ã¯ã
print(hdtree_titanic_3.explain_decision(X_titanic_train[42]))
ããã¯ãããªãŒãžã®æåŸã®å€æŽã§ãããšèŠãªãããŸããã³ãŒãã¯ãããåºåããŸãïŒ
Query:
Query:
{'PassengerId': 273, 'Pclass': 2, 'Sex': 'female', 'Age': 41.0, 'SibSp': 0, 'Parch': 1, 'Fare': 19.5, 'Cabin': nan, 'Embarked': 'S', 'Name Length': 41}
Predicted sample as "Survived" because of:
Explanation 1:
Step 1: Sex doesn't match value male
Step 2: Pclass doesn't match value 3
Step 3: Fare is OUTSIDE range [134.61, ..., 152.31[(19.50 is below range)
Step 4: Leaf. Vote for {'Survived'}
ããã¯ãæ¬ èœããŠããããŒã¿ã«å¯ŸããŠãæ©èœããŸããå±æ§2ïŒ
Sex
ïŒã®ã€ã³ããã¯ã¹ãmissing (None
ïŒã«èšå®ããŸãããïŒ
passenger_42 = X_titanic_train[42].copy()
passenger_42[2] = None
print(hdtree_titanic_3.explain_decision(passenger_42))
Query:
{'PassengerId': 273, 'Pclass': 2, 'Sex': None, 'Age': 41.0, 'SibSp': 0, 'Parch': 1, 'Fare': 19.5, 'Cabin': nan, 'Embarked': 'S', 'Name Length': 41}
Predicted sample as "Death" because of:
Explanation 1:
Step 1: Sex has no value available
Step 2: Age is OUTSIDE range [28.00, ..., 31.00[(41.00 is above range)
Step 3: Age is OUTSIDE range [18.00, ..., 25.00[(41.00 is above range)
Step 4: Leaf. Vote for {'Death'}
---------------------------------
Explanation 2:
Step 1: Sex has no value available
Step 2: Pclass doesn't match value 3
Step 3: Fare is OUTSIDE range [134.61, ..., 152.31[(19.50 is below range)
Step 4: Leaf. Vote for {'Survived'}
---------------------------------
ããã«ããããã¹ãŠã®æ±ºå®ãã¹ãåºåãããŸãïŒäžéšã®ããŒãã§ã¯æ±ºå®ãè¡ãããšãã§ããªããããè€æ°ãããŸãïŒïŒãæçµçµæã¯ããã¹ãŠã®èã®äžã§æãäžè¬çãªã¯ã©ã¹ã«ãªããŸãã
...ãã®ä»ã®äŸ¿å©ãªãã®
å ã«é²ãã§ãããªãŒãã¥ãŒãããã¹ããšããŠååŸã§ããŸãã
Level 0, ROOT: Node having 596 samples and 2 children with split rule "Split on Sex equals male" (Split Score:
0.251)
-Level 1, Child #1: Node having 390 samples and 2 children with split rule "Age is within range [28.00, ..., 31.00["
(Split Score: 0.342)
--Level 2, Child #1: Node having 117 samples and 2 children with split rule "Name Length is within range [18.80,
..., 20.00[" (Split Score: 0.543)
---Level 3, Child #1: Node having 14 samples and no children with
- SNIP -
ãŸãã¯ããã¹ãŠã®ã¯ãªãŒã³ããŒãã«ã¢ã¯ã»ã¹ããŸãïŒã¹ã³ã¢ãé«ãïŒïŒ
[str(node) for node in hdtree_titanic_3.get_clean_nodes(min_score=0.5)]
['Node having 117 samples and 2 children with split rule "Name Length is within range [18.80, ..., 20.00[" (Split
Score: 0.543)',
'Node having 14 samples and no children with split rule "no split rule" (Node Score: 1)',
'Node having 15 samples and no children with split rule "no split rule" (Node Score: 0.647)',
'Node having 107 samples and 2 children with split rule "Fare is within range [134.61, ..., 152.31[" (Split Score:
0.822)',
'Node having 102 samples and no children with split rule "no split rule" (Node Score: 0.861)']
HDTreeæ¡åŒµ
ããªããã·ã¹ãã ã«è¿œå ããããšæããããããªãæãéèŠãªãã®ã¯ããªãèªèº«ã®ãã®
SplitRule
ã§ããåé¢ã«ãŒã«ã¯ãåé¢ãããããšãå®éã«å®è¡ã§ããŸã...å®è£
SplitRule
ãéããŠå®è£
ãAbstractSplitRule
ãŸããããŒã¿ã®åã蟌ã¿ãããã©ãŒãã³ã¹ã®è©äŸ¡ãªã©ãèªåã§åŠçããå¿
èŠããããããããã¯å°ã泚æãå¿
èŠã§ãããããã®çç±ã«ãããåå²ã¿ã€ãã«å¿ããŠå®è£
ã«è¿œå ã§ããããã¯ã¹ã€ã³ãããã±ãŒãžã«å«ãŸããŠããŸããããã¯ã¹ã€ã³ã¯ããªãã®ããã«é£ããéšåã®ã»ãšãã©ãè¡ããŸãã
æžèª
- [1] Wikipedia article on Decision Trees
- [2] Medium 101 article on Decision Trees
- [3] Breiman, Leo, Joseph H Friedman, R. A. Olshen and C. J. Stone. âClassification and Regression Trees.â (1983).
- [4] scikit-learn documentation: Decision Tree Classifier
- [5] Cython project page
- [6] Wikipedia article on pruning
- [7] sklearn documentation: plot a Decision Tree
- [8] Wikipedia article Support Vector Machine
- [9] MLExtend Python library
- [10] Wikipedia Article Entropy in context of Decision Trees
- [12] Wikipedia Article on imputing
- [13] Hackernoon article about one-hot-encoding
- [14] Wikipedia Article about Quantiles
- [15] Hyafil, Laurent; Rivest, Ronald L. âConstructing optimal binary decision trees is NP-completeâ (1976)
- [16] Hackernoon Article on Decision Trees
ãªã³ã©ã€ã³ã®SkillFactoryã³ãŒã¹ãåè¬ããŠã泚ç®ãéããè·æ¥ããŒãããååŸããæ¹æ³ããŸãã¯ã¹ãã«ãšçµŠäžãã¬ãã«ã¢ããããæ¹æ³ã®è©³çŽ°ã確èªããŠãã ããã
- æ©æ¢°åŠç¿ã³ãŒã¹ïŒ12é±éïŒ
- äžçŽã³ãŒã¹ããã·ã³ã©ãŒãã³ã°ãã+ãã£ãŒãã©ãŒãã³ã°ãïŒ20é±éïŒ
- « Machine Learning Data Science» (20 )
- «Python -» (9 )
E