. OTUS Machine Learning: . , -, â (Senior Data Scientist Oura) , .
ããŒã¿ãµã€ãšã³ãã£ã¹ããšããŠã®ç§ãã¡ã®äž»ãªçŠç¹ã¯ãããŒã¿ãåŠçããæ©æ¢°åŠç¿ã¢ãã«ãéçºããã³æ¹åããããšã§ããããŒã¿åŠçã¯ãããžã§ã¯ãå šäœã§æãæéã®ãããæ®µéã§ãããã¢ãã«ã®æ£ç¢ºããæ å ±è£œåã®æåãæ±ºå®ãããšããæèŠããããŸããããããæ¥çã§ã¯ã人工ç¥èœã®åéã§ïŒè¶ 倧åœãã®å®è£ ã®æä»£ã«çºèŠã®æä»£ãããéæž¡æã«ãªããŸããïŒäžåœãã·ãªã³ã³ãã¬ãŒãããã³ãã®åéã§ã®æ°ããäžçç§©åºãã«ãã£ãŠæ±ºå®ããããªããŠã æµ·éšïŒãçŸåšãç¶æ³ã¯æ¡å€§ããŠãããçŠç¹ã¯ã¢ãã«ã®æ§ç¯ãããµãŒãã¹ãšããŠã®ãŠãŒã¶ãŒãžã®ã¢ãã«ã®æäŸããããŠã¢ãã«ã®ããã©ãŒãã³ã¹ãããã®ããžãã¹äŸ¡å€ãžãšã·ããããŠããŸããããã§æãæåãªäŸã¯Netflixã§ããããã¯ããããã®ãšã³ãžã³ããå€§å¹ ãªããã©ãŒãã³ã¹ã®åäžãçŽæãããŠããã«ããããããããšã³ãžãã¢ãªã³ã°ã³ã¹ãã®ããã«100äžãã«ã®ã¢ã«ãŽãªãºã ã®åå©ã¢ãã«ã䜿çšããããšããããŸãã-WIREDã
çè§£ããçŸå®ãžïŒStrata Dataã«ã³ãã¡ã¬ã³ã¹ã®ã¹ã©ã€ã-Kubeflowã®èª¬æïŒKubernetesã§ã®ããŒã¿ãã«ãã·ã³åŠç¿ïŒ
ã¢ãã«ã®å®è£ ã¯éåžžã«éèŠã§ãããæ å ±è£œåã¯ãããžã§ã¯ãæ§é ã管çãã©ã€ããµã€ã¯ã«ãé¡äŒŒããŠããããããœãããŠã§ã¢è£œåãšèŠãªãããšãã§ããŸãããããã£ãŠããœãããŠã§ã¢éçºã®åéã§ç¥ãããŠãããã¹ãŠã®ææ³ã䜿çšããŠãæ©æ¢°åŠç¿ã¢ãã«ãæ¬çªç°å¢ã«å±éããæš©å©ããããŸãã
ã³ã³ããåã¯ãã¯ã©ãŠããã©ãããã©ãŒã ãšããŒã«ã«ãµãŒããŒã®äž¡æ¹ã«ãœãããŠã§ã¢è£œåãå±éããããã«åºã䜿çšãããŠããæ¹æ³ã§ããåºæ¬çã«ãã³ã³ãããšåŒã°ããããã¯ã¹ã«ã³ãŒããšäŸåé¢ä¿ãããã±ãŒãžåããããšã«ã€ããŠè©±ããŸãã以äžã¯ããœãããŠã§ã¢éçºã®ã³ã³ããã¹ãã§ã®ã³ã³ããã®å®çŸ©ã§ãã
Docker ãµã€ãããã®
ã³ã³ããã¯ãã³ãŒããšãã®ãã¹ãŠã®äŸåé¢ä¿ãããã±ãŒãžåããæšæºçãªãœãããŠã§ã¢ã§ãããã¢ããªã±ãŒã·ã§ã³ãããŸããŸãªã³ã³ãã¥ãŒãã£ã³ã°ç°å¢ã§è¿ éãã€ç¢ºå®ã«å®è¡ã§ããŸãã
Dockerã¯ããã·ã³åŠç¿ã¢ãã«ã®éçºãã³ã³ããåãããã³ä»ã®ç°å¢ãžã®å±éãå éããã®ã«åœ¹ç«ã€ãã©ãããã©ãŒã ã§ãããã®ã·ãªãŒãºã®èšäºã§ã¯ãã¢ãã«ãä¿åãããããAPIãšã³ããã€ã³ããšããŠäœ¿çšããMLã¢ããªã±ãŒã·ã§ã³ãã³ã³ããåããDockerãšã³ãžã³ã§å®è¡ããæ¹æ³ã玹ä»ããŸãã
質å1ããªãDockerïŒã
éå§ããåã«ãDocker IDããæã¡ã§ãªãå Žåã¯ç»é²ãããã®IDã䜿çšããŠDockerãããŠã³ããŒãããŠãã·ã³ã«ã€ã³ã¹ããŒã«ããå¿ èŠããããŸãã
ç§ãæåã«éè¡ã§ä»äºãå§ãããšããç§ã¯ããŒã¿åŠçãå«ããããžã§ã¯ããå²ãåœãŠãããæåã®MVPïŒæå°ã®å®è¡å¯èœãªè£œåïŒã¯1ãæã§é éãããªããã°ãªããŸããã§ãããã¹ãã¬ã¹ã«èãããŸãããç§ãã¡ããŒã ã¯ãã¹ãŠã®äž»èŠè£œåã®éçºã«ã¢ãžã£ã€ã«ææ³ã䜿çšããŠããŸãããã®MVPã®äž»ãªç®æšã¯ã補åã®å®çšæ§ãšæå¹æ§ã«é¢ãã仮説ããã¹ãããããšã§ããïŒã¢ãžã£ã€ã«ææ³ã®è©³çްã«ã€ããŠã¯ãEric Riesã®èæžã LeanStartup ããåç §ããŠãã ããïŒãç§ã®ãããŒãžã£ãŒã¯ãã¢ãã«ãèªåã®ã©ãããããã«ãããã€ããããšãã€ãŸããã¢ãã«ãå®è¡ããŠäºæž¬ã«äœ¿çšããããšãæãã§ããŸããã
ãããžã§ã¯ããå®è¡ããããã«ãããŒãžã£ãŒã®ã©ããããããæºåããããã«å¿ èŠãªãã¹ãŠã®æé ãæ³åããå Žåãæ¬¡ã®ãããªå€ãã®è³ªåããããããããŸããã
- 圌ã¯MacbookãšThinkPadã䜿çšããŠãããããã¢ãã«ã¯ã©ã®ãªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã§å®è¡ããå¿ èŠããããŸããïŒãã¡ãããç§ã¯åœŒã«ããã«ã€ããŠå°ããããšãã§ããŸããããç§ã®äººçã®ãã®ç¬éãç§ã®äžåžã¯éåžžã«åä»ã§ãç§ã«ãã®æ å ±ãç¥ãããããªãã£ããšæããŸããïŒãã®èãã¯ããªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã®äŸåé¢ä¿ã®åé¡ãèªèãããããã®ãã®ã§ãããç§ã®äžåžã¯æ¬åœã«è¯ã人ã§ããïŒ
- 2çªç®ã®è³ªåïŒã圌ã¯Pythonãã€ã³ã¹ããŒã«ããŠããŸããïŒããããããªããã©ã®ããŒãžã§ã³ã2ãŸãã¯3ïŒ2.6ã2.7ã3.7ã®ã©ãã§ããïŒ
- scikit-learnãpandasãnumpyãªã©ã®å¿ é ããã±ãŒãžã¯ã©ãã§ããïŒç§ã®ãã·ã³ãšåãããŒãžã§ã³ããããŸããïŒ
ããããã¹ãŠã®è³ªåã念é ã«çœ®ããŠãããã¯ç§ã®ã¢ãã«ãå®è¡ããããã«åœŒã®ã³ã³ãã¥ãŒã¿ãŒã§ç§ãããªããã°ãªããªãã£ãããšã§ãã
- Pythonãã€ã³ã¹ããŒã«ããŸãã
- ãã¹ãŠã®ããã±ãŒãžãã€ã³ã¹ããŒã«ããŸãã
- ç°å¢å€æ°ãèšå®ããŸãã
- ã³ãŒããè»ã«è»¢éããŸãã
- å¿ èŠãªãã©ã¡ãŒã¿ãŒã䜿çšããŠã³ãŒããå®è¡ããŸãã
ãããã®æé ã¯ãã¹ãŠå€å€§ãªåŽåãèŠããç°ãªãç°å¢ã§ã³ãŒããå®è¡ãããšéäºææ§ã®ãªã¹ã¯ããããŸãã
ãããã£ãŠãDockerããã§ã«ã€ã³ã¹ããŒã«ãããŠå®è¡ãããŠããå Žåã¯ãã¿ãŒããã«ãéããŠæ¬¡ã®ã³ãã³ããå®è¡ã§ããŸãã
docker run --rm -p 5000:5000 datascienceexplorer/classifier
æ°ååŸãã¿ãŒããã«ã«äŒŒããããªãã®ã衚瀺ãããŸãã
* Serving Flask app "main" (lazy loading)
* Environment: production
WARNING: Do not use the development server in a production environment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
次ã«ããæ°ã«å ¥ãã®ãã©ãŠã¶ãéããŠã次ã®ã¢ãã¬ã¹ã«ç§»åããŸãã
http://localhost:5000/apidocs/
APIã®äºæž¬ è¡ãã¯ãªãã¯ããŠãããå³åŽã®è©Šçšãã¿ã³ãã¯ãªãã¯ãããšãã€ã³ã¿ãŒãã§ã€ã¹ã¯æ¬¡ã®ãã
ã«ãªããŸããããã¯ãšã³ãã®APIã®ã¹ã¯ã¬ãŒããŒãžäœ¿çšã
ãæšæºã®Iris FlowersããŒã¿ã»ãããèŠããŠããŸããïŒãã®å°ããªã¢ããªã±ãŒã·ã§ã³ã¯ãåé¡ã¢ãã«ã«åºã¥ãããã€ãã®æž¬å®å€ã«é¢ããæ å ±ã«åºã¥ããŠãè±ã®çš®é¡ãäºæž¬ããã®ã«åœ¹ç«ã¡ãŸããå®éãããªãã¯ãã§ã«ç§ã®ãã·ã³åŠç¿ã¢ãã«ããµãŒãã¹ãšããŠäœ¿çšããŠãããã€ã³ã¹ããŒã«ãããã®ã¯ãã¹ãŠDockerã ãã§ãããPythonãããã±ãŒãžããã·ã³ã«ã€ã³ã¹ããŒã«ããå¿ èŠã¯ãããŸããã§ããã
ãããDockerã®åŒ·ã¿ã§ããäŸåé¢ä¿ã®åé¡ã解決ããã®ã«åœ¹ç«ã¡ãããŸããŸãªç°å¢ããã®å Žåã¯ãã·ã³ã«ã³ãŒãããã°ããå±éã§ããŸãã
DevOpsããŒã¿ãµã€ãšã³ã¹
ããŠãããŸãããã°ãç§ã¯ããªãã«èªã¿ç¶ããã®ã«ååãªåæ©ãäžããŸããããããŠããªãããããã®éšåãã¹ãããããŠã³ãŒãã«çŽæ¥è¡ãããã®ã§ããã°ãããã¯å€§äžå€«ã§ããããã¯ããªããDockerã§ããªãã®ãã·ã³åŠç¿ã¢ãã«ãã³ã³ããåããŠããããšããŠå ¬éããããšããããšãæå³ããŸããµãŒãã¹ããã ããä»ã®ãšãããå°ãç«ã¡æ¢ãŸã£ãŠããã·ã³ã©ãŒãã³ã°ãšDockerã«é¢ãããã¹ãŠã®è³æãèã«çœ®ããŠãããŒã¿ãµã€ãšã³ã¹ã®DevOpsãšããªããããå¿ èŠãªã®ããèããå¿ èŠããããŸãã
DevOpsãšã¯äœã§ããïŒ
ãŠã£ãããã£ã¢ã®ãœãããŠã§ã¢éçºè ã®ç®æšã¯ã䜿ãããããä¿¡é Œæ§ãã¹ã±ãŒã©ããªãã£ããããã¯ãŒã¯ããŒãããã¡ã€ã¢ãŠã©ãŒã«ãã€ã³ãã©ã¹ãã©ã¯ãã£ãªã©ãå¿ èŠãªãã¹ãŠã®æ©èœãåããã³ãŒããã¿ã€ã ãªãŒã«é ä¿¡ããããšã§ããå€ãã®å Žåãéçšäžã®åé¡ãæ®ããŸããæçµç®æšãšKPIã®å¯èœæ§ãç°ãªãããããããã®ããŒã ã¯éåžžãåã屿 ¹ã®äžã§ããŸããã£ãŠããããšã¯ãããŸããããããã£ãŠãDevOpsã¹ãã·ã£ãªã¹ãã¯é£çµ¡ä¿ãšããŠæ©èœãããããã®ããŒã ãååããã®ãæ¯æŽããããäž¡æ¹ã®åœäºè ã®è²¬ä»»ãåŒãåãããããããšãã§ããŸããããã«ãããæçµçã«1ã€ã®ããŒã ãã§ããéçºãæåããæåŸãŸã§ãªãŒãã§ããŸããçµå±ã®ãšãããã³ãŒãã¯æ£åžžã«æ©èœãããããã³ã³ãã¥ãŒã¿ãã¯ã©ã€ã¢ã³ãã«æž¡ãããšã¯ã§ããŸããã
DevOpsã¯ããœãããŠã§ã¢éçºãšæ å ±æè¡ãµãŒãã¹ãçµã¿åããããã©ã¯ãã£ã¹ã®ã³ã¬ã¯ã·ã§ã³ã§ãããã®ç®çã¯ãã·ã¹ãã éçºã®ã©ã€ããµã€ã¯ã«ãççž®ããé«å質ã®ãœãããŠã§ã¢ãç¶ç¶çã«æäŸããããšã§ãã
ããããJupyterããŒãããã¯ã§ç§ã¯å¹žãã§ã!!!ããŒã¿ãµã€ãšã³ãã£ã¹ãã«ãåæ§ã®è©±ããããŸããããããJupyter Notebookãå®è¡ããŠããã©ããããããæã«åã£ãŠãã¯ã©ã€ã¢ã³ãã䜿çšã§ããããã«ããããšã¯ã§ããªãããã§ããã¢ãã«ã䜿çšããŠããã€ã§ãã©ãã§ã倿°ã®ãŠãŒã¶ãŒã«ãµãŒãã¹ãæäŸããæå°éã®ããŠã³ã¿ã€ã ïŒäœ¿ãããããä¿¡é Œæ§ãã¹ã±ãŒã©ããªãã£ïŒã§ç«ã¡äžããããšãã§ããããã«ããæ¹æ³ãå¿ èŠã§ãã
ãã®ãããäŒæ¥ã¯ãåã«æŠå¿µã蚌æããŠã¢ãã«ã®ç²ŸåºŠã®åäžã«çŠç¹ãåãããã®ã§ã¯ãªããæ¬çªç°å¢ã§ãã·ã³åŠç¿ã¢ãã«ãå±éããã³å±éããäŒæ¥ã«ããžãã¹äŸ¡å€ãæäŸã§ããDevOpsã¹ãã«ãæã€ããŒã¿ã¢ããªã¹ããæ¢ããŠããŸãããã®ãããªäººã ã¯ãŠãã³ãŒã³ãšåŒã°ããŸãã
æ©æ¢°åŠç¿ã¢ãã«ãå±éããæ¹æ³ã¯ãããããããŸãããDockerã¯ãã³ãŒãã®å ç¢æ§ãšã«ãã»ã«åãç¶æããªãããå¿ èŠãªæè»æ§ãæäŸãã匷åãªããŒã«ã§ãããã¡ãããDockerãã€ã³ã¹ããŒã«ããã¿ãŒããã«ãéããŠå®è¡ããããã«ã客æ§ã«äŸé Œããããšã¯ãããŸããããã ãããã®ã³ã³ããåãã§ãŒãºã¯ãã¢ãã«ãã¯ã©ãŠããã©ãããã©ãŒã ãŸãã¯ãªã³ãã¬ãã¹ãµãŒããŒã«ãããã€ããå¿ èŠãããå®éã®ãããžã§ã¯ãã§äœæ¥ãéå§ãããšãã«ãæçµçã«ã¯åºç€ã«ãªããŸãã
èšç·Žãããã¢ãã«ã®ä¿ç®¡
倧åŠã«æ»ã£ãŠãäžã®åçã«ç€ºãããã«ãããŒã¿ãµã€ãšã³ã¹ãããžã§ã¯ãã¯6ã€ã®æ®µéã§æ§æãããŠããããšãåŠã³ãŸãããã¢ãã«ãèªååããŠæ¬çªç°å¢ã«ãããã€ããããšãæçµçãªç®æšã§ããå Žåãã¢ãã«ããããã€ãã§ãŒãºã«ãå°å ¥ãããã«ã¯ã©ãããã°ããã§ããããã
ããŒã¿ãµã€ãšã³ã¹ãããžã§ã¯ãã®6ã€ã®æ®µé
ããªããèããããšãã§ããæãç°¡åãªæ¹æ³ã¯ãããŒãããã¯ãããã¹ãŠãã³ããŒããããã.pyãã¡ã€ã«ã«è²Œãä»ããŠå®è¡ããããšã§ãããã ããäºæž¬ãè¡ãå¿ èŠããããã³ã«ããã®ãã¡ã€ã«ãå®è¡ããŠãåãããŒã¿ã§ã¢ãã«ãå床ãã¬ãŒãã³ã°ããŸãããã®ã¢ãããŒããããã¬ãŒãã³ã°ããŒã¿ã»ãããå°ããåçŽãªã¢ãã«ã«äœããã®åœ¢ã§é©çšã§ããå Žåããã¬ãŒãã³ã°ããŒã¿ãå€ãè€éãªã¢ãã«ã«ã¯å¹æçã§ã¯ãããŸããïŒANNãŸãã¯CNNã¢ãã«ã®ãã¬ãŒãã³ã°ã«ãããââæéãèããŠãã ããïŒãã€ãŸãããŠãŒã¶ãŒãã¢ãã«ã®äºæž¬ãªã¯ãšã¹ããéä¿¡ãããšãã¢ãã«ã®ãã¬ãŒãã³ã°æ®µéãå®äºããã®ã«æéãããããããçµæãåŸããããŸã§ã«æ°åããæ°æéåŸ ã€å¿ èŠããããŸãã
ãã¬ãŒãã³ã°åŸããã«ã¢ãã«ãä¿åããã«ã¯ã©ãããã°ããã§ããïŒ
ã»ãšãã©ã®å ŽåãPythonã®æ©æ¢°åŠç¿ã¢ãã«ã¯ãã³ãŒãã®å®è¡äžã«Pythonãªããžã§ã¯ããšããŠã¡ã¢ãªã«ä¿åãããããã°ã©ã ã®çµäºåŸã«åé€ãããŸããã¢ãã«ããã¬ãŒãã³ã°ãããçŽåŸã«ãã®ãªããžã§ã¯ããããŒããã£ã¹ã¯ã«ä¿åã§ããã°ã次ã«äºæž¬ãè¡ãå¿ èŠããããšãã«ã宿ããã¢ãã«ãã¡ã¢ãªã«ããŒãããã ãã§ãåæåãšãã¬ãŒãã³ã°ã®æ®µéãçµãããšã¯ãããŸãããã³ã³ãã¥ãŒã¿ãµã€ãšã³ã¹ã§ã¯ããªããžã§ã¯ããã¹ãã¬ãŒãžçšã®ãã€ãã¹ããªãŒã ã«å€æããããã»ã¹ã¯ãã·ãªã¢ã«åãšåŒã°ããŸãã Pythonã§ã¯ãããã¯ãpickleãšåŒã°ããããã±ãŒãžã䜿çšããŠç°¡åã«å®è¡ã§ããŸããpickleã¯ããã®ãŸãŸã§ãã€ãã£ãPythonããµããŒãããŠããŸãã Pythonéçºè ã¯ãpickleã䜿çšããŠãªããžã§ã¯ããã·ãªã¢ã«åããããã»ã¹ããpicklingããšãåŒã³ãŸãã..ã
JupyterããŒãããã¯ã§ã¯ãã¢ãã«ãªããžã§ã¯ãïŒç§ã®å Žåã¯ãknnãïŒãã³ãŒããšåããã£ã¬ã¯ããªã«ããpklãã¡ã€ã«ã«ç°¡åã«ä¿åã§ããŸãã
import pickle
with open('./model.pkl', 'wb') as model_pkl:
pickle.dump(knn, model_pkl)
ã¢ãã«ãçŸåšã®ãã£ã¬ã¯ããªã«ä¿åãã
ç§ã®ããŒãããã¯ãããããååŸããããšããå§ãããŸããããã«ãããããã«åæ§ã®çµæãåŸãããŸãããŸãã¯ãç¬èªã®ã¢ãã«ã䜿çšããããšãã§ããŸãããå¿ èŠãªãã¹ãŠã®ããã±ãŒãžãšæ£ããã¢ãã«å ¥åãããããšã確èªããŠãã ããã
æåã®ã¹ããããå®äºãããšããã¬ãŒãã³ã°æžã¿ã¢ãã«ãä¿åãããŸãããããã«ãã¢ãã«ãäºæž¬ã«åå©çšããŸãããããã«ã€ããŠã¯èšäºã®åŸåã§è©³ãã説æããŸãã