ä»å¹Ž3æã®èªå·±éé¢ãžã®ç§»è¡ã«äŒŽããå€ãã®äŒæ¥ãšåæ§ã«ããã¹ãŠã®é£æåã€ãã³ãããªã³ã©ã€ã³ã«ç§»è¡ããŸãããããŠãããªãã¯ãµã«ãšã®ãŠã§ãããŒã«ã€ããŠã®ãã®çŽ æŽãããåçãèŠããŠããŸããéå»6ãæéã§ãç§ã®ããŒã ãæ åœããããŒã¿ã»ã³ã¿ãŒã®ãããã¯ã«ã€ããŠã®ã¿ãçŽ25ã®2æéã®èšé²ããããŠã§ãããŒãåèš50æéã®ãããªãèç©ããŸãããå®å šã«æé·ããŠããåé¡ã¯ãç¹å®ã®è³ªåã«å¯Ÿããåçãæ¢ãããã«ã©ã®ãããªã§ã©ã®ããã«ç解ãããã§ããã«ã¿ãã°ãã¿ã°ãç°¡åãªèª¬æã¯è¯ãã§ãããŸããç§ãã¡ã¯ã€ãã«ãããã¯ã«é¢ãã4ã€ã®2æéã®ãããªãããããšãçºèŠããŸããããããŠããããäœã§ããïŒå·»ãæ»ããèŠãŸããïŒã©ããããããéãæ¹æ³ã§å¯èœã§ããïŒãããŠãããªãããã¡ãã·ã§ããã«ãªæ¹æ³ã§è¡åããAIãå°ç¡ãã«ããããšãããïŒ
ãã£ãã¡ãªäººã®ããã®ãã¿ãã¬ïŒå®å šãªå¥è·¡ã®ã·ã¹ãã ãèŠã€ããããšããèã®äžã«çµã¿ç«ãŠãããšãã§ããªãã£ãã®ã§ããã®èšäºã«ã¯æå³ããããŸãããããããæ°æ¥ïŒãšããããã¯å€ïŒã®èª¿æ»ã®çµæãç§ã¯å®çšçãªMVPãæã«å ¥ããŸãããããã«ã€ããŠã話ãããããšæããŸãããã®èšäºã®ç®çã¯ããã®åé¡ãžã®é¢å¿ã®ã¬ãã«ã調ã¹ãç¥èã®ãã人ã ããã¢ããã€ã¹ãåŸãŠãããããåãåé¡ãæ±ããŠããä»ã®èª°ããèŠã€ããããšã§ãã
ç§ããããããš
äžèŠããã¹ãŠãã·ã³ãã«ã«èŠããŸããããããªãæ®ããããããã¥ãŒã©ã«ãããã¯ãŒã¯ã§å®è¡ããããã¹ããååŸããŠãããé¢å¿ã®ãããããã¯ã説æããããã¹ãå ã®ãã©ã°ã¡ã³ããæ¢ããŸããã«ã¿ãã°å ã®ãã¹ãŠã®ãããªãäžåºŠã«æ€çŽ¢ãããšããã«äŸ¿å©ã§ããå®éãããã¹ãã®ãã©ã³ã¹ã¯ãªããããããªãšäžç·ã«ã¢ããããŒãããããšã¯é·ãéçºæãããŠããŸãããYoutubeãã»ãšãã©ã®æè²ãã©ãããã©ãŒã ã¯ãããè¡ãããšãã§ããŸãããããã§äººã ããããã®ããã¹ããç·šéããããšã¯æããã§ããç®ã§ããã¹ãããã°ããã¹ãã£ã³ããŠãç®çã®è³ªåã«å¯Ÿããçãããããã©ãããç解ã§ããŸããããããã䟿å©ãªæ©èœãããããã¹ãã®èå³ã®ããå ŽæãçªããŠãè¬åž«ãèšã£ãã瀺ãããããããšãèãããšãã§ããŠã害ã¯ãããŸãããããã¹ãå ã®åèªã®ããŒã¯ã¢ãããæéå ã«ããã°ããããé£ãããããŸãããããŠãç§ã¯éçºã®å¯èœãªæ¹åæ§ã倢èŠãŠããŸãããæåŸã«è©±ããŸããããããã§ã¯ããã§ãŒã³ãã§ããã ãå¹ççã«å®è£ ããŠã¿ãŸãããã
ãããªãã¡ã€ã«->ããã¹ããã©ã°ã¡ã³ã->ãã¡ãžãŒããã¹ãæ€çŽ¢ã
æåã¯ããã¹ãŠããšãŠãåçŽã§ããã®ã±ãŒã¹ã¯ãã§ã«4幎éãã¹ãŠã®AIäŒè°ã§è°è«ãããŠããã®ã§ããã®ãããªã·ã¹ãã ã¯æ¢è£œã§ããã¯ãã ãšæããŸãããèšäºãæ€çŽ¢ããŠèªãã æ°æéã¯ãããã§ã¯ãªãããšã瀺ããŸããããããªã¯äž»ã«ãã³ãŒã«ã»ã³ã¿ãŒã®ãœãªã¥ãŒã·ã§ã³ã®äžéšãšããŠãé¡ãè»ããã®ä»ã®èŠèŠãªããžã§ã¯ãïŒãã¹ã¯/ãã«ã¡ããïŒããªãŒãã£ãªïŒæ²ããã©ãã¯ãã¹ããŒã«ãŒã®ããŒã³/ã€ã³ãããŒã·ã§ã³ïŒãæ¢ãããã«äœ¿çšãããŸããDeepgramã·ã¹ãã ã«ã€ããŠã®ãã®èšåã ããèŠã€ããããšãã§ããŸãããããããæ®å¿µãªããã圌女ã¯ãã·ã¢èªããµããŒãããŠããŸããããŸããMicrosoftã¯Streamsã§éåžžã«ãã䌌ãæ©èœãåããŠããŸããããã·ã¢èªã®ãµããŒãã«ã€ããŠã®èšåã¯ã©ãã«ãèŠã€ãããŸããã§ãããæããã«ãããã«ããããŸããã
ããŠãåçºæããŸããããç§ã¯ããã®ããã°ã©ããŒã§ã¯ãããŸãããïŒã¡ãªã¿ã«ãã³ãŒãã«å¯Ÿãã建èšçãªæ¹å€ã¯åãã§åãå ¥ããŸãïŒãæã ãèªåã®ããã«ãäœããæžããŠããŸããé³å£°ãããã¹ãã«å€æã§ãããã¥ãŒã©ã«ãããã¯ãŒã¯ã¯ãïŒãµãã©ã€ãºãµãã©ã€ãºïŒãé³å£°ããããã¹ããžãšåŒã°ããŸããå ¬éã®ã¹ããŒãããããã¹ããžã®ãµãŒãã¹ãèŠã€ããããšãã§ããã°ãããã䜿çšããŠãã¹ãŠã®ãŠã§ãããŒã§ã¹ããŒãããããžã¿ã«åãããããã¹ãã§ãã¡ãžãŒæ€çŽ¢ãè¡ãããšãã§ããŸããããã¯ç°¡åãªäœæ¥ã§ããæåã¯ãã¯ã©ãŠãã«ç»ãããšã¯æã£ãŠããªãã£ãã®ã§ããã¹ãŠãããŒã«ã«ã§åéãããã£ãã®ã§ãããããã¬ã«é¢ãããã®èšäºãèªãã åŸãé³å£°èªèã¯ã¯ã©ãŠãã§è¡ãæ¹ãæ¬åœã«è¯ããšå€æããŸããã
é³å£°ããããã¹ããžã®ã¯ã©ãŠããµãŒãã¹ãæ¢ããŠããŸã
ã¹ããŒãããããã¹ããžã®å€æãå¯èœãªãµãŒãã¹ã®æ€çŽ¢ã§ã¯ããã·ã¢ã§éçºããããã®ãå«ãããã®ãããªã·ã¹ãã ãããããããããšãããããŸããããã®äžã«ã¯ãGoogleãAmazonãMSAzureãªã©ã®ã°ããŒãã«ã¯ã©ãŠããããã€ããŒããããŸãããã·ã¢èªãå«ãããã€ãã®ãµãŒãã¹ã®èª¬æã¯ããã«ãããŸããéåžžãæ€çŽ¢ãšã³ãžã³ã®çµæã®æåã®20è¡ã¯äžæã«ãªããŸããããããå¥ã®åé¡ããããŸããå°æ¥ããã®ã·ã¹ãã ãæ¬çªç°å¢ã«ç§»è¡ããããšèããŠããŸããããã«ã¯ã³ã¹ããããããŸããç§ã¯ãäž»èŠãªã¯ã©ãŠããšã°ããŒãã«ã«å¥çŽãçµãã§ããCiscoã§åããŠããŸããããã§ããªã¹ãå šäœãããç§ã¯ä»ã®ãšãããããã ããèæ ®ããããšã«ããŸããã
ã ããç§ã®ãªã¹ãã¯GoogleãAmazonãAzureãIBM WatsonïŒã¿ã€ãã«ãžã®ãªã³ã¯ã¯äžã®è¡šãšåãã§ãïŒããã¹ãŠã®ãµãŒãã¹ã«ã¯ããããã䜿çšã§ããAPIããããŸããæ®ãã®å¯èœæ§ãåæããåŸãç§ã¯å°ããªè¡šããŸãšããŸããã
IBM Watsonã¯ãã®æ®µéã§ã¬ãŒã¹ãå»ããŸãããç§ããã·ã¢èªã§é²é³ãããã®ã¯ãã¹ãŠããŠã§ãããŒããã®çãæç²ã§æ®ãã®ãããã€ããŒããã¹ãããããšã«ããŸããã AWSãšAzureã§ã¢ã«ãŠã³ããèšå®ããŸãããå°æ¥çã«ã¯ãMicrosoftã¯ã¢ã«ãŠã³ãã®èšå®ã«é¢ããŠã²ã³ãå ¥ããã®ãé£ããããšãå€æãããšèšããŸããç§ã¯ã¢ã ã¹ãã«ãã ã®ã©ããã§ã€ã³ã¿ãŒãããã«ãçéžãããäŒæ¥ãããã¯ãŒã¯ã§åããŠããŸãããç»é²ããã»ã¹äžã«ãèªåã®äœæããã·ã¢ã§ãããã©ããã2åå°ããããåŸãã·ã¹ãã ã¯ã¢ã«ãŠã³ãã管çäžãããã¯ãããŠãããšããã¡ãã»ãŒãžã衚瀺ããŸããã ..ããã®èšäºãæžããŠãã5æ¥åŸãç¶æ³ã¯å€ãã£ãŠããªãã®ã§ãAzureããŸã ãã¹ãã§ããŠããŸãããããã¯æ®å¿µã§ãïŒç§ã¯ç解ããŠããŸã-ã»ãã¥ãªãã£ã§ãããããã§ã¯ãŸã ãµãŒãã¹ãè©Šãããšãã§ããŸãããåŸã§ç¶æ³ã解決ãããšãã«ããããå®è¡ããããšããŸãã
ãããšã¯å¥ã«ãYandex.Cloudã§ãã®ãããªæ©èœããã¹ãããããšæããŸããçè«çã«ã¯ããã·ã¢èªã®é³å£°ã®èªèãæé©ã§ããã¯ãã§ãããã ããæ®å¿µãªããããµãŒãã¹ã®ãã¹ãã¢ã¯ã»ã¹ããŒãžã«ã¯ãããã¹ãããèšããæ©èœãããªãããã¡ã€ã«ã®ããŠã³ããŒãã¯æäŸãããŠããŸãããããã§ã2äœã¯Azureãšäžç·ã«å»¶æããŸãã
ã ãããã°ãŒã°ã«ãšã¢ããŸã³ããããŸããããã«ããããã¹ãããŸãããïŒã³ãŒããäœæããåã«ããã¹ãŠãæåã§ç¢ºèªããã³æ¯èŒã§ããŸããäž¡æ¹ã®ãããã€ããŒã«ã¯ãAPIã«å ããŠã管çã€ã³ã¿ãŒãã§ã€ã¹ããããŸãããã¹ãã®ããã«ãç§ã¯æåã«ãå¯èœã§ããã°ãæå°éã®å°éçšèªã䜿çšããŠãäžè¬çãªæ§è³ªã®10åã®ãã©ã°ã¡ã³ããæºåããŸããããããããã®åŸãGoogleã¯ãã¹ãã¢ãŒãã§æ倧1åã®ãã©ã°ã¡ã³ãããµããŒãããŠããããšãå€æããããããã®57ç§ã®ãã©ã°ã¡ã³ãã䜿çšããŠãµãŒãã¹ãæ¯èŒããŸããã
äœæ¥ã®çµæã«åºã¥ããŠãäž¡æ¹ã®ãµãŒãã¹ãèªèãããããã¹ããçºè¡ãã1åééã§äœæ¥ã®çµæãæ¯èŒã§ããŸãã
ççŽã«èšã£ãŠãçµæã¯æåŸ ã©ããã§ã¯ãããŸããããã¢ãã«ãã«ã¹ã¿ãã€ãºã®ããã®ããŸããŸãªãªãã·ã§ã³ãæäŸããããšã¯äœã®æå³ããããŸãããã芧ã®ãšããããç®±ããåºããŠããã«ã䜿çšã§ããGoogleãšã³ãžã³ã¯ãã»ãšãã©ã®ããã¹ããããæ確ã«èªèãããã¹ãŠã§ã¯ãããŸããããäžéšã®è£œåã®ååã確èªã§ããŸãããããã¯ã圌ãã®ã¢ãã«ãå€èšèªããã¹ããå¯èœã«ããããšã瀺åããŠããŸããã¢ããŸã³ïŒåŸã§ããã確èªãããïŒã«ã¯ãã®ãããªæ©äŒããããŸãã-圌ãã¯ãã·ã¢èªãèšããŸãããããã¯ç§ãã¡ãæãããšãæå³ããŸãïŒããã³ãã€ããŒãã ããšããªãªãïŒ
ããããAmazonãæäŸããã¿ã°ä»ãJSONãååŸããæ©èœã¯ãç§ã«ã¯éåžžã«èå³æ·±ãããã«æããŸãããçµå±ã®ãšãããããã«ãããå°æ¥ãç®çã®ãã©ã°ã¡ã³ããèŠã€ãã£ããã¡ã€ã«ã®éšåãžã®çŽæ¥é·ç§»ãå®è£ ã§ããããã«ãªããŸãããã¹ãŠã®é³å£°èªèãã¥ãŒã©ã«ãããã¯ãŒã¯ããã®ããã«æ©èœãããããããããGoogleã«ããã®ãããªæ©èœããããŸãããããã¥ã¡ã³ãããã£ãšæ€çŽ¢ããŠããã®æ©èœãèŠã€ããããšãã§ããŸããã§ããã
ãã®JSONãèŠããšã翻蚳ãããããã¹ãïŒãã©ã³ã¹ã¯ãªããïŒãåèªã®é åïŒã¢ã€ãã ïŒãããã³ã»ã°ã¡ã³ãã®ã»ããïŒã»ã°ã¡ã³ãïŒã®3ã€ã®ã»ã¯ã·ã§ã³ã§æ§æãããŠããããšãããããŸããåèªãšã»ã°ã¡ã³ãã®é åã«ã€ããŠã¯ãåèŠçŽ ã«ã€ããŠããã®éå§æå»ãšçµäºæå»ãããã³ãããæ£ããèªèããããšããç¥çµãããã¯ãŒã¯ã®ä¿¡é Œæ§ã瀺ãããŸãã
ããŒã¿ã»ã³ã¿ãŒãç解ããããã®ãã¥ãŒã©ã«ãããã¯ãŒã¯ã®æå°
ããã§ããã®æ®µéã®çµããã«ããããªãå®éšã®ããã«Amazon TranscribeãéžæããåŠç¿ã¢ãã«ãèšå®ããããšã«ããŸããããŸããå®å®ããèªèãåŸãããªãå Žåã¯ãGoogleã«ãçžè«ãã ããããããªããã¹ãã¯ã10åã®ãã©ã°ã¡ã³ãã§å®è¡ãããŸããã
AWS Transcribeã«ã¯ããã¥ãŒã©ã«ãããã¯ãŒã¯ãèªèãããã®ã調æŽããããã®2ã€ã®ãªãã·ã§ã³ãšãããã¹ããåŸåŠçããããã®2ã€ã®æ©èœããããŸãã
- Custom Vocabularies â «» , , «» , . : «, , » Word 97- . , , .. .
- Custom Language Models â «» 10 . , . , , , .
- , , -. , â , .. -, .
ããã§ãç§ã¯ããã¹ãã®ããã«ç§èªèº«ã®èšèãäœãããšã«æ±ºããŸãããæããã«ãããããã¯ãŒã¯ããµãŒããŒããããã¡ã€ã«ãããŒã¿ã»ã³ã¿ãŒãããã€ã¹ãã³ã³ãããŒã©ãŒãã€ã³ãã©ã¹ãã©ã¯ãã£ããªã©ã®åèªãå«ãŸããŸãã 2ã3åã®ãã¹ãã®åŸãç§ã®èªåœã¯60èªã«å¢ããŸããããã®èŸæžã¯ãéåžžã®ããã¹ããã¡ã€ã«ã§ã1è¡ã«1åèªããã¹ãŠå€§æåã§äœæããå¿ èŠããããŸããåèªã®çºé³ãæå®ããæ©èœãåãããããè€éãªãªãã·ã§ã³ïŒããã§èª¬æïŒããããŸãããæåã®æ®µéã§ã¯ãåçŽãªãªã¹ãã䜿çšããããšã«ããŸããã
èŸæžã䜿çšããåã«ããããäœæããå¿ èŠããããŸããAmazon Transcribeã®[ Custom vocabulary ]ã¿ãã§ã[ Create Vocabulary ]ãã¯ãªãã¯ãããã¡ã€ã«ã®ããã¹ããããŒããããã·ã¢èªãæå®ããæ®ãã®è³ªåã«çãããšãèŸæžã®äœæããã»ã¹ãå§ãŸããŸãã圌ãåºããåŠçãæºåå®äºã«ãªããŸã-èŸæžã䜿çšã§ããŸãã
åé¡ã¯æ®ã£ãŠããŸã-ãè±èªãã®çšèªãã©ã®ããã«èªèãããïŒèŸæžã¯1ã€ã®èšèªãããµããŒãããŠããªãããšãæãåºãããŠãã ãããæåã¯ãè±èªã®çšèªã䜿ã£ãŠå¥ã®èŸæžãäœæããåãããã¹ããå®è¡ããããšãèããŸãããCiscoãVLANãUCSãªã©ã®çšèªãæ€åºãããå Žåçc確çç100ïŒ -æå®ãããæéãã©ã°ã¡ã³ãã«å¯ŸããŠããããååŸããŸããããããããã«æ©èœããªãã£ããšèšããŸããè±èªã®ã¢ãã©ã€ã¶ãŒã¯ãããã¹ãå ã®çšèªã®åå以äžãèªèããŸããã§ãããèããŠã¿ããšãããããããã¹ãŠã®çšèªãããã·ã¢ã®ã¢ã¯ã»ã³ããã§çºé³ããŠããã®ã§ãè±ç±³äººã§ããåããŠç§ãã¡ãç解ããŠããªãã®ã§ãããã¯è«ççã§ãããšå€æããŸãããããã¯ããèãããšããã«æžãããŠããããšããååã«åŸã£ãŠããããã®çšèªããã·ã¢ã®èŸæžã«åçŽã«è¿œå ãããšããã¢ã€ãã¢ãä¿ããŸãããCiscoãusiesãeisiaiãvilanãviikslan-çµå±ã®ãšãããç§ãã¡ã¯ãäºãã«éä¿¡ãããšãã«æ£çŽã«èšã£ãŠããŸããããã«ãããèŸæžãæ°åèªå¢ããŸããããå°æ¥çã«ã¯ãèªèå質ã1æ¡åäžããŸããã
ãããèããæ¥ãããšããããšããã«ããããã«ãæåã®èŸæžã¯ãã§ã«äœæãããŠããã®ã§ãå¥ã®èŸæžãäœæããããã«ãã¹ãŠã®ç¥èªãè¿œå ããŠãäœãèµ·ããããæ¯èŒããããšã«ããŸããã
èŸæžã§èªèãéå§ããããšã§ãåæ§ã«ç°¡åã§ãè°äºé²ã®ãµãŒãã¹äžã®è»¢åãžã§ããã¿ããéžæãããžã§ããäœæãããã·ã¢èªãæå®ããèŸæžæã ã®å¿ èŠæ§ãæå®ããããšãå¿ããªãã§ãã ããããã1ã€ã®äŸ¿å©ãªã¢ã¯ã·ã§ã³-ãã¥ãŒã©ã«ãããã¯ãŒã¯ã«ããã€ãã®ä»£æ¿æ€çŽ¢çµæãæäŸããããã«äŸé Œã§ããŸãã代æ¿çµæ-ã¯ãé ç®ã3ã€ã®ä»£æ¿ãªãã·ã§ã³ãèšå®ããŸããåŸã§ãã¡ãžãŒããã¹ãæ€çŽ¢ãè¡ããšãã«ãããã¯äŸ¿å©ã§ãã
10åã®ããã¹ãã®æŸéã«ã¯4ã5åããããŸããæéãç¡é§ã«ããªãããã«ãçµæãæ¯èŒããããã»ã¹ã容æã«ããå°ããªããŒã«ãäœæããããšã«ããŸããã JSONãã¡ã€ã«ã®æçµããã¹ãããã©ãŠã¶ã«è¡šç€ºãããšåæã«ããã¥ãŒã©ã«ãããã¯ãŒã¯ã«ããåã ã®åèªã®æ€åºã®ãä¿¡é Œæ§ãã匷調ããŸãïŒåãä¿¡é Œæ§ãã©ã¡ãŒã¿ïŒãçµæã®ããã¹ãã«ã¯ãããã©ã«ãã®ç¿»èš³ãçšèªã®ãªãèŸæžãçšèªã®ããèŸæžã®3ã€ã®ãªãã·ã§ã³ããããŸãã 3ã€ã®ããã¹ããã¹ãŠã3ã€ã®åã«åæã«è¡šç€ºããŸããä¿¡é Œæ§ã®é«ãåèªãç·ã§95ïŒ ä»¥äžãé»è²ã§95ïŒ ãã70ïŒ ãèµ€ã§70ïŒ æªæºã§åŒ·èª¿è¡šç€ºããŸããçµæã®HTMLããŒãžã®æ¥ãã§ã³ã³ãã€ã«ãããã³ãŒãã¯ä»¥äžã®ãšããã§ããJSONãã¡ã€ã«ã¯ãã¡ã€ã«ãšåããã£ã¬ã¯ããªã«ããå¿ èŠããããŸãããã¡ã€ã«åã¯ãFILENAME1å€æ°ãªã©ã§æå®ãããŸãã
çµæã衚瀺ããããã®HTMLããŒãžã³ãŒã
<!DOCTYPE html>
<html lang="en">
<head> <meta charset="UTF-8"> <title>Title</title> </head>
<body onload="initText()">
<hr> <table> <tr valign="top">
<td width="400"> <h2 >- </h2><div id="text-area-1"></div></td>
<td width="400"> <h2 > 1: / </h2><div id="text-area-2"></div></td>
<td width="400"> <h2 > 2: / </h2><div id="text-area-3"></div></td>
</tr> </table> <hr>
<style>
.known { background-image: linear-gradient(90deg, #f1fff4, #c4ffdb, #f1fff4); }
.unknown { background-image: linear-gradient(90deg, #ffffff, #ffe5f1, #ffffff); }
.badknown { background-image: linear-gradient(90deg, #feffeb, #ffffc2, #feffeb); }
</style>
<script>
// File names
const FILENAME1 = "1-My_CiscoClub_transcription_10min-1-default.json";
const FILENAME2 = '2-My_CiscoClub_transcription_10min-2-Russian_only.json';
const FILENAME3 = '3-My_CiscoClub_transcription_10min-v3_Russian_terminilogy.json';
// Read file from disk and call callback if success
function readTextFile(file, textBlockName, callback) {
let rawFile = new XMLHttpRequest();
rawFile.overrideMimeType("application/json");
rawFile.open("GET", file, true);
rawFile.onreadystatechange = function() {
if (rawFile.readyState === 4 && rawFile.status == "200") {
callback(textBlockName, rawFile.responseText);
}
};
rawFile.send(null);
}
// Insert text to text block and color words confidence level
function updateTextBlock(textBlockName, text) {
var data = JSON.parse(text);
let translatedTextList = data['results']['items'];
const listLen = translatedTextList.length;
const textBlock = document.getElementById(textBlockName);
for (let i=0; i<listLen; i++) {
let addWord = translatedTextList[i]['alternatives'][0];
// load word probability and setup color depends on it
let wordProbability = parseFloat(addWord['confidence']);
let wordClass = 'unknown';
// setup the color
if (wordProbability > 0.95) {
wordClass = 'known';
} else if (wordProbability > 0.7) {
wordClass = 'badknown';
}
// insert colored word to the end of block
let insText = '<span class="' + wordClass+ '">' + addWord['content'] + ' </span>';
textBlock.insertAdjacentHTML('beforeEnd', insText)
}
}
function initText() {
// read three files each to it's area
readTextFile(FILENAME1, "text-area-1", function(textBlockName, text){
updateTextBlock(textBlockName, text);
});
readTextFile(FILENAME2, "text-area-2", function(textBlockName, text) {
updateTextBlock(textBlockName, text);
});
readTextFile(FILENAME3, "text-area-3", function(textBlockName, text) {
updateTextBlock(textBlockName, text);
});
}
</script>
</body></html>
3ã€ã®ã¿ã¹ã¯ãã¹ãŠã®asrOutput.jsonãã¡ã€ã«ãããŠã³ããŒãããHTMLã¹ã¯ãªããã§èšè¿°ãããååã«å€æŽãããšã次ã®ããã«ãªããŸãã
ãã·ã¢èªã®çšèªãè¿œå ããããšã§ããã¥ãŒã©ã«ãããã¯ãŒã¯ãç¹å®ã®çšèªïŒããµãŒãã¹ãããã¡ã€ã«ããªã©ïŒãããæ£ç¢ºã«èªèã§ããããã«ãªã£ãããšãã¯ã£ãããšããããŸãããããŠã2çªç®ã®ã¹ãããã§ãã·ã¢èªã®æåèµ·ãããè¿œå ãããšãCSKAãciscoã«å€ãããŸãããããã¹ãã¯ãŸã ããªããæ±ããã§ãããç§ã®ã³ã³ããã¹ãæ€çŽ¢ã¿ã¹ã¯ã«ã¯ãã§ã«é©ããŠããã¯ãã§ããæ°ãããŠã§ãããŒãè¿œå ãããŠèªãŸããã«ã€ããŠãèªåœã¯åŸã ã«æ¡å€§ããŸããããã¯ãå¿ããŠã¯ãªããªããã®ãããªã·ã¹ãã ãç¶æããããã»ã¹ã§ãã
èªèãããããã¹ãã®ãã¡ãžãŒæ€çŽ¢
ãã¡ãžãŒæ€çŽ¢ã®åé¡ã解決ããã«ã¯ããããã12ã®ã¢ãããŒãããããŸããã»ãšãã©ã®å Žåããããã¯ãããšãã°Levenshteinè·é¢ãªã©ãæ°åŠçãªã¢ã«ãŽãªãºã ã®å°ããªã»ããã«åºã¥ããŠããŸããããã«ã€ããŠã®è¯ãèšäºããã1ã€ãããŠãã1ã€ãããããç§ã¯æã¡äžããåäœãªã©ãæºåãã§ããŠãããã®ãèŠã€ãããã£ãã®ã§ãã
ããŒã«ã«ææžæ€çŽ¢ã®ããã®æ¢è£œã®ãœãªã¥ãŒã·ã§ã³ãããå°ãç 究ããåŸãç§ã¯æ¯èŒçå€ããããžã§ã¯ããèŠã€ãSPHINXãããã«ããã¹ãæ€çŽ¢ã®å¯èœæ§ã¯ãããã¯ãããããã¯ããã«ã€ããŠæžãããŠãããPostgreSQLã§ã¯ããHEREããããããã·ã¢èªãå«ãã»ãšãã©ã®è³æã¯ãElasticsearchã«ã€ããŠèŠã€ãããŸãããã®ãããªè¯ãã¹ã¿ãŒãã¢ãããšã»ããã¢ããã¬ã€ããèªãã åŸãã®æçš¿ãŸãã¯ãã®ã¬ãã¹ã³ãããã«å¥ã®ãã®ãããã³Pythonã®ããã¥ã¡ã³ããšAPIã¬ã€ãããããç§ã¯ããã䜿çšããããšã«ããŸããã
ãã¹ãŠã®ããŒã«ã«å®éšã§ãç§ã¯é·ãéDockerã䜿çšããŠããŸããããäœããã®çç±ã§ãŸã Dockerãç解ããŠããªããã¹ãŠã®äººã«ããããè¡ãããšã匷ããå§ãããŸããå®éãç§ã¯ããŒã«ã«ãªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã§éçºç°å¢ããã©ãŠã¶ãããã¥ãŒã¢ã以å€ã®ãã®ãå®è¡ããªãããã«ããŠããŸããäºææ§ã®åé¡ãªã©ããªãããšã¯å¥ãšããŠãããã«ãããæ°è£œåããã°ããè©ŠããŠãããŸãæ©èœãããã©ããã確èªã§ããŸãã
Elasticsearchã䜿çšããŠã³ã³ãããããŠã³ããŒããã次ã®2ã€ã®ã³ãã³ãã§å®è¡ããŸãã
$ docker pull elasticsearch:7.9.1
$ docker run -d --name elasticsearch -p 9200:9200 -e "discovery.type=single-node" elasticsearch:7.9.1
ã³ã³ãããèµ·å
http://localhost:9200
ãããšãã¢ãã¬ã¹ã«ãšã©ã¹ãã£ãã¯ã€ã³ã¿ãŒãã§ã€ã¹ã衚瀺ããããã©ãŠã¶ãŸãã¯POSTMANããŒã«ã®RESTAPIã䜿çšããŠã¢ã¯ã»ã¹ã§ããŸããããããç§ã¯äŸ¿å©ãªChromeãã©ã°ã€ã³ãèŠã€ããŸããã
ããã¯ãäžèšã®ã¬ã€ãã®1ã€ã§èª¬æãããŠããé¢çœãåç«ã«é¢ããäŸã§ãã©ã°ã€ã³ãŠã£ã³ããŠãã©ã®ããã«èŠãããã§ãã
å·ŠåŽã¯ãªã¯ãšã¹ãã§ã-å³åŽã¯åçãèªåå®äºãæ§æã®åŒ·èª¿è¡šç€ºãèªåãã©ãŒãããã§ã-çç£æ§ãé«ããããã«ä»ã«äœãå¿ èŠã§ããïŒããã«ããã®ãã©ã°ã€ã³ã¯ãã¯ãªããããŒããã貌ãä»ããããããã¹ãã®CURLã³ãã³ãã©ã€ã³åœ¢åŒãèªèããŠæ£ãããã©ãŒãããã§ããŸããããšãã°ã
ãcurl -X GET $ ES_URLãè¡ã貌ãä»ããŠãäœãèµ·ãããã確èªããŸããäžè¬çã«ã¯äŸ¿å©ãªãã®ã§ãã
äœãã©ã®ããã«ä¿åããã³æ€çŽ¢ããŸããïŒElasticsearchã¯ããã¹ãŠã®JSONããã¥ã¡ã³ããååŸããããããã€ã³ããã¯ã¹ãšåŒã°ããæ§é ã«æ ŒçŽããŸããã€ã³ããã¯ã¹ã¯ããã€ã§ãååšã§ããŸããã1ã€ã®ã€ã³ããã¯ã¹ã«åçš®ã®ããŒã¿ãšããã¥ã¡ã³ããå«ããããšãã§ãããã£ãŒã«ãã®æ§é ã¯é¡äŒŒããŠãããæ€çŽ¢æ¹æ³ãåãã§ãã
ãã¡ãžãŒæ€çŽ¢ã®å¯èœæ§ã調æ»ããããã«ãåã®æé ã§ååŸãã転èšãã¡ã€ã«ã®ãã¬ãŒãºïŒã»ã°ã¡ã³ãïŒã»ã¯ã·ã§ã³ãããŠã³ããŒãããŠæ€çŽ¢ããããšã«ããŸããã JSONãã¡ã€ã«ã®ã»ã°ã¡ã³ãã»ã¯ã·ã§ã³ã§ã¯ãããŒã¿ã¯æ¬¡ã®åœ¢åŒã§ä¿åãããŸãã
- 1 (segment)
-> /
->
--> 1
---->
----> , (confidence)
--> 2
---->
----> , (confidence)
æ€çŽ¢ãæåããå¯èœæ§ãé«ãããã®ã§ããã¹ãŠã®ä»£æ¿ãªãã·ã§ã³ãããŒã¿ããŒã¹ã«ã¢ããããŒãããŠæ€çŽ¢ããèŠã€ãã£ããã©ã°ã¡ã³ãããå šäœã®ä¿¡é Œæ§ãé«ããã®ãéžæããŸãã
JSONããã¥ã¡ã³ããåãã©ãŒãããããŠElasticsearchã«ããŒãããã«ã¯ãå°ããªPythonã¹ã¯ãªããã䜿çšããŸããã¹ã¯ãªããããžãã¯ã¯æ¬¡ã®ãšããã§ãã
- ãŸããã»ã°ã¡ã³ãã»ã¯ã·ã§ã³ã®ãã¹ãŠã®èŠçŽ ãšãã¹ãŠã®ä»£æ¿è»¢åãªãã·ã§ã³ã«ã€ããŠèª¬æããŸãã
- å転åãªãã·ã§ã³ã«ã€ããŠããã®å šäœçãªèªèã®ä¿¡é Œæ§ãèæ ®ããŸããç§ã¯åã ã®åèªã®ç®è¡å¹³åãåããŸãããããããå°æ¥çã«ã¯ãããã«ãã£ãšæ³šææ·±ãåãçµãå¿ èŠããããŸãã
- 代æ¿ã®è»¢èšãªãã·ã§ã³ããšã«ããã©ãŒã ã®ã¬ã³ãŒããElasticsearchã«ããŒãããŸã
{ "recording_id" : < >, "seg_id" : <id >, "alt_id" : <id >, "start_time" : < >, "end_time" : < >, "transcribe_score" : < (confidence) >, "transcript" : < > }
JSONãã¡ã€ã«ããElasticsearchã«ã¬ã³ãŒããããŒãããPythonã¹ã¯ãªãã
from elasticsearch import Elasticsearch
import json
from statistics import mean
#
TRANCRIBE_FILE_NAME = "3-My_CiscoClub_transcription_10min-v3_Russian_terminilogy.json"
LOCAL_IP = "192.168.2.35"
INDEX_NAME = 'ciscorecords'
# Setup Elasticsearch connection
es = Elasticsearch([{'host': LOCAL_IP, 'port': 9200}])
if not es.ping():
print ("ES connection error, check IP and port")
es.indices.create(index=INDEX_NAME) # Create index for our recordings
# Open and load file
res = None
with open(TRANCRIBE_FILE_NAME) as json_file:
data = json.load(json_file)
res = data['results']
#
index = 1
for idx, seq in enumerate(res['segments']):
# enumerate fragments
for jdx, alt in enumerate(seq['alternatives']):
# enumerate alternatives for each segments
score_list = []
for item in alt['items']:
score_list.append( float(item['confidence']))
score = mean(score_list)
obj = {
"recording_id" : "rec_1",
"seg_id" : idx,
"alt_id" : jdx,
"start_time" : seq["start_time"],
"end_time" : seq ["end_time"],
"transcribe_score" : score,
"transcript" : alt["transcript"]
}
es.index( index=INDEX_NAME, id = index, body = obj )
index += 1
Pythonããæã¡ã§ãªãå Žåã§ããå¿é ããªãã§ãã ãããDockerãåã³ãµããŒãããŠãããŸããç§ã¯éåžžãJupyterããŒãããã¯ä»ãã®ã³ã³ãããŒã䜿çšããŸããéåžžã®ãã©ãŠã¶ãŒã§ã³ã³ãããŒã«æ¥ç¶ããŠãå¿ èŠãªããšããã¹ãŠå®è¡ã§ããŸããã³ã³ãããŒãç Žæ£ããããšãã¹ãŠã®æ å ±ã倱ããããããçµæã®ä¿åã«ã€ããŠèæ ®ããå¿ èŠãããã®ã¯1ã€ã ãã§ãããããŸã§ãã®ããŒã«ã䜿çšããããšããªãå Žåã¯ãåå¿è åãã®åªããèšäºã§ããã¡ãªã¿ã«ãã€ã³ã¹ããŒã«ã«é¢ããã»ã¯ã·ã§ã³ã¯ã¹ãããããŠããŸããŸããã
次ã®ã³ãã³ãã䜿çšããŠãPythonããŒãããã¯ã§ã³ã³ãããèµ·åããŸãã
$ docker run -p 8888:8888 jupyter/base-notebook sh -c 'jupyter notebook --allow-root --no-browser --ip=0.0.0.0 --port=8888'
ãããŠãã¹ã¯ãªãããæ£åžžã«èµ·åãããåŸã«ç»é¢ã«è¡šç€ºãããã¢ãã¬ã¹ã«ããä»»æã®ãã©ãŠã¶ãŒã§æ¥ç¶ã
http://127.0.0.1:8888
ãŸããããã¯ãæå®ãããã»ãã¥ãªãã£ããŒã䜿çšããŸãã
æ°ããããŒãããã¯ãäœæããŸããæåã®ã»ã«ã«æ¬¡ã®ããã«èšè¿°ããŸãã
!pip install elasticsearch
å®è¡ããAPIãä»ããŠESãæäœããããã®ããã±ãŒãžãã€ã³ã¹ããŒã«ããããŸã§åŸ ã¡ãã¹ã¯ãªããã2çªç®ã®ã»ã«ã«ã³ããŒããŠå®è¡ããŸããäœæ¥åŸããã¹ãŠãæåããå ŽåãElasticsearchã³ã³ãœãŒã«ã§ããŒã¿ãæ£åžžã«ããŒããããããšã確èªã§ããŸããã³ãã³ããå ¥åãã
GET /ciscorecords/_search
ãšãhits.total.valueãã£ãŒã«ãã«ç€ºãããŠããããã«ãããŒããããã¬ã³ãŒããå¿çãŠã£ã³ããŠã«åèš173å衚瀺ãããŸãã
ä»ãããã¡ãžãŒæ€çŽ¢ãè©Šããšãã§ã-ããããã¹ãŠã§ãããããšãã°ããããŒã¿ã»ã³ã¿ãŒãããã¯ãŒã¯ã®ã³ã¢ããšãããã¬ãŒãºãæ€çŽ¢ããã«ã¯ã次ã®ã³ãã³ããå®è¡ããå¿ èŠããããŸãã
POST /ciscorecords/_search
{
"size" : 20,
"min_score" : 1,
"sort": { "_score": { "order": "desc" } },
"query": {
"multi_match": {
"query" : " ",
"fuzziness" : 2,
"fields": [ "transcript" ],
"analyzer" : "russian"
}
},
"_source": [ "transcript", "transcribe_score" ]
}
47ãã®çµæãåŸãããŸãïŒ
åœç¶ã®ããšãªããããããã®ã»ãšãã©ã¯åããã©ã°ã¡ã³ãã®ç°ãªãããªãšãŒã·ã§ã³ã§ããããã§ããå¥ã®ã¹ã¯ãªãããäœæããŠãåã»ã°ã¡ã³ãããä¿¡é Œå€ãæãé«ã1ã€ã®ã¬ã³ãŒããéžæããŠã¿ãŸãããã
ElasticsearchããŒã¿ããŒã¹ã«ã¯ãšãªãå®è¡ããPythonã¹ã¯ãªãã
#####
#
# PHRASE = " "
# PHRASE = " "
PHRASE = " "
LOCAL_IP = "192.168.2.35"
INDEX_NAME = 'ciscorecords'
#
elastic_queary = {
"size" : 40,
"min_score" : 1,
"sort": { "_score": { "order": "desc" } },
"query": {
"multi_match": {
"query" : PHRASE,
"fuzziness" : 2,
"fields": [ "transcript" ],
"analyzer" : "russian"
}
},
}
# Setup Elasticsearch connection
es = Elasticsearch([{'host': LOCAL_IP, 'port': 9200}])
if not es.ping():
print ("ES connection error, check IP and port")
#
res = es.search(index=INDEX_NAME, body = elastic_queary)
print ("Got %d Hits:" % res['hits']['total']['value'])
#
search_results = {}
for hit in res['hits']['hits']:
seg_id = hit["_source"]['seg_id']
if seg_id not in search_results or search_results[seg_id]['score'] < hit["_score"]:
_res = hit["_source"]
_res["score"] = hit["_score"]
search_results[seg_id] = _res
print ("%s unique results \n-----" % len(search_results))
for rec in search_results:
print ("seg %(seg_id)s: %(score).4f : start(%(start_time)s)-end(%(end_time)s) -- %(transcript)s" % \
(search_results[rec]))
åºåäŸïŒ
Got 47 Hits:
16 unique results
-----
seg 39: 7.2885 : start(374.24)-end(377.165) -- , ..
seg 49: 7.0923 : start(464.44)-end(468.065) -- , ...
seg 41: 4.5401 : start(385.14)-end(405.065) -- . , , , , , ...
seg 30: 4.3556 : start(292.74)-end(298.265) -- , , ,
seg 44: 2.1968 : start(415.34)-end(426.765) -- , , , . -
seg 48: 2.0587 : start(449.64)-end(464.065) -- , , , , , .
seg 26: 1.8621 : start(243.24)-end(259.065) -- . . , . ...
çµæãã¯ããã«å°ãããªã£ãŠããããšãããããŸãããããã§ãçµæã衚瀺ããŠãæãé¢å¿ã®ãããã®ãéžæã§ããŸãã
ãŸãããããªãã©ã°ã¡ã³ãã®éå§æå»ãšçµäºæå»ãããããããããªãã¬ãŒã€ãŒã䜿çšããŠããŒãžãäœæããããã°ã©ã ã§ç®çã®ãã©ã°ã¡ã³ãã«ãå·»ãæ»ããããšãã§ããŸãã
ãã ãããã®ãããã¯ã«é¢ããä»åŸã®åºçç©ã«é¢å¿ãããå Žåã¯ããã®ã¿ã¹ã¯ãå¥ã®èšäºã«å ¥ããŸãã
çµè«ã®ä»£ããã«
ããã§ããã®èšäºã®æ çµã¿ã®äžã§ãæè¡çãªãããã¯ã«é¢ãããŠã§ãããŒã®èšé²ãåãããããªããŒã«ã䜿çšããŠããã¹ãæ€çŽ¢ã·ã¹ãã ãæ§ç¯ããåé¡ãã©ã®ããã«è§£æ±ºãããã瀺ããŸããããã®çµæãéåžžMVPãšåŒã°ãããã®ã«ãªããŸããçµæãååŸãããã®çµæãååãšããŠæ¢åã®ãã¯ãããžãŒã§éæå¯èœã§ããããšã蚌æããããã®æå°éã®äœæ¥ã¢ã«ãŽãªãºã ã
è¿ãå°æ¥ã«å®è£ ã§ããã¢ã€ãã¢ãããæçµè£œåã«å°éãããŸã§ã«ã¯ãŸã é·ãéã®ãããããŸãã
- ããªããèãããšãã§ããããã«ãããªãã¬ãŒã€ãŒããã蟌ã¿ãèŠã€ãã£ããã©ã°ã¡ã³ããèŠã
- ããã¹ãç·šéã®å¯èœæ§ãèããŠãã ããã100ïŒ èªèãããåèªã®ããã¹ãã«ã¢ã³ã«ãŒãæ®ãããšãã§ããŸãããèªèå質ããäœäžãããŠãããã©ã°ã¡ã³ãã®ã¿ãç·šéããŠãã ããã
- elasticsearch, -
- speech-to-text, Google, Yandex, Azure. â
- , «»
- BERT (Bi-directional Encoder Representation from Transformer), . â « xx yy».
- , - - . Youtube , 15-20 , ,
- â , , ,
ãäžæãªç¹ããæèŠãããããŸãããããæ°è»œã«ãåãåãããã ããããŸããããã»ã¹å šäœãæ¹åãŸãã¯ç°¡çŽ åããããã®ãææ¡ããåŸ ã¡ããŠãããŸããããã¯Habrã®æåã®æè¡èšäºã§ããããããæçšã§èå³æ·±ããã®ã«ãªã£ãããšãå¿ããé¡ã£ãŠããŸãã
ããªãã®åµé çãªæ€çŽ¢ã®ãã¹ãŠã®äººã«é 匵ã£ãŠãã ããããããŠãã©ãŒã¹ãããªããšäžç·ã«ãããããããŸããïŒ