
å人çãªçµéšããããã€ãã©ãã§äœã圹ã«ç«ã£ãããæããŸãã調æ»ãšè«æã次ã«äœãã©ãã§æãã®ããæç¢ºã«ãªããŸããããããã§ã¯ç§ã¯å°ã䞻芳çãªå人çãªçµéšãæã£ãŠããŸãããããããã¹ãŠãããªããšã¯å®å šã«ç°ãªããŸãã
ã¯ãšãªèšèªãç¥ããåŠçã§ããããšãéèŠãªã®ã¯ãªãã§ããïŒããŒã¿ãµã€ãšã³ã¹ã®æ žãšãªãã®ã¯ãäœæ¥ã®ããã€ãã®æãéèŠãªæ®µéã§ãããæåã§æãéèŠãªã®ã¯ïŒãã¡ããããããªãã§ã¯äœãæ©èœããŸããïŒïŒããŒã¿ã®ååŸãŸãã¯ååŸã§ããã»ãšãã©ã®å Žåãäœããã®åœ¢åŒã®ããŒã¿ãã©ããã«ããããããããååŸãããå¿ èŠããããŸãã
ã¯ãšãªèšèªã§ã¯ããŸãã«ãã®ããŒã¿ãæœåºã§ããŸãïŒãããŠä»æ¥ãç§ã¯ç§ã«åœ¹ç«ã€ãããã®ã¯ãšãªèšèªã«ã€ããŠããªãã«è©±ããŸãããããŠç§ã¯æããŸã-ã©ãã§ãããŠã©ã®ããã«æ£ç¢ºã«-ãªããããå匷ããå¿ èŠãããã®ãââã瀺ããŸã
åèšã§ãããŒã¿ãžã®ã¯ãšãªã®ã¿ã€ãã«ã¯3ã€ã®äž»èŠãªãããã¯ãããããã®èšäºã§åæããŸãã
- ãæšæºãã¯ãšãªèšèªã¯ããªã¬ãŒã·ã§ãã«ä»£æ°ãSQLãªã©ã®ã¯ãšãªèšèªã«ã€ããŠè©±ããšãã«éåžžçè§£ãããã®ã§ãã
- ã¹ã¯ãªããã¯ãšãªèšèªïŒããšãã°ãPythonã®ãã³ãããªãã¯ãnumpyããŸãã¯ã·ã§ã«ã¹ã¯ãªããã
- ç¥èã°ã©ããšã°ã©ãããŒã¿ããŒã¹ã®ã¯ãšãªèšèªã
ããã«æžãããŠããããšã¯ãã¹ãŠãç¶æ³ã®èª¬æãšããªããããå¿ èŠã ã£ãã®ãããåããã䟿å©ãªå人çãªçµéšã§ã-誰ããããªããããåã«ãããã®èšèªãæ±ã£ãã®ã§ã䌌ããããªç¶æ³ãããªãã«äŒãããšãã§ããæ¹æ³ã詊ããäºåã«ãããã®æºåã詊ã¿ãããšãã§ããŸãïŒç·æ¥ã«ïŒãããžã§ã¯ãã«å¿åããããå¿ èŠãªå Žæã§ãããžã§ã¯ãã«åå ããããšãã§ããŸãã
ãæšæºãã¯ãšãªèšèª
æšæºã®ã¯ãšãªèšèªã¯ãã¯ãšãªã«ã€ããŠè©±ããšãã«éåžžãããã«ã€ããŠèãããšããæå³ã§æ£ç¢ºã§ãã
é¢ä¿ä»£æ°
仿¥ããªããªã¬ãŒã·ã§ãã«ä»£æ°ãå¿ èŠãªã®ã§ããïŒã¯ãšãªèšèªãç¹å®ã®æ¹æ³ã§é 眮ãããŠããçç±ãããçè§£ããããããæèçã«äœ¿çšããã«ã¯ãåºç€ãšãªãã³ã¢ãçè§£ããå¿ èŠããããŸãã
ãªã¬ãŒã·ã§ãã«ä»£æ°ãšã¯äœã§ããïŒ
æ£åŒãªå®çŸ©ã¯æ¬¡ã®ãšããã§ãããªã¬ãŒã·ã§ãã«ä»£æ°ã¯ããªã¬ãŒã·ã§ãã«ããŒã¿ã¢ãã«ã®ãªã¬ãŒã·ã§ã³ã«å¯Ÿããæäœã®éããã·ã¹ãã ã§ãããã人éçã«ã¯ãããã¯ããŒãã«ã«å¯Ÿããæäœã®ã·ã¹ãã ã§ãããçµæãåžžã«ããŒãã«ã«ãªããŸããHabrã®ãã®èšäºã«ãã
ãã¹ãŠã®ãªã¬ãŒã·ã§ãã«æäœãåç §ããŠãã ãããããã§ã¯ããªãç¥ãå¿ èŠãããã®ãââããããŠãããã©ãã§åœ¹ç«ã€ã®ãã説æããŸããäœã®ããã«ïŒ
äžè¬çã«äœ¿çšãããŠããã¯ãšãªèšèªãšãç¹å®ã®ã¯ãšãªèšèªã®åŒã®èåŸã«ããæäœã«ã€ããŠçè§£ãå§ããŸããå€ãã®å Žåãã¯ãšãªèšèªã§äœãã©ã®ããã«æ©èœããããããæ·±ãçè§£ã§ããŸãã

ãã®èšäºããåŒçšãæäœäŸïŒããŒãã«ãçµåããjoinã
ææïŒ
ã¹ã¿ã³ãã©ãŒãããã®è¯ãå ¥éã³ãŒã¹ãäžè¬ã«ãé¢ä¿ä»£æ°ãšçè«ã«é¢ããè³æã¯ãããããããŸã-CourseraãUdacityãåªããåŠè¡ã³ãŒã¹ãå«ããèšå€§ãªéã®ãªã³ã©ã€ã³è³æããããŸããç§ã®å人çãªã¢ããã€ã¹ã¯ãé¢ä¿ä»£æ°ãéåžžã«ããçè§£ããããšã§ã-ãããåºç€ã§ãã
SQL

ãã®èšäºããåŒçšã
å®éãSQLã¯ãªã¬ãŒã·ã§ãã«ä»£æ°ã®å®è£ ã§ããéèŠãªæ³šæç¹ããããŸãããSQLã¯å®£èšçã§ããã€ãŸãããªã¬ãŒã·ã§ãã«ä»£æ°ã®èšèªã§ã¯ãšãªãäœæãããšãå®éã«ã«ãŠã³ãããæ¹æ³ãèšãããšã«ãªããŸãããSQLã䜿çšããŠæœåºãããã®ãæå®ãããšãDBMSã¯ãã§ã«ãªã¬ãŒã·ã§ãã«ä»£æ°ã®èšèªã§ïŒå¹æçãªïŒåŒãçæããŸãïŒãããã®åçæ§ã¯Coddã®å®çã®äžã§ç§ãã¡ã«ç¥ãããŠããŸãïŒ ..ã

ãã®èšäºããåŒçšã
äœã®ããã«ïŒ
ãªã¬ãŒã·ã§ãã«DBMSïŒOracleãPostgresãSQL Serverãªã©ã¯ãŸã äºå®äžââã©ãã«ã§ããããããããšå¯Ÿè©±ããå¿ èŠãããå¯èœæ§ãéåžžã«é«ããããSQLãèªã¿åãïŒå¯èœæ§ãéåžžã«é«ãïŒããæžã蟌ãå¿ èŠããããŸãïŒéåžžã«å¯èœæ§ãé«ãïŒããŸããå¯èœæ§ã¯äœãã§ãïŒã
äœãèªãã§åŠã¶ã
äžèšã®åããªã³ã¯ïŒãªã¬ãŒã·ã§ãã«ä»£æ°ïŒããããã®ãããªèšå€§ãªéã®è³æããããŸãã
ã¡ãªã¿ã«ãNoSQLãšã¯ïŒ
ããNoSQLããšããçšèªã¯å®å šã«èªçºçãªèµ·æºã§ãããäžè¬ã«åãå ¥ããããŠããå®çŸ©ããã®èåŸã«ããç§åŠçæ©é¢ããªãããšãããäžåºŠåŒ·èª¿ãã䟡å€ããããŸãããHabréã«é¢ãã察å¿ããèšäºã
å®éãå€ãã®åé¡ã解決ããããã«å®å šãªãªã¬ãŒã·ã§ãã«ã¢ãã«ã¯å¿ èŠãªãããšã«æ°ã¥ããŸãããç¹ã«ãããã©ãŒãã³ã¹ãåºæ¬ã§ãããéèšã䌎ãç¹å®ã®åçŽãªã¯ãšãªãæ¯é çã§ããå Žåã¯ç¹ã«ããã§ããã¡ããªãã¯ããã°ããèªã¿åã£ãŠããŒã¿ããŒã¹ã«æžã蟌ãããšãéèŠã§ãããã»ãšãã©ã®æ©èœã¯ãªã¬ãŒã·ã§ãã«ã§ããäžå¿ èŠã§ããã ãã§ãªããæå®³ã§ããããšã倿ããŸãã-ãããç§ãã¡ã«ãšã£ãŠæãéèŠãªããšïŒç¹å®ã®ã¿ã¹ã¯ã®ããã«ïŒãå°ç¡ãã«ããã®ã§ããã°ããªãäœããæ£èŠåããã®ã§ããïŒ
ãŸããåŸæ¥ã®ãªã¬ãŒã·ã§ãã«ã¢ãã«ã®åºå®æ°åŠã¹ããŒãã§ã¯ãªããæè»ãªã¹ããŒããå¿ èŠã«ãªãããšããããããŸããããã«ãããã·ã¹ãã ãå±éããŠè¿ éã«äœæ¥ãéå§ããçµæãåŠçããããšãéèŠãªå Žåãã¢ããªã±ãŒã·ã§ã³éçºãéåžžã«ç°¡åã«ãªããŸãããŸãã¯ãä¿åãããŠããããŒã¿ã®ã¹ããŒããšã¿ã€ãã¯ããã»ã©éèŠã§ã¯ãããŸããã
ããšãã°ããšãã¹ããŒãã·ã¹ãã ãäœæããŠããŠãç¹å®ã®ãã¡ã€ã³ã«æ å ±ãã¡ã¿æ å ±ãšãšãã«ä¿åãããå ŽåïŒãã¹ãŠã®ãã£ãŒã«ããããããªãå Žåããããåã¬ã³ãŒãã®JSONãä¿åããã®ã¯é¢åã§ãïŒãããã«ãããããŒã¿ã¢ãã«ãæ¡åŒµããŠé«éãªå埩ãè¡ãããã®éåžžã«æè»ãªç°å¢ãåŸãããŸãã NoSQLã®å Žåã¯ãããã«å¥œãŸãããèªã¿ããããªããŸãããšã³ããªã®äŸïŒç§ã®ãããžã§ã¯ãã®1ã€ãããNoSQLãå¿ èŠãªå Žæã«é©åã§ããïŒã
{"en_wikipedia_url":"https://en.wikipedia.org/wiki/Johnny_Cash",
"ru_wikipedia_url":"https://ru.wikipedia.org/wiki/?curid=301643",
"ru_wiki_pagecount":149616,
"entity":[42775," ","ru"],
"en_wiki_pagecount":2338861}
NoSQL㮠詳现ã«ã€ããŠã¯ããã¡ããã芧ãã ããã
äœãå匷ããŸããïŒ
ããããã¿ã¹ã¯ãã¿ã¹ã¯ã®ããããã£ãããã³ãã®èª¬æã«é©åããå©çšå¯èœãªNoSQLã·ã¹ãã ã®åæã«é·ããŠããå¿ èŠããããŸãããããŠããã§ã«ãã®ã·ã¹ãã ãç ç©¶ããŠããŸãã
ã¹ã¯ãªããã¯ãšãªèšèª
æåã¯ãPythonã¯Pythonãšäœã®é¢ä¿ãããããã§ããããã¯ããã°ã©ãã³ã°èšèªã§ãããã¯ãšãªã«é¢ãããã®ã§ã¯ãããŸããã

- Pandasã¯ãããŒã¿ãµã€ãšã³ã¹ã®çŽæ¥çãªã¹ã€ã¹ã®ãã€ãã§ããã倧éã®ããŒã¿å€æãéçŽãªã©ãè¡ãããŠããŸãã
- Numpyã¯ããã¯ãã«ã³ã³ãã¥ãŒãã£ã³ã°ãè¡åãç·åœ¢ä»£æ°ã§ãã
- Scipyã¯ããã®ããã±ãŒãžã®å€ãã®èšç®ãç¹ã«çµ±èšã§ãã
- Jupyterã©ã-å€ãã®æ¢çŽ¢çããŒã¿åæã¯ã©ãããããã«ããŸãé©åããŸã-ã§ããã®ã¯è¯ãããšã§ãã
- ãªã¯ãšã¹ã-ãããã¯ãŒãã³ã°ã
- Pysparksã¯ããŒã¿ãšã³ãžãã¢ã«éåžžã«äººæ°ãããããã®äººæ°ã®ããã«ããããããããšå¯Ÿè©±ããããã¹ããŒã¯ããå¿ èŠããããŸãã
- * Seleniumã¯ããµã€ãããªãœãŒã¹ããããŒã¿ãåéããã®ã«éåžžã«äŸ¿å©ã§ããããŒã¿ãååŸããä»ã®æ¹æ³ããªãå ŽåããããŸãã
ç§ã®äžçªã®ãã³ãïŒPythonãåŠã³ãŸãããïŒ
ãã³ã
äŸãšããŠæ¬¡ã®ã³ãŒããåãäžããŸãããã
import pandas as pd
df = pd.read_csv(âdata/dataset.csvâ)
# Calculate and rename aggregations
all_together = (df[df[âtrip_typeâ] == âreturnâ]
.groupby(['start_station_name','end_station_name'])\
.agg({'trip_duration_seconds': [np.size, np.mean, np.min, np.max]})\
.rename(columns={'size': 'num_trips',
'mean': 'avg_duration_seconds',
'amin': min_duration_seconds',
âamax': 'max_duration_seconds'}))
å®éãã³ãŒããåŸæ¥ã®SQLãã¿ãŒã³ã«é©åããŠããããšãããããŸãã
SELECT start_station_name, end_station_name, count(trip_duration_seconds) as size, âŠ..
FROM dataset
WHERE trip_type = âreturnâ
GROUPBY start_station_name, end_station_name
ãã ããéèŠãªéšåã¯ããã®ã³ãŒããã¹ã¯ãªãããšãã€ãã©ã€ã³ã®äžéšã§ãããšããããšã§ããå®éããªã¯ãšã¹ãã¯Pythonãã€ãã©ã€ã³ã«åã蟌ãŸããŠããŸãããã®ç¶æ³ã§ã¯ãã¯ãšãªèšèªã¯PandasãpySparkãªã©ã®ã©ã€ãã©ãªããæäŸãããŸãã
äžè¬ã«ãpySparkã§ã¯ã次ã®ç²Ÿç¥ã§ã¯ãšãªèšèªãä»ããåæ§ã®ã¿ã€ãã®ããŒã¿å€æãèŠãããŸãã
df.filter(df.trip_type = âreturnâ)\
.groupby(âdayâ)\
.agg({duration: 'mean'})\
.sort()
ã©ãã§äœãèªãã
Pythonèªäœãç ç©¶ããããã®è³æãèŠã€ããããšã¯åé¡ã§ã¯ãããŸããããããäžã«ã¯ããã³ããpySparkãSparkïŒããã³DSèªäœïŒã®ã³ãŒã¹ã«é¢ããèšå€§ãªæ°ã®ãã¥ãŒããªã¢ã«ããããŸããäžè¬çã«ãããã®è³æã¯ã°ãŒã°ã«ã§çŽ æŽãããã§ãããããŠç§ãçŠç¹ãåãããããã«1ã€ã®ããã±ãŒãžãéžã°ãªããã°ãªããªãã£ããªãã°ããã¡ããããã¯ãã³ãã§ããããDS + Pythonãã³ãã«ã«ã¯å€ãã®è³æããããŸãã
ã¯ãšãªèšèªãšããŠã®ã·ã§ã«
å®éãç§ã䜿çšããªããã°ãªããªãã£ãããŒã¿åŠçããã³åæãããžã§ã¯ãã®å€ãã¯ãPythonãjavaãããã³ã·ã§ã«ã³ãã³ãèªäœã§ã³ãŒããåŒã³åºãã·ã§ã«ã¹ã¯ãªããã§ãããããã£ãŠãäžè¬ã«ãbash / zsh /ãªã©ã®ãã€ãã©ã€ã³ãé«ã¬ãã«ã®èŠæ±ãšèŠãªãããšãã§ããŸãïŒãã¡ãããããã§ã«ãŒããããã·ã¥ããããšã¯ã§ããŸãããããã¯ã·ã§ã«èšèªã®DSã³ãŒãã§ã¯äžè¬çã§ã¯ãããŸããïŒãç°¡åãªäŸãæããŸããã-wikidataã®QIDããããããå¿ èŠããããŸãããã·ã¢èªãšè±èªã®wikiãžã®å®å šãªãªã³ã¯ããã®ããã«ãbashã®ã³ãã³ãããç°¡åãªã¯ãšãªãäœæããåºåçšã«pythonã§ç°¡åãªã¹ã¯ãªãããäœæããŸããããããæ¬¡ã®ããã«ãŸãšããŸããã
pv âdata/latest-all.json.gzâ |
unpigz -c |
jq --stream $JQ_QUERY |
python3 scripts/post_process.py "output.csv"
ã©ã
JQ_QUERY = 'select((.[0][1] == "sitelinks" and (.[0][2]=="enwiki" or .[0][2] =="ruwiki") and .[0][3] =="title") or .[0][1] == "id")'
å®éãããã¯å¿ èŠãªãããã³ã°ãäœæãããã€ãã©ã€ã³å šäœã§ããããã¹ãŠãèŠããšãã¹ããªãŒã ã¢ãŒãã§æ©èœããŠããŸããã
- pv filepath-ãã¡ã€ã«ãµã€ãºã«åºã¥ããŠé²è¡ç¶æ³ããŒã衚瀺ãããã®å å®¹ãæž¡ããŸã
- unpigz -cã¯ã¢ãŒã«ã€ãã®äžéšãèªã¿åããjqãæäŸããŸãã
- ããŒãæã€jq-ã¹ããªãŒã ã¯ããã«çµæãçæããPythonã§ãã¹ãããã»ããµã«æž¡ããŸããïŒæåã®äŸãšåãããã«ïŒ
- å éšçã«ã¯ããã¹ãããã»ããµã¯åºåããã©ãŒãããããåçŽãªã¹ããŒããã·ã³ã§ãã
åèšãããšãããã°ããŒã¿ïŒ0.5TBïŒã§ã¹ããªãŒã ã¢ãŒãã§åäœããè€éãªãã€ãã©ã€ã³ã§ããã倧ããªãªãœãŒã¹ã¯ãªããåçŽãªãã€ãã©ã€ã³ãšããã€ãã®ããŒã«ã§æ§æãããŠããŸãã
ãã1ã€ã®éèŠãªãã³ãïŒã¿ãŒããã«ã§å¹ççã«äœæ¥ããbash / zsh /ãªã©ã§æžã蟌ã¿ãŸããã©ãã§åœ¹ã«ç«ã¡ãŸããïŒã¯ããã»ãšãã©ã©ãã«ã§ããããŸããããã§ãããããäžã§ç ç©¶ããããã®è³æããããããããŸããç¹ã«ãããã¯ç§ã®åã®èšäºã§ãã
Rã¹ã¯ãªããã£ã³ã°
ç¹°ãè¿ããŸãããèªè ã¯å«ã¶ãããããŸãã-ãŸããããã¯å®å šãªããã°ã©ãã³ã°èšèªã§ãïŒãããŠãã¡ãããåœŒã¯æ£ããã§ããããããããç§ã¯éåžžãRãåžžã«ãã®ãããªã³ã³ããã¹ãã§åŠçããå¿ èŠããããå®éãããã¯ã¯ãšãªèšèªã«éåžžã«äŒŒãŠããŸããã
Rã¯ãçµ±èšèšç®ãã¬ãŒã ã¯ãŒã¯ãšéçã³ã³ãã¥ãŒãã£ã³ã°åã³èŠèŠåèšèªïŒã«åŸãæ¬ïŒã

ãããã æ®åœ±ãã¡ãªã¿ã«ãè¯ãçŽ æããå§ãããŸãã
ãªãç§åŠè ã«Rãç¥ã£ãŠãããã®ã§ããïŒå°ãªããšããRã§ããŒã¿åæã«åŸäºããŠããIT以å€ã®äººã ã®å·šå€§ãªå±€ããããããç§ã¯æ¬¡ã®å Žæã§äŒããŸããã
- 補è¬éšéã
- çç©åŠè ã
- éèéšéã
- çµ±èšãæ±ããçŽç²ã«æ°åŠçãªæè²ãåããŠãã人ã ã
- ç¹æ®ãªçµ±èšããã³æ©æ¢°åŠç¿ã¢ãã«ïŒå€ãã®å ŽåãRããã±ãŒãžãšããŠã¢ããã¹ããªãŒã ããŒãžã§ã³ã§ã®ã¿èŠã€ããããšãã§ããŸãïŒã
ãªãå®éã«ã¯ãšãªèšèªãªã®ã§ããïŒããèŠããã圢åŒã§ã¯ãå®éã«ã¯ãããŒã¿ã®èªã¿åããã¯ãšãªãã©ã¡ãŒã¿ïŒã¢ãã«ïŒã®ä¿®æ£ãggplot2ãªã©ã®ããã±ãŒãžå ã®ããŒã¿ã®èŠèŠåãå«ãã¢ãã«ã®äœæèŠæ±ã§ãããããã¯ãšãªã®èšè¿°åœ¢åŒã§ãã
ã¬ã³ããªã³ã°ã®ã¯ãšãªã®äŸ
ggplot(data = beav,
aes(x = id, y = temp,
group = activ, color = activ)) +
geom_line() +
geom_point() +
scale_color_manual(values = c("red", "blue"))
äžè¬ã«ãRã®å€ãã®ã¢ã€ãã¢ã¯ãããŒã¿ãã¬ãŒã ãããŒã¿ã®ãã¯ãã«åãªã©ããã³ããnumpyãscipyãªã©ã®pythonããã±ãŒãžã«ç§»è¡ãããŠããŸãããããã£ãŠãäžè¬ã«ãRã®å€ãã®ãã®ã¯äœ¿ãæ £ããŠããŠäŸ¿å©ã«æããŸãã
ç ç©¶ã®ããã®å€ãã®æ å ±æºããããŸããäŸãã°ãããã
ç¥èã°ã©ã
ããã§ç§ã¯å°ãå€ãã£ãçµéšãããŠããŸãããªããªããç§ã¯ãŸã ããªãé »ç¹ã«ç¥èã°ã©ããšã°ã©ãã®ã¯ãšãªèšèªã䜿çšããå¿ èŠãããããã§ãããããã£ãŠããã®éšåã¯ããå°ããšããŸããã¯ãªã®ã§ãåºæ¬ãç°¡åã«èŠãŠãããŸãããã
åŸæ¥ã®ãªã¬ãŒã·ã§ãã«ããŒã¿ããŒã¹ã§ã¯ãåºå®ã¹ããŒãããããŸããããã§ã¯ãã¹ããŒãã¯æè»ã§ãããåè¿°èªã¯å®éã«ã¯ãåããªã©ã§ãã
人ãã¢ãã«åããéèŠãªããšã説æããããšããŸããããšãã°ããã°ã©ã¹ã»ã¢ãã ã¹ã®ç¹å®ã®äººãåãäžããŸãããããã®èª¬æãåºæ¬ãšããŸãã

www.wikidata.org/wiki/Q42
ãªã¬ãŒã·ã§ãã«ããŒã¿ããŒã¹ã䜿çšããŠããå Žåãèšå€§ãªæ°ã®åãæã€1ã€ãŸãã¯è€æ°ã®å·šå€§ãªããŒãã«ãäœæããå¿ èŠããããŸãããã®ã»ãšãã©ã¯ãNULLã§ããããããã©ã«ãã®Falseå€ã§åããããŸããç§ãã¡ã®å€ãã¯éåœåœç«å³æžé€šã«ãšã³ããªãæã£ãŠããŸã-ãã¡ãããããããå¥ã ã®ããŒãã«ã«çœ®ãããšãã§ããŸãããããã¯æçµçã«ã¯ãåºå®ã®ãªã¬ãŒã·ã§ãã«ããžãã¯ã䜿çšããŠè¿°èªã§æè»ãªããžãã¯ãã¢ãã«åãã詊ã¿ã«ãªããŸãã

ãããã£ãŠããã¹ãŠã®ããŒã¿ãã°ã©ããšããŠããŸãã¯ãã€ããªããã³åäžã®è«çåŒãšããŠæ ŒçŽãããŠãããšæ³åããŠãã ããã
ã©ãã§ããã«ééããããšãã§ããŸããïŒãŸããwikiããŒã¿ãããã³ã°ã©ãããŒã¿ããŒã¹ãŸãã¯æ¥ç¶ãããããŒã¿ãæäœããŸãã
以äžã¯ãç§ã䜿çšãã䜿çšããäž»ãªã¯ãšãªèšèªã§ãã
SPARQL
Wiki:
SPARQL ( . SPARQL Protocol and RDF Query Language) â , RDF, . SPARQL W3C .
ãããå®éã«ã¯ãããã¯è«ççãªåé ããã³äºé è¿°èªãžã®ã¯ãšãªã®èšèªã§ããããŒã«åŒã§äœãä¿®æ£ãããäœãä¿®æ£ãããªãããæ¡ä»¶ä»ãã§è¿°ã¹ãŠããã ãã§ãïŒéåžžã«åçŽåãããŠããŸãïŒã
SPARQLã¯ãšãªãå®è¡ãããRDFïŒResource Description FrameworkïŒããŒã¹èªäœã¯ããªãã¬ããã§
object, predicate, subjectãããã¯ãšãªã¯æ¬¡ã®ç²Ÿç¥ã§æå®ãããå¶çŽã«åŸã£ãŠå¿
èŠãªããªãã¬ãããéžæããŸããp_55ïŒXãq_33ïŒãçã«ãªããããªXãèŠã€ããŸã-ãã¡ãããp_55ã¯äœã§ãã-ID 55ãšã®é¢ä¿ãããã³q_33ã¯ID 33ã®ãªããžã§ã¯ãã§ãïŒãããå
šäœã®è©±ã§ãããããã§ããã¹ãŠã®çš®é¡ã®è©³çްãçç¥ãããŠããŸãïŒã
ããŒã¿è¡šç€ºã®äŸïŒ

åçãšåœã®äŸã¯ããããã§ãã
åºæ¬çãªã¯ãšãªã®äŸ

å®éã倿°ïŒCountryã®å€ãèŠã€ãããã®ã§ãè¿°èª
member_ofã®å Žåãmember_ofïŒïŒcountryãq458ïŒãšq458ãæ¬§å·é£åã®IDã§ããããšãçã§ãã
pythonãšã³ãžã³å ã®å®éã®SPARQLã¯ãšãªã®äŸïŒ

ååãšããŠãç§ã¯SPARQLãæžãã®ã§ã¯ãªãèªãå¿ èŠããããŸããããã®ãããªç¶æ³ã§ã¯ãããŒã¿ãã©ã®ããã«ååŸãããããæ£ç¢ºã«çè§£ããããã«ãå°ãªããšãåºæ¬çãªã¬ãã«ã§èšèªãçè§£ããããšã圹ç«ã€ã¹ãã«ã«ãªãã§ãããã
ãããããã®ããã«ããªã³ã©ã€ã³ã«ã¯ããããã®åŠç¿è³æããããŸããç§èªèº«ã¯éåžžãç¹å®ã®æ§é ãšäŸãã°ãŒã°ã«ã§æ€çŽ¢ããŸããããããŸã§ã®ãšããååã§ãã
è«çã¯ãšãªèšèª
ãã®ãããã¯ã®è©³çްã«ã€ããŠã¯ããã¡ãã®ç§ã®èšäºãã芧ãã ãããããã§ã¯ãè«çèšèªãã¯ãšãªã®äœæã«é©ããŠããçç±ã«ã€ããŠç°¡åã«èª¬æããŸããå®éãRDFã¯pïŒXïŒããã³hïŒXãYïŒã®åœ¢åŒã®è«çã¹ããŒãã¡ã³ãã®åãªãã³ã¬ã¯ã·ã§ã³ã§ãããè«çã¯ãšãªã¯æ¬¡ã®ããã«ãªããŸãã
output(X) :- country(X), member_of(X,âEUâ).
ããã§ã¯ãæ°ããè¿°èªåºå/ 1ïŒ/ 1ã¯åäžãæå³ããŸãïŒã®äœæã«ã€ããŠèª¬æããŠããŸãããã®åœïŒXïŒãXã«åœãŠã¯ãŸãå Žåãã€ãŸããXã¯åœã§ãããmember_ofïŒXããEUãïŒã§ããããŸãã
ã€ãŸãããã®å Žåã®ããŒã¿ãšã«ãŒã«ã®äž¡æ¹ãäžè¬çã«åãæ¹æ³ã§è¡šç€ºããããããã¿ã¹ã¯ã®ã¢ãã«åãéåžžã«ç°¡åã§åªããŠããŸãã
æ¥çã®ã©ãã§äŒããŸãããïŒãã®ãããªèšèªã§ã¯ãšãªãäœæããäŒç€Ÿãšã®å€§èŠæš¡ãªãããžã§ã¯ãå šäœãããã³ã·ã¹ãã ã®ã³ã¢ã«ããçŸåšã®ãããžã§ã¯ã-ããªããšããŸããã¯ãªããšã®ããã«èŠããŸãããæã çºçããŸãã
wikidataãåŠçããè«çèšèªã®ã³ãŒãã¹ããããã®äŸïŒ

è³æïŒããã§ã¯ãææ°ã®è«çããã°ã©ãã³ã°èšèªã§ããåçã»ããããã°ã©ãã³ã°ãžã®ãªã³ã¯ãããã€ã瀺ããŸããããã«ã€ããŠåŠç¿ããããšããå§ãããŸãã
- http://peace.eas.asu.edu/aaai12tutorial/asp-tutorial-aaai.pdf
- http://ceur-ws.org/Vol-1145/tutorial1.pdf
- https://www.youtube.com/watch?v=gVQ0bP8zyHw
- https://www.youtube.com/watch?v=kdcd7Je2glc
- https://potassco.org/book/
- http://potassco.sourceforge.net/teaching.html
- https://www.cs.uni-potsdam.de/~torsten/Potassco/Tutorials/fmcad12.pdf
