ãã®èšäºã§ã¯ãBigQueryã®äž»ãªæ©èœã«ã€ããŠèª¬æãããããã®æ©èœãå ·äœçãªäŸãšãšãã«ç€ºããŸããåºæ¬çãªã¯ãšãªãäœæããŠããã¢ããŒã¿ã§è©Šãããšãã§ããŸãã
SQLãšã¯äœã§ããïŒSQLã«ã¯ã©ã®ãããªæ¹èšããããŸãã
SQLïŒStructured Query LanguageïŒã¯ãããŒã¿ããŒã¹ãæäœããããã®æ§é åç §äŒèšèªã§ãã倧éã®ããŒã¿ã®åä¿¡ãããŒã¿ããŒã¹ãžã®è¿œå ãããã³å€æŽã«äœ¿çšã§ããŸããGoogle BigQueryã¯ãæšæºSQLãšã¬ã¬ã·ãŒSQLã®2ã€ã®æ¹èšããµããŒãããŠããŸãã
ã©ã®æ¹èšãéžæãããã¯å¥œã¿ã«ãã£ãŠç°ãªããŸãããããã€ãã®å©ç¹ããããããæšæºSQLã®äœ¿çšããå§ãããŸãã
- ãã¹ãããããã£ãŒã«ããšç¹°ãè¿ããã£ãŒã«ããæäœãããšãã®æè»æ§ãšæ©èœã
- DMLããã³DDLèšèªã®ãµããŒããããã«ãããããŒãã«å ã®ããŒã¿ãå€æŽããããGBQã§ããŒãã«ããã¥ãŒãæäœãããã§ããŸãã
- 倧éã®ããŒã¿ã®åŠçã¯ãLegasySQLãããé«éã§ãã
- çŸåšããã³å°æ¥ã®ãã¹ãŠã®BigQueryã¢ããããŒãã®ãµããŒãã
æ¹èšã®éãã«ã€ããŠè©³ããã¯ããã«ããã芧ãã ããã
ããã©ã«ãã§ã¯ãGoogleBigQueryã¯ãšãªã¯ã¬ã¬ã·ãŒSQLã«å¯ŸããŠå®è¡ãããŸãã
æšæºSQLã«åãæ¿ããæ¹æ³ã¯ããã€ããããŸãã
- BigQueryã€ã³ã¿ãŒãã§ã€ã¹ã®ã¯ãšãªç·šéãŠã£ã³ããŠã§[ãªãã·ã§ã³ã衚瀺]ãéžæãã[ã¬ã¬ã·ãŒSQLã䜿çšãã]ãªãã·ã§ã³ã®æšªã«ãããã§ãã¯ããã¯ã¹ããªãã«ããŸã
- ã¯ãšãªã®åã«#standardSQLè¡ãè¿œå ããæ°ããè¡ã§ã¯ãšãªãéå§ããŸã
ã©ãããå§ããã
èšäºãèªãã®ãšäžŠè¡ããŠã¯ãšãªã®å®è¡ãç·Žç¿ã§ããããã«ããã¢ããŒã¿ãå«ãè¡šãçšæããŸãããã¹ãã¬ããã·ãŒãããGoogleBigQueryãããžã§ã¯ãã«ããŒã¿ãèªã¿èŸŒã¿ãŸãã
GBQãããžã§ã¯ãããŸã ãæã¡ã§ãªãå Žåã¯ãäœæããŠãã ããããããè¡ãã«ã¯ãGoogle CloudPlatformã«ã¢ã¯ãã£ããªè«æ±å ã¢ã«ãŠã³ããå¿ èŠã§ããã«ãŒãããªã³ã¯ããå¿ èŠããããŸãããç¥ããªããã¡ã«ã«ãŒããããéãåŒãèœãšãããããšã¯ãããŸãããããã«ãç»é²æã«ã12ãæé300ãã«ãåãåããããŒã¿ã®ä¿åãšåŠçã«è²»ããããšãã§ããŸãã
GoogleBigQueryã®æ©èœ
ã¯ãšãªãäœæãããšãã«æãäžè¬çã«äœ¿çšãããé¢æ°ã®ã°ã«ãŒãã¯ãéèšé¢æ°ãæ¥ä»é¢æ°ãæååé¢æ°ãããã³ãŠã£ã³ããŠé¢æ°ã§ãã次ã«ãããããã«ã€ããŠè©³ãã説æããŸãã
éèšæ©èœ
éèšé¢æ°ã䜿çšãããšãããŒãã«å šäœã®èŠçŽå€ãååŸã§ããŸããããšãã°ãå¹³åå°åæãæéç·åå ¥ãèšç®ããããæå€§è³Œå ¥æ°ãéæãããŠãŒã¶ãŒã®ã»ã°ã¡ã³ãã匷調衚瀺ãããããŸãã
ãã®ã»ã¯ã·ã§ã³ã§æã人æ°ã®ããæ©èœã¯æ¬¡ã®ãšããã§ãã
ã¬ã¬ã·ãŒSQL | æšæºSQL | é¢æ°ã®æ©èœ |
---|---|---|
AVGïŒãã£ãŒã«ãïŒ | AVGïŒ[DISTINCT]ïŒãã£ãŒã«ãïŒïŒ | ãã£ãŒã«ãåã®å¹³åãè¿ããŸããæšæºSQLã§ã¯ãDISTINCTå¥ãè¿œå ãããšããã£ãŒã«ãåã®äžæã®ïŒéè€ããªãïŒå€ãæã€è¡ã«ã€ããŠã®ã¿å¹³åãèšç®ãããŸãã |
MAXïŒãã£ãŒã«ãïŒ | MAXïŒãã£ãŒã«ãïŒ | ãã£ãŒã«ãåããæ倧å€ãè¿ããŸã |
MINïŒãã£ãŒã«ãïŒ | MINïŒãã£ãŒã«ãïŒ | ãã£ãŒã«ãåããæå°å€ãè¿ããŸã |
SUMïŒãã£ãŒã«ãïŒ | SUMïŒãã£ãŒã«ãïŒ | ãã£ãŒã«ãåããå€ã®åèšãè¿ããŸã |
COUNTïŒãã£ãŒã«ãïŒ | COUNTïŒãã£ãŒã«ãïŒ | åãã£ãŒã«ãã®è¡æ°ãè¿ããŸã |
EXACT_COUNT_DISTINCTïŒãã£ãŒã«ãïŒ | COUNTïŒ[DISTINCT]ïŒãã£ãŒã«ãïŒïŒ | ãã£ãŒã«ãåã®äžæã®è¡ã®æ°ãè¿ããŸã |
ãã¹ãŠã®é¢æ°ã®ãªã¹ãã«ã€ããŠã¯ããã«ãïŒã¬ã¬ã·ãŒSQLããã³æšæºSQLãåç §ããŠãã ããã
ãªã¹ããããé¢æ°ããµã³ãã«ããŒã¿ãã¢ã§ã©ã®ããã«æ©èœããããèŠãŠã¿ãŸãããããã©ã³ã¶ã¯ã·ã§ã³ããã®å¹³ååå ¥ãæé«é¡ãšæäœé¡ã®è³Œå ¥ãç·åå ¥ãããã³ãã¹ãŠã®ãã©ã³ã¶ã¯ã·ã§ã³ã®æ°ãèšç®ããŠã¿ãŸããããè³Œå ¥ãéè€ããŠãããã©ããã確èªããããã«ãäžæã®ãã©ã³ã¶ã¯ã·ã§ã³ã®æ°ãèšç®ããŸãããããè¡ãã«ã¯ãGoogle BigQueryãããžã§ã¯ããããŒã¿ã»ãããããã³ããŒãã«ã®ååã瀺ãã¯ãšãªãäœæããŸãã
#legasy SQL
SELECT
AVG(revenue) as average_revenue,
MAX(revenue) as max_revenue,
MIN(revenue) as min_revenue,
SUM(revenue) as whole_revenue,
COUNT(transactionId) as transactions,
EXACT_COUNT_DISTINCT(transactionId) as unique_transactions
FROM
[owox-analytics:t_kravchenko.Demo_data]
#standard SQL
SELECT
AVG(revenue) as average_revenue,
MAX(revenue) as max_revenue,
MIN(revenue) as min_revenue,
SUM(revenue) as whole_revenue,
COUNT(transactionId) as transactions,
COUNT(DISTINCT(transactionId)) as unique_transactions
FROM
`owox-analytics.t_kravchenko.Demo_data`
ãã®çµæã次ã®ãããªçµæãåŸãããŸãã
æšæºã®Google Sheetsé¢æ°ïŒSUMãAVGãªã©ïŒãŸãã¯ããããããŒãã«ã䜿çšããŠããã¢ããŒã¿ã䜿çšããŠå ã®ããŒãã«ã®èšç®çµæã確èªã§ããŸãã
äžã®ã¹ã¯ãªãŒã³ã·ã§ãããããããããã«ããã©ã³ã¶ã¯ã·ã§ã³ã®æ°ãšäžæã®ãã©ã³ã¶ã¯ã·ã§ã³ã¯ç°ãªããŸãã
ããã¯ãtransactionIdãéè€ããŠããããŒãã«ã«2ã€ã®ãã©ã³ã¶ã¯ã·ã§ã³ããã
ããšã瀺ããŠããŸãããããã£ãŠãäžæã®ãã©ã³ã¶ã¯ã·ã§ã³ã«é¢å¿ãããå Žåã¯ãäžæã®è¡ãã«ãŠã³ãããé¢æ°ã䜿çšããŠãã ããããŸãã¯ãGROUP BYå¥ã䜿çšããŠããŒã¿ãã°ã«ãŒãåããéèšæ©èœã䜿çšããåã«éè€ãåãé€ãããšãã§ããŸãã
æ¥ä»ãæäœããããã®é¢æ°ïŒæ¥ä»é¢æ°ïŒ
ãããã®æ©èœã䜿çšãããšãæ¥ä»ãåŠçã§ããŸãã圢åŒã®å€æŽãå¿ èŠãªéšåïŒæ¥ãæããŸãã¯å¹ŽïŒã®éžæãæ¥ä»ãç¹å®ã®ééã§ã·ããããŸãã
ãããã¯ã次ã®å Žåã«åœ¹ç«ã¡ãŸãã
- ãšã³ãããŒãšã³ãã®åæãèšå®ããå Žå-ããŸããŸãªãœãŒã¹ããã®æ¥ä»ãšæå»ãåäžã®åœ¢åŒã«å€æããŸãã
- èªåçã«æŽæ°ãããã¬ããŒããŸãã¯ããªã¬ãŒãããã¡ãŒãªã³ã°ãäœæããå Žåãããšãã°ãéå»2æéã1é±éããŸãã¯1ãæã®ããŒã¿ãå¿ èŠãªå Žåã§ãã
- æ¥ãé±ãæã®ã³ã³ããã¹ãã§ããŒã¿ãååŸããå¿ èŠãããã³ããŒãã¬ããŒããäœæããå Žåã
æãäžè¬çã«äœ¿çšãããæ¥ä»é¢æ°ïŒ
ã¬ã¬ã·ãŒSQL | æšæºSQL | é¢æ°ã®æ©èœ |
---|---|---|
çŸåšã®æ¥ä» ïŒïŒ | çŸåšã®æ¥ä» ïŒïŒ | çŸåšã®æ¥ä»ãïŒ YYYY-ïŒ MM-ïŒ DDã®åœ¢åŒã§è¿ããŸã |
æ¥ä»ïŒã¿ã€ã ã¹ã¿ã³ãïŒ | æ¥ä»ïŒã¿ã€ã ã¹ã¿ã³ãïŒ | æ¥ä»ãïŒ YYYY-ïŒ MM-ïŒ DDïŒ HïŒïŒ MïŒïŒ Sã®åœ¢åŒããå€æããŸããïŒ YYYY-ïŒ MM-ïŒ DDã®åœ¢åŒã§ |
DATE_ADDïŒtimestampãintervalãinterval_unitsïŒ | DATE_ADD(timestamp, INTERVAL interval interval_units) | timestamp, interval.interval_units.
Legacy SQL YEAR, MONTH, DAY, HOUR, MINUTE SECOND, Standard SQL â YEAR, QUARTER, MONTH, WEEK, DAY |
DATE_ADD(timestamp, â interval, interval_units) | DATE_SUB(timestamp, INTERVAL interval interval_units) | timestamp, interval |
DATEDIFF(timestamp1, timestamp2) | DATE_DIFF(timestamp1, timestamp2, date_part) | timestamp1 timestamp2.
Legacy SQL , Standard SQL â date_part (, , , , ) |
DAY(timestamp) | EXTRACT(DAY FROM timestamp) | timestamp. 1 31 |
MONTH(timestamp) | EXTRACT(MONTH FROM timestamp) | timestamp. 1 12 |
YEAR(timestamp) | EXTRACT(YEAR FROM timestamp) | timestamp |
ãã¹ãŠã®æ©èœã®ãªã¹ãã«ã€ããŠã¯ãã¬ã¬ã·ãŒSQLããã³æšæºSQLã®ãã«ããåç §ããŠãã ããã
äžèšã®åé¢æ°ãã©ã®ããã«æ©èœããããããŒã¿ã®ãã¢ãèŠãŠã¿ãŸããããããšãã°ãçŸåšã®æ¥ä»ãååŸããå ã®ããŒãã«ã®æ¥ä»ãïŒ YYYY-ïŒ MM-ïŒ DDã®åœ¢åŒã«å€æãã1æ¥ãæžç®ããŠå ç®ããŸãã次ã«ãçŸåšã®æ¥ä»ãšå ã®ããŒãã«ã®æ¥ä»ãšã®å·®ãèšç®ããçŸåšã®æ¥ä»ã幎ãæãæ¥ã«å¥ã ã«åå²ããŸãããããè¡ãã«ã¯ã以äžã®ã¯ãšãªäŸãã³ããŒããŠããããžã§ã¯ããããŒã¿ã»ãããããã³ããŒã¿ããŒãã«ã®ååãç¬èªã®ãã®ã«çœ®ãæããããšãã§ããŸãã
#legasy SQL
SELECT
CURRENT_DATE() AS today,
DATE( date_UTC ) AS date_UTC_in_YYYYMMDD,
DATE_ADD( date_UTC,1, 'DAY') AS date_UTC_plus_one_day,
DATE_ADD( date_UTC,-1, 'DAY') AS date_UTC_minus_one_day,
DATEDIFF(CURRENT_DATE(), date_UTC ) AS difference_between_date,
DAY( CURRENT_DATE() ) AS the_day,
MONTH( CURRENT_DATE()) AS the_month,
YEAR( CURRENT_DATE()) AS the_year
FROM
[owox-analytics:t_kravchenko.Demo_data]
#standard SQL
SELECT
today,
date_UTC_in_YYYYMMDD,
DATE_ADD( date_UTC_in_YYYYMMDD, INTERVAL 1 DAY) AS date_UTC_plus_one_day,
DATE_SUB( date_UTC_in_YYYYMMDD,INTERVAL 1 DAY) AS date_UTC_minus_one_day,
DATE_DIFF(today, date_UTC_in_YYYYMMDD, DAY) AS difference_between_date,
EXTRACT(DAY FROM today ) AS the_day,
EXTRACT(MONTH FROM today ) AS the_month,
EXTRACT(YEAR FROM today ) AS the_year
FROM (
SELECT
CURRENT_DATE() AS today,
DATE( date_UTC ) AS date_UTC_in_YYYYMMDD
FROM
`owox-analytics.t_kravchenko.Demo_data`)
ãªã¯ãšã¹ããé©çšãããšã次ã®ã¬ããŒããå±ããŸãã
æååãæäœããããã®é¢æ°ïŒæååé¢æ°ïŒ
æååé¢æ°ã䜿çšãããšãæååã®åœ¢æããµãæååã®åŒ·èª¿è¡šç€ºãšçœ®æãæååã®é·ãã®èšç®ãããã³å ã®æååã®ãµãæååã®é åºã€ã³ããã¯ã¹ãèšç®ã§ããŸãã
ããšãã°ã圌ãã®å©ããåããŠã次ã®ããšãã§ããŸãã
- ããŒãžURLã«æž¡ãããUTMã¿ã°ã«ãã£ãŠã¬ããŒãã«ãã£ã«ã¿ãŒãäœæããŸãã
- ãœãŒã¹ãšãã£ã³ããŒã³ã®ååãç°ãªãã¬ãžã¹ã¿ã«æžã蟌ãŸããŠããå Žåã¯ãããŒã¿ãçµ±äžããã圢åŒã«ããŸãã
- ãã£ã³ããŒã³åãã¿ã€ããã¹ã§éä¿¡ãããå Žåãªã©ãã¬ããŒãå ã®èª€ã£ãããŒã¿ã眮ãæããŸãã
æååãæäœããããã®æãäžè¬çãªé¢æ°ïŒ
ã¬ã¬ã·ãŒSQL | æšæºSQL | é¢æ°ã®æ©èœ |
---|---|---|
CONCATïŒ 'str1'ã 'str2'ïŒãŸã㯠'str1' + 'str2' | CONCATïŒ 'str1'ã 'str2'ïŒ | è€æ°ã®æååãstr1ããšãstr2ãã1ã€ã«é£çµããŸã |
'str1' CONTAINS'str2 ' | REGEXP_CONTAINS('str1', 'str2') 'str1' LIKE â%str2%â | true 'str1' âstr2â.
Standard SQL âstr2â re2 |
LENGTH('str' ) | CHAR_LENGTH('str' )
CHARACTER_LENGTH('str' ) |
'str' ( ) |
SUBSTR('str', index [, max_len]) | SUBSTR('str', index [, max_len]) | max_len, index 'str' |
LOWER('str') | LOWER('str') | 'str' |
UPPER(str) | UPPER(str) | 'str' |
INSTR('str1', 'str2') | STRPOS('str1', 'str2') | æåå 'str1'ã§æåå 'str2'ãæåã«åºçŸããã€ã³ããã¯ã¹ãè¿ããŸãããã以å€ã®å Žåã¯-0 |
REPLACEïŒ 'str1'ã 'str2'ã 'str3'ïŒ | REPLACEïŒ 'str1'ã 'str2'ã 'str3'ïŒ | æåå 'str1'ã®ãµãæåå 'str2'ããµãæåå 'str3'ã«çœ®ãæããŸã |
詳现-ãã«ãïŒã¬ã¬ã·ãŒSQLããã³æšæºSQLã
説æããé¢æ°ã®äœ¿çšæ¹æ³ã®ãã¢ããŒã¿ã®äŸãèŠãŠã¿ãŸããããæ¥ãæã幎ã®å€ãå«ã3ã€ã®å¥ã ã®åããã
ãšããŸãããã®åœ¢åŒã§æ¥ä»ãæäœããã®ã¯ããŸã䟿å©ã§ã¯ãªãã®ã§ã1ã€ã®åã«ãŸãšããŸãããããããè¡ãã«ã¯ã以äžã®SQLã¯ãšãªã䜿çšãããããžã§ã¯ããããŒã¿ã»ãããããã³GoogleBigQueryããŒãã«ã®ååãå«ããããšãå¿ããªãã§ãã ããã
#legasy SQL
SELECT
CONCAT(the_day,'-',the_month,'-',the_year) AS mix_string1,
the_day+'-'+the_month+'-'+the_year AS mix_string2
FROM (
SELECT
'31' AS the_day,
'12' AS the_month,
'2018' AS the_year
FROM
[owox-analytics:t_kravchenko.Demo_data])
GROUP BY
mix_string1,
mix_string2
#standard SQL
SELECT
CONCAT(the_day,'-',the_month,'-',the_year) AS mix_string1
FROM (
SELECT
'31' AS the_day,
'12' AS the_month,
'2018' AS the_year
FROM
`owox-analytics.t_kravchenko.Demo_data`)
GROUP BY
mix_string1
ãªã¯ãšã¹ããå®è¡ããåŸã1ã€ã®åã«æ¥ä»ã衚瀺ãããŸãïŒ
å€ãã®å Žåããµã€ãã®ç¹å®ã®ããŒãžãèªã¿èŸŒããšãã«ãURLã«ã¯ãŠãŒã¶ãŒãéžæããå€æ°ã®å€ãå«ãŸããŠããŸããããã¯ãæ¯æããŸãã¯é éã®æ¹æ³ããã©ã³ã¶ã¯ã·ã§ã³çªå·ã顧客ãã¢ã€ãã ãåãåãããå®åºèã®ã€ã³ããã¯ã¹ãªã©ã§ããSQLã¯ãšãªã䜿çšããŠãããŒãžã¢ãã¬ã¹ãããããã®ãã©ã¡ãŒã¿ãæœåºã§ããŸãã
ãããè¡ãæ¹æ³ãšçç±ã®2ã€ã®äŸãèŠãŠã¿ãŸãããã
äŸ1ããŠãŒã¶ãŒãå®åºèããååãåãåãè³Œå ¥æ°ãç¥ããããšããŸãããããè¡ãã«ã¯ãURLã«ãµãã¹ããªã³ã°shop_idïŒå®åºèã®ã€ã³ããã¯ã¹ïŒãå«ãŸããŠããããŒãžããéä¿¡ããããã©ã³ã¶ã¯ã·ã§ã³ã®æ°ãã«ãŠã³ãããå¿ èŠããããŸããããã¯ã次ã®ã¯ãšãªã䜿çšããŠè¡ããŸãã
ïŒlegasy SQL
SELECT
COUNT(transactionId) AS transactions,
check
FROM (
SELECT
transactionId,
page CONTAINS 'shop_id' AS check
FROM
[owox-analytics:t_kravchenko.Demo_data])
GROUP BY
check
#standard SQL
SELECT
COUNT(transactionId) AS transactions,
check1,
check2
FROM (
SELECT
transactionId,
REGEXP_CONTAINS( page, 'shop_id') AS check1,
page LIKE '%shop_id%' AS check2
FROM
`owox-analytics.t_kravchenko.Demo_data`)
GROUP BY
check1,
check2
çµæã®ããŒãã«ãããshop_idïŒcheck = trueïŒãå«ãããŒãžãã5502ãã©ã³ã¶ã¯ã·ã§ã³ãéä¿¡ãããããšãããããŸãã
äŸ2ãdelivery_idãåé ä¿¡æ¹æ³ã«å²ãåœãŠããã®ãã©ã¡ãŒã¿ãŒã®å€ãããŒãžã®URLã«æžã蟌ãã ãšããŸãããŠãŒã¶ãŒãéžæããé ä¿¡æ¹æ³ã確èªããã«ã¯ãå¥ã®åã§delivery_idãéžæããŸãã
ããã«ã¯æ¬¡ã®ã¯ãšãªã䜿çšããŸãïŒ
#legasy SQL
SELECT
page_lower_case,
page_length,
index_of_delivery_id,
selected_delivery_id,
REPLACE(selected_delivery_id, 'selected_delivery_id=', '') as delivery_id
FROM (
SELECT
page_lower_case,
page_length,
index_of_delivery_id,
SUBSTR(page_lower_case, index_of_delivery_id) AS selected_delivery_id
FROM (
SELECT
page_lower_case,
LENGTH(page_lower_case) AS page_length,
INSTR(page_lower_case, 'selected_delivery_id') AS index_of_delivery_id
FROM (
SELECT
LOWER( page) AS page_lower_case,
UPPER( page) AS page_upper_case
FROM
[owox-analytics:t_kravchenko.Demo_data])))
ORDER BY
page_lower_case ASC
#standard SQL
SELECT
page_lower_case,
page_length,
index_of_delivery_id,
selected_delivery_id,
REPLACE(selected_delivery_id, 'selected_delivery_id=', '') AS delivery_id
FROM (
SELECT
page_lower_case,
page_length,
index_of_delivery_id,
SUBSTR(page_lower_case, index_of_delivery_id) AS selected_delivery_id
FROM (
SELECT
page_lower_case,
CHAR_LENGTH(page_lower_case) AS page_length,
STRPOS(page_lower_case, 'selected_delivery_id') AS index_of_delivery_id
FROM (
SELECT
LOWER( page) AS page_lower_case,
UPPER( page) AS page_upper_case
FROM
`owox-analytics.t_kravchenko.Demo_data`)))
ORDER BY
page_lower_case ASC
ãã®çµæãGoogleBigQueryã«æ¬¡ã®ããŒãã«ã衚瀺ãããŸãã
ããŒã¿ã®ãµãã»ãããŸãã¯ãŠã£ã³ããŠé¢æ°ãæäœããããã®é¢æ°ïŒãŠã£ã³ããŠé¢æ°ïŒ
ãããã®é¢æ°ã¯ãåè¿°ã®éèšé¢æ°ã«äŒŒãŠããŸãããããã®äž»ãªéãã¯ãã¯ãšãªã䜿çšããŠéžæãããããŒã¿ã®ã»ããå šäœã§ã¯ãªãããã®äžéšïŒãµãã»ãããŸãã¯ãŠã£ã³ããŠïŒã§èšç®ãå®è¡ãããããšã§ãã
ãŠã£ã³ããŠé¢æ°ã䜿çšãããšãJOINæŒç®åã䜿çšããŠè€æ°ã®ã¯ãšãªãçµåããªããŠããã°ã«ãŒãããšã«ããŒã¿ãéçŽã§ããŸããããšãã°ãåºåãã£ã³ããŒã³ããšã®å¹³ååçãããã€ã¹ããšã®ãã©ã³ã¶ã¯ã·ã§ã³æ°ãèšç®ããŸããã¬ããŒãã«å¥ã®ãã£ãŒã«ããè¿œå ããããšã§ãããšãã°ããã©ãã¯ãã©ã€ããŒã®åºåãã£ã³ããŒã³ããã®åçã®ã·ã§ã¢ãã¢ãã€ã«ã¢ããªã±ãŒã·ã§ã³ããè¡ããããã©ã³ã¶ã¯ã·ã§ã³ã®ã·ã§ã¢ãç°¡åã«èŠã€ããããšãã§ããŸãã
åé¢æ°ãšãšãã«ããŠã£ã³ããŠã®å¢çãå®çŸ©ããOVERåŒããªã¯ãšã¹ãã«æžã蟌ãå¿ èŠããããŸããOVERã«ã¯ãæäœã§ãã3ã€ã®ã³ã³ããŒãã³ããå«ãŸããŠããŸãã
- PARTITION BY-ãœãŒã¹ããŒã¿ããµãã»ããã«åå²ããããã®å±æ§ãå®çŸ©ããŸãïŒäŸïŒPARTITION BY clientIdãDayTimeïŒã
- ORDER BY-ãµãã»ããå ã®è¡ã®é åºãå®çŸ©ããŸãïŒäŸïŒORDER BYæéDESCïŒã
- WINDOWFRAME-ç¹å®ã®ç¹æ§ã«åŸã£ãŠãµãã»ããå ã®è¡ãåŠçã§ããŸããããšãã°ããŠã£ã³ããŠå ã®ãã¹ãŠã®è¡ã§ã¯ãªããçŸåšã®è¡ã®åã®æåã®5è¡ã®ã¿ã®åèšãèšç®ã§ããŸãã
ãã®è¡šã¯ãæãäžè¬çã«äœ¿çšããããŠã£ã³ããŠæ©èœããŸãšãããã®ã§ãã
ã¬ã¬ã·ãŒSQL | æšæºSQL | é¢æ°ã®æ©èœ |
---|---|---|
AVGïŒãã£ãŒã«ãïŒ
COUNTïŒãã£ãŒã«ãïŒ COUNTïŒDISTINCTãã£ãŒã«ãïŒ MAXïŒïŒ MINïŒïŒ SUMïŒïŒ |
AVGïŒ[DISTINCT]ïŒãã£ãŒã«ãïŒïŒ
COUNTïŒãã£ãŒã«ãïŒ COUNTïŒ[DISTINCT]ïŒãã£ãŒã«ãïŒïŒ MAXïŒãã£ãŒã«ãïŒ MINïŒãã£ãŒã«ãïŒ SUMïŒãã£ãŒã«ãïŒ |
, , , field .
DISTINCT , () |
'str1' CONTAINS 'str2' | REGEXP_CONTAINS('str1', 'str2') 'str1' LIKE â%str2%â | true 'str1' âstr2â.
Standard SQL âstr2â re2 |
DENSE_RANK() | DENSE_RANK() | |
FIRST_VALUE(field) | FIRST_VALUE (field[{RESPECT | IGNORE} NULLS]) | field .
field . RESPECT IGNORE NULLS , NULL |
LAST_VALUE(field) | LAST_VALUE (field [{RESPECT | IGNORE} NULLS]) | field .
field . RESPECT IGNORE NULLS , NULL |
LAG(field) | LAG (field[, offset [, default_expression]]) | field .
Offset , . . Default_expression â , , |
LEAD(field) | LEAD (field[, offset [, default_expression]]) | field .
Offset , . . Default_expression â , , |
ãã¹ãŠã®é¢æ°ã®ãªã¹ãã¯ãã¬ã¬ã·ãŒSQLããã³æšæºSQLã®ãã«ãã§ç¢ºèªã§ããŸãïŒéèšåæé¢æ°ãããã²ãŒã·ã§ã³é¢æ°ã
äŸ1.å°±æ¥æéãšéå°±æ¥æéäžã®ãã€ã€ãŒã®æŽ»åãåæããããšããŸãããããè¡ãã«ã¯ããã©ã³ã¶ã¯ã·ã§ã³ã2ã€ã®ã°ã«ãŒãã«åå²ããé¢å¿ã®ããã¡ããªãã¯ãèšç®ããå¿ èŠããããŸãã
- ã°ã«ãŒã1-å¶æ¥æéã®9:00ãã18:00ãŸã§ã®è³Œå ¥ã
- ã°ã«ãŒã2-å¶æ¥æéå€ã®00:00ãã9:00ããã³18:00ãã00:00ãŸã§ã®è³Œå ¥ã
å°±æ¥æéãšéå°±æ¥æéã«å ããŠããŠã£ã³ããŠã圢æããããã®å¥ã®èšå·ã¯clientIdã§ããã€ãŸãããŠãŒã¶ãŒããšã«2ã€ã®ãŠã£ã³ããŠããããŸãã
ãµãã»ããïŒãŠã£ã³ããŠïŒ | ã¯ã©ã€ã¢ã³ãID | æŒé |
---|---|---|
1ãŠã£ã³ã㊠| clientId 1 | å€åæé |
2ãŠã£ã³ã㊠| clientId 2 | éå°±æ¥æé |
3ãŠã£ã³ã㊠| clientId 3 | å€åæé |
4ãŠã£ã³ã㊠| clientId 4 | éå°±æ¥æé |
NãŠã£ã³ã㊠| clientId N | å€åæé |
N +1ãŠã£ã³ã㊠| clientId N + 1 | éå°±æ¥æé |
ãã¢ããŒã¿ã§ãå¹³åãæ倧ãæå°ãããã³ç·åå ¥ããã©ã³ã¶ã¯ã·ã§ã³æ°ãããã³åãŠãŒã¶ãŒã®å°±æ¥æéãšéå°±æ¥æéäžã®äžæã®ãã©ã³ã¶ã¯ã·ã§ã³ã®æ°ãèšç®ããŠã¿ãŸãããã以äžã®ã¯ãšãªã¯ããããè¡ãã®ã«åœ¹ç«ã¡ãŸãã
#legasy SQL
SELECT
date,
clientId,
DayTime,
avg_revenue,
max_revenue,
min_revenue,
sum_revenue,
transactions,
unique_transactions
FROM (
SELECT
date,
clientId,
DayTime,
AVG(revenue) OVER (PARTITION BY date, clientId, DayTime) AS avg_revenue,
MAX(revenue) OVER (PARTITION BY date, clientId, DayTime) AS max_revenue,
MIN(revenue) OVER (PARTITION BY date, clientId, DayTime) AS min_revenue,
SUM(revenue) OVER (PARTITION BY date, clientId, DayTime) AS sum_revenue,
COUNT(transactionId) OVER (PARTITION BY date, clientId, DayTime) AS transactions,
COUNT(DISTINCT(transactionId)) OVER (PARTITION BY date, clientId, DayTime) AS unique_transactions
FROM (
SELECT
date,
date_UTC,
clientId,
transactionId,
revenue,
page,
hour,
CASE
WHEN hour>=9 AND hour<=18 THEN ' '
ELSE ' '
END AS DayTime
FROM
[owox-analytics:t_kravchenko.Demo_data]))
GROUP BY
date,
clientId,
DayTime,
avg_revenue,
max_revenue,
min_revenue,
sum_revenue,
transactions,
unique_transactions
ORDER BY
transactions DESC
#standard SQL
SELECT
date,
clientId,
DayTime,
avg_revenue,
max_revenue,
min_revenue,
sum_revenue,
transactions,
unique_transactions
FROM (
SELECT
date,
clientId,
DayTime,
AVG(revenue) OVER (PARTITION BY date, clientId, DayTime) AS avg_revenue,
MAX(revenue) OVER (PARTITION BY date, clientId, DayTime) AS max_revenue,
MIN(revenue) OVER (PARTITION BY date, clientId, DayTime) AS min_revenue,
SUM(revenue) OVER (PARTITION BY date, clientId, DayTime) AS sum_revenue,
COUNT(transactionId) OVER (PARTITION BY date, clientId, DayTime) AS transactions,
COUNT(DISTINCT(transactionId)) OVER (PARTITION BY date, clientId, DayTime) AS unique_transactions
FROM (
SELECT
date,
date_UTC,
clientId,
transactionId,
revenue,
page,
hour,
CASE
WHEN hour>=9 AND hour<=18 THEN ' '
ELSE ' '
END AS DayTime
FROM
`owox-analytics.t_kravchenko.Demo_data`))
GROUP BY
date,
clientId,
DayTime,
avg_revenue,
max_revenue,
min_revenue,
sum_revenue,
transactions,
unique_transactions
ORDER BY
transactions DESC
clientId = '102041117.1428132012'ã®ãŠãŒã¶ãŒã®1人ã®äŸã䜿çšããŠãçµæãšããŠäœãèµ·ãã£ãããèŠãŠã¿ãŸãããããã®ãŠãŒã¶ãŒã®å ã®ããŒãã«ã«ã¯ã次ã®ããŒã¿ã
ãããŸããã¯ãšãªãé©çšããããšã«ããããã®ãŠãŒã¶ãŒããã®å¹³åãæå°ãæ倧ãåèšã®åå ¥ãšãã©ã³ã¶ã¯ã·ã§ã³æ°ãå«ãã¬ããŒããåãåããŸããã以äžã®ã¹ã¯ãªãŒã³ã·ã§ããã§ãããããã«ããŠãŒã¶ãŒã¯å¶æ¥æéäžã«äž¡æ¹ã®ãã©ã³ã¶ã¯ã·ã§ã³ãå®è¡ããŸããã
äŸ2ãããã§ã¯ãã¿ã¹ã¯ãå°ãè€éã«ããŸãããã
- å®è¡æéã«å¿ããŠããŠã£ã³ããŠå ã®ãã¹ãŠã®ãã©ã³ã¶ã¯ã·ã§ã³ã®ã·ãŒã±ã³ã¹çªå·ãæžãçããŸãããããŠãŒã¶ãŒãšçšŒåæé/é皌åæéã«ãã£ãŠãŠã£ã³ããŠãå®çŸ©ããããšãæãåºããŠãã ããã
- ãŠã£ã³ããŠå ã«æ¬¡/åã®ãã©ã³ã¶ã¯ã·ã§ã³ã®åçïŒçŸåšã®ãã©ã³ã¶ã¯ã·ã§ã³ãšæ¯èŒããŠïŒã衚瀺ããŠã¿ãŸãããã
- æåãšæåŸã®ãã©ã³ã¶ã¯ã·ã§ã³ã®åçããŠã£ã³ããŠã«è¡šç€ºããŠã¿ãŸãããã
ãã®ããã«ã次ã®ã¯ãšãªã䜿çšããŸãã
ïŒlegasy SQL
SELECT
date,
clientId,
DayTime,
hour,
rank,
revenue,
lead_revenue,
lag_revenue,
first_revenue_by_hour,
last_revenue_by_hour
FROM (
SELECT
date,
clientId,
DayTime,
hour,
DENSE_RANK() OVER (PARTITION BY date, clientId, DayTime ORDER BY hour) AS rank,
revenue,
LEAD( revenue, 1) OVER (PARTITION BY date, clientId, DayTime ORDER BY hour) AS lead_revenue,
LAG( revenue, 1) OVER (PARTITION BY date, clientId, DayTime ORDER BY hour) AS lag_revenue,
FIRST_VALUE(revenue) OVER (PARTITION BY date, clientId, DayTime ORDER BY hour) AS first_revenue_by_hour,
LAST_VALUE(revenue) OVER (PARTITION BY date, clientId, DayTime ORDER BY hour) AS last_revenue_by_hour
FROM (
SELECT
date,
date_UTC,
clientId,
transactionId,
revenue,
page,
hour,
CASE
WHEN hour>=9 AND hour<=18 THEN ' '
ELSE ' '
END AS DayTime
FROM
[owox-analytics:t_kravchenko.Demo_data]))
GROUP BY
date,
clientId,
DayTime,
hour,
rank,
revenue,
lead_revenue,
lag_revenue,
first_revenue_by_hour,
last_revenue_by_hour
ORDER BY
date,
clientId,
DayTime,
hour,
rank,
revenue,
lead_revenue,
lag_revenue,
first_revenue_by_hour,
last_revenue_by_hour
#standard SQL
SELECT
date,
clientId,
DayTime,
hour,
rank,
revenue,
lead_revenue,
lag_revenue,
first_revenue_by_hour,
last_revenue_by_hour
FROM (
SELECT
date,
clientId,
DayTime,
hour,
DENSE_RANK() OVER (PARTITION BY date, clientId, DayTime ORDER BY hour) AS rank,
revenue,
LEAD( revenue, 1) OVER (PARTITION BY date, clientId, DayTime ORDER BY hour) AS lead_revenue,
LAG( revenue, 1) OVER (PARTITION BY date, clientId, DayTime ORDER BY hour) AS lag_revenue,
FIRST_VALUE(revenue) OVER (PARTITION BY date, clientId, DayTime ORDER BY hour) AS first_revenue_by_hour,
LAST_VALUE(revenue) OVER (PARTITION BY date, clientId, DayTime ORDER BY hour) AS last_revenue_by_hour
FROM (
SELECT
date,
date_UTC,
clientId,
transactionId,
revenue,
page,
hour,
CASE
WHEN hour>=9 AND hour<=18 THEN ' '
ELSE ' '
END AS DayTime
FROM
`owox-analytics.t_kravchenko.Demo_data`))
GROUP BY
date,
clientId,
DayTime,
hour,
rank,
revenue,
lead_revenue,
lag_revenue,
first_revenue_by_hour,
last_revenue_by_hour
ORDER BY
date,
clientId,
DayTime,
hour,
rank,
revenue,
lead_revenue,
lag_revenue,
first_revenue_by_hour,
last_revenue_by_hour
clientId = '102041117.1428132012'ã«æ¢ã«ç²ŸéããŠãããŠãŒã¶ãŒã®äŸã䜿çšããŠãèšç®çµæã確èªããŠã¿ãŸãããã
äžã®ã¹ã¯ãªãŒã³ã·ã§ããããã次ã®ããšãããããŸãã
- æåã®ãã©ã³ã¶ã¯ã·ã§ã³ã¯15:00ã§ã2çªç®ã®ãã©ã³ã¶ã¯ã·ã§ã³ã¯16:00ã§ããã
- 15:00ã®çŸåšã®ãã©ã³ã¶ã¯ã·ã§ã³ã®åŸã16ïŒ00ã«ãã©ã³ã¶ã¯ã·ã§ã³ãããããã®åçã¯25066ïŒålead_revenueïŒã§ãã
- ååŸ4æã®çŸåšã®ãã©ã³ã¶ã¯ã·ã§ã³ã®åã«ãååŸ3æã«ãã©ã³ã¶ã¯ã·ã§ã³ããããåçã¯3699ã§ããïŒålag_revenueïŒã
- ãŠã£ã³ããŠå ã®æåã®ãã©ã³ã¶ã¯ã·ã§ã³ã¯15:00ã®ãã©ã³ã¶ã¯ã·ã§ã³ã§ããã®åçã¯3699ïŒåfirst_revenue_by_hourïŒã§ãã
- ãªã¯ãšã¹ãã¯ããŒã¿ã1è¡ãã€åŠçãããããæ€èšäžã®ãã©ã³ã¶ã¯ã·ã§ã³ã®å ŽåããŠã£ã³ããŠã®æåŸã«ãªããlast_revenue_by_houråãšrevenueåã®å€ã¯åãã«ãªããŸãã
çµè«
ãã®èšäºã§ã¯ãAggregateé¢æ°ãDateé¢æ°ãStringé¢æ°ãWindowé¢æ°ã®ã»ã¯ã·ã§ã³ããæã人æ°ã®ããé¢æ°ã«ã€ããŠèª¬æããŸããããã ããGoogle BigQueryã«ã¯ã次ã®ãããªå€ãã®äŸ¿å©ãªæ©èœããããŸãã
- ãã£ã¹ãé¢æ°-ããŒã¿ãç¹å®ã®åœ¢åŒã«ãã£ã¹ãã§ããŸãã
- ããŒãã«ã¯ã€ã«ãã«ãŒãé¢æ°-ããŒã¿ã»ããããè€æ°ã®ããŒãã«ã«ã¢ã¯ã»ã¹ã§ããŸãã
- éåžžã®åŒé¢æ°-æ£ç¢ºãªå€ã§ã¯ãªããæ€çŽ¢ã¯ãšãªã®ã¢ãã«ãèšè¿°ã§ããŸãã
ã³ã¡ã³ãã«ã€ããŠåã詳现ãæžãããšãçã«ããªã£ãŠããå Žåã¯ãã³ã¡ã³ãã«æžã蟌ãã§ãã ããã