ãªãŒãã³ãœãŒã¹ã«ã¯ãèšå€§ãªéã®æçšãªæ å ±ãå«ãŸããŠããŸããæ£ããåéãä¿åãåæããããšã§ãæè¯ã®ããžãã¹ãã£ã³ã¹ãèŠã€ããããšãã§ããŸãã
è¥ãèµ·æ¥å®¶ã®ã°ã«ãŒãã¯ãã¢ã¹ã¯ã¯ã«ç¬èªã®åçã¹ã¿ãžãªãéããªãã·ã§ã³ãæ€èšããŸããã圌ãã¯èŠã€ããå¿ èŠããããŸããïŒ
- ãã©ãã¹ã¿ãžãªåžå Žã®äžè¬çãªç¶æ ã¯ã©ãã§ããïŒæé·ãå®å®ããŸãã¯äžéïŒ
- åžå Žã®å£ç¯æ§ã¯äœã§ããïŒ
- 圌ãã¯ããã皌ãããšãã§ããŸããïŒ
- ã©ãã§ããŒã«ãéãã®ãè¯ãã§ããïŒ
- ãããžã§ã¯ãã«ãããæè³ããŸããïŒ
- åžå Žã§ã®ç«¶äºã¯ã©ããããæ¿ããã§ããïŒ
ãã®èšäºã§æäŸãããŠããåçŽãªããŒãµãŒãããŒã¿ããŒã¹ãããã³åæã¯ã ãããã®è³ªåãä»ã®å€ãã®è³ªåã«çããã®ã«åœ¹ç«ã¡ãŸããã

ã§ã¯æåã®èšäºãç§ãã¡ã¯ãæ§æè§£æã®æ€èšugoloc.ruã®åçã¹ã¿ãžãªã¢ã°ãªã²ãŒã¿ãµã€ããããŠéšå±ããäºçŽã®äžããã©ãã¹ã¿ãžãªãããŒã«ãããŒã¿ã«é¢ããäžè¬çãªæ å ±ãã¢ããããŒãããŸããã第äºã®ç©åãæã ã¯ããŒã¿ããŒã¹ã«åä¿¡ããããŒã¿ãæžã蟌ã¿ãããŒã¿ããŒã¹ããããŒã¿ãèªã¿åºã調ã¹ããŸããããŒã¿ããŒã¹å ã®æ å ±ã«å¿ããŠè§£ææäœãèšå®ããŸãã ãã®èšäºã§ã¯ãåéããããŒã¿ã®ç°¡åãªåæãè¡ããŸããgithubã® ç§ã®ããŒãžã§ãããŒã¿ããŒã¹ã®ããŒãã«ã®äŸãäžéããŒãã«ãã°ã©ãã远å ã®ã³ã¡ã³ããå«ã宿ãããããžã§ã¯ããèŠã€ããããšãã§ããŸãã
䜿çšããåæã®æ¹å
- ãã©ãã¹ã¿ãžãªãéããã€ããã¯ã¹ãå®çŸ©ããŸãã
- 鿥æã«å¿ããŠãã©ãã¹ã¿ãžãªã®åçæ§ãèšç®ããŸãã
- ããžãã¹ã®å£ç¯æ§ã決å®ããŸãã
- ããŒã«ãããã®å¹³ååå ¥ãšããã©ãã¹ã¿ãžãªã®æé©ãªããŒã«æ°ãèšç®ããŸãã
- åçæ§ã®åçã¹ã¿ãžãªã®å Žæãžã®äŸåæ§ã調æ»ããŸãã
- ç«¶åããã¹ã¿ãžãªã®ããŒã«ã®æ°ã調ã¹ãŸãã
- 倩äºã®é«ããããŒã«ã®é¢ç©ãäºçŽäŸ¡æ Œãªã©ãåå ¥ã«å¯Ÿããä»ã®ãã©ã¡ãŒã¿ãŒã®åœ±é¿ãèšç®ããŸãã
- ä»ã®å¯èœãªåæã®æ¹åãæ€èšããŠãã ããã
ããŒã¿ããŒã¹ããã®ããŒã¿ã®ã¢ã³ããŒã
ã¢ã³ããŒãããã«ã¯ãæ¬¡ã®æé ãå®è¡ããŸãã
åºå°ãšã®æ¥ç¶ã確ç«ãã
directory = './/'
conn = sqlite3.connect(directory + 'photostudios_moscow1.sqlite')
cur = conn.cursor()
ã¹ã¿ãžãªã«ããããŒã¿ã®ã¢ããããŒã
studios = db_to_studios(conn)
studios
ããŒã«ãéã£ãŠ
halls = db_to_halls(conn)
halls
äºçŽæ
booking = db_to_booking(conn)
booking
鿥æ¥ãã¹ã¿ãžãªã«æ®ããããŒã«ã®ãªã¹ããããã¬ãã·ã³ã°ã«ãŒã ãé€å€ããŸã
studios = studios[[x.year > 0 for x in studios['established_date']]]
halls = halls[halls['is_hall'] == 1]
幎ããšã®ãã©ãã¹ã¿ãžãªéèšã®ãã€ããã¯ã¹
ããŸããŸãªå¹Žã«ãªãŒãã³ããåçã¹ã¿ãžãªã®é »åºŠãã¹ãã°ã©ã ãäœæããŠã¿ãŸãããããããè¡ãã«ã¯ãæéïŒå¹ŽïŒã®æ°ãèšç®ãããã¹ãã°ã©ã ãäœæããŸãã
ãã¹ãã°ã©ã ããããããã
num_bins = np.max(studios['established_date']).year - np.min(studios['established_date']).year + 1
plt.hist([x.year for x in studios['established_date']], num_bins)
plt.show()

ãã¹ãã°ã©ã ã¯ãæ¯å¹Žæ°ããåçã¹ã¿ãžãªãæããã«æé·ããŠããããšã瀺ããŠããŸãããã®ãã¿ãŒã³ã¯ã幎ã«2åã®åžå Žã®å®éã®æé·ã§ã¯ãªããã¢ã°ãªã²ãŒã¿ãŒèªäœã®æé·ã瀺ããŠããŸãã
ãã®äºå®ã¯ãã¹ã¿ãžãªã2ã€ã®ã«ããŽãªã«åé¡ããå¿ èŠãããããšã瀺ããŠããŸãããã©ãã¹ã¿ãžãªãéããšãã«ã¢ã°ãªã²ãŒã¿ãŒã«ç»é²ãã人ïŒãæ°ãããïŒãšãä¹ ãã¶ãã«ç»é²ãã人ïŒãå€ããïŒã§ãããããæ¬¡ã®ã¿ã¹ã¯ã«ãªããŸãã
æ°ããåçã¹ã¿ãžãªã®ç¹å®
ã©ã®ãã©ãã¹ã¿ãžãªãæ°ãããšèŠãªãããšãã§ããŸããïŒå®£äŒãããã¯ã©ã€ã¢ã³ããç²åŸããŠãããšããã§ããéåºã®ç¬éããã®äºçŽã«ã¬ã³ããŒã®èŠèŠçãªåæã¯ãã¹ã¿ãžãªãæ°ã¶æã§çå®ãªé¡§å®¢ã®æµããç²åŸããŠããããšã瀺ããŠããŸãã
æ°ãããã©ãã¹ã¿ãžãªãšå€ããã©ãã¹ã¿ãžãªïŒããã«ã¢ã°ãªã²ãŒã¿ãŒã«åå ããªãã£ãïŒãåºå¥ããã«ã¯ãããªãŒãã³ãã®ç¬éãã1幎åŸã®åãæéãŸã§ã®ååã®åçãæ¯èŒããå¿ èŠãããããšãããããŸãããæ°ããã¹ã¿ãžãªã®åå ¥ã¯1幎ã§å€§å¹ ã«å¢å ããã¯ãã§ãããå€ãã¹ã¿ãžãªã®åå ¥ã¯ã»ãŒåãã¬ãã«ã«ãšã©ãŸãã¯ãã§ãã
ãŸãããã¹ãŠã®ããŒãã«ãçµã¿åãããŠãäºçŽããæéã ããæ®ããŸããã
# merge all tables
data = (booking
.merge(halls, left_on = 'hall_id', right_on = 'hall_id', how = 'inner')
.merge(studios, left_on ='studio_id', right_on = 'studio_id', how = 'inner')
)
data = data[data['is_working_hour'] == 1]
data['date'] = pd.to_datetime(data['date'])
data
次ã«ãåçã¹ã¿ãžãªã®ä»äºã®ååæã®åå
¥ãèšç®ããŸã
first_month = (data[data['date'] <= [x + datetime.timedelta(days = 15) for x in data['established_date']]]
.loc[:, ['studio_id', 'price', 'duration']]
)
first_month['income'] = first_month['price'] * first_month['duration']
first_month = first_month.groupby('studio_id').agg(np.sum)
first_month
1幎åŸã®åæã§
month_after_year = (data[(data['date'] >= [x + datetime.timedelta(days = 365) for x in data['established_date']])
& (data['date'] <= [x + datetime.timedelta(days = 365 + 15) for x in data['established_date']])
]
.loc[:, ['studio_id', 'price', 'duration']]
)
month_after_year['income'] = month_after_year['price'] * month_after_year['duration']
month_after_year = month_after_year.groupby('studio_id').agg(np.sum)
month_after_year
1å¹Žã®ææšããªãŒããã³ã°ã§åæ§ã®ææšã§åå²ããŸã
month_diff = (month_after_year.merge(first_month, left_on = 'studio_id', right_on = 'studio_id', how = 'inner')
.merge(halls.groupby('studio_id').count()
, left_on = 'studio_id', right_on = 'studio_id', how = 'inner')
)[['income_x', 'income_y', 'is_hall']]
month_diff['income_diff'] = (month_diff['income_x'] / month_diff['income_y']) ** (1 / month_diff['is_hall'])
month_diff.sort_values('income_diff')
1幎åŸã«åå ¥ã®äŒžã³çãåãåããŸãããããŸããŸãªã¹ã¿ãžãªã®ã€ã³ãžã±ãŒã¿ãŒã¯ãæ¥æ¿ãªãžã£ã³ããªãã«0.75ãã2.1ã«åæ£ãããŠããŸããããã¯ãã¹ã¿ãžãªããªãŒãã³çŽåŸã1é±éã1ãæã1幎åŸãªã©ã«ã¢ã°ãªã²ãŒã¿ãŒã«æ¥ç¶ã§ããããšã瀺ããŠããŸãã
æ°ãããã©ãã¹ã¿ãžãªã決å®ããããã«ãåçæé·çã®æ¡ä»¶å€ãäžå€®å€1.18ãšããŸãããããããã®å¹Žã®åçã¹ã¿ãžãªã®åå ¥ã18ïŒ ä»¥äžå¢å ããå Žåããã®åçã¹ã¿ãžãªã¯æ°ãããã®ãšèŠãªãããŸãããã®ãããªã¹ã¿ãžãªã¯22ãããŸããã
ãã©ãã¹ã¿ãžãªãéãã®ã¯äœæãããã§ããïŒ
éåºçŽåŸã«ã¢ã°ãªã²ãŒã¿ãŒã«ç»é²ãããã©ãã¹ã¿ãžãªãèšç®ããŸããããããã£ãŠãç§ãã¡ã®ããŒã¿ã«ãããšãå®éã®éæ¥æ¥ãšéæ¥æ¥ã¯ããããã®ã¹ã¿ãžãªã§åããšèŠãªãããŸãã
èšç®ã«ã¯ãæ°ãããã©ãã¹ã¿ãžãªãå©çšããäºçŽãããã¹ãŠã®æéã®äºçŽäŸ¡æ Œã®åèšãšããŠåå ¥ãèšç®ããããŒã«ããšã«ã°ã«ãŒãåããŠïŒéåºæãèæ ®ããŠïŒãéåºæããšã®å¹³å幎åãèšç®ããŸãã
鿥æã«å¿ãã幎éå¹³ååå
¥ã®èšç®
new = studios['is_new'].reset_index().merge(data, left_on = 'studio_id', right_on = 'studio_id', how = 'inner')
new = new[new['is_new'] == 1]
new = new[new['date'] <= [x + datetime.timedelta(days = 365) for x in new['established_date']]]
new['est_year'] = [x.year for x in new['established_date']]
new['est_month'] = [x.month for x in new['established_date']]
new['income'] = new['price'] * new['is_booked']
mean_income = (new
.groupby(['hall_id', 'est_year', 'est_month']).agg('sum')['income'].reset_index()
.groupby('est_month').agg('mean')['income']
plt.bar(range(1, 12), mean_income)
plt.show()
)

ãã¹ãã°ã©ã ã¯æç¢ºãªé¢ä¿ã瀺ããŠããŸãã
- ãã©ãã¹ã¿ãžãªãéãã®ã«æé©ãªæã¯å¹Žã®åãïŒ1æãã4æïŒã§ã
- ãŸããéåºããã®ã«é©ããæã¯9æãã10æã§ãã
- ææªã®æã¯5æãã6æã§ãã
ãã®ããŒã¿ãåžå Žã®å£ç¯æ§ãšæ¯èŒããããšã¯è峿·±ãã§ãããã
ããžãã¹ã®å£ç¯æ§ã®æ±ºå®
å£ç¯æ§-æéã«å¿ããæ³šææ°ã®å€åã幎éã®å£ç¯æ§ãåæããŠã¿ãŸãããã
èšç®ã®ããã«ã2018幎ãŸã§éããŠããã¹ã¿ãžãªãåãäžããŠã2018ã2020幎ã®äºçŽãèŠãŠã¿ãŸããããã¹ã¿ãžãªåå ¥ã¯ãäºçŽãããæéã®äŸ¡æ Œã®åèšãšããŠå®çŸ©ãããŸããæ¬¡ã«ãéžæããæéã®åæã®ãã¹ãŠã®ã¹ã¿ãžãªã®ç·åå ¥ãèšç®ããŸãã
å£ç¯æ§ã®èšç®
season = data[(data['open_date'] < '2018-01-01') & (data['date'] > '2018-01-01')]
season['income'] = season['price'] * season['duration']
season['year'] = [x.year for x in season['date']]
season['month'] = [x.month for x in season['date']]
incomes = season.groupby(['year', 'month']).agg(np.sum)['income']
incomes = incomes[incomes.index]
ãããã
incomes = incomes[: -3]
plt.figure(figsize = (20, 10))
plt.plot([str(x[0]) + '-' + str(x[1]) for x in incomes.index], incomes)
plt.xticks(rotation=60)
plt.grid()
plt.show()

ã°ã©ãã¯ãæç¢ºã«é¡èãªå£ç¯æ§ã瀺ããŠããŸãã10æãã4æã«æ³šææ°ãæãå€ãã5æãã9æã«æ¥æ¿ã«æžå°ããŠããŸããå£ç¯æ§ã¯ããžãã¹ã®è«çã«é©åããŸããå€ã«ã¯ã人ã ã¯éããå ¬åã§åçãæ®ããŸããå¬ã«ã¯ããã¯äžå¯èœã§ãããå±å ã§ãã©ãã»ãã·ã§ã³ãæé ããå¿ èŠããããŸããå£ç¯æ§ã¯ããã«é¢é£ããŠããŸããå€ã«ã¯ã¯ã©ã€ã¢ã³ããå°ãªããå¬ã«ã¯å€ãã®ã¯ã©ã€ã¢ã³ããããŸããæ³šæã®ããŒã¯ã¯12æã§ããããã¯ãããããåçã«æ®ãããæ°å¹ŽãšäŒæ¥ã®æ°æã¡ã«ãããã®ã§ãã
éåºã«æé©ãªæã¯å£ç¯ã«ãã£ãŠç°ãªããŸããã·ãŒãºã³äžãŸãã¯éå§ã®1ãæåã«ã¹ã¿ãžãªãéãããšããå§ãããŸãã5æãã8æãŸã§ã¯ãã¹ã¿ãžãªãéæŸããªãã§ãã ããããªãã·ãŒãºã³ã«å ¥ããŸãã
ããŒã«åçæ§ã®èšç®
鿥ã®éèŠãªææšã¯ã1ã€ã®éšå±ããã®åå ¥ã§ãã
èšç®ããã«ã¯ãæ¯æã®éšå±ããšã«åå ¥ãã°ã«ãŒãåããæ€ç«ã®ããã«2020幎ãç°åžžãªå¹ŽãšããŠé€å€ãã.describeïŒïŒé¢æ°ã䜿çšããŠåå ¥ã®éžæã調ã¹ãŸãã
1ããŒã«ã®åçæ§ã®èšç®
hall_income = season.groupby(['studio_id','hall_id', 'year', 'month']).agg(sum)['income'].reset_index()
hall_income = hall_income[hall_income['year'] < 2020]
hall_income['income'].describe()
count 648.000000
mean 184299.691358
std 114304.925311
min 0.000000
25% 95575.000000
50% 170350.000000
75% 256575.000000
max 617400.000000
Name: income, dtype: float64
ã«ãŒãã«ã§ããŒã«ããšã«åãåã£ãåå ¥ã
çŸåçã®ããŒã¿ãããããŒã«ã®ååã®åå ¥ã95,000ã«ãŒãã«ã®ç¯å²å ã«ããããšãããããŸããæå€§256,000ã«ãŒãã«ãäžå€®å€ã¯170,000ã«ãŒãã«ã§ãã
å¹³åãšæšæºåå·®ã®ããŒã¿ããã1ã·ã°ãã«ãŒã«ã«ããã°ãããŒã«ã®3åã®2ã70,000ã«ãŒãã«ããæ¥ãŠããããšãããããŸããæå€§300,000ã«ãŒãã« 184,000ã«ãŒãã«ã®çãäžããã
å¹³åçãªããŒã«ã¯170,000ã180,000ã«ãŒãã«ã®åå ¥ãæåŸ ã§ããããšãããããŸããã±80,000ã«ãŒãã«
ãã®ãããªå€§ããªåºããã¯ãä»ã®èŠå ã®åœ±é¿ã«ãã£ãŠèª¬æãããŸããããã¯ãå°æ¥çã«æ±ºå®ããããšããŸãã
ãã©ãã¹ã¿ãžãªã«ã¯ããã€ã®ããŒã«ãéãã¹ãã§ããïŒ
èšç®ããã«ã¯ãåããŒã«ã®æéå¹³ååçæ§ãèšç®ããåçã¹ã¿ãžãªã®ããŒã«ã®å¹³ååçæ§ãèšç®ããåçã¹ã¿ãžãªã®ããŒã«æ°ãèšç®ããããŒã¿ãããŒã«æ°ã§ã°ã«ãŒãåããŠãããŒã«ãããã®å¹³åå©åããèšç®ããŸãã
ãã©ãã¹ã¿ãžãªã®ããŒã«æ°ã«å¿ããããŒã«ã®åçæ§ã®èšç®
(hall_income
.groupby(['studio_id', 'hall_id']).agg('mean').reset_index()
.groupby('studio_id').agg(['count', 'mean'])['income']
.groupby('count').agg('mean')
)
mean
count
1 134847.916667
2 146531.944444
3 300231.944444
4 222202.604167
ãã©ãã¹ã¿ãžãªã®ããŒã«æ°ã«ããããŸãããæå¹³å1ããŒã«ã®åçãäžããŸãããèŠåæ§ã«æ³šç®ããŸããããããŒã«ãå€ãã»ã©ãåçæ§ãé«ããªããŸãã3éšå±ã®ã¹ã¿ãžãªã§æå€§ã®åçæ§ã
ãã®çŸè±¡ã¯ããã©ãã¹ã¿ãžãªã®1ã€ã®éšå±ã䜿çšãããšãã¯ã©ã€ã¢ã³ããå¥ã®éšå±ãèŠãŠããã«äºçŽã§ããããã§ãããããã£ãŠãåçã¹ã¿ãžãªã®1ã€ã®éšå±ãä»ã®éšå±ãã宣äŒãããŸãã
ããŒã«ã®å Žæãžã®åå ¥ã®äŸåæ§
ããŒã«ã®å Žæã¯åçæ§ã«å€§ããªåœ±é¿ãäžããå¯èœæ§ããããŸããäžå€®ã§ã¯ãããŒã«ã顧客ã«ãšã£ãŠããã¢ã¯ã»ã¹ãããããªããåçãé«ããªããŸãã仮説ã確èªããŠã¿ãŸãããã
èšç®ã®ããã«ãããŒã«ã®å¹³åæåãèšç®ãããã¡ãããã«åŸã£ãŠã°ã«ãŒãåããæé ã§äžŠã¹æ¿ããŠã¿ãŸãããã
äžå¿ããã®è·é¢ã«å¿ããããŒã«ã®åçæ§
data['income'] = data['price'] * data['duration']
data['year'] = [x.year for x in data['date']]
data['month'] = [x.month for x in data['date']]
(data
.groupby(['hall_id', 'metro', 'year', 'month']).agg('sum')['income'].reset_index()
.groupby(['hall_id', 'metro']).agg('mean')['income'].reset_index()
.groupby('metro').agg('mean')['income'].sort_values()
)[-59:]
次ã®ããŒã¿ãååŸããŸããã
metro
5016.666667
10485.264378
11925.000000
/ 18116.666667
, 19000.000000
21963.333333
30667.051729
31031.250000
37787.500000
/ 39357.142857
44354.375000
45888.888889
46566.666667
48541.666667
. , 49086.503623
55340.659341
, , 55944.444444
. / . 59771.111111
66780.000000
66847.058824
67692.545788
. 70090.341880
. 70337.676411
, 72974.494949
79987.083333
88800.000000
95550.000000
98326.086957
99216.279070
99925.000000
102835.622784
. , . , . \ 104956.521739
111050.684459
111090.000000
111909.090909
116426.892180
117450.000000
118382.236364
122626.500000
, 123258.518519
- 124557.894737
, 126300.000000
129222.916667
135281.642512
, 138945.454545
152246.883469
, 168484.500000
. 169079.381010
. 172618.798439
173777.659900
178254.545455
181041.818182
187283.444198
189140.857975
250975.000000
, , 252685.714286
, 264164.473684
- 277162.791991
556621.746032
Name: income, dtype: float64
ã¡ããããŒã¿ã¯ãã®ãŸãŸã«ããŠãããŸãã®ã§ã泚æãã ãããããæ£ç¢ºãªç»åãåŸãã«ã¯ããBaumanskayaãElektrozavodskayaãããElektrozavodskayaå°äžéé§ ãããElectrozavodskayaããªã©ãå ±éã®åœ¢åŒã«ããå¿ èŠããããŸãã
ããŒã¿ãããMaryina RoshchaãNovye CheryomushkiãKrylatskoyeãªã©ã®é«äŸ¡ãªäžåç£ã®ããå°åã§ã¯ãããŒã«ãããã®åçæ§ãé«ãããšãããããŸãã
ç«¶åããã¹ã¿ãžãªã«ã¯ããã€ã®ããŒã«ããããŸãã
åžå Žã®ã¹ã¿ãžãªã«ã¯ããã€ã®ããŒã«ããããŸããïŒãã®è³ªåã«çããããã«ãããŒã«ã®ããããŒãã«ãã¹ã¿ãžãªããŒãã«ã«åãä»ããã¹ã¿ãžãªããšã«ã°ã«ãŒãåããããŒã«ã®æ°ãæ°ããé »åºŠãã¹ãã°ã©ã ãäœæããŸãããã
ã¹ã¿ãžãªã®ããŒã«æ°ã®èšç®
hall_num = studios.merge(halls, left_on='studio_id', right_on='studio_id').groupby('studio_id').agg('count')['is_hall']
plt.hist(hall_num, range(np.min(hall_num), np.max(hall_num)+1))
plt.show()
hall_num.describe()

count 105.000000
mean 2.685714
std 2.292606
min 1.000000
25% 1.000000
50% 2.000000
75% 3.000000
max 13.000000
åŸãããããŒã¿ãããã»ãšãã©ã®åçã¹ã¿ãžãªïŒ75ïŒ ä»¥äžïŒã«ã¯3ã€ä»¥äžã®ããŒã«ãããããšãããããŸããåžå Žå šäœã§ã¯ãååãšããŠãã¹ã¿ãžãªã«ã¯5ã€ä»¥äžã®ããŒã«ãããããŸããã
ã¹ã¿ãžãªåå ¥ã«å¯Ÿããä»ã®ãã©ã¡ãŒã¿ãŒã®åœ±é¿
倩äºã®é«ã
åçã«ã¯ããããã®å ãå¿ èŠã§ã倩äºã®é«ãéšå±ã«ãã倧ããªçªããã¯èªç¶å ããã£ã·ããšå·®ã蟌ã¿ãŸããããã«ã倩äºãé«ãã»ã©ãå ãåºã«å±ããæ¡æ£ããŸãããããã£ãŠã倩äºã®é«ãã¯åçã¹ã¿ãžãªã®åçæ§ã«åœ±é¿ãäžããå¯èœæ§ããããŸãããã®ä»®èª¬ã確èªããŠã¿ãŸãããã
倩äºã®é«ãã®ããŒã¿ãæ®ããŠåããŒã«ã®å¹³åæåãèšç®ãã倩äºã®é«ãã«å¿ããŠå¹³åæåãèšç®ããŠã°ã©ããäœæããŠã¿ãŸãããã
倩äºã®é«ãïŒã¡ãŒãã«åäœïŒã«å¿ããããŒã«åå
¥
halls_sq_ceil = (data
.groupby(['hall_id', 'ceiling', 'square', 'year', 'month']).agg('sum')['income'].reset_index()
.groupby(['hall_id', 'ceiling', 'square']).agg('mean')['income'].reset_index()
)
plt.bar(halls_sq_ceil.groupby('ceiling').agg('mean')['income'].index[:-2],
halls_sq_ceil.groupby('ceiling').agg('mean')['income'][: len(halls_sq_ceil) - 2]
)
plt.show()

åŸãããããŒã¿ã§ã¯ãæå€§6ã¡ãŒãã«ãŸã§ãåçã¹ã¿ãžãªã®åçæ§ã倩äºã®é«ãã«çŽæ¥äŸåããŠããããšãããããŸããæé©ãªé«ãã¯5ã6ã¡ãŒãã«ã§ãã
ããŒã«ãšãªã¢
仮説ïŒããŒã«ã®é¢ç©ã倧ããã»ã©ãããŒã«ã¯ããå€ãã®åå ¥ããããããŸãã
仮説ããã¹ãããŸãã以åã®èšç®ã䜿çšããŠããšãªã¢ã«å¿ããå¹³ååçæ§ãèšç®ããã°ã©ããäœæããŸãã
å°åã«å¿ããããŒã«åå
¥
square = halls_sq_ceil.groupby('square').agg('mean')['income']
plt.bar(square.index[:-3],
square.iloc[: len(square) - 3]
)
plt.show()

ã°ã©ãã«ã¯æç¢ºãªãã¿ãŒã³ãèŠãããŸããé¢ç©ã倧ããã»ã©ãããŒã«ã¯ããå€ãã®ãã®ããããããŸãã
äºçŽäŸ¡æ Œ
仮説ïŒã¯ã©ã€ã¢ã³ããã»ãŒãã¹ãŠã®éšå±ã«æ¯æãæé©ãªéšå±ã®äŸ¡æ ŒããããŸãã顧客ã¯é«å質ã®ããã ãã«é«ãäŸ¡æ Œãæ¯æãããšãããšããªãã
仮説ããã¹ãããã«ã¯ãæåã«çŸåšã®äŸ¡æ Œã¬ãã«ãæ€èšããŸãããããè¡ãã«ã¯ãäžè¬çãªäºçŽããŒãã«ãéšå±ãäŸ¡æ Œã幎ãæã§ã°ã«ãŒãåããåå ¥ãåèšããŸããããæ¬¡ã«ãéšå±ãšäºçŽäŸ¡æ Œã§ã°ã«ãŒãåããå¹³ååå ¥ãèšç®ããŸããæ¬¡ã«ãå¹³ååå ¥ãèšç®ããŠãäŸ¡æ Œã§ã°ã«ãŒãåããŸããããèšå®ãããäºçŽäŸ¡æ Œã«å¿ããŠãã¹ã¿ãžãªããšã®å¹³åæåãåãåããŸãã
éšå±ã®äºçŽäŸ¡æ Œã«å¿ããã¹ã¿ãžãªã®æéå¹³ååçæ§
price = (data
.groupby(['hall_id', 'price', 'year', 'month']).agg('sum')['income'].reset_index()
.groupby(['hall_id', 'price']).agg('mean')['income'].reset_index()
.groupby('price').agg('mean')['income']
)
1æéãããã®å®¶è³ã«ç¹å®ã®äŸ¡æ Œãããéšå±ã®æ°
plt.figure(figsize = (20, 10))
plt.hist(price.iloc[: len(price) - 5].index)
plt.show()

é »åºŠãã¹ãã°ã©ã ãããã»ãšãã©ã®ã¹ã¿ãžãªã500ãã2000ã«ãŒãã«ã®ã¬ã³ã¿ã«äŸ¡æ Œãèšå®ããŠããããšãããããŸãã500ã«ãŒãã«æªæº -çãããããŒã«ã®æå€§ã¬ã³ã¿ã«äŸ¡æ Œã¯3500ã«ãŒãã«ã§ãã
å¹³åæåã®ããŒã«ã®è³è²žäŸ¡æ Œãžã®äŸå床ã®ã°ã©ã
price = price[price > 10000]
plt.figure(figsize = (20, 10))
plt.scatter(price.index, price)
plt.show()

ã°ã©ãã¯ãæå€§2000ã«ãŒãã«ã§ããããšã瀺ããŠããŸããæç¢ºãªçŽæ¥çãªé¢ä¿ããããŸããäºçŽäŸ¡æ Œãé«ãèšå®ãããã»ã©ãã¹ã¿ãžãªã¯ããå€ãã®åå ¥ãåŸãããšãã§ããŸãã2000ã«ãŒãã«ä»¥äžã®äŸ¡æ Œã§ãéšå±ã®åå ¥ã¯äœããŠãé«ããŠãããŸããŸãããã©ãããã2,000ã«ãŒãã«ä»¥äžãã¯ã©ã€ã¢ã³ãã¯ãæäŸããããµãŒãã¹ã®é«å質ã«å¯ŸããŠã®ã¿æ¯æãæºåãã§ããŠããŸãã䟿å©ãªå Žæãæ©åšãåºããšãªã¢ããŸãã¯é«å質ã®ã€ã³ããªã¢ãªã©ã§ãã
åžå Žåæã®ä»ã®åé
æ©åšåæ
ãµã€ãugoloc.ruã«ã¯ãåçã¹ã¿ãžãªã®èšåã«é¢ããæ å ±ããããŸããè²ä»ãã®èæ¯ã®ååšããã©ãã·ã¥ã®ãã©ã³ããªã©ã§ããåçã¹ã¿ãžãªã®èšåãåçæ§ã«åœ±é¿ãäžããå¯èœæ§ããããããåæãå®å šã«ããããã«ããã®èŠçŽ ãèæ ®ã«å ¥ããå¿ èŠããããŸãã
ãã¹ãŠã®ã¹ã¿ãžãªã远å ã®æ©åšã®ååšã瀺ããŠããããã§ã¯ãããŸããããããã£ãŠããã®èŠå ã®åœ±é¿ã®è©äŸ¡ã¯äžæ£ç¢ºã§ããå¯èœæ§ããããŸãã
åå ¥ã«å¯Ÿããããã€ãã®ãã©ã¡ãŒã¿ãŒã®åœ±é¿ã®åæ
ãã©ã¡ãŒã¿ã¯ãäºãã«åé¢ããŠåçã«åœ±é¿ãäžããŸãããããšãã°ãã¹ããŒã¹ãšäºçŽäŸ¡æ Œã¯ãªã³ã¯ãããŠãããã¹ã¿ãžãªã®å šäœçãªåçæ§ã«åœ±é¿ãäžããŸãããããã£ãŠããããã®åœ±é¿ãäžç·ã«æ€èšããæ¹ãåççã§ããã¯ã©ã€ã¢ã³ãã®èŠæ±ã®è©³çްã«åºã¥ããŠãããã€ãã®ãã©ã¡ãŒã¿ãŒã®åœ±é¿ãèæ ®ããå¿ èŠããããŸãã
匷åãããããŒã¿åé
ugoloc.ruã®åçã¹ã¿ãžãªã¯ãéã®é¢ã§åžå Žã®3åã®1æªæºãå ããŠããŸãããã®ã¢ã°ãªã²ãŒã¿ãŒãµã€ãããã¹ã¿ãžãªã®ã·ã§ã¢ãåå ¥ãšåžå Žã»ã°ã¡ã³ãã§èŠç©ããããšã¯ã§ããŸãããããæ£ç¢ºãªç»åãåŸãã«ã¯ãAppEventãGoogleã«ã¬ã³ããŒãå Žåã«ãã£ãŠã¯ã«ã¹ã¿ã äºçŽã¢ããªã±ãŒã·ã§ã³ããããŒã¿ãåéãã䟡å€ããããŸãã
äŒèšè²»çš
çµµã宿ãããã®ã«ååãªè²»çšããªãããšãããããããšã«æ°ã¥ãããããããŸãããããšãã°ãããŒã«ã®é¢ç©ã倧ããã»ã©ãããŒã«ããã®åå ¥ãå€ããªããŸããçµè«ã¯ãã¡ããè¯ãã§ãããé¢ç©ã倧ãããªãã«ã€ããŠãããŒã«ãåããã³ã¹ããå¢å ããŸãããããã£ãŠãã¬ã³ã¿ã«è²»çšã®å¢å ãã¹ã±ãžã¥ãŒã«ã«ããããããããšã¯ç¢ºãã«åœ¹ç«ã¡ãŸãããããžã§ã¯ãã®åçæ§ã¯ãç¹å®ã®ãã©ã¡ãŒã¿ãŒã«å¯Ÿããåçãšè²»çšã®æé©ãªæ¯çã«é ãããŠããŸãã
ä¿®çã®è²»çšãå°åã«ãã£ãŠç°ãªããŸããå°åãåºãã»ã©ãä¿®çã®è²»çšã¯é«ããªããŸãã
ããŒã«æ°ã®å¢å ã«äŒŽããããŒã«ãããã®äººä»¶è²»ã¯æžå°ããŸãã1人ã®ç®¡çè ã1ã€ã®ããŒã«ãš3ã€ã®ããŒã«ã®äž¡æ¹ã«ãµãŒãã¹ãæäŸã§ããŸãã
ã¡ããããã®è·é¢ã®åæ
ã¹ã¿ãžãªã®å ŽæãããŒã«ã®åå ¥ã«äžãã圱é¿ãè©äŸ¡ããå ŽåãéèŠãªèæ ®ãããŠããªãèŠå ã¯å°äžéããã®è·é¢ã§ããæåã§é 眮ããå¿ èŠããããŸããããããªããšãGoogleAPIã«ç²ŸéããŠãã人ããã®ã¢ã¯ã·ã§ã³ãèªååããããšãã§ããŸãã
ç«¶åä»ç€Ÿããã®è·é¢
ã»ãšãã©ã®å Žåãã¹ã¿ãžãªã¯äºãã«è¿ãã«ãããŸããElektrozavodã ãã§ãçŽ40å°ãããä»ã®åçã¹ã¿ãžãªã«è¿ãããšã§åçæ§ãäžãããšãã仮説ããããŸããå ŽæïŒå»ºç©/ããžãã¹ã»ã³ã¿ãŒïŒã¯é¡§å®¢ã«éŠŽæã¿ããããä¿¡é Œã§ããå Žæã§ããå¯èœæ§ãããããã®å Žæã«ãããã¹ãŠã®åçã¹ã¿ãžãªã«ãã©ã¹ã®å¹æããããŸãã
åçã¹ã¿ãžãªã®ä»äºé
ãããšã¯å¥ã«ãåçã¹ã¿ãžãªã®äœæ¥è² è·ã調æ»ã§ããŸãã
- ããŒã«ã®å¶æ¥æéã®äœããŒã»ã³ããäºçŽã§ãã
- äºçŽãææ¥ãšã©ã®ããã«é¢é£ããŠãããïŒãã¿ãã¬ïŒé±æ«ã«äºçŽããé »åºŠãé«ããªããŸãïŒã
- äºçŽãããŠããªãæ¥ããããã©ããïŒç®¡çè ãä»äºã«è¡ããªãå¯èœæ§ãããæ¥ïŒã
- æãé »ç¹ã«äºçŽãããæéïŒç¹ã«å¹³æ¥ã«èŠãã®ã¯è峿·±ãïŒ
- ç
ãªãã·ãŒãºã³ã®ãã©ãã¹ã¿ãžãªã®æ§å
泚æããªãå€ã«ã¯ãã¹ã¿ãžãªãéãŸãããšãå€ããªããŸããåæã«ãäžéšã®åçã¹ã¿ãžãªã®æ³šææ°ã¯ããã»ã©æžå°ããŠããŸããã人æ°ã®ãªãã·ãŒãºã³ã¹ã¿ãžãªã®å©ç¹ã¯äœã§ããïŒããã¯ãèæ ®ãã¹ãå¥ã®é åã§ãã
ç«¶åä»ç€Ÿã®åçæ§åæ
åçã¹ã¿ãžãªã®æ·å°ãåããè²»çšãšã¹ã¿ããã®å¹³å絊äžã«é¢ããæ å ±ãããã°ãç«¶åä»ç€Ÿã®è²¡æ¿ç¶æ ãè©äŸ¡ããããšãã§ããŸããäžéšã®ã¹ã¿ãžãªã¯ééã®å±æ©ã«çããŠããããšã倿ãããããããŸããããããã£ãŠãããªãã¯åœŒãã®ééããç¹å®ããããããåé¿ããããšããããšãã§ããŸãã
åæ§ã«ãæãåçæ§ã®é«ãåçã¹ã¿ãžãªã®äœéšãæ¢çŽ¢ããã¹ã¿ãžãªã§ããããæŽ»çšããããšãã§ããŸãã
åææ®µé
äžèšã®åæã¯ãåžå Žã®å€§ãŸããªå šäœåã瀺ãããã®æåã®ã¹ãããã§ããããã«åæããããã«ãã¯ã©ã€ã¢ã³ãã¯ããªãŒãã³ãããã¹ã¿ãžãªãäŸ¡æ Œã»ã°ã¡ã³ããå¯èœãªå Žæãã¬ã³ã¿ã«äŸ¡æ Œãæ©åšãªã©ã決å®ããå¿ èŠããããŸãã
çæ³ïŒè€æ°ã®ã¬ã³ã¿ã«ãªãã·ã§ã³ãç¹å®ããŸããæ¬¡ã«ãé¢ç©ã倩äºã®é«ããããŒã«ã®ããããã®æ°ãã³ã¹ããããã³æãè¿ãç«¶åä»ç€Ÿã決å®ãããŸãã
ãã®å Žåãåæã¯ããå®è³ªçãã€æ£ç¢ºã«å®è¡ã§ããŸãã
çµæ
äžé£ã®èšäºã§ã¯ããªãŒãã³ãœãŒã¹ããããŒã¿ãåéããããŒã¿ããŒã¹ã«ä¿åããŠåæããæ¹æ³ã«ã€ããŠèª¬æããŸãããäœæ¥ã®çµæããã©ãã¹ã¿ãžãªãµãŒãã¹åžå Žã®äžè¬çãªçè§£ãåŸãããŸããã
äžèšã®èšç®ãé©çšã§ããŸãã
- åçã®éšåã§ããžãã¹ãã©ã³ãäœæããéã«ããããŠãããã¯çµ±èšçã«ç¢ºèªãããããŒã¿ã«ãªããŸãã
- ãããžã§ã¯ãã®å®çŸå¯èœæ§ãšåçæ§ãè©äŸ¡ããããŸããŸãªéå§ãªãã·ã§ã³ã®åçãšè²»çšãæ¯èŒããŸãã
- åçã¹ã¿ãžãªã®éå¶ãå€ãã®åçã¹ã¿ãžãªã¯æ³šæãªãã§ã¢ã€ãã«ç¶æ ã§ããããéæ¹ã«æ®ããŠããŸããã ãã圌ãã¯äœãééã£ãããšãããŠãããäžèšã®åæã¯ãã¹ã¿ãžãªãç¶æ ã®åå ãç¹å®ããã®ã«åœ¹ç«ã¡ãŸãã
ç§ã¯ãã®ãããžã§ã¯ããæ¥œããã ã
ç§ã¯ããªãã«åœ¹ç«ã€ãããããªãç§ã®çµéšãå ±æããããšã«ããŸããã
ãããã®3ã€ã®èšäºã®æ å ±ã¯ã©ã®çšåºŠåœ¹ã«ç«ã¡ãŸãããïŒ
ããªãã®æèŠãå ±æããŠãã ããã
宿ãããããžã§ã¯ãã¯ç§ã®githubããŒãžã«ãããŸãã