æ¯å¹ŽãDANã°ã«ãŒããæ¯å¹Žéå¬ ããã¯ã©ã€ã¢ã³ãã«ã³ãã¡ã¬ã³ã¹ã®åå€ãããŒãããŒãã¯ã©ã€ã¢ã³ãã«ã€ãã³ãã®æåãæãåºã楜ããã§ããããããã«ãã©ããªããšãé¢çœããèããŠããŸãããç§ãã¡ã¯ããã®äŒè°ãšéå»ã®æ°åæã®åçã®ã¢ãŒã«ã€ããäœæããããšã決å®ããŸããïŒãããŠããã®ãšããŸã§ã«å šéšã§18æãããŸããïŒã人ãç§ãã¡ã«åçãéä¿¡ããæ°ç§ã§ç§ãã¡ã®ã¢ãŒã«ã€ãããæ°å¹Žéã圌ãšäžç·ã«éžæããåçãéä¿¡ããŸãã
ç§ãã¡ã¯èªè»¢è»ãçºæããã®ã§ã¯ãªããããç¥ãããdlibã©ã€ãã©ãªã䜿çšããŠãå人ã®åã蟌ã¿ïŒãã¯ãã«è¡šçŸïŒãåãåããŸããã
䟿å®äžTelegramããããè¿œå ããŸãããããã¹ãŠåé¡ãããŸããã§ãããé¡èªèã¢ã«ãŽãªãºã ã®èŠ³ç¹ããèŠããšããã¹ãŠãããŸããããŸããããäŒè°ã¯çµäºããŸãããç§ã¯ãå®èšŒæžã¿ã®ãã¹ãæžã¿ã®æè¡ãææŸããããããŸããã§ãããæ°å人ããæ°å人ã«è¡ãããã£ãã®ã§ãããç¹å®ã®æ¥åã¯ãããŸããã§ããããã°ããããŠãç§ãã¡ã®ååã¯ããã®ãããªå€§éã®ããŒã¿ãæ±ãå¿ èŠãããä»äºãããŸããã
åé¡ã¯ãInstagramãããã¯ãŒã¯å ã«ã¹ããŒããããã¢ãã¿ãªã³ã°ã·ã¹ãã ãäœæããããšã§ãããããã§ç§ãã¡ã®èãã¯ã·ã³ãã«ã§è€éãªã¢ãããŒããçã¿åºããŸããïŒ
ã·ã³ãã«ãªæ¹æ³ïŒãµãã¹ã¯ã©ã€ããŒãããã¯ããã«å€ãã®ãµãã¹ã¯ãªãã·ã§ã³ããããã¢ãã¿ãŒããªãããã«ããŒã ãå ¥åãããŠããªããªã©ã®ãã¹ãŠã®ã¢ã«ãŠã³ããèæ ®ããŸãããã®çµæãç§ãã¡ã¯ãåæ»ãã ã¢ã«ãŠã³ãã®ç解ã§ããªã矀è¡ãåŸãŸãã
é£ããæ¹æ³ïŒæè¿ã®ãããã¯ããè³¢ããªããã³ã³ãã³ããæçš¿ãã¹ãªãŒããããã«ã¯æžã蟌ã¿ããããããã«ãªã£ãã®ã§ãçåãçããŸãããŸããåéããããã§ããããšãå€ããããåéã«æ³šæãæããéè€ããåçã远跡ããŸãã次ã«ãããããèªåã®åçãçæããæ¹æ³ãç¥ã£ãŠããããšã¯ã»ãšãã©ãããŸããïŒããã¯å¯èœã§ããïŒïŒãã€ãŸããInstagramã®å¥ã®ã¢ã«ãŠã³ãã«ãã人ã®åçã®éè€ã¯ããããã®ãããã¯ãŒã¯ãèŠã€ããè¯ããã£ããã«ãªããŸãã
次ã¯äœã§ããïŒ
åçŽãªãã¹ãéåžžã«äºæž¬å¯èœã§ãããããã«çµæãåºãå Žåãé£ãããã¹ã¯æ£ç¢ºã«å°é£ã§ãããããå®è£ ããã«ã¯ãåŸç¶ã®é¡äŒŒæ§ãæ¯èŒããããã«ä¿¡ããããªãã»ã©å€§éã®åçããã¯ãã«åããŠã€ã³ããã¯ã¹ä»ãããå¿ èŠãããããã§ãããããå®è·µããæ¹æ³ã¯ïŒçµå±ã®ãšãããæè¡çãªåé¡ãçºçããŸãã
- æ€çŽ¢é床ãšç²ŸåºŠ
- ããŒã¿ãå ãããã£ã¹ã¯å®¹é
- 䜿çšãããRAMã®ãµã€ãºã
åçãå°ãªãå Žåãå°ãªããšã1äžãè¶ ããªãå Žåã¯ããã¯ã¿ãŒã¯ã©ã¹ã¿ãªã³ã°ã䜿çšããåçŽãªãœãªã¥ãŒã·ã§ã³ã«å¶éã§ããŸããã倧éã®ãã¯ã¿ãŒãåŠçãããã¯ã¿ãŒã«æãè¿ãè¿åãæ€çŽ¢ããã«ã¯ãè€éã§æé©åãããã¢ã«ãŽãªãºã ãå¿ èŠã§ãã
AnnoyãFAISSãHNSWãªã©ã®æåã§å®çžŸã®ãããã¯ãããžãŒããããŸããnmslibããã³hnswlibã©ã€ãã©ãªãŒã§äœ¿çšå¯èœãªé«éHNSWè¿é£æ¢çŽ¢ã¢ã«ãŽãªãºã ã¯ãåããã³ãããŒã¯ã§ãããããã«ãCPUã«é¢ããææ°ã®çµæã瀺ããŠããŸããããããæ¬åœã«å€§éã®ããŒã¿ãåŠçãããšãã«äœ¿çšãããã¡ã¢ãªã®éã«æºè¶³ã§ããªããããããã«ãããåãæšãŠãŸãããAnnoyãšFAISSã®ã©ã¡ãããéžæãå§ããæçµçã«ã¯FAISSãéžæããŸãããããã¯ãå©äŸ¿æ§ãã¡ã¢ãªäœ¿çšéã®åæžãGPUã§ã®äœ¿çšã®å¯èœæ§ãããã³ããã©ãŒãã³ã¹ã®ãã³ãããŒã¯ã§ãïŒããšãã°ããã¡ããã芧ãã ããïŒãã¡ãªã¿ã«ãFAISSã§ã¯HNSWã¢ã«ãŽãªãºã ããªãã·ã§ã³ãšããŠå®è£ ãããŠããŸãã
FAISSãšã¯äœã§ããïŒ
Facebook AIãªãµãŒãã®é¡äŒŒæ€çŽ¢ -Facebook AIãªãµãŒãããŒã ãéçºããæãè¿ãè¿åãšãã¯ãã«ç©ºéã®ã¯ã©ã¹ã¿ãªã³ã°ããã°ããèŠã€ããŸããé«éã®æ€çŽ¢ã«ãããæ°ååãã¯ã¿ãŒãŸã§ã®éåžžã«å€§ããªããŒã¿ãæ±ãããšãã§ããŸãã
FAISSã®äž»ãªå©ç¹ã¯ãGPUã§ã®æå 端ã®çµæã§ãããCPUã§ã®å®è£ ã¯hnswïŒnmslibïŒããããããã«å£ã£ãŠããŸããCPUãšGPUã®äž¡æ¹ã§æ€çŽ¢ã§ããããã«ããããšèããŸãããããã«ãFAISSã¯ã¡ã¢ãªäœ¿çšéãšå€§èŠæš¡ãããã§ã®æ€çŽ¢ã«é¢ããŠæé©åãããŠããŸãã
ãœãŒã¹
FAISSã䜿çšãããšãç¹å®ã®ãã¯ãã«xã«å¯ŸããŠkåã®æãè¿ããã¯ãã«ããã°ããèŠã€ããããšãã§ããŸãããããããã®æ€çŽ¢ã¯å éšã§ã¯ã©ã®ããã«æ©èœããŸããïŒ
ã€ã³ããã¯ã¹
FAISSã®äž»ãªæŠå¿µã¯indexã§ãããæ¬è³ªçã«ã¯ãã©ã¡ãŒã¿ãŒãšãã¯ãã«ã®ã³ã¬ã¯ã·ã§ã³ã«ãããŸããããã©ã¡ãŒã¿ã®ã»ããã¯å®å šã«ç°ãªãããŠãŒã¶ãŒã®ããŒãºã«äŸåããŸãããã¯ãã«ã¯å€æŽãããªããŸãŸã«ããããšãã§ããŸãããåæ§ç¯ããããšãã§ããŸããäžéšã®ã€ã³ããã¯ã¹ã¯ããã¯ã¿ãŒãè¿œå ãããšããã«äœ¿çšã§ããäžéšã¯äºåã®ãã¬ãŒãã³ã°ãå¿ èŠã§ãããã¯ãã«åã¯ã€ã³ããã¯ã¹ã«æ ŒçŽãããŸãã0ããnãŸã§ã®çªå·ããŸãã¯Int64åã«é©åããæ°å€ãšããŠæ ŒçŽãããŸãã
æåã®ã€ã³ããã¯ã¹ã§ãããã«ã³ãã¡ã¬ã³ã¹ã§äœ¿çšããæãåçŽãªãã®ã¯Flatã§ãããã¹ãŠã®ãã¯ãã«ã®ã¿ãæ ŒçŽãããç¹å®ã®ãã¯ãã«ã®æ€çŽ¢ã¯åŸ¹åºçãªæ€çŽ¢ã«ãã£ãŠå®è¡ãããããããã¬ãŒãã³ã°ããå¿ èŠã¯ãããŸããïŒãã ãã以äžã®åŠç¿ã«ã€ããŠïŒãå°éã®ããŒã¿ã§ã¯ããã®ãããªåçŽãªã€ã³ããã¯ã¹ã§æ€çŽ¢ã®ããŒãºãå®å šã«ã«ããŒã§ããŸãã
äŸïŒ
import numpy as np
dim = 512 # 512
nb = 10000 #
nq = 5 #
np.random.seed(228)
vectors = np.random.random((nb, dim)).astype('float32')
query = np.random.random((nq, dim)).astype('float32')
ãã©ããã€ã³ããã¯ã¹ãäœæãããã¬ãŒãã³ã°ãªãã§ãã¯ãã«ãè¿œå ããŸãã
import faiss
index = faiss.IndexFlatL2(dim)
print(index.ntotal) #
index.add(vectors)
print(index.ntotal) # 10 000
次ã«ããã¯ãã«ããæåã®5ã€ã®ãã¯ãã«ã®7ã€ã®æè¿åãèŠã€ããŸãã
topn = 7
D, I = index.search(vectors[:5], topn) # : Distances, Indices
print(I)
print(D)
åºå
[[0 5662 6778 7738 6931 7809 7184]
[1 5831 8039 2150 5426 4569 6325]
[2 7348 2476 2048 5091 6322 3617]
[3 791 3173 6323 8374 7273 5842]
[4 6236 7548 746 6144 3906 5455]]
[[ 0. 71.53578 72.18823 72.74326 73.2243 73.333244 73.73317 ]
[ 0. 67.604805 68.494774 68.84221 71.839905 72.084335 72.10817 ]
[ 0. 66.717865 67.72709 69.63666 70.35903 70.933304 71.03237 ]
[ 0. 68.26415 68.320595 68.82381 68.86328 69.12087 69.55179 ]
[ 0. 72.03398 72.32417 73.00308 73.13054 73.76181 73.81281 ]]
è·é¢ã0ã®æãè¿ãè¿åã¯ãã¯ãã«èªäœã§ãããæ®ãã¯è·é¢ãå¢ããããšã§ç¯å²ã決ãŸããŸããã¯ãšãªãããã¯ã¿ãŒãæ€çŽ¢ããŠã¿ãŸãããïŒ
D, I = index.search(query, topn)
print(I)
print(D)
åºå
[[2467 2479 7260 6199 8640 2676 1767]
[2623 8313 1500 7840 5031 52 6455]
[1756 2405 1251 4136 812 6536 307]
[3409 2930 539 8354 9573 6901 5692]
[8032 4271 7761 6305 8929 4137 6480]]
[[73.14189 73.654526 73.89804 74.05615 74.11058 74.13567 74.443436]
[71.830215 72.33813 72.973885 73.08897 73.27939 73.56996 73.72397 ]
[67.49588 69.95635 70.88528 71.08078 71.715965 71.76285 72.1091 ]
[69.11357 69.30089 70.83269 71.05977 71.3577 71.62457 71.72549 ]
[69.46417 69.66577 70.47629 70.54611 70.57645 70.95326 71.032005]]
ã¯ãšãªããã®ãã¯ãã«ãã€ã³ããã¯ã¹ã«ãªããããçµæã®æåã®åã®è·é¢ã¯ãŒãã§ã¯ãªããªããŸããã
ã€ã³ããã¯ã¹ããã£ã¹ã¯ã«ä¿åããŠããããã£ã¹ã¯ããããŒãã§ããŸãã
faiss.write_index(index, "flat.index")
index = faiss.read_index("flat.index")
ãã¹ãŠãåçŽã®ããã§ãïŒæ°è¡ã®ã³ãŒã-ãããŠãé«æ¬¡å ã®ãã¯ãã«ã§æ€çŽ¢ããããã®æ§é ããã§ã«ãããŸãããããã512次å ã®æ°åäžã®ãã¯ãã«ãããªããã®ãããªã€ã³ããã¯ã¹ã¯ãçŽ20 GBã®éããšãªãã䜿çšæã«åãéã®RAMãå æããŸãã
ã«ã³ãã¡ã¬ã³ã¹ã®ãããžã§ã¯ãã§ã¯ããã©ããã€ã³ããã¯ã¹ã䜿çšãããã®ãããªåºæ¬çãªã¢ãããŒãã®ã¿ã䜿çšããŸãããæ¯èŒçå°éã®ããŒã¿ã®ãããã§ãã¹ãŠãçŽ æŽããããã®ã§ããããä»ã§ã¯æ°åäžããæ°åã®é«æ¬¡å ãã¯ãã«ã«ã€ããŠè©±ããŠããŸãã
å転ãªã¹ãã§æ€çŽ¢ãã¹ããŒãã¢ãã
ãœãŒã¹
FAISSã®äž»ãªæãåªããæ©èœã¯ãIVFã€ã³ããã¯ã¹ãã€ãŸãå転ãã¡ã€ã«ã€ã³ããã¯ã¹ã§ãã転眮ãã¡ã€ã«ã®ã¢ã€ãã¢ã¯ç°¡æœã§ãçŸããæã«èª¬æïŒ
ã¬ããã¯ãæãéå€æŠå£«ããã³ããªã³ã°ãããšãã°ã1,000,000人ããæã巚倧ãªè»éãæ³åããŠã¿ãŠãã ãããè»å šäœãäžåºŠã«ææ®ããããšã¯äžå¯èœã§ããè»äºæ £ç¿ã®ããã«ãç§ãã¡ã¯è»ããµããŠãããã«åå²ããå¿ èŠããããŸããåããŸãããåéšéã®ä»£è¡šãææ®å®ã®åœ¹å²ãšããŠéžæããŸãããŸããæ§æ Œãåºèº«å°ãç©ççããŒã¿ãªã©ãå¯èœãªéãéä¿¡ããããåªããŸããæŠå£«ã1ã€ã®ãŠãããã«é 眮ããåžä»€å®ãéžæããŠã圌ãã§ããã ãæ£ç¢ºã«ãŠããããè¡šãããã«ããŸãããã®çµæãç§ãã¡ã®ä»»åã¯ã100äžäººã®å µå£«ãææ®ããããšãããææ®å®ãä»ããŠ1000ãŠããããææ®ããããšãŸã§åæžãããŸããã
ãããIVFã€ã³ããã¯ã¹ã®èåŸã«ããèãæ¹ã§ããk-meansã¢ã«ãŽãªãºã ã䜿çšããŠããã¯ãã«ã®å€§ããªã»ãããå°ããã€ã°ã«ãŒãåããŸããããéå¿ã«åŸã£ãŠåããŒããèšå®ããããšã¯ãç¹å®ã®ã¯ã©ã¹ã¿ãŒã«å¯ŸããŠéžæãããäžå¿ã§ãããã¯ãã«ã§ããéå¿ãŸã§ã®æå°è·é¢ãæ€çŽ¢ããŠããããã®éå¿ã«å¯Ÿå¿ããã¯ã©ã¹ã¿ãŒå ã®ãã¯ãã«éã®æå°è·é¢ãæ¢ããŸããkãçããããã©ã -ã€ã³ããã¯ã¹å ã®ãã¯ãã«ã®æ°ã2ã€ã®ã¬ãã«ã§æé©ãªæ€çŽ¢ãè¡ãããŸãã 次ã«éå¿ åã¯ã©ã¹ã¿ãŒã®ãã¯ãã«ã培åºçãªæ€çŽ¢ãšæ¯èŒããŠãæ€çŽ¢ã¯æ°åé«éåãããäœçŸäžãã®ãã¯ãã«ãæ±ãéã®åé¡ã®1ã€ã解決ããŸãã
ãã¯ãã«ç©ºéã¯ãkå¹³åæ³ã«ãã£ãŠkåã®ã¯ã©ã¹ã¿ãŒã«åå²ãããŸããåã¯ã©ã¹ã¿ãŒã«ã¯ãéå¿ã®
ãµã³ãã«ã³ãŒããå²ãåœãŠãããŠããŸãã
dim = 512
k = 1000 # ââ
quantiser = faiss.IndexFlatL2(dim)
index = faiss.IndexIVFFlat(quantiser, dim, k)
vectors = np.random.random((1000000, dim)).astype('float32') # 1 000 000 ââ
ãŸãã¯ãã€ã³ããã¯ã¹ãäœæããããã®äŸ¿å©ãªFAISSã䜿çšããŠãã¯ããã«ãšã¬ã¬ã³ãã«ãããæžãçããããšãã§ããŸãã
index = faiss.index_factory(dim, âIVF1000,Flatâ)
:
print(index.is_trained) # False.
index.train(vectors) # Train
# , , :
print(index.is_trained) # True
print(index.ntotal) # 0
index.add(vectors)
print(index.ntotal) # 1000000
ãã®ã¿ã€ãã®ã€ã³ããã¯ã¹ããã©ããã®åŸã«æ€èšããçµæãæœåšçãªåé¡ã®1ã€ã§ããæ€çŽ¢é床ã解決ããŸãããããã¯ã培åºçãªæ€çŽ¢ã«æ¯ã¹ãŠæ°åé ããªããŸãã
D, I = index.search(query, topn)
print(I)
print(D)
åºå
[[19898 533106 641838 681301 602835 439794 331951]
[654803 472683 538572 126357 288292 835974 308846]
[588393 979151 708282 829598 50812 721369 944102]
[796762 121483 432837 679921 691038 169755 701540]
[980500 435793 906182 893115 439104 298988 676091]]
[[69.88127 71.64444 72.4655 72.54283 72.66737 72.71834 72.83057]
[72.17552 72.28832 72.315926 72.43405 72.53974 72.664055 72.69495]
[67.262115 69.46998 70.08826 70.41119 70.57278 70.62283 71.42067]
[71.293045 71.6647 71.686615 71.915405 72.219505 72.28943 72.29849]
[73.27072 73.96091 74.034706 74.062515 74.24464 74.51218 74.609695]]
ãã ãããããããã1ã€ãããŸããæ€çŽ¢ç²ŸåºŠãšé床ã¯ã蚪åããã¯ã©ã¹ã¿ãŒã®æ°ã«äŸåããŸããããã¯ãnprobeãã©ã¡ãŒã¿ãŒã䜿çšããŠèšå®ã§ããŸãã
print(index.nprobe) # 1 â
index.nprobe = 16 # -16 top-n
D, I = index.search(query, topn)
print(I)
print(D)
åºå
[[ 28707 811973 12310 391153 574413 19898 552495]
[540075 339549 884060 117178 878374 605968 201291]
[588393 235712 123724 104489 277182 656948 662450]
[983754 604268 54894 625338 199198 70698 73403]
[862753 523459 766586 379550 324411 654206 871241]]
[[67.365585 67.38003 68.17187 68.4904 68.63618 69.88127 70.3822]
[65.63759 67.67015 68.18429 68.45782 68.68973 68.82755 69.05]
[67.262115 68.735535 68.83473 68.88733 68.95465 69.11365 69.33717]
[67.32007 68.544685 68.60204 68.60275 68.68633 68.933334 69.17106]
[70.573326 70.730286 70.78615 70.85502 71.467674 71.59512 71.909836]]
ã芧ã®ããã«ãnprobeãå¢ãããåŸãå®å šã«ç°ãªãçµæãåŸãããŸãããDã®æçè·é¢ã®é ç¹ãè¯ããªã£ãŠããŸãã
ã€ã³ããã¯ã¹å ã®éå¿ã®æ°ã«çããnprobeãååŸã§ããŸããããã¯ã培åºçãªæ€çŽ¢ã«ããæ€çŽ¢ãšåçã«ãªãã粟床ã¯æ倧ã«ãªããŸãããæ€çŽ¢é床ã¯èããäœäžããŸãã
ãã£ã¹ã¯ã®æ€çŽ¢-On Disk Inverted Lists
ãã°ããããæåã®åé¡ã解決ããŸãããããã§æ°åäžã®ãã¯ãã«ã§èš±å®¹å¯èœãªæ€çŽ¢é床ãåŸãããŸããããã ãã巚倧ãªã€ã³ããã¯ã¹ãRAMã«åãŸããªãéããããã¯ãã¹ãŠåœ¹ã«ç«ã¡ãŸããã
ç¹ã«ç§ãã¡ã®ã¿ã¹ã¯ã§ã¯ãFAISSã®äž»ãªå©ç¹ã¯ãIVFã€ã³ããã¯ã¹ã®å転ãªã¹ãããã£ã¹ã¯ã«æ ŒçŽããã¡ã¿ããŒã¿ã®ã¿ãRAMã«ããŒãã§ããããšã§ãã
ãã®ãããªã€ã³ããã¯ã¹ã®äœææ¹æ³ïŒã¡ã¢ãªã«åãŸãæ倧éã®ããŒã¿ã«å¿ èŠãªãã©ã¡ãŒã¿ãŒã䜿çšããŠindexIVFããã¬ãŒãã³ã°ãããã¬ãŒãã³ã°ãããã€ã³ããã¯ã¹ã ãã§ãªããã¯ã¿ãŒããã¬ãŒãã³ã°ãããã€ã³ããã¯ã¹ã«ããŒãã§è¿œå ããåããŒãã®ã€ã³ããã¯ã¹ããã£ã¹ã¯ã«æžã蟌ã¿ãŸãã
index = faiss.index_factory(512, â,IVF65536, Flatâ, faiss.METRIC_L2)
GPUã€ã³ããã¯ã¹ã®ãã¬ãŒãã³ã°ã¯æ¬¡ã®ããã«è¡ãããŸãã
res = faiss.StandardGpuResources()
index_ivf = faiss.extract_index_ivf(index)
index_flat = faiss.IndexFlatL2(512)
clustering_index = faiss.index_cpu_to_gpu(res, 0, index_flat) # 0 â GPU
index_ivf.clustering_index = clustering_index
faiss.index_cpu_to_gpuïŒresã0ãindex_flatïŒãfaiss.index_cpu_to_all_gpusïŒindex_flatïŒã«çœ®ãæããŠããã¹ãŠã®GPUãäžç·ã«äœ¿çšã§ããŸãã
ãã¬ãŒãã³ã°ãµã³ãã«ã¯ã§ããéã代衚çã§åäžãªååžãæã€ããšãéåžžã«æãŸãããããå¿ èŠãªæ°ã®ãã¯ãã«ããäºåã«ãã¬ãŒãã³ã°ããŒã¿ã»ãããäœæããããŒã¿ã»ããå šäœããã©ã³ãã ã«éžæããŸãã
train_vectors = ... #
index.train(train_vectors)
# , :
faiss.write_index(index, "trained_block.index")
#
# :
for bno in range(first_block, last_block+ 1):
block_vectors = vectors_parts[bno]
block_vectors_ids = vectors_parts_ids[bno] # id ,
index = faiss.read_index("trained_block.index")
index.add_with_ids(block_vectors, block_vectors_ids)
faiss.write_index(index, "block_{}.index".format(bno))
ãã®åŸããã¹ãŠã®å転ãªã¹ããçµåããŸããããã¯ãåãããã¯ãæ¬è³ªçã«åããã¬ãŒãã³ã°æžã¿ã€ã³ããã¯ã¹ã§ãããå éšã«ç°ãªããã¯ãã«ãããã ããªã®ã§å¯èœã§ãã
ivfs = []
for bno in range(first_block, last_block+ 1):
index = faiss.read_index("block_{}.index".format(bno), faiss.IO_FLAG_MMAP)
ivfs.append(index.invlists)
# index inv_lists
# :
index.own_invlists = False
# :
index = faiss.read_index("trained_block.index")
# invlists
# invlists merged_index.ivfdata
invlists = faiss.OnDiskInvertedLists(index.nlist, index.code_size, "merged_index.ivfdata")
ivf_vector = faiss.InvertedListsPtrVector()
for ivf in ivfs:
ivf_vector.push_back(ivf)
ntotal = invlists.merge_from(ivf_vector.data(), ivf_vector.size())
index.ntotal = ntotal #
index.replace_invlists(invlists)
faiss.write_index(index, data_path + "populated.index") #
çµè« ïŒããã§ãã€ã³ããã¯ã¹ãpopulated.indexãšmerged_blocks.ivfdataãã¡ã€ã«ã«ãªããŸããã
ã§populated.indexãã€ã³ããŒããããªã¹ãã䜿çšãããã¡ã€ã«ã«ãå ã®å®å šãªãã¹ãèšé²ãããã§ããã°ãã€ã³ããã¯ã¹ã®èªã¿åãã®å€åããã©ã°ã䜿çšããå¿ èŠããããŸãäœããã®çç±ã§ãã¡ã€ã«ãã¹ivfdata faiss.IO_FLAG_ONDISK_SAME_DIRããªãã¯ãšåããã£ã¬ã¯ããªã«ivfdataãã¡ã€ã«ãæ€çŽ¢ããããšãã§ãã populated.indexïŒ
index = faiss.read_index('populated.index', faiss.IO_FLAG_ONDISK_SAME_DIR)
FAISSãããžã§ã¯ãã®Github ã®ãã¢äŸãããŒã¹ãšããŠäœ¿çšãããŸããã
ã€ã³ããã¯ã¹ãéžæããããã®ããã¬ã€ãã¯FAISS Wikiã«ãããŸããããšãã°ã1200äžã®ãã¯ãã«ã®ãã¬ãŒãã³ã°ããŒã¿ã»ãããRAMã«åããããšãã§ããããã262144ã®éå¿ã«IVFFlatã€ã³ããã¯ã¹ãéžæããæ°åã«æ¡å€§ããŸãããã¬ã€ãã®ã€ã³ããã¯ã¹IVF262144_HNSW32ã䜿çšããããšãææ¡ãããŠããŸããã¯ã©ã¹ã¿ãŒãžã®ãã¯ãã«ã®æå±ã¯ã32åã®æè¿åç¹ããã€HNSWã¢ã«ãŽãªãºã ã«ãã£ãŠïŒã€ãŸããéåååšIndexHNSWFlatã䜿çšããŠïŒæ±ºå®ãããŸããã以éã®ãã¹ãã§ã¯ããã®ãããªã€ã³ããã¯ã¹ã«ããæ€çŽ¢ã¯ç²ŸåºŠãäœããªããŸããããã«ããã®ãããªéåååšã¯ãGPUã§ã®äœ¿çšã®å¯èœæ§ãæé€ããããšãèŠããŠããå¿ èŠããããŸãã
ãã¿ãã¬ïŒ
補åã®éååã«ãããã£ã¹ã¯äœ¿çšéãåçã«åæž
ãã£ã¹ã¯æ€çŽ¢æ¹æ³ã®ãããã§ãRAMããè² è·ãåãé€ãããšã¯å¯èœã§ãããã100äžã®ãã¯ã¿ãŒãå«ãã€ã³ããã¯ã¹ã¯ãäŸç¶ãšããŠçŽ2 GBã®ãã£ã¹ã¯å®¹éãå¿ èŠãšããŸããããã¡ãããç®æšãèšå®ããŠè¿œå ã®ãã£ã¹ã¯é åãå²ãåœãŠããšãããªã¥ãŒã ã¯ããã»ã©å€§ãããããŸããããå°ãæ°ã«ãªããŸããã
ãããŠããã§ãã¯ãã«ã³ãŒãã£ã³ã°ãæãã«æ¥ãŸããããªãã¡ã¹ã«ã©ãŒéååïŒSQïŒãšè£œåéååïŒPQïŒã SQã¯ãnãããïŒéåžžã¯8ã6ããŸãã¯4ãããïŒã®åãã¯ãã«ã³ã³ããŒãã³ãã®ãšã³ã³ãŒãã£ã³ã°ã§ãã 1ã€ã®float32ã³ã³ããŒãã³ãã8ãããã§ãšã³ã³ãŒããããšããèãã¯ã粟床ã®äœäžãšãã芳ç¹ããã¯ããŸãã«ãæ鬱ã«èŠãããããPQãªãã·ã§ã³ãæ€èšããŸãããã ããå Žåã«ãã£ãŠã¯ãSQfp16ãfloat16ã«å§çž®ããŠã粟床ã¯ã»ãšãã©å€±ãããŸããã
補åã®éååã®æ¬è³ªã¯æ¬¡ã®ãšããã§ãã次å 512ã®ãã¯ãã«ã¯nåã®éšåã«åå²ãããããããã256åã®å¯èœãªã¯ã©ã¹ã¿ãŒïŒ1ãã€ãïŒã«ã¯ã©ã¹ã¿ãŒåãããŸããnãã€ãã䜿çšããŠãã¯ãã«ãè¡šããŸããFAISSå®è£ ã§ã¯ãnã¯éåžž64ãè¶ ããŸããããããããã®ãããªéååã¯ãããŒã¿ã»ããããã®ãã¯ãã«èªäœã«ã¯é©çšãããŸãããããããã®ãã¯ãã«ã®å·®ãšãå転ãªã¹ããçæãã段éã§ååŸããã察å¿ããéå¿ã«é©çšãããŸãïŒå転ãªã¹ãã¯ããã¯ãã«ãšãã®éå¿éã®è·é¢ã®ãšã³ã³ãŒããããã»ããã«ãªãããšãããããŸãã
index = faiss.index_factory(dim, "IVF262144,PQ64", faiss.METRIC_L2)
ãã¹ãŠã®ãã¯ãã«ãä¿åããå¿ èŠã¯ãªãããšãããããŸããããã¯ãã«ããšã«nãã€ããéå¿ãã¯ãã«ããšã«2048ãã€ããå²ãåœãŠãã ãã§ååã§ããç§ãã¡ã®å Žåãç§ãã¡ã¯ãã€ãŸã -256ã¯ã©ã¹ã¿ãŒã®1ã€ã§å®çŸ©ããã1ã€ã®ãµããã¯ãã«ã®é·ãã
ãã¯ãã«xã§æ€çŽ¢ããå Žåãæãè¿ãéå¿ã¯éåžžã®ãã©ããéåååšã§æåã«æ±ºå®ããã次ã«xããµããã¯ãã«ã«åå²ãããŸããåãµããã¯ãã«ã¯ã察å¿ãã256ã®éå¿ã®1ã€ã®æ°ã«ãã£ãŠãšã³ã³ãŒããããŸãããããŠããã¯ãã«ãŸã§ã®è·é¢ã¯ããµããã¯ãã«éã®64ã®è·é¢ã®åèšãšããŠå®çŸ©ãããŸãã
çµæã¯ã©ãã§ããïŒ
â IVF262144ãPQ64âã€ã³ããã¯ã¹ã¯ãæ€çŽ¢ã®é床ãšç²ŸåºŠã«é¢ãããã¹ãŠã®ããŒãºãå®å šã«æºãããããã«ã€ã³ããã¯ã¹ãããã«æ¡å€§ããŠãã£ã¹ã¯é åãé©åã«äœ¿çšã§ããããã«ãªã£ããããå®éšãäžæ¢ããŸãããããå ·äœçã«ã¯ãçŸæç¹ã§ã¯3å1500äžã®ãã¯ãã«ã§ãã€ã³ããã¯ã¹ã¯22 GBã®ãã£ã¹ã¯é åãšçŽ3 GBã®RAMã䜿çšããŠããŸãã
åã«è§Šããªãã£ããã1ã€ã®èå³æ·±ã詳现ã¯ãã€ã³ããã¯ã¹ã§äœ¿çšãããã¡ããªãã¯ã§ããããã©ã«ãã§ã¯ãä»»æã®2ã€ã®ãã¯ãã«éã®è·é¢ã¯ãŠãŒã¯ãªããL2ã¡ããªãã¯ã§èšç®ãããŸãããŸãã¯ãããç解ããããèšèªã§ã¯ãè·é¢ã¯åº§æšããšã®å·®ã®å¹³æ¹åã®å¹³æ¹æ ¹ãšããŠèšç®ãããŸãããã ããå¥ã®ã¡ããªãã¯ãèšå®ã§ããŸããç¹ã«ãMETRIC_INNER_PRODUCTã¡ããªãã¯ããã¹ãããŸããããŸãã¯ãã¯ãã«éã®ã³ãµã€ã³è·é¢ã®ã¡ããªãã¯ããŠãŒã¯ãªãã座æšç³»ã®2ã€ã®ãã¯ãã«éã®è§åºŠã®äœåŒŠã¯ããã¯ãã«ã®é·ãã®ç©ã«å¯Ÿãããã¯ãã«ã®ã¹ã«ã©ãŒïŒåº§æšæ¹åïŒç©ã®æ¯ãšããŠè¡šããããããäœåŒŠã§ãã空éå ã®ãã¹ãŠã®ãã¯ãã«ã1ã®å Žåãè§åºŠã®äœåŒŠã¯ã座æšæ¹åã®ç©ãšæ£ç¢ºã«çãããªããŸãããã®å Žåããã¯ãã«ã空éã«è¿ã¥ãã»ã©ããããã®å ç©ã¯1ã«è¿ããªããŸãã
ã¡ããªãã¯L2ã¯ãã¹ã«ã©ãŒç©ã®ã¡ããªãã¯ã«çŽæ¥æ°åŠçã«ç§»è¡ããŸãããã ãã2ã€ã®ææšãå®éšçã«æ¯èŒãããšãå ç©ææšã¯ç»åã®é¡äŒŒåºŠã®ä¿æ°ãããé©åã«åæããã®ã«åœ¹ç«ã€ãšããå°è±¡ããããŸãããããã«ãç§ãã¡ã®åçã®åã蟌ã¿ã¯ãInsightFaceã¯ãã³ãµã€ã³è·é¢ã䜿çšããŠArcFaceã¢ãŒããã¯ãã£ãå®è£ ããŸããããã§èªãããšãã§ããFAISSã€ã³ããã¯ã¹ã«ã¯ä»ã®ã¡ããªãã¯ããããŸãã
GPUã«ã€ããŠäžèš
çµè«ãšå¥åŠãªäŸ
ããã§ããã¹ãŠãå§ãŸã£ããšããã«æ»ããŸãããããŠããã¯æãåºããŠãInstagramãããã¯ãŒã¯ã§ããããèŠã€ããåé¡ã解決ãããšããåæ©ã§ãããå ·äœçã«ã¯ãç¹å®ã®ãŠãŒã¶ãŒã»ããã®äººã ãŸãã¯ã¢ãã¿ãŒãšã®éè€ããæçš¿ãæ¢ãããšã§ãããè³æãäœæããéçšã§ããããæ€çŽ¢ã®æ¹æ³è«ã®è©³çŽ°ãªèª¬æãå¥ã®èšäºã«åŒãåºãããããšãæããã«ãªããŸãããããã«ã€ããŠã¯ä»åŸã®åºçç©ã§èª¬æããŸãããããã§ã¯ãFAISSã§ã®å®éšã®äŸã«éå®ããŸãã
åçãé¡ãããŸããŸãªæ¹æ³ã§ãã¯ãã«åã§ããŸããInsightFaceãã¯ãããžãŒãéžæããŸããïŒç»åã®ãã¯ãã«åãšãããããã®n次å ã®ç¹åŸŽã®éžæã¯å¥ã®é·ã話ã§ãïŒãç§ãã¡ãåãåã£ãã€ã³ãã©ã¹ãã©ã¯ãã£ã䜿ã£ãå®éšã®éçšã§ãéåžžã«èå³æ·±ããé¢çœãç¹æ§ãçºèŠãããŸããã
ããšãã°ãååãç¥äººã®èš±å¯ãåŸãŠãæ€çŽ¢ã§åœŒãã®é¡ãã¢ããããŒããã圌ããããåçãããã«èŠã€ããŸãã
ãåºå ž
å人ã®å€§èŠæš¡ãªã°ã«ãŒãã§ã®ãã¯ããã¯ãå人ã®ã¢ã«ãŠã³ãããåçããœãŒã¹
æž¡ãããã°ãããæªç¥ã®åç家ã圌ã®äž»é¡ã®ãããã£ãŒã«ã®ããã«åœŒããæãããŸããã圌ãã¯èªåã®åçãã©ãã«è¡ãã®ããç¥ããŸããã§ããããããŠã5幎åŸã«åœŒãã¯åœŒããã©ã®ããã«åçãæ®ãããããå®å šã«å¿ããŸãããåºå ž
ãã®å Žåãåç家ã¯äžæã§ãããå¯ãã«æ®åœ±ãããŠããŸãã
ããã«:)ã®åã®ç¬éã«åº§ã£ãŠãSLRã§äžå¯©ãªå¥³ã®åãæãåºããœãŒã¹
ãããã£ãŠãç°¡åãªã¢ã¯ã·ã§ã³ã§ãFAISSã¯æåãªFindFaceã®é¡äŒŒç©ãèã®äžã«éããããšãã§ããŸãã
ãã1ã€ã®èå³æ·±ãæ©èœïŒFAISSã€ã³ããã¯ã¹ã§ã¯ãé¡ãäºãã«é¡äŒŒããŠããã»ã©ã察å¿ãããã¯ãã«ã空éã«é 眮ãããŸããç§ã¯èªåã®é¡ã®æ€çŽ¢çµæã®ç²ŸåºŠãããå£ã£ãŠããããšã詳ãã調ã¹ãããšã«ããã²ã©ã䌌ãã¯ããŒã³ãèŠã€ããŸãã:)
èè ã®ã¯ããŒã³ã®äžéšã
åçåºå žïŒ1ã2ã 3ã¯ã
äžè¬çã«èšãã°ãFAISSã¯ãä»»æã®åµé çãªã¢ã€ãã¢ã®å®çŸã®ããã®å·šå€§ãªãã£ãŒã«ããéããŸããããšãã°ã䌌ãé¡ã®ãã¯ãã«è¿æ¥ãšããåãåçã䜿çšããŠã人ãã人ãžã®çµè·¯ãæ§ç¯ã§ããŸãããŸãã¯ãæåŸã®æ段ãšããŠãFAISSããã®ãããªããŒã ãäœæããããã®å·¥å Žã«ããŸãã
åºå ž
ãæž èŽããããšãããããŸããããã®è³æãHabrã®èªè ã«åœ¹ç«ã€ããšãé¡ã£ãŠããŸãã
ãã®èšäºã¯ãååã®Artyom KorolevïŒã³ããŒã«ãŽã¡ãŒãïŒããã£ã ãŒã«ã»ã«ãã£ãããšã¢ãªãŒãã»ã¬ã·ã§ããã³ã¯ã
RïŒDé»éã€ãŒãžã¹ãããã¯ãŒã¯ãã·ã¢ã