GoogleのCertificateTransparencyログの処理に関するRyanSearsの記事の2部構成の翻訳を用意しました。最初の部分では、ログの構造の概要を示し、これらのログからレコードを解析するためのサンプルPythonコードを提供します。2番目の部分では、使用可能なログからすべての証明書を取得し、受信したデータの検索を保存および整理するためにGoogleBigQueryシステムを構成します。
オリジナルが書かれてから3年が経過し、それ以来、利用可能なログの数、したがってそれらのエントリは何度も増加しています。受信するデータの量を最大化することが目標である場合は、ログの処理に正しく取り組むことがさらに重要です。
パート1。上司のように証明書の透明性ログを解析する
最初のプロジェクトであるphisfinderの開発中、私はフィッシング攻撃の構造と、実際の損害を引き起こす前に今後のフィッシングキャンペーンの痕跡を特定できるデータソースについて考えることに多くの時間を費やしました。
我々は統合(と間違いなく最高の1つ)きた源の1つは証明書の透明性ログ(CTL)、によって開始されたプロジェクトであるベン・ローリーとアダム・ラングレーGoogleの。基本的に、CTLは、CAによって発行された証明書の不変リストを含むログであり、Merkleツリーに格納され、必要に応じて各証明書を暗号で検証できるようにします。
, , , CTL:
import requests
import json
import locale
locale.setlocale(locale.LC_ALL, 'en_US')
ctl_log = requests.get('https://www.gstatic.com/ct/log_list/log_list.json').json()
total_certs = 0
human_format = lambda x: locale.format('%d', x, grouping=True)
for log in ctl_log['logs']:
log_url = log['url']
try:
log_info = requests.get('https://{}/ct/v1/get-sth'.format(log_url), timeout=3).json()
total_certs += int(log_info['tree_size'])
except:
continue
print("{} has {} certificates".format(log_url, human_format(log_info['tree_size'])))
print("Total certs -> {}".format(human_format(total_certs)))
:
ct.googleapis.com/pilot has 92,224,404 certificates
ct.googleapis.com/aviator has 46,466,472 certificates
ct1.digicert-ct.com/log has 1,577,183 certificates
ct.googleapis.com/rocketeer has 89,391,361 certificates
ct.ws.symantec.com has 3,562,198 certificates
ctlog.api.venafi.com has 94,797 certificates
vega.ws.symantec.com has 200,401 certificates
ctserver.cnnic.cn has 5,081 certificates
ctlog.wosign.com has 1,387,492 certificates
ct.startssl.com has 293,374 certificates
ct.googleapis.com/skydiver has 1,249,079 certificates
ct.googleapis.com/icarus has 48,585,765 certificates
Total certs -> 285,037,607
285,037,607 . , , . .
, API , PreCerts ( ) . , , , 6 , Chrome. , , .
, , , Google Chrome, 46 , 6,861,473,804 , .
CTL
CTL HTTP, . , , . :
json
// curl -s 'https://ct1.digicert-ct.com/log/ct/v1/get-entries?start=0&end=0' | jq .
{
"entries": [
{
"leaf_input": "AAAAAAFIyfaldAAAAAcDMIIG/zCCBeegAwIBAgI...",
"extra_data": "AAiJAAS6MIIEtjCCA56gAwIBAgIQDHmpRLCMEZU..."
}
]
}
`leaf_input` `extra_data` base64. RFC6962 , `leaf_input` - MerkleTreeLeaf, `extra_data` - PrecertChainEntry.
PreCerts
, , PreCert ( , RFC, , , . PreCerts :
PreCerts , CA , “” . , , x509 v3, `poison` . , , , PreCert, , .
, , , x509/ASN.1 , PreCert. , , , PreCerts CTL , CA, .
, - CTF, . `struct`, , , Construct, . , , :
from construct import Struct, Byte, Int16ub, Int64ub, Enum, Bytes, Int24ub, this, GreedyBytes, GreedyRange, Terminated, Embedded
MerkleTreeHeader = Struct(
"Version" / Byte,
"MerkleLeafType" / Byte,
"Timestamp" / Int64ub,
"LogEntryType" / Enum(Int16ub, X509LogEntryType=0, PrecertLogEntryType=1),
"Entry" / GreedyBytes
)
Certificate = Struct(
"Length" / Int24ub,
"CertData" / Bytes(this.Length)
)
CertificateChain = Struct(
"ChainLength" / Int24ub,
"Chain" / GreedyRange(Certificate),
)
PreCertEntry = Struct(
"LeafCert" / Certificate,
Embedded(CertificateChain),
Terminated
)
import json
import base64
import ctl_parser_structures
from OpenSSL import crypto
entry = json.loads("""
{
"entries": [
{
"leaf_input": "AAAAAAFIyfaldAAAAAcDMIIG/zCCBeegAwIBAgIQ...",
"extra_data": "AAiJAAS6MIIEtjCCA56gAwIBAgIQDHmpRLCMEZUg..."
}
]
}
""")['entries'][0]
leaf_cert = ctl_parser_structures.MerkleTreeHeader.parse(base64.b64decode(entry['leaf_input']))
print("Leaf Timestamp: {}".format(leaf_cert.Timestamp))
print("Entry Type: {}".format(leaf_cert.LogEntryType))
if leaf_cert.LogEntryType == "X509LogEntryType":
# , - X509
cert_data_string = ctl_parser_structures.Certificate.parse(leaf_cert.Entry).CertData
chain = [crypto.load_certificate(crypto.FILETYPE_ASN1, cert_data_string)]
# `extra_data`
extra_data = ctl_parser_structures.CertificateChain.parse(base64.b64decode(entry['extra_data']))
for cert in extra_data.Chain:
chain.append(crypto.load_certificate(crypto.FILETYPE_ASN1, cert.CertData))
else:
# , - PreCert
extra_data = ctl_parser_structures.PreCertEntry.parse(base64.b64decode(entry['extra_data']))
chain = [crypto.load_certificate(crypto.FILETYPE_ASN1, extra_data.LeafCert.CertData)]
for cert in extra_data.Chain:
chain.append(
crypto.load_certificate(crypto.FILETYPE_ASN1, cert.CertData)
)
X509 leaf_input
, Construct Python.
, , CTL , - .
2. Retrieving, Storing and Querying 250M+ Certificates Like a Boss
RFC, `get-entries`. , , ( `start` `end`), 64 . CTL Google, , 1024 .
Google (Argon, Xenon, Aviator, Icarus, Pilot, Rocketeer, Skydiver) 32 , , , .
1024 , CTL, Google, 256 .
IO-bound ( http) CPU-bound ( ), , .
, CTL ( Google, , . Axeman, asyncio aioprocessing , CSV , -.
(_. ._ Google Cloud VM) c 16 , 32 SSD 750 ( Google 300$ !), Axeman, `/tmp/certificates/$CTL_DOMAIN/`
?
Postgres, , , Postgres 250 ( , 20 !), , :
, , (AWS RDS, Heroku Postgres, Google Cloud SQL) . , , .
, , map/reduce , , Spark Hadoop Pig. “big data” ( ), Google BigQuery, .
BigQuery
BigQuery , Google gsutil. :
, `gsutil` Google ( BigQuery). `gsutil config`, :
gsutil -o GSUtil:parallel_composite_upload_threshold=150M \
-m cp \
/tmp/certificates/* \
gs://all-certificates
:
BigQuery:
. , BigQuery “, ”, CTL , . ( ):

, “Edit as Text”. :
[
{
"name": "url",
"type": "STRING",
"mode": "REQUIRED"
},
{
"mode": "REQUIRED",
"name": "cert_index",
"type": "INTEGER"
},
{
"mode": "REQUIRED",
"name": "chain_hash",
"type": "STRING"
},
{
"mode": "REQUIRED",
"name": "cert_der",
"type": "STRING"
},
{
"mode": "REQUIRED",
"name": "all_dns_names",
"type": "STRING"
},
{
"mode": "REQUIRED",
"name": "not_before",
"type": "FLOAT"
},
{
"mode": "REQUIRED",
"name": "not_after",
"type": "FLOAT"
}
]
. , ( , , ). :
.
, punycode . :
SQL
SELECT
all_dns_names
FROM
[ctl-lists:certificate_data.scan_data]
WHERE
(REGEXP_MATCH(all_dns_names,r'\b?xn\-\-'))
AND NOT all_dns_names CONTAINS 'cloudflare'
15 punycode CTL!

. Coinbase, Certificate Transparency:
SQL
SELECT
all_dns_names
FROM
[ctl-lists:certificate_data.scan_data]
WHERE
(REGEXP_MATCH(all_dns_names,r'.*\.coinbase.com[\s$]?'))
:
- , - .
, . `flowers-to-the-world.com` . , :
SQL
SELECT
url,
COUNT(*) AS total_certs
FROM
[ctl-lists:certificate_data.scan_data]
WHERE
(REGEXP_MATCH(all_dns_names,r'.*flowers-to-the-world.*'))
GROUP BY
url
ORDER BY
total_certs DESC
Whois , Google, , - . Google, - , Certificate Transparency, .
, . Certificate Transparency.
`flowers-to-the-world.com` Google. , CTL RFC6962. , .
, , , , , .
`flower-to-the-world.com`, , : “C=GB, ST=London, O=Google UK Ltd., OU=Certificate Transparency, CN=Merge Delay Monitor Root”
, .
— NetLas.io. , , , .
, , . , . , — , . Netlas.io " ". — .