Как мы оркестрируем процессы обработки данных с помощью Apache Airflow

! , Lamoda. Airflow , Hadoop , ML , , , , /- .



image



:



  • Airflow,
  • Airflow: , DAG, Operator
  • Airflow
  • « »
  • Airflow


Airflow



Airflow – , . open source , Python, 2014 Airbnb. 2016 Airflow Apache Software Foundation, 2019 top-level Apache.



ETL-, ETL , , , Pentaho, Informatica PowerCenter, Talend . Airflow – , “cron ”: , , , . Hive Spark .



Airflow, worker ( ), . , , .



Airflow - Hadoop . Python-, Bash , Docker Kubernetes, .



Airflow



image



Airflow, Lamoda . - scheduler, . , ML Vowpal Wabbit. .



Airflow ( ) , - . , .



Airflow



Webserver



Webserver – -, , . :



image



- . . , : , , , .



, Graph View. .



image



Graph View Tree View. , . , – .



– , – . – . , , , , .



image



Scheduler – , , . Python-, , , . , Scheduler – Airflow.



  • , Scheduler’a. , High Availability ( Scheduler HA Airflow 2.0).
  • : , - . , - , .


- Airflow, . , Airflow – real-time . ( ), . , 5 – , 10 . , 10 , .



Worker



Worker – , . Airflow :



  • , – SequentialExecutor. , .
  • LocalExecutor , , LocalExecutor . : - SQLite, LocalExecutor SequentialExecutor.
  • CeleryExecutor , . Celery – , RabbitMQ Redis. , .
  • DaskExecutor Dask – .
  • KubernetesExecutor pod Kubernetes.
  • DebugExecutor IDE.


Apache Airflow



, DAG



Airflow – DAG, , . , , .



, . : , , SLA. , .






All Articles