{"id":1987,"date":"2024-06-27T10:02:12","date_gmt":"2024-06-27T10:02:12","guid":{"rendered":"https:\/\/www.w3computing.com\/articles\/?p=1987"},"modified":"2024-06-27T10:02:17","modified_gmt":"2024-06-27T10:02:17","slug":"data-pipelines-with-apache-airflow-and-python","status":"publish","type":"post","link":"https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/","title":{"rendered":"Data Pipelines with Apache Airflow and Python"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The ability to efficiently manage and process data has become a critical aspect. Data pipelines are essential for the seamless flow of data from source to destination, ensuring that data is collected, processed, and available for analysis and reporting. Apache Airflow, an open-source platform for orchestrating complex workflows, has become a popular choice for building and managing data pipelines. This tutorial will guide you through setting up and using Apache Airflow with Python to create robust data pipelines.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Prerequisites<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Before we dive into building data pipelines with Apache Airflow and Python, ensure you have the following prerequisites:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Basic knowledge of Python programming<\/li>\n\n\n\n<li>Familiarity with concepts of data processing and ETL (Extract, Transform, Load)<\/li>\n\n\n\n<li>A working Python environment (Python 3.7+ recommended)<\/li>\n\n\n\n<li>Installed Apache Airflow (version 2.0+)<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Overview of Apache Airflow<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Apache Airflow is a platform designed to programmatically author, schedule, and monitor workflows. It allows you to define workflows as Directed Acyclic Graphs (DAGs) using Python code. Each node in a DAG represents a task, and edges define dependencies between these tasks. Airflow executes tasks on a defined schedule and handles task dependencies, retries, logging, and more.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Key Features of Apache Airflow<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Dynamic DAG Creation<\/strong>: DAGs are created using Python code, allowing dynamic generation and customization.<\/li>\n\n\n\n<li><strong>Extensible<\/strong>: Airflow supports custom plugins to extend its capabilities.<\/li>\n\n\n\n<li><strong>Scalable<\/strong>: Airflow can scale horizontally with multiple workers.<\/li>\n\n\n\n<li><strong>Integrations<\/strong>: Airflow supports a wide range of integrations with third-party services and tools.<\/li>\n\n\n\n<li><strong>UI and Monitoring<\/strong>: A rich web interface for monitoring and managing workflows.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Setting Up Apache Airflow<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Installation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To install Apache Airflow, you can use pip. It&#8217;s recommended to use a virtual environment to manage your dependencies.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\"><span class=\"hljs-comment\"># Create and activate a virtual environment<\/span>\npython -m venv airflow_env\n<span class=\"hljs-built_in\">source<\/span> airflow_env\/bin\/activate\n\n<span class=\"hljs-comment\"># Install Apache Airflow<\/span>\npip install apache-airflow\n\n<span class=\"hljs-comment\"># Initialize the Airflow database<\/span>\nairflow db init<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Starting the Web Server and Scheduler<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Airflow consists of several components, including the web server and scheduler. Start these components using the following commands:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\"><span class=\"hljs-comment\"># Start the web server<\/span>\nairflow webserver --port 8080\n\n<span class=\"hljs-comment\"># Open a new terminal and activate the virtual environment, then start the scheduler<\/span>\nairflow scheduler<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Access the Airflow web interface by navigating to <code>http:\/\/localhost:8080<\/code> in your web browser.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Creating Your First DAG<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">DAG Structure<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">A DAG in Airflow is defined as a Python script. The script defines the DAG\u2019s structure (tasks and their dependencies) and metadata (such as schedule and tags). Let\u2019s create a simple DAG that prints \u201cHello, World!\u201d as an example.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example DAG: Hello World<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Create a Python file named <code>hello_world_dag.py<\/code> in the DAGs folder (usually located at <code>~\/airflow\/dags<\/code>).<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">from<\/span> airflow <span class=\"hljs-keyword\">import<\/span> DAG\n<span class=\"hljs-keyword\">from<\/span> airflow.operators.python_operator <span class=\"hljs-keyword\">import<\/span> PythonOperator\n<span class=\"hljs-keyword\">from<\/span> datetime <span class=\"hljs-keyword\">import<\/span> datetime, timedelta\n\n<span class=\"hljs-comment\"># Default arguments for the DAG<\/span>\ndefault_args = {\n    <span class=\"hljs-string\">'owner'<\/span>: <span class=\"hljs-string\">'airflow'<\/span>,\n    <span class=\"hljs-string\">'depends_on_past'<\/span>: <span class=\"hljs-literal\">False<\/span>,\n    <span class=\"hljs-string\">'start_date'<\/span>: datetime(<span class=\"hljs-number\">2023<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">1<\/span>),\n    <span class=\"hljs-string\">'email_on_failure'<\/span>: <span class=\"hljs-literal\">False<\/span>,\n    <span class=\"hljs-string\">'email_on_retry'<\/span>: <span class=\"hljs-literal\">False<\/span>,\n    <span class=\"hljs-string\">'retries'<\/span>: <span class=\"hljs-number\">1<\/span>,\n    <span class=\"hljs-string\">'retry_delay'<\/span>: timedelta(minutes=<span class=\"hljs-number\">5<\/span>),\n}\n\n<span class=\"hljs-comment\"># Define the DAG<\/span>\ndag = DAG(\n    <span class=\"hljs-string\">'hello_world'<\/span>,\n    default_args=default_args,\n    description=<span class=\"hljs-string\">'A simple hello world DAG'<\/span>,\n    schedule_interval=timedelta(days=<span class=\"hljs-number\">1<\/span>),\n)\n\n<span class=\"hljs-comment\"># Python function to be executed<\/span>\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">print_hello<\/span><span class=\"hljs-params\">()<\/span>:<\/span>\n    print(<span class=\"hljs-string\">\"Hello, World!\"<\/span>)\n\n<span class=\"hljs-comment\"># Define the task using PythonOperator<\/span>\nhello_task = PythonOperator(\n    task_id=<span class=\"hljs-string\">'print_hello'<\/span>,\n    python_callable=print_hello,\n    dag=dag,\n)\n\n<span class=\"hljs-comment\"># Set task dependencies (in this case, there are none)<\/span>\nhello_task<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">This DAG will run daily starting from January 1, 2023, and will execute a Python function that prints &#8220;Hello, World!&#8221;.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Deploying the DAG<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Save the <code>hello_world_dag.py<\/code> file in the Airflow DAGs folder. Airflow will automatically detect new DAGs and make them available in the web interface. Check the Airflow UI to see your newly created DAG.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Building a Data Pipeline<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Now that we have a basic understanding of DAGs and tasks, let\u2019s create a more complex data pipeline. In this section, we will build a pipeline that:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Extracts data from an API.<\/li>\n\n\n\n<li>Transforms the data.<\/li>\n\n\n\n<li>Loads the data into a database.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Extract Data from an API<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">First, we need to define a task to extract data from an API. For this example, let\u2019s use a mock API that returns JSON data.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">import<\/span> requests\n\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">extract_data<\/span><span class=\"hljs-params\">()<\/span>:<\/span>\n    response = requests.get(<span class=\"hljs-string\">'https:\/\/jsonplaceholder.typicode.com\/posts'<\/span>)\n    data = response.json()\n    <span class=\"hljs-keyword\">return<\/span> data<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Step 2: Transform Data<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Next, we define a task to transform the extracted data. For simplicity, let\u2019s filter out posts that have userId greater than 5.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">transform_data<\/span><span class=\"hljs-params\">(**kwargs)<\/span>:<\/span>\n    data = kwargs&#91;<span class=\"hljs-string\">'ti'<\/span>].xcom_pull(task_ids=<span class=\"hljs-string\">'extract_data'<\/span>)\n    filtered_data = &#91;post <span class=\"hljs-keyword\">for<\/span> post <span class=\"hljs-keyword\">in<\/span> data <span class=\"hljs-keyword\">if<\/span> post&#91;<span class=\"hljs-string\">'userId'<\/span>] &lt;= <span class=\"hljs-number\">5<\/span>]\n    <span class=\"hljs-keyword\">return<\/span> filtered_data<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Step 3: Load Data into a Database<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, we define a task to load the transformed data into a database. For this example, we\u2019ll print the data to simulate loading it into a database.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">load_data<\/span><span class=\"hljs-params\">(**kwargs)<\/span>:<\/span>\n    data = kwargs&#91;<span class=\"hljs-string\">'ti'<\/span>].xcom_pull(task_ids=<span class=\"hljs-string\">'transform_data'<\/span>)\n    <span class=\"hljs-keyword\">for<\/span> post <span class=\"hljs-keyword\">in<\/span> data:\n        print(post)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Full Data Pipeline DAG<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Now, let\u2019s combine these tasks into a single DAG.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">from<\/span> airflow <span class=\"hljs-keyword\">import<\/span> DAG\n<span class=\"hljs-keyword\">from<\/span> airflow.operators.python_operator <span class=\"hljs-keyword\">import<\/span> PythonOperator\n<span class=\"hljs-keyword\">from<\/span> datetime <span class=\"hljs-keyword\">import<\/span> datetime, timedelta\n<span class=\"hljs-keyword\">import<\/span> requests\n\n<span class=\"hljs-comment\"># Default arguments<\/span>\ndefault_args = {\n    <span class=\"hljs-string\">'owner'<\/span>: <span class=\"hljs-string\">'airflow'<\/span>,\n    <span class=\"hljs-string\">'depends_on_past'<\/span>: <span class=\"hljs-literal\">False<\/span>,\n    <span class=\"hljs-string\">'start_date'<\/span>: datetime(<span class=\"hljs-number\">2023<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">1<\/span>),\n    <span class=\"hljs-string\">'email_on_failure'<\/span>: <span class=\"hljs-literal\">False<\/span>,\n    <span class=\"hljs-string\">'email_on_retry'<\/span>: <span class=\"hljs-literal\">False<\/span>,\n    <span class=\"hljs-string\">'retries'<\/span>: <span class=\"hljs-number\">1<\/span>,\n    <span class=\"hljs-string\">'retry_delay'<\/span>: timedelta(minutes=<span class=\"hljs-number\">5<\/span>),\n}\n\n<span class=\"hljs-comment\"># Define the DAG<\/span>\ndag = DAG(\n    <span class=\"hljs-string\">'data_pipeline'<\/span>,\n    default_args=default_args,\n    description=<span class=\"hljs-string\">'A simple data pipeline'<\/span>,\n    schedule_interval=timedelta(days=<span class=\"hljs-number\">1<\/span>),\n)\n\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">extract_data<\/span><span class=\"hljs-params\">()<\/span>:<\/span>\n    response = requests.get(<span class=\"hljs-string\">'https:\/\/jsonplaceholder.typicode.com\/posts'<\/span>)\n    data = response.json()\n    <span class=\"hljs-keyword\">return<\/span> data\n\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">transform_data<\/span><span class=\"hljs-params\">(**kwargs)<\/span>:<\/span>\n    data = kwargs&#91;<span class=\"hljs-string\">'ti'<\/span>].xcom_pull(task_ids=<span class=\"hljs-string\">'extract_data'<\/span>)\n    filtered_data = &#91;post <span class=\"hljs-keyword\">for<\/span> post <span class=\"hljs-keyword\">in<\/span> data <span class=\"hljs-keyword\">if<\/span> post&#91;<span class=\"hljs-string\">'userId'<\/span>] &lt;= <span class=\"hljs-number\">5<\/span>]\n    <span class=\"hljs-keyword\">return<\/span> filtered_data\n\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">load_data<\/span><span class=\"hljs-params\">(**kwargs)<\/span>:<\/span>\n    data = kwargs&#91;<span class=\"hljs-string\">'ti'<\/span>].xcom_pull(task_ids=<span class=\"hljs-string\">'transform_data'<\/span>)\n    <span class=\"hljs-keyword\">for<\/span> post <span class=\"hljs-keyword\">in<\/span> data:\n        print(post)\n\n<span class=\"hljs-comment\"># Define tasks<\/span>\nextract_task = PythonOperator(\n    task_id=<span class=\"hljs-string\">'extract_data'<\/span>,\n    python_callable=extract_data,\n    dag=dag,\n)\n\ntransform_task = PythonOperator(\n    task_id=<span class=\"hljs-string\">'transform_data'<\/span>,\n    python_callable=transform_data,\n    provide_context=<span class=\"hljs-literal\">True<\/span>,\n    dag=dag,\n)\n\nload_task = PythonOperator(\n    task_id=<span class=\"hljs-string\">'load_data'<\/span>,\n    python_callable=load_data,\n    provide_context=<span class=\"hljs-literal\">True<\/span>,\n    dag=dag,\n)\n\n<span class=\"hljs-comment\"># Set task dependencies<\/span>\nextract_task &gt;&gt; transform_task &gt;&gt; load_task<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Deploying the Data Pipeline<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Save the <code>data_pipeline.py<\/code> file in the Airflow DAGs folder. The new DAG will appear in the Airflow UI, where you can trigger it manually or wait for the scheduled execution.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Advanced Airflow Concepts<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Task Dependencies<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In the previous examples, we defined simple linear dependencies between tasks. However, Airflow allows for more complex dependency structures. You can define tasks that run in parallel or create conditional dependencies.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-8\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Example of parallel tasks<\/span>\ntask1 &gt;&gt; &#91;task2, task3] &gt;&gt; task4<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-8\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Branching<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Branching allows you to execute different tasks based on certain conditions. The <code>BranchPythonOperator<\/code> is used to implement branching.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-9\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">from<\/span> airflow.operators.python_operator <span class=\"hljs-keyword\">import<\/span> BranchPythonOperator\n\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">branch_function<\/span><span class=\"hljs-params\">(**kwargs)<\/span>:<\/span>\n    value = kwargs&#91;<span class=\"hljs-string\">'ti'<\/span>].xcom_pull(task_ids=<span class=\"hljs-string\">'some_task'<\/span>)\n    <span class=\"hljs-keyword\">if<\/span> value == <span class=\"hljs-string\">'condition_a'<\/span>:\n        <span class=\"hljs-keyword\">return<\/span> <span class=\"hljs-string\">'task_a'<\/span>\n    <span class=\"hljs-keyword\">else<\/span>:\n        <span class=\"hljs-keyword\">return<\/span> <span class=\"hljs-string\">'task_b'<\/span>\n\nbranch_task = BranchPythonOperator(\n    task_id=<span class=\"hljs-string\">'branch_task'<\/span>,\n    python_callable=branch_function,\n    provide_context=<span class=\"hljs-literal\">True<\/span>,\n    dag=dag,\n)\n\ntask_a = PythonOperator(\n    task_id=<span class=\"hljs-string\">'task_a'<\/span>,\n    python_callable=task_a_function,\n    dag=dag,\n)\n\ntask_b = PythonOperator(\n    task_id=<span class=\"hljs-string\">'task_b'<\/span>,\n    python_callable=task_b_function,\n    dag=dag,\n)\n\nbranch_task &gt;&gt; &#91;task_a, task_b]<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-9\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Sensors<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Sensors are special types of operators that wait for a certain condition to be met before executing. They are useful for tasks that depend on external events, such as the arrival of a file in an S3 bucket.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-10\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">from<\/span> airflow.operators.sensors <span class=\"hljs-keyword\">import<\/span> S3KeySensor\n\ns3_sensor = S3KeySensor(\n    task_id=<span class=\"hljs-string\">'s3_sensor'<\/span>,\n    bucket_key=<span class=\"hljs-string\">'s3:\/\/my_bucket\/my_key'<\/span>,\n    wildcard_match=<span class=\"hljs-literal\">True<\/span>,\n    aws_conn_id\n\n=<span class=\"hljs-string\">'my_aws_conn'<\/span>,\n    timeout=<span class=\"hljs-number\">18<\/span>*<span class=\"hljs-number\">60<\/span>*<span class=\"hljs-number\">60<\/span>,\n    poke_interval=<span class=\"hljs-number\">120<\/span>,\n    dag=dag,\n)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-10\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">XComs<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">XComs (short for &#8220;cross-communication&#8221;) are a mechanism that allows tasks to exchange messages or small amounts of data. Tasks can push and pull XComs using the <code>xcom_push<\/code> and <code>xcom_pull<\/code> methods.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-11\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">push_xcom<\/span><span class=\"hljs-params\">(**kwargs)<\/span>:<\/span>\n    kwargs&#91;<span class=\"hljs-string\">'ti'<\/span>].xcom_push(key=<span class=\"hljs-string\">'my_key'<\/span>, value=<span class=\"hljs-string\">'my_value'<\/span>)\n\npush_task = PythonOperator(\n    task_id=<span class=\"hljs-string\">'push_task'<\/span>,\n    python_callable=push_xcom,\n    provide_context=<span class=\"hljs-literal\">True<\/span>,\n    dag=dag,\n)\n\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">pull_xcom<\/span><span class=\"hljs-params\">(**kwargs)<\/span>:<\/span>\n    value = kwargs&#91;<span class=\"hljs-string\">'ti'<\/span>].xcom_pull(task_ids=<span class=\"hljs-string\">'push_task'<\/span>, key=<span class=\"hljs-string\">'my_key'<\/span>)\n    print(value)\n\npull_task = PythonOperator(\n    task_id=<span class=\"hljs-string\">'pull_task'<\/span>,\n    python_callable=pull_xcom,\n    provide_context=<span class=\"hljs-literal\">True<\/span>,\n    dag=dag,\n)\n\npush_task &gt;&gt; pull_task<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-11\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Integrating with External Systems<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Airflow provides extensive integration capabilities with various external systems and databases. Here are some examples of common integrations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Integrating with AWS S3<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To interact with AWS S3, you can use the <code>S3Hook<\/code> and <code>S3FileTransformOperator<\/code>.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-12\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">from<\/span> airflow.hooks.S3_hook <span class=\"hljs-keyword\">import<\/span> S3Hook\n<span class=\"hljs-keyword\">from<\/span> airflow.operators.s3_file_transform_operator <span class=\"hljs-keyword\">import<\/span> S3FileTransformOperator\n\ns3_hook = S3Hook(aws_conn_id=<span class=\"hljs-string\">'my_aws_conn'<\/span>)\n\ntransform_task = S3FileTransformOperator(\n    task_id=<span class=\"hljs-string\">'transform_s3_file'<\/span>,\n    source_s3_key=<span class=\"hljs-string\">'s3:\/\/my_bucket\/source_file.csv'<\/span>,\n    dest_s3_key=<span class=\"hljs-string\">'s3:\/\/my_bucket\/dest_file.csv'<\/span>,\n    transform_script=<span class=\"hljs-string\">'transform_script.py'<\/span>,\n    source_aws_conn_id=<span class=\"hljs-string\">'my_aws_conn'<\/span>,\n    dest_aws_conn_id=<span class=\"hljs-string\">'my_aws_conn'<\/span>,\n    replace=<span class=\"hljs-literal\">True<\/span>,\n    dag=dag,\n)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-12\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Integrating with Databases<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Airflow provides operators and hooks for interacting with various databases, such as MySQL, PostgreSQL, and BigQuery.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Example: MySQL<\/h4>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-13\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">from<\/span> airflow.operators.mysql_operator <span class=\"hljs-keyword\">import<\/span> MySqlOperator\n\nmysql_task = MySqlOperator(\n    task_id=<span class=\"hljs-string\">'mysql_task'<\/span>,\n    mysql_conn_id=<span class=\"hljs-string\">'my_mysql_conn'<\/span>,\n    sql=<span class=\"hljs-string\">'SELECT * FROM my_table;'<\/span>,\n    dag=dag,\n)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-13\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h4 class=\"wp-block-heading\">Example: BigQuery<\/h4>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-14\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">from<\/span> airflow.contrib.operators.bigquery_operator <span class=\"hljs-keyword\">import<\/span> BigQueryOperator\n\nbigquery_task = BigQueryOperator(\n    task_id=<span class=\"hljs-string\">'bigquery_task'<\/span>,\n    bql=<span class=\"hljs-string\">'SELECT * FROM `my_project.my_dataset.my_table`;'<\/span>,\n    use_legacy_sql=<span class=\"hljs-literal\">False<\/span>,\n    bigquery_conn_id=<span class=\"hljs-string\">'my_bigquery_conn'<\/span>,\n    dag=dag,\n)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-14\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Integrating with Hadoop and Spark<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For big data processing, Airflow integrates with Hadoop and Spark.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Example: Spark<\/h4>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-15\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">from<\/span> airflow.contrib.operators.spark_submit_operator <span class=\"hljs-keyword\">import<\/span> SparkSubmitOperator\n\nspark_task = SparkSubmitOperator(\n    task_id=<span class=\"hljs-string\">'spark_task'<\/span>,\n    application=<span class=\"hljs-string\">'\/path\/to\/my_spark_application.py'<\/span>,\n    conn_id=<span class=\"hljs-string\">'my_spark_conn'<\/span>,\n    dag=dag,\n)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-15\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Best Practices<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Modularize Your Code<\/strong> &#8211; Keep your DAGs clean and modular by separating logic into different files or functions. Use Python modules to manage complex workflows.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Use Variables and Connections<\/strong> &#8211; Leverage Airflow\u2019s built-in variables and connections to manage configuration settings and credentials. This promotes reusability and reduces hardcoding.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Monitor and Handle Failures<\/strong> &#8211; Implement proper monitoring and alerting to handle task failures. Use retries, alerts, and SLAs (Service Level Agreements) to ensure robust pipelines.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Optimize Performance<\/strong> &#8211; Optimize your DAGs for performance by minimizing task execution time and avoiding bottlenecks. Use parallelism and task concurrency to improve throughput.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Security<\/strong> &#8211; Ensure your Airflow deployment is secure by following best practices such as using secure connections, managing user access, and regularly updating Airflow and its dependencies.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Apache Airflow is a powerful tool for building and managing data pipelines. With its flexible and scalable architecture, it can handle complex workflows and integrate with a variety of external systems. By leveraging Airflow\u2019s capabilities and following best practices, you can create efficient and reliable data pipelines.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction The ability to efficiently manage and process data has become a critical aspect. Data pipelines are essential for the seamless flow of data from source to destination, ensuring that data is collected, processed, and available for analysis and reporting. Apache Airflow, an open-source platform for orchestrating complex workflows, has become a popular choice for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[4,6],"tags":[],"class_list":["post-1987","post","type-post","status-publish","format-standard","category-programming-languages","category-python","entry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Pipelines with Apache Airflow and Python<\/title>\n<meta name=\"description\" content=\"Data pipelines are essential for the seamless flow of data from source to destination, ensuring that data is collected\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Pipelines with Apache Airflow and Python\" \/>\n<meta property=\"og:description\" content=\"Data pipelines are essential for the seamless flow of data from source to destination, ensuring that data is collected\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/\" \/>\n<meta property=\"article:published_time\" content=\"2024-06-27T10:02:12+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-06-27T10:02:17+00:00\" \/>\n<meta name=\"author\" content=\"w3compadmin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"w3compadmin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"TechArticle\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/data-pipelines-with-apache-airflow-and-python\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/data-pipelines-with-apache-airflow-and-python\\\/\"},\"author\":{\"name\":\"w3compadmin\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\"},\"headline\":\"Data Pipelines with Apache Airflow and Python\",\"datePublished\":\"2024-06-27T10:02:12+00:00\",\"dateModified\":\"2024-06-27T10:02:17+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/data-pipelines-with-apache-airflow-and-python\\\/\"},\"wordCount\":989,\"articleSection\":[\"Programming Languages\",\"Python\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/data-pipelines-with-apache-airflow-and-python\\\/\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/data-pipelines-with-apache-airflow-and-python\\\/\",\"name\":\"Data Pipelines with Apache Airflow and Python\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#website\"},\"datePublished\":\"2024-06-27T10:02:12+00:00\",\"dateModified\":\"2024-06-27T10:02:17+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\"},\"description\":\"Data pipelines are essential for the seamless flow of data from source to destination, ensuring that data is collected\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/data-pipelines-with-apache-airflow-and-python\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/data-pipelines-with-apache-airflow-and-python\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/data-pipelines-with-apache-airflow-and-python\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Articles Home\",\"item\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Programming Languages\",\"item\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/programming-languages\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Data Pipelines with Apache Airflow and Python\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#website\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/\",\"name\":\"Developer Articles Hub\",\"description\":\"\",\"alternateName\":\"Developer Articles\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\",\"name\":\"w3compadmin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266\",\"contentUrl\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266\",\"caption\":\"w3compadmin\"},\"sameAs\":[\"http:\\\/\\\/w3computing.com\\\/articles\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Pipelines with Apache Airflow and Python","description":"Data pipelines are essential for the seamless flow of data from source to destination, ensuring that data is collected","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/","og_locale":"en_US","og_type":"article","og_title":"Data Pipelines with Apache Airflow and Python","og_description":"Data pipelines are essential for the seamless flow of data from source to destination, ensuring that data is collected","og_url":"https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/","article_published_time":"2024-06-27T10:02:12+00:00","article_modified_time":"2024-06-27T10:02:17+00:00","author":"w3compadmin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"w3compadmin","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"TechArticle","@id":"https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/#article","isPartOf":{"@id":"https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/"},"author":{"name":"w3compadmin","@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561"},"headline":"Data Pipelines with Apache Airflow and Python","datePublished":"2024-06-27T10:02:12+00:00","dateModified":"2024-06-27T10:02:17+00:00","mainEntityOfPage":{"@id":"https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/"},"wordCount":989,"articleSection":["Programming Languages","Python"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/","url":"https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/","name":"Data Pipelines with Apache Airflow and Python","isPartOf":{"@id":"https:\/\/www.w3computing.com\/articles\/#website"},"datePublished":"2024-06-27T10:02:12+00:00","dateModified":"2024-06-27T10:02:17+00:00","author":{"@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561"},"description":"Data pipelines are essential for the seamless flow of data from source to destination, ensuring that data is collected","breadcrumb":{"@id":"https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.w3computing.com\/articles\/data-pipelines-with-apache-airflow-and-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Articles Home","item":"https:\/\/www.w3computing.com\/articles\/"},{"@type":"ListItem","position":2,"name":"Programming Languages","item":"https:\/\/www.w3computing.com\/articles\/programming-languages\/"},{"@type":"ListItem","position":3,"name":"Data Pipelines with Apache Airflow and Python"}]},{"@type":"WebSite","@id":"https:\/\/www.w3computing.com\/articles\/#website","url":"https:\/\/www.w3computing.com\/articles\/","name":"Developer Articles Hub","description":"","alternateName":"Developer Articles","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.w3computing.com\/articles\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561","name":"w3compadmin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266","url":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266","contentUrl":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266","caption":"w3compadmin"},"sameAs":["http:\/\/w3computing.com\/articles"]}]}},"featured_image_src":null,"featured_image_src_square":null,"author_info":{"display_name":"w3compadmin","author_link":"https:\/\/www.w3computing.com\/articles\/author\/w3compadmin\/"},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/1987","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/comments?post=1987"}],"version-history":[{"count":1,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/1987\/revisions"}],"predecessor-version":[{"id":1988,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/1987\/revisions\/1988"}],"wp:attachment":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/media?parent=1987"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/categories?post=1987"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/tags?post=1987"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}