{"id":2046,"date":"2024-07-04T18:19:04","date_gmt":"2024-07-04T18:19:04","guid":{"rendered":"https:\/\/www.w3computing.com\/articles\/?p=2046"},"modified":"2024-07-04T18:19:09","modified_gmt":"2024-07-04T18:19:09","slug":"building-custom-data-pipelines-with-pandas","status":"publish","type":"post","link":"https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/","title":{"rendered":"Building Custom Data Pipelines with Pandas"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">A data pipeline is a series of processes that automate the extraction, transformation, and loading (ETL) of data from various sources to a destination where it can be analyzed and utilized. Pandas, a powerful data manipulation library in Python, offers a versatile toolkit for constructing custom data pipelines. This tutorial aims to provide a comprehensive guide for non-beginners on how to build custom data pipelines with Pandas.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Prerequisites<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Before diving into the tutorial, you should be familiar with the following concepts:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Basic Python programming<\/li>\n\n\n\n<li>Fundamentals of Pandas (DataFrames, Series, basic operations)<\/li>\n\n\n\n<li>ETL concepts (Extraction, Transformation, Loading)<\/li>\n\n\n\n<li>Basic understanding of file formats (CSV, Excel, JSON, SQL databases)<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Overview of a Data Pipeline<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">A typical data pipeline involves the following steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Data Ingestion<\/strong>: Extracting data from various sources.<\/li>\n\n\n\n<li><strong>Data Cleaning<\/strong>: Handling missing values, removing duplicates, and correcting errors.<\/li>\n\n\n\n<li><strong>Data Transformation<\/strong>: Aggregating, filtering, and reshaping data.<\/li>\n\n\n\n<li><strong>Data Validation<\/strong>: Ensuring the data meets the required quality and standards.<\/li>\n\n\n\n<li><strong>Data Loading<\/strong>: Storing the processed data into a destination for further analysis or use.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Step 1: Data Ingestion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Data ingestion is the first step in any data pipeline. It involves extracting data from various sources such as CSV files, Excel files, databases, APIs, and more. Pandas provides a variety of functions to read data from these sources.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Reading Data from CSV<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">CSV (Comma Separated Values) files are one of the most common data formats. Pandas provides the <code>pd.read_csv<\/code> function to read data from CSV files.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">import<\/span> pandas <span class=\"hljs-keyword\">as<\/span> pd\n\n<span class=\"hljs-comment\"># Reading data from a CSV file<\/span>\ndata = pd.read_csv(<span class=\"hljs-string\">'data.csv'<\/span>)\nprint(data.head())<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Reading Data from Excel<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Excel files are another popular format for storing data. Pandas offers the <code>pd.read_excel<\/code> function to read data from Excel files.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Reading data from an Excel file<\/span>\ndata = pd.read_excel(<span class=\"hljs-string\">'data.xlsx'<\/span>, sheet_name=<span class=\"hljs-string\">'Sheet1'<\/span>)\nprint(data.head())<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Reading Data from Databases<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To read data from SQL databases, you can use the <code>pd.read_sql<\/code> function along with a database connection. For this example, we&#8217;ll use SQLite.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">import<\/span> sqlite3\n\n<span class=\"hljs-comment\"># Establishing a database connection<\/span>\nconn = sqlite3.connect(<span class=\"hljs-string\">'data.db'<\/span>)\n\n<span class=\"hljs-comment\"># Reading data from a SQL database<\/span>\ndata = pd.read_sql(<span class=\"hljs-string\">'SELECT * FROM table_name'<\/span>, conn)\nprint(data.head())<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Step 2: Data Cleaning<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Data cleaning is a crucial step in the data pipeline process. It involves handling missing values, removing duplicates, and correcting data errors to ensure the dataset is accurate and reliable.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Handling Missing Values<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Missing values can be handled in various ways, such as dropping rows\/columns with missing values or filling them with appropriate values.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Dropping rows with missing values<\/span>\ndata.dropna(inplace=<span class=\"hljs-literal\">True<\/span>)\n\n<span class=\"hljs-comment\"># Filling missing values with a specific value<\/span>\ndata.fillna(<span class=\"hljs-number\">0<\/span>, inplace=<span class=\"hljs-literal\">True<\/span>)\n\n<span class=\"hljs-comment\"># Filling missing values with the mean of the column<\/span>\ndata.fillna(data.mean(), inplace=<span class=\"hljs-literal\">True<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Removing Duplicates<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Duplicates can skew analysis and lead to incorrect conclusions. Pandas provides the <code>drop_duplicates<\/code> function to remove duplicate rows.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Removing duplicate rows<\/span>\ndata.drop_duplicates(inplace=<span class=\"hljs-literal\">True<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Correcting Data Errors<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Data errors, such as incorrect data types or invalid values, need to be corrected to ensure data quality.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Converting a column to a specific data type<\/span>\ndata&#91;<span class=\"hljs-string\">'column_name'<\/span>] = data&#91;<span class=\"hljs-string\">'column_name'<\/span>].astype(<span class=\"hljs-string\">'int'<\/span>)\n\n<span class=\"hljs-comment\"># Replacing invalid values with NaN<\/span>\ndata&#91;<span class=\"hljs-string\">'column_name'<\/span>].replace(<span class=\"hljs-string\">'invalid_value'<\/span>, pd.NA, inplace=<span class=\"hljs-literal\">True<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Step 3: Data Transformation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Data transformation involves aggregating, filtering, and reshaping data to prepare it for analysis. Pandas provides a wide range of functions to perform these operations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Aggregating Data<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Aggregation involves summarizing data by grouping it based on specific criteria.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Grouping data and calculating the sum<\/span>\ngrouped_data = data.groupby(<span class=\"hljs-string\">'column_name'<\/span>).sum()\nprint(grouped_data)\n\n<span class=\"hljs-comment\"># Grouping data and calculating multiple aggregations<\/span>\nagg_data = data.groupby(<span class=\"hljs-string\">'column_name'<\/span>).agg({<span class=\"hljs-string\">'col1'<\/span>: <span class=\"hljs-string\">'sum'<\/span>, <span class=\"hljs-string\">'col2'<\/span>: <span class=\"hljs-string\">'mean'<\/span>})\nprint(agg_data)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Filtering Data<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Filtering involves selecting a subset of the data based on specific conditions.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-8\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Filtering data based on a condition<\/span>\nfiltered_data = data&#91;data&#91;<span class=\"hljs-string\">'column_name'<\/span>] &gt; <span class=\"hljs-number\">100<\/span>]\nprint(filtered_data)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-8\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Reshaping Data<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Reshaping involves changing the structure of the data, such as pivoting or melting data frames.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-9\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Pivoting data<\/span>\npivoted_data = data.pivot(index=<span class=\"hljs-string\">'index_column'<\/span>, columns=<span class=\"hljs-string\">'columns_column'<\/span>, values=<span class=\"hljs-string\">'values_column'<\/span>)\nprint(pivoted_data)\n\n<span class=\"hljs-comment\"># Melting data<\/span>\nmelted_data = data.melt(id_vars=&#91;<span class=\"hljs-string\">'id_column'<\/span>], value_vars=&#91;<span class=\"hljs-string\">'value_column1'<\/span>, <span class=\"hljs-string\">'value_column2'<\/span>])\nprint(melted_data)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-9\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Step 4: Data Validation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Data validation ensures that the data meets the required quality and standards. This step involves checking for data consistency, completeness, and accuracy.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Checking Data Consistency<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Data consistency checks ensure that the data follows certain rules and constraints.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-10\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Checking for unique values in a column<\/span>\nis_unique = data&#91;<span class=\"hljs-string\">'column_name'<\/span>].is_unique\nprint(is_unique)\n\n<span class=\"hljs-comment\"># Checking for consistent data types<\/span>\nconsistent_dtypes = data.dtypes == <span class=\"hljs-string\">'int'<\/span>\nprint(consistent_dtypes)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-10\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Ensuring Data Completeness<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Data completeness checks ensure that all required data is present.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-11\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Checking for missing values<\/span>\nmissing_values = data.isnull().sum()\nprint(missing_values)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-11\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Verifying Data Accuracy<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Data accuracy checks ensure that the data is correct and reliable.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-12\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Checking for outliers<\/span>\noutliers = data&#91;(data&#91;<span class=\"hljs-string\">'column_name'<\/span>] &lt; lower_bound) | (data&#91;<span class=\"hljs-string\">'column_name'<\/span>] &gt; upper_bound)]\nprint(outliers)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-12\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Step 5: Data Loading<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The final step in the data pipeline is loading the processed data into a destination for further analysis or use. This could be a file, a database, or a data warehouse.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Writing Data to CSV<\/h3>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-13\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Writing data to a CSV file<\/span>\ndata.to_csv(<span class=\"hljs-string\">'processed_data.csv'<\/span>, index=<span class=\"hljs-literal\">False<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-13\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Writing Data to Excel<\/h3>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-14\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Writing data to an Excel file<\/span>\ndata.to_excel(<span class=\"hljs-string\">'processed_data.xlsx'<\/span>, sheet_name=<span class=\"hljs-string\">'Sheet1'<\/span>, index=<span class=\"hljs-literal\">False<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-14\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Writing Data to Databases<\/h3>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-15\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Writing data to a SQL database<\/span>\ndata.to_sql(<span class=\"hljs-string\">'table_name'<\/span>, conn, if_exists=<span class=\"hljs-string\">'replace'<\/span>, index=<span class=\"hljs-literal\">False<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-15\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Example: Building a Complete Data Pipeline<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s build a complete data pipeline that ingests data from a CSV file, cleans it, transforms it, validates it, and loads it into an Excel file.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Step 1: Data Ingestion<\/strong><\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-16\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">import<\/span> pandas <span class=\"hljs-keyword\">as<\/span> pd\n\n<span class=\"hljs-comment\"># Reading data from a CSV file<\/span>\ndata = pd.read_csv(<span class=\"hljs-string\">'data.csv'<\/span>)\nprint(data.head())<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-16\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\"><strong>Step 2: Data Cleaning<\/strong><\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-17\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Dropping rows with missing values<\/span>\ndata.dropna(inplace=<span class=\"hljs-literal\">True<\/span>)\n\n<span class=\"hljs-comment\"># Removing duplicate rows<\/span>\ndata.drop_duplicates(inplace=<span class=\"hljs-literal\">True<\/span>)\n\n<span class=\"hljs-comment\"># Correcting data errors<\/span>\ndata&#91;<span class=\"hljs-string\">'column_name'<\/span>] = data&#91;<span class=\"hljs-string\">'column_name'<\/span>].astype(<span class=\"hljs-string\">'int'<\/span>)\ndata&#91;<span class=\"hljs-string\">'column_name'<\/span>].replace(<span class=\"hljs-string\">'invalid_value'<\/span>, pd.NA, inplace=<span class=\"hljs-literal\">True<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-17\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\"><strong>Step 3: Data Transformation<\/strong><\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-18\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Aggregating data by grouping and summing<\/span>\ngrouped_data = data.groupby(<span class=\"hljs-string\">'group_column'<\/span>).sum()\n\n<span class=\"hljs-comment\"># Filtering data based on a condition<\/span>\nfiltered_data = grouped_data&#91;grouped_data&#91;<span class=\"hljs-string\">'filter_column'<\/span>] &gt; <span class=\"hljs-number\">100<\/span>]\n\n<span class=\"hljs-comment\"># Reshaping data by pivoting<\/span>\npivoted_data = filtered_data.pivot(index=<span class=\"hljs-string\">'index_column'<\/span>, columns=<span class=\"hljs-string\">'columns_column'<\/span>, values=<span class=\"hljs-string\">'values_column'<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-18\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\"><strong>Step 4: Data Validation<\/strong><\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-19\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Checking for unique values in a column<\/span>\nis_unique = pivoted_data&#91;<span class=\"hljs-string\">'column_name'<\/span>].is_unique\nprint(is_unique)\n\n<span class=\"hljs-comment\"># Checking for missing values<\/span>\nmissing_values = pivoted_data.isnull().sum()\nprint(missing_values)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-19\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\"><strong>Step 5: Data Loading<\/strong><\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-20\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Writing processed data to an Excel file<\/span>\npivoted_data.to_excel(<span class=\"hljs-string\">'processed_data.xlsx'<\/span>, sheet_name=<span class=\"hljs-string\">'Sheet1'<\/span>, index=<span class=\"hljs-literal\">False<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-20\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Advanced Topics<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">For non-beginners, there are several advanced topics and techniques that can enhance the functionality and efficiency of data pipelines built with Pandas.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Using Custom Functions<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Custom functions can be applied to DataFrames for more complex operations.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-21\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-comment\"># Defining a custom function<\/span>\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">custom_function<\/span><span class=\"hljs-params\">(row)<\/span>:<\/span>\n    <span class=\"hljs-keyword\">return<\/span> row&#91;<span class=\"hljs-string\">'column1'<\/span>] * row&#91;<span class=\"hljs-string\">'column2'<\/span>]\n\n<span class=\"hljs-comment\"># Applying the custom function to the DataFrame<\/span>\ndata&#91;<span class=\"hljs-string\">'new_column'<\/span>] = data.apply(custom_function, axis=<span class=\"hljs-number\">1<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-21\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Parallel Processing with Dask<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Dask is a parallel computing library that integrates well with Pandas for handling large datasets.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-22\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">import<\/span> dask.dataframe <span class=\"hljs-keyword\">as<\/span> dd\n\n<span class=\"hljs-comment\"># Reading data with Dask<\/span>\ndata = dd.read_csv(<span class=\"hljs-string\">'large_data.csv'<\/span>)\n\n<span class=\"hljs-comment\"># Performing operations with Dask<\/span>\ndata = data.dropna().drop_duplicates()\ngrouped_data = data.groupby(<span class=\"hljs-string\">'group_column'<\/span>).sum().compute()<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-22\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Automating Data Pipelines with Airflow<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Apache Airflow is a platform for orchestrating complex data pipelines.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-23\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">from<\/span> airflow <span class=\"hljs-keyword\">import<\/span> DAG\n<span class=\"hljs-keyword\">from<\/span> airflow.operators.python_operator <span class=\"hljs-keyword\">import<\/span> PythonOperator\n<span class=\"hljs-keyword\">from<\/span> datetime <span class=\"hljs-keyword\">import<\/span> datetime\n\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">data_pipeline<\/span><span class=\"hljs-params\">()<\/span>:<\/span>\n    <span class=\"hljs-comment\">#<\/span>\n\n Data pipeline code\n    data = pd.read_csv(<span class=\"hljs-string\">'data.csv'<\/span>)\n    data.dropna(inplace=<span class=\"hljs-literal\">True<\/span>)\n    data.drop_duplicates(inplace=<span class=\"hljs-literal\">True<\/span>)\n    data&#91;<span class=\"hljs-string\">'column_name'<\/span>] = data&#91;<span class=\"hljs-string\">'column_name'<\/span>].astype(<span class=\"hljs-string\">'int'<\/span>)\n    data&#91;<span class=\"hljs-string\">'column_name'<\/span>].replace(<span class=\"hljs-string\">'invalid_value'<\/span>, pd.NA, inplace=<span class=\"hljs-literal\">True<\/span>)\n    grouped_data = data.groupby(<span class=\"hljs-string\">'group_column'<\/span>).sum()\n    filtered_data = grouped_data&#91;grouped_data&#91;<span class=\"hljs-string\">'filter_column'<\/span>] &gt; <span class=\"hljs-number\">100<\/span>]\n    pivoted_data = filtered_data.pivot(index=<span class=\"hljs-string\">'index_column'<\/span>, columns=<span class=\"hljs-string\">'columns_column'<\/span>, values=<span class=\"hljs-string\">'values_column'<\/span>)\n    pivoted_data.to_excel(<span class=\"hljs-string\">'processed_data.xlsx'<\/span>, sheet_name=<span class=\"hljs-string\">'Sheet1'<\/span>, index=<span class=\"hljs-literal\">False<\/span>)\n\n<span class=\"hljs-comment\"># Defining the DAG<\/span>\ndag = DAG(<span class=\"hljs-string\">'data_pipeline'<\/span>, description=<span class=\"hljs-string\">'Simple data pipeline'<\/span>, schedule_interval=<span class=\"hljs-string\">'@daily'<\/span>, start_date=datetime(<span class=\"hljs-number\">2023<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">1<\/span>), catchup=<span class=\"hljs-literal\">False<\/span>)\n\n<span class=\"hljs-comment\"># Defining the task<\/span>\ntask = PythonOperator(task_id=<span class=\"hljs-string\">'run_data_pipeline'<\/span>, python_callable=data_pipeline, dag=dag)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-23\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Building custom data pipelines with Pandas involves a series of well-defined steps: data ingestion, cleaning, transformation, validation, and loading. With the power and flexibility of Pandas, along with integration with other tools and libraries, you can create robust and efficient data pipelines tailored to your specific needs. This tutorial has provided a comprehensive guide to get you started and take your data engineering skills to the next level.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction A data pipeline is a series of processes that automate the extraction, transformation, and loading (ETL) of data from various sources to a destination where it can be analyzed and utilized. Pandas, a powerful data manipulation library in Python, offers a versatile toolkit for constructing custom data pipelines. This tutorial aims to provide a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[18,4,6],"tags":[],"class_list":["post-2046","post","type-post","status-publish","format-standard","category-artificial-intelligence","category-programming-languages","category-python","entry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Building Custom Data Pipelines with Pandas<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building Custom Data Pipelines with Pandas\" \/>\n<meta property=\"og:description\" content=\"Introduction A data pipeline is a series of processes that automate the extraction, transformation, and loading (ETL) of data from various sources to a destination where it can be analyzed and utilized. Pandas, a powerful data manipulation library in Python, offers a versatile toolkit for constructing custom data pipelines. This tutorial aims to provide a [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/\" \/>\n<meta property=\"article:published_time\" content=\"2024-07-04T18:19:04+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-07-04T18:19:09+00:00\" \/>\n<meta name=\"author\" content=\"w3compadmin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"w3compadmin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"TechArticle\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/building-custom-data-pipelines-with-pandas\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/building-custom-data-pipelines-with-pandas\\\/\"},\"author\":{\"name\":\"w3compadmin\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\"},\"headline\":\"Building Custom Data Pipelines with Pandas\",\"datePublished\":\"2024-07-04T18:19:04+00:00\",\"dateModified\":\"2024-07-04T18:19:09+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/building-custom-data-pipelines-with-pandas\\\/\"},\"wordCount\":765,\"articleSection\":[\"Artificial Intelligence\",\"Programming Languages\",\"Python\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/building-custom-data-pipelines-with-pandas\\\/\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/building-custom-data-pipelines-with-pandas\\\/\",\"name\":\"Building Custom Data Pipelines with Pandas\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#website\"},\"datePublished\":\"2024-07-04T18:19:04+00:00\",\"dateModified\":\"2024-07-04T18:19:09+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/building-custom-data-pipelines-with-pandas\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/building-custom-data-pipelines-with-pandas\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/building-custom-data-pipelines-with-pandas\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Articles Home\",\"item\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Uncategorized\",\"item\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/uncategorized\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Building Custom Data Pipelines with Pandas\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#website\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/\",\"name\":\"Developer Articles Hub\",\"description\":\"\",\"alternateName\":\"Developer Articles\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\",\"name\":\"w3compadmin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266\",\"contentUrl\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266\",\"caption\":\"w3compadmin\"},\"sameAs\":[\"http:\\\/\\\/w3computing.com\\\/articles\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building Custom Data Pipelines with Pandas","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/","og_locale":"en_US","og_type":"article","og_title":"Building Custom Data Pipelines with Pandas","og_description":"Introduction A data pipeline is a series of processes that automate the extraction, transformation, and loading (ETL) of data from various sources to a destination where it can be analyzed and utilized. Pandas, a powerful data manipulation library in Python, offers a versatile toolkit for constructing custom data pipelines. This tutorial aims to provide a [&hellip;]","og_url":"https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/","article_published_time":"2024-07-04T18:19:04+00:00","article_modified_time":"2024-07-04T18:19:09+00:00","author":"w3compadmin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"w3compadmin","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"TechArticle","@id":"https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/#article","isPartOf":{"@id":"https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/"},"author":{"name":"w3compadmin","@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561"},"headline":"Building Custom Data Pipelines with Pandas","datePublished":"2024-07-04T18:19:04+00:00","dateModified":"2024-07-04T18:19:09+00:00","mainEntityOfPage":{"@id":"https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/"},"wordCount":765,"articleSection":["Artificial Intelligence","Programming Languages","Python"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/","url":"https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/","name":"Building Custom Data Pipelines with Pandas","isPartOf":{"@id":"https:\/\/www.w3computing.com\/articles\/#website"},"datePublished":"2024-07-04T18:19:04+00:00","dateModified":"2024-07-04T18:19:09+00:00","author":{"@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561"},"breadcrumb":{"@id":"https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.w3computing.com\/articles\/building-custom-data-pipelines-with-pandas\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Articles Home","item":"https:\/\/www.w3computing.com\/articles\/"},{"@type":"ListItem","position":2,"name":"Uncategorized","item":"https:\/\/www.w3computing.com\/articles\/uncategorized\/"},{"@type":"ListItem","position":3,"name":"Building Custom Data Pipelines with Pandas"}]},{"@type":"WebSite","@id":"https:\/\/www.w3computing.com\/articles\/#website","url":"https:\/\/www.w3computing.com\/articles\/","name":"Developer Articles Hub","description":"","alternateName":"Developer Articles","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.w3computing.com\/articles\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561","name":"w3compadmin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266","url":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266","contentUrl":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266","caption":"w3compadmin"},"sameAs":["http:\/\/w3computing.com\/articles"]}]}},"featured_image_src":null,"featured_image_src_square":null,"author_info":{"display_name":"w3compadmin","author_link":"https:\/\/www.w3computing.com\/articles\/author\/w3compadmin\/"},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/2046","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/comments?post=2046"}],"version-history":[{"count":3,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/2046\/revisions"}],"predecessor-version":[{"id":2049,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/2046\/revisions\/2049"}],"wp:attachment":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/media?parent=2046"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/categories?post=2046"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/tags?post=2046"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}