{"id":2000,"date":"2024-06-27T21:10:09","date_gmt":"2024-06-27T21:10:09","guid":{"rendered":"https:\/\/www.w3computing.com\/articles\/?p=2000"},"modified":"2024-06-27T21:10:13","modified_gmt":"2024-06-27T21:10:13","slug":"implementing-hyperloglog-for-cardinality-estimation-in-postgresql","status":"publish","type":"post","link":"https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/","title":{"rendered":"Implementing HyperLogLog for Cardinality Estimation in PostgreSQL"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Cardinality estimation is a critical aspect of database management, particularly in optimizing query performance and planning. Traditional methods of exact counting can be resource-intensive, especially when dealing with large datasets. HyperLogLog (HLL) is an algorithm that provides an efficient way to estimate the cardinality (number of distinct elements) of a dataset using a probabilistic approach. PostgreSQL, a powerful open-source relational database system, supports HyperLogLog through extensions, making it possible to leverage this algorithm for efficient cardinality estimation. This tutorial will guide you through the process of implementing HyperLogLog for cardinality estimation in PostgreSQL.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Understanding HyperLogLog<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1.1 What is HyperLogLog?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">HyperLogLog (HLL) is a probabilistic algorithm used for estimating the number of distinct elements (cardinality) in a large dataset. It provides an approximate result with a known, adjustable error rate, making it significantly more space-efficient than exact counting methods.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1.2 How HyperLogLog Works<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">HLL works by hashing the input elements and using the leading zero bits in the hash values to estimate the cardinality. The algorithm uses a fixed-size data structure (typically a register array) to keep track of the maximum number of leading zeros observed for different hash value segments. The cardinality estimate is then derived from these observations using harmonic mean and bias correction techniques.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1.3 Advantages of HyperLogLog<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Space Efficiency<\/strong>: HLL requires significantly less memory compared to exact counting methods.<\/li>\n\n\n\n<li><strong>Scalability<\/strong>: HLL can handle very large datasets efficiently.<\/li>\n\n\n\n<li><strong>Mergeability<\/strong>: HLL registers can be merged to estimate the cardinality of the union of multiple sets.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">2. Setting Up PostgreSQL and the HyperLogLog Extension<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">2.1 Installing PostgreSQL<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">First, ensure you have PostgreSQL installed on your system. You can download and install it from the <a href=\"https:\/\/www.postgresql.org\/download\/\">official PostgreSQL website<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2.2 Installing the HyperLogLog Extension<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">PostgreSQL does not come with built-in support for HyperLogLog, but you can use the <code>hll<\/code> extension, which provides HyperLogLog functionality.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Step 1: Install PostgreSQL&#8217;s contrib package<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Depending on your operating system, you can install the contrib package using the appropriate package manager.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For Debian-based systems (e.g., Ubuntu):<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"Shell Session\" data-shcb-language-slug=\"shell\"><span><code class=\"hljs language-shell\">sudo apt-get install postgresql-contrib<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Shell Session<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">shell<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">For Red Hat-based systems (e.g., CentOS):<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"Shell Session\" data-shcb-language-slug=\"shell\"><span><code class=\"hljs language-shell\">sudo yum install postgresql-contrib<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Shell Session<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">shell<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h4 class=\"wp-block-heading\">Step 2: Install the <code>hll<\/code> extension<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Download and compile the <code>hll<\/code> extension from the <a href=\"https:\/\/github.com\/citusdata\/postgresql-hll\">pg_hll GitHub repository<\/a>.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"Shell Session\" data-shcb-language-slug=\"shell\"><span><code class=\"hljs language-shell\">git clone https:\/\/github.com\/citusdata\/postgresql-hll.git\ncd postgresql-hll\nmake\nsudo make install<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Shell Session<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">shell<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h4 class=\"wp-block-heading\">Step 3: Create the <code>hll<\/code> extension in your database<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Connect to your PostgreSQL database and create the extension:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"SQL (Structured Query Language)\" data-shcb-language-slug=\"sql\"><span><code class=\"hljs language-sql\"><span class=\"hljs-keyword\">CREATE<\/span> EXTENSION hll;<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">SQL (Structured Query Language)<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">sql<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">3. Using HyperLogLog for Cardinality Estimation<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">3.1 Creating an HLL Column<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To use HLL in your tables, you need to define a column of type <code>hll<\/code>.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"SQL (Structured Query Language)\" data-shcb-language-slug=\"sql\"><span><code class=\"hljs language-sql\"><span class=\"hljs-keyword\">CREATE<\/span> <span class=\"hljs-keyword\">TABLE<\/span> user_activities (\n    user_id <span class=\"hljs-built_in\">INT<\/span>,\n    activity <span class=\"hljs-built_in\">VARCHAR<\/span>,\n    activity_time <span class=\"hljs-built_in\">TIMESTAMP<\/span>,\n    hll_col HLL\n);<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">SQL (Structured Query Language)<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">sql<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">3.2 Inserting Data into HLL Columns<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To insert data into an <code>hll<\/code> column, you need to use the <code>hll_add_agg<\/code> aggregate function to convert the data into the HLL format.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"SQL (Structured Query Language)\" data-shcb-language-slug=\"sql\"><span><code class=\"hljs language-sql\"><span class=\"hljs-keyword\">INSERT<\/span> <span class=\"hljs-keyword\">INTO<\/span> user_activities (user_id, activity, activity_time, hll_col)\n<span class=\"hljs-keyword\">VALUES<\/span> (<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-string\">'login'<\/span>, <span class=\"hljs-string\">'2024-06-01 10:00:00'<\/span>, hll_empty()),\n       (<span class=\"hljs-number\">2<\/span>, <span class=\"hljs-string\">'logout'<\/span>, <span class=\"hljs-string\">'2024-06-01 11:00:00'<\/span>, hll_empty()),\n       (<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-string\">'purchase'<\/span>, <span class=\"hljs-string\">'2024-06-01 12:00:00'<\/span>, hll_empty());\n\n<span class=\"hljs-keyword\">UPDATE<\/span> user_activities\n<span class=\"hljs-keyword\">SET<\/span> hll_col = hll_add(hll_col, hll_hash_integer(user_id));<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">SQL (Structured Query Language)<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">sql<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">3.3 Estimating Cardinality<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To estimate the cardinality of the <code>user_id<\/code> column, use the <code>hll_cardinality<\/code> function.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"SQL (Structured Query Language)\" data-shcb-language-slug=\"sql\"><span><code class=\"hljs language-sql\"><span class=\"hljs-keyword\">SELECT<\/span> hll_cardinality(hll_union_agg(hll_col)) <span class=\"hljs-keyword\">AS<\/span> estimated_unique_users\n<span class=\"hljs-keyword\">FROM<\/span> user_activities;<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">SQL (Structured Query Language)<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">sql<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">4. Advanced Usage and Optimization<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">4.1 Tuning HLL Parameters<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">HLL has several parameters that can be tuned to balance between accuracy and memory usage. The <code>log2m<\/code> parameter controls the number of registers, and <code>regwidth<\/code> controls the width of each register.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-8\" data-shcb-language-name=\"SQL (Structured Query Language)\" data-shcb-language-slug=\"sql\"><span><code class=\"hljs language-sql\"><span class=\"hljs-comment\">-- Create an HLL with custom parameters<\/span>\n<span class=\"hljs-keyword\">SELECT<\/span> hll_create_empty(<span class=\"hljs-number\">12<\/span>, <span class=\"hljs-number\">5<\/span>) <span class=\"hljs-keyword\">AS<\/span> custom_hll;\n\n<span class=\"hljs-comment\">-- Use custom HLL parameters in a table<\/span>\n<span class=\"hljs-keyword\">CREATE<\/span> <span class=\"hljs-keyword\">TABLE<\/span> custom_user_activities (\n    user_id <span class=\"hljs-built_in\">INT<\/span>,\n    activity <span class=\"hljs-built_in\">VARCHAR<\/span>,\n    activity_time <span class=\"hljs-built_in\">TIMESTAMP<\/span>,\n    hll_col HLL\n);\n\n<span class=\"hljs-comment\">-- Insert data with custom HLL parameters<\/span>\n<span class=\"hljs-keyword\">INSERT<\/span> <span class=\"hljs-keyword\">INTO<\/span> custom_user_activities (user_id, activity, activity_time, hll_col)\n<span class=\"hljs-keyword\">VALUES<\/span> (<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-string\">'login'<\/span>, <span class=\"hljs-string\">'2024-06-01 10:00:00'<\/span>, hll_create_empty(<span class=\"hljs-number\">12<\/span>, <span class=\"hljs-number\">5<\/span>)),\n       (<span class=\"hljs-number\">2<\/span>, <span class=\"hljs-string\">'logout'<\/span>, <span class=\"hljs-string\">'2024-06-01 11:00:00'<\/span>, hll_create_empty(<span class=\"hljs-number\">12<\/span>, <span class=\"hljs-number\">5<\/span>)),\n       (<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-string\">'purchase'<\/span>, <span class=\"hljs-string\">'2024-06-01 12:00:00'<\/span>, hll_create_empty(<span class=\"hljs-number\">12<\/span>, <span class=\"hljs-number\">5<\/span>));\n\n<span class=\"hljs-keyword\">UPDATE<\/span> custom_user_activities\n<span class=\"hljs-keyword\">SET<\/span> hll_col = hll_add(hll_col, hll_hash_integer(user_id));<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-8\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">SQL (Structured Query Language)<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">sql<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">4.2 Merging HLLs<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">One of the powerful features of HLL is the ability to merge multiple HLLs to get the union&#8217;s cardinality.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-9\" data-shcb-language-name=\"SQL (Structured Query Language)\" data-shcb-language-slug=\"sql\"><span><code class=\"hljs language-sql\"><span class=\"hljs-keyword\">SELECT<\/span> hll_cardinality(hll_union(hll1, hll2)) <span class=\"hljs-keyword\">AS<\/span> merged_cardinality\n<span class=\"hljs-keyword\">FROM<\/span> (\n    <span class=\"hljs-keyword\">SELECT<\/span> hll_add(hll_empty(), hll_hash_integer(<span class=\"hljs-number\">1<\/span>)) <span class=\"hljs-keyword\">AS<\/span> hll1,\n           hll_add(hll_empty(), hll_hash_integer(<span class=\"hljs-number\">2<\/span>)) <span class=\"hljs-keyword\">AS<\/span> hll2\n) <span class=\"hljs-keyword\">AS<\/span> subquery;<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-9\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">SQL (Structured Query Language)<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">sql<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">4.3 Combining HLL with Other PostgreSQL Features<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You can combine HLL with other PostgreSQL features such as window functions, CTEs, and more for advanced analysis.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-10\" data-shcb-language-name=\"SQL (Structured Query Language)\" data-shcb-language-slug=\"sql\"><span><code class=\"hljs language-sql\"><span class=\"hljs-keyword\">WITH<\/span> activity_hll <span class=\"hljs-keyword\">AS<\/span> (\n    <span class=\"hljs-keyword\">SELECT<\/span> activity, hll_add(hll_empty(), hll_hash_integer(user_id)) <span class=\"hljs-keyword\">AS<\/span> hll_col\n    <span class=\"hljs-keyword\">FROM<\/span> user_activities\n    <span class=\"hljs-keyword\">GROUP<\/span> <span class=\"hljs-keyword\">BY<\/span> activity\n)\n<span class=\"hljs-keyword\">SELECT<\/span> activity, hll_cardinality(hll_col) <span class=\"hljs-keyword\">AS<\/span> estimated_unique_users\n<span class=\"hljs-keyword\">FROM<\/span> activity_hll;<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-10\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">SQL (Structured Query Language)<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">sql<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">5. Practical Examples and Use Cases<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">5.1 Web Analytics<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In a web analytics context, you might want to estimate the number of unique visitors to your site.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-11\" data-shcb-language-name=\"SQL (Structured Query Language)\" data-shcb-language-slug=\"sql\"><span><code class=\"hljs language-sql\"><span class=\"hljs-keyword\">CREATE<\/span> <span class=\"hljs-keyword\">TABLE<\/span> web_logs (\n    visitor_id <span class=\"hljs-built_in\">INT<\/span>,\n    page_url <span class=\"hljs-built_in\">VARCHAR<\/span>,\n    visit_time <span class=\"hljs-built_in\">TIMESTAMP<\/span>,\n    hll_col HLL\n);\n\n<span class=\"hljs-keyword\">INSERT<\/span> <span class=\"hljs-keyword\">INTO<\/span> web_logs (visitor_id, page_url, visit_time, hll_col)\n<span class=\"hljs-keyword\">VALUES<\/span> (<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-string\">'\/home'<\/span>, <span class=\"hljs-string\">'2024-06-01 10:00:00'<\/span>, hll_empty()),\n       (<span class=\"hljs-number\">2<\/span>, <span class=\"hljs-string\">'\/about'<\/span>, <span class=\"hljs-string\">'2024-06-01 11:00:00'<\/span>, hll_empty()),\n       (<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-string\">'\/products'<\/span>, <span class=\"hljs-string\">'2024-06-01 12:00:00'<\/span>, hll_empty());\n\n<span class=\"hljs-keyword\">UPDATE<\/span> web_logs\n<span class=\"hljs-keyword\">SET<\/span> hll_col = hll_add(hll_col, hll_hash_integer(visitor_id));\n\n<span class=\"hljs-keyword\">SELECT<\/span> hll_cardinality(hll_union_agg(hll_col)) <span class=\"hljs-keyword\">AS<\/span> estimated_unique_visitors\n<span class=\"hljs-keyword\">FROM<\/span> web_logs;<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-11\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">SQL (Structured Query Language)<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">sql<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">5.2 Monitoring Systems<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In monitoring systems, you might want to estimate the number of unique errors or events over time.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-12\" data-shcb-language-name=\"SQL (Structured Query Language)\" data-shcb-language-slug=\"sql\"><span><code class=\"hljs language-sql\"><span class=\"hljs-keyword\">CREATE<\/span> <span class=\"hljs-keyword\">TABLE<\/span> system_events (\n    event_id <span class=\"hljs-built_in\">SERIAL<\/span> PRIMARY <span class=\"hljs-keyword\">KEY<\/span>,\n    event_type <span class=\"hljs-built_in\">VARCHAR<\/span>,\n    event_time <span class=\"hljs-built_in\">TIMESTAMP<\/span>,\n    hll_col HLL\n);\n\n<span class=\"hljs-keyword\">INSERT<\/span> <span class=\"hljs-keyword\">INTO<\/span> system_events (event_type, event_time, hll_col)\n<span class=\"hljs-keyword\">VALUES<\/span> (<span class=\"hljs-string\">'error'<\/span>, <span class=\"hljs-string\">'2024-06-01 10:00:00'<\/span>, hll_empty()),\n       (<span class=\"hljs-string\">'warning'<\/span>, <span class=\"hljs-string\">'2024-06-01 11:00:00'<\/span>, hll_empty()),\n       (<span class=\"hljs-string\">'error'<\/span>, <span class=\"hljs-string\">'2024-06-01 12:00:00'<\/span>, hll_empty());\n\n<span class=\"hljs-keyword\">UPDATE<\/span> system_events\n<span class=\"hljs-keyword\">SET<\/span> hll_col = hll_add(hll_col, hll_hash_integer(event_id));\n\n<span class=\"hljs-keyword\">SELECT<\/span> hll_cardinality(hll_union_agg(hll_col)) <span class=\"hljs-keyword\">AS<\/span> estimated_unique_events\n<span class=\"hljs-keyword\">FROM<\/span> system_events;<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-12\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">SQL (Structured Query Language)<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">sql<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">5.3 Marketing Analysis<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In marketing, you might want to estimate the number of unique customers participating in different campaigns.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-13\" data-shcb-language-name=\"SQL (Structured Query Language)\" data-shcb-language-slug=\"sql\"><span><code class=\"hljs language-sql\"><span class=\"hljs-keyword\">CREATE<\/span> <span class=\"hljs-keyword\">TABLE<\/span> marketing_campaigns (\n    campaign_id <span class=\"hljs-built_in\">INT<\/span>,\n    customer_id <span class=\"hljs-built_in\">INT<\/span>,\n    participation_time <span class=\"hljs-built_in\">TIMESTAMP<\/span>,\n    hll_col HLL\n);\n\n<span class=\"hljs-keyword\">INSERT<\/span> <span class=\"hljs-keyword\">INTO<\/span> marketing_campaigns (campaign_id, customer_id, participation_time, hll_col)\n<span class=\"hljs-keyword\">VALUES<\/span> (<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-string\">'2024-06-01 10:00:00'<\/span>, hll_empty()),\n       (<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">2<\/span>, <span class=\"hljs-string\">'2024-06-01 11:00\n\n:00'<\/span>, hll_empty()),\n       (<span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-string\">'2024-06-01 12:00:00'<\/span>, hll_empty());\n\n<span class=\"hljs-keyword\">UPDATE<\/span> marketing_campaigns\n<span class=\"hljs-keyword\">SET<\/span> hll_col = hll_add(hll_col, hll_hash_integer(customer_id));\n\n<span class=\"hljs-keyword\">SELECT<\/span> campaign_id, hll_cardinality(hll_union_agg(hll_col)) <span class=\"hljs-keyword\">AS<\/span> estimated_unique_customers\n<span class=\"hljs-keyword\">FROM<\/span> marketing_campaigns\n<span class=\"hljs-keyword\">GROUP<\/span> <span class=\"hljs-keyword\">BY<\/span> campaign_id;<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-13\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">SQL (Structured Query Language)<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">sql<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">HyperLogLog is a powerful tool for cardinality estimation, offering a balance between accuracy and efficiency that is well-suited for large datasets. By leveraging the <code>hll<\/code> extension in PostgreSQL, you can easily implement HyperLogLog for various use cases such as web analytics, monitoring systems, and marketing analysis. This tutorial provided a comprehensive guide on setting up and using HyperLogLog in PostgreSQL, from basic setup to advanced usage and practical examples. By incorporating HLL into your PostgreSQL workflows, you can achieve efficient and scalable cardinality estimation, enhancing the performance and capabilities of your database applications.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Cardinality estimation is a critical aspect of database management, particularly in optimizing query performance and planning. Traditional methods of exact counting can be resource-intensive, especially when dealing with large datasets. HyperLogLog (HLL) is an algorithm that provides an efficient way to estimate the cardinality (number of distinct elements) of a dataset using a probabilistic [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[14,22],"tags":[],"class_list":["post-2000","post","type-post","status-publish","format-standard","category-database-development","category-postgresql","entry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Implementing HyperLogLog for Cardinality Estimation in PostgreSQL<\/title>\n<meta name=\"description\" content=\"Cardinality estimation is a critical aspect of database management, particularly in optimizing query performance and planning.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Implementing HyperLogLog for Cardinality Estimation in PostgreSQL\" \/>\n<meta property=\"og:description\" content=\"Cardinality estimation is a critical aspect of database management, particularly in optimizing query performance and planning.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/\" \/>\n<meta property=\"article:published_time\" content=\"2024-06-27T21:10:09+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-06-27T21:10:13+00:00\" \/>\n<meta name=\"author\" content=\"w3compadmin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"w3compadmin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"TechArticle\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\\\/\"},\"author\":{\"name\":\"w3compadmin\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\"},\"headline\":\"Implementing HyperLogLog for Cardinality Estimation in PostgreSQL\",\"datePublished\":\"2024-06-27T21:10:09+00:00\",\"dateModified\":\"2024-06-27T21:10:13+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\\\/\"},\"wordCount\":667,\"articleSection\":[\"Database Development\",\"PostgreSQL\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\\\/\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\\\/\",\"name\":\"Implementing HyperLogLog for Cardinality Estimation in PostgreSQL\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#website\"},\"datePublished\":\"2024-06-27T21:10:09+00:00\",\"dateModified\":\"2024-06-27T21:10:13+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\"},\"description\":\"Cardinality estimation is a critical aspect of database management, particularly in optimizing query performance and planning.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Articles Home\",\"item\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Database Development\",\"item\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/database-development\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Implementing HyperLogLog for Cardinality Estimation in PostgreSQL\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#website\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/\",\"name\":\"Developer Articles Hub\",\"description\":\"\",\"alternateName\":\"Developer Articles\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\",\"name\":\"w3compadmin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266\",\"contentUrl\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266\",\"caption\":\"w3compadmin\"},\"sameAs\":[\"http:\\\/\\\/w3computing.com\\\/articles\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Implementing HyperLogLog for Cardinality Estimation in PostgreSQL","description":"Cardinality estimation is a critical aspect of database management, particularly in optimizing query performance and planning.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/","og_locale":"en_US","og_type":"article","og_title":"Implementing HyperLogLog for Cardinality Estimation in PostgreSQL","og_description":"Cardinality estimation is a critical aspect of database management, particularly in optimizing query performance and planning.","og_url":"https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/","article_published_time":"2024-06-27T21:10:09+00:00","article_modified_time":"2024-06-27T21:10:13+00:00","author":"w3compadmin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"w3compadmin","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"TechArticle","@id":"https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/#article","isPartOf":{"@id":"https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/"},"author":{"name":"w3compadmin","@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561"},"headline":"Implementing HyperLogLog for Cardinality Estimation in PostgreSQL","datePublished":"2024-06-27T21:10:09+00:00","dateModified":"2024-06-27T21:10:13+00:00","mainEntityOfPage":{"@id":"https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/"},"wordCount":667,"articleSection":["Database Development","PostgreSQL"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/","url":"https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/","name":"Implementing HyperLogLog for Cardinality Estimation in PostgreSQL","isPartOf":{"@id":"https:\/\/www.w3computing.com\/articles\/#website"},"datePublished":"2024-06-27T21:10:09+00:00","dateModified":"2024-06-27T21:10:13+00:00","author":{"@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561"},"description":"Cardinality estimation is a critical aspect of database management, particularly in optimizing query performance and planning.","breadcrumb":{"@id":"https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.w3computing.com\/articles\/implementing-hyperloglog-for-cardinality-estimation-in-postgresql\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Articles Home","item":"https:\/\/www.w3computing.com\/articles\/"},{"@type":"ListItem","position":2,"name":"Database Development","item":"https:\/\/www.w3computing.com\/articles\/database-development\/"},{"@type":"ListItem","position":3,"name":"Implementing HyperLogLog for Cardinality Estimation in PostgreSQL"}]},{"@type":"WebSite","@id":"https:\/\/www.w3computing.com\/articles\/#website","url":"https:\/\/www.w3computing.com\/articles\/","name":"Developer Articles Hub","description":"","alternateName":"Developer Articles","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.w3computing.com\/articles\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561","name":"w3compadmin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266","url":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266","contentUrl":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1780141266","caption":"w3compadmin"},"sameAs":["http:\/\/w3computing.com\/articles"]}]}},"featured_image_src":null,"featured_image_src_square":null,"author_info":{"display_name":"w3compadmin","author_link":"https:\/\/www.w3computing.com\/articles\/author\/w3compadmin\/"},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/2000","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/comments?post=2000"}],"version-history":[{"count":1,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/2000\/revisions"}],"predecessor-version":[{"id":2001,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/2000\/revisions\/2001"}],"wp:attachment":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/media?parent=2000"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/categories?post=2000"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/tags?post=2000"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}