{"id":2039,"date":"2024-07-03T20:15:11","date_gmt":"2024-07-03T20:15:11","guid":{"rendered":"https:\/\/www.w3computing.com\/articles\/?p=2039"},"modified":"2024-07-03T20:15:16","modified_gmt":"2024-07-03T20:15:16","slug":"use-kubernetes-with-apache-flink-for-real-time-data-processing","status":"publish","type":"post","link":"https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/","title":{"rendered":"Use Kubernetes with Apache Flink for Real-Time Data Processing"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Apache Flink is a powerful stream processing framework used for processing large volumes of data in real-time. Kubernetes, on the other hand, is a container orchestration platform that automates the deployment, scaling, and management of containerized applications. Combining Apache Flink with Kubernetes provides a robust solution for managing and scaling real-time data processing applications. This tutorial aims to provide an in-depth guide on how to deploy and manage Apache Flink applications on Kubernetes for real-time data processing. We assume that the reader has a basic understanding of both Apache Flink and Kubernetes.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Prerequisites<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Before we dive into the details, ensure you have the following prerequisites:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>A working Kubernetes cluster (Minikube, GKE, EKS, or AKS can be used for development and testing).<\/li>\n\n\n\n<li>kubectl command-line tool configured to interact with your Kubernetes cluster.<\/li>\n\n\n\n<li>Apache Flink binaries or a Docker image of Apache Flink.<\/li>\n\n\n\n<li>Basic understanding of Kubernetes concepts such as Pods, Services, Deployments, and ConfigMaps.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Setting Up Kubernetes Cluster<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">If you do not have a Kubernetes cluster set up, you can use Minikube for local development and testing. Here is a quick guide to set up Minikube:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Install Minikube<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Follow the installation guide for your operating system from the <a href=\"https:\/\/minikube.sigs.k8s.io\/docs\/start\/\">official Minikube documentation<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Start Minikube<\/h3>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"Shell Session\" data-shcb-language-slug=\"shell\"><span><code class=\"hljs language-shell\">minikube start --cpus 4 --memory 8192<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Shell Session<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">shell<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">This command starts a Minikube cluster with 4 CPUs and 8GB of memory, which should be sufficient for running Apache Flink.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Verify Minikube<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Ensure Minikube is running correctly:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"Shell Session\" data-shcb-language-slug=\"shell\"><span><code class=\"hljs language-shell\">kubectl get nodes<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Shell Session<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">shell<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">You should see a node named <code>minikube<\/code> in the <code>Ready<\/code> state.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Deploying Apache Flink on Kubernetes<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Now that we have a Kubernetes cluster up and running, let&#8217;s deploy Apache Flink. We will use a Docker image of Apache Flink for this purpose.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Create a Namespace<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">First, create a namespace for Flink to keep the resources isolated:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"YAML\" data-shcb-language-slug=\"yaml\"><span><code class=\"hljs language-yaml\"><span class=\"hljs-string\">kubectl<\/span> <span class=\"hljs-string\">create<\/span> <span class=\"hljs-string\">namespace<\/span> <span class=\"hljs-string\">flink<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">YAML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">yaml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Flink Docker Image<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We will use the official Apache Flink Docker image. You can also build your own custom image if needed. The official Flink Docker images are available on Docker Hub.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Deploy Flink JobManager<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The JobManager is the central coordinator of a Flink cluster, responsible for scheduling tasks, managing checkpoints, and handling job lifecycle events. We will create a Deployment and a Service for the JobManager.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">JobManager Deployment<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Create a file named <code>jobmanager-deployment.yaml<\/code> with the following content:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"YAML\" data-shcb-language-slug=\"yaml\"><span><code class=\"hljs language-yaml\"><span class=\"hljs-attr\">apiVersion:<\/span> <span class=\"hljs-string\">apps\/v1<\/span>\n<span class=\"hljs-attr\">kind:<\/span> <span class=\"hljs-string\">Deployment<\/span>\n<span class=\"hljs-attr\">metadata:<\/span>\n  <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">flink-jobmanager<\/span>\n  <span class=\"hljs-attr\">namespace:<\/span> <span class=\"hljs-string\">flink<\/span>\n<span class=\"hljs-attr\">spec:<\/span>\n  <span class=\"hljs-attr\">replicas:<\/span> <span class=\"hljs-number\">1<\/span>\n  <span class=\"hljs-attr\">selector:<\/span>\n    <span class=\"hljs-attr\">matchLabels:<\/span>\n      <span class=\"hljs-attr\">app:<\/span> <span class=\"hljs-string\">flink<\/span>\n      <span class=\"hljs-attr\">component:<\/span> <span class=\"hljs-string\">jobmanager<\/span>\n  <span class=\"hljs-attr\">template:<\/span>\n    <span class=\"hljs-attr\">metadata:<\/span>\n      <span class=\"hljs-attr\">labels:<\/span>\n        <span class=\"hljs-attr\">app:<\/span> <span class=\"hljs-string\">flink<\/span>\n        <span class=\"hljs-attr\">component:<\/span> <span class=\"hljs-string\">jobmanager<\/span>\n    <span class=\"hljs-attr\">spec:<\/span>\n      <span class=\"hljs-attr\">containers:<\/span>\n        <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">jobmanager<\/span>\n          <span class=\"hljs-attr\">image:<\/span> <span class=\"hljs-string\">flink:latest<\/span>\n          <span class=\"hljs-attr\">args:<\/span> <span class=\"hljs-string\">&#91;\"jobmanager\"]<\/span>\n          <span class=\"hljs-attr\">ports:<\/span>\n            <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">containerPort:<\/span> <span class=\"hljs-number\">6123<\/span>\n              <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">rpc<\/span>\n            <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">containerPort:<\/span> <span class=\"hljs-number\">8081<\/span>\n              <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">web<\/span>\n          <span class=\"hljs-attr\">env:<\/span>\n            <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">JOB_MANAGER_RPC_ADDRESS<\/span>\n              <span class=\"hljs-attr\">value:<\/span> <span class=\"hljs-string\">flink-jobmanager<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">YAML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">yaml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h4 class=\"wp-block-heading\">JobManager Service<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Create a file named <code>jobmanager-service.yaml<\/code> with the following content:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"YAML\" data-shcb-language-slug=\"yaml\"><span><code class=\"hljs language-yaml\"><span class=\"hljs-attr\">apiVersion:<\/span> <span class=\"hljs-string\">v1<\/span>\n<span class=\"hljs-attr\">kind:<\/span> <span class=\"hljs-string\">Service<\/span>\n<span class=\"hljs-attr\">metadata:<\/span>\n  <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">flink-jobmanager<\/span>\n  <span class=\"hljs-attr\">namespace:<\/span> <span class=\"hljs-string\">flink<\/span>\n<span class=\"hljs-attr\">spec:<\/span>\n  <span class=\"hljs-attr\">ports:<\/span>\n    <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">port:<\/span> <span class=\"hljs-number\">6123<\/span>\n      <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">rpc<\/span>\n    <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">port:<\/span> <span class=\"hljs-number\">8081<\/span>\n      <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">web<\/span>\n  <span class=\"hljs-attr\">selector:<\/span>\n    <span class=\"hljs-attr\">app:<\/span> <span class=\"hljs-string\">flink<\/span>\n    <span class=\"hljs-attr\">component:<\/span> <span class=\"hljs-string\">jobmanager<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">YAML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">yaml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Deploy the JobManager resources:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"Shell Session\" data-shcb-language-slug=\"shell\"><span><code class=\"hljs language-shell\">kubectl apply -f jobmanager-deployment.yaml\nkubectl apply -f jobmanager-service.yaml<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Shell Session<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">shell<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Deploy Flink TaskManagers<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">TaskManagers are the worker nodes in a Flink cluster. They execute the tasks assigned by the JobManager.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">TaskManager Deployment<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Create a file named <code>taskmanager-deployment.yaml<\/code> with the following content:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"YAML\" data-shcb-language-slug=\"yaml\"><span><code class=\"hljs language-yaml\"><span class=\"hljs-attr\">apiVersion:<\/span> <span class=\"hljs-string\">apps\/v1<\/span>\n<span class=\"hljs-attr\">kind:<\/span> <span class=\"hljs-string\">Deployment<\/span>\n<span class=\"hljs-attr\">metadata:<\/span>\n  <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">flink-taskmanager<\/span>\n  <span class=\"hljs-attr\">namespace:<\/span> <span class=\"hljs-string\">flink<\/span>\n<span class=\"hljs-attr\">spec:<\/span>\n  <span class=\"hljs-attr\">replicas:<\/span> <span class=\"hljs-number\">2<\/span>\n  <span class=\"hljs-attr\">selector:<\/span>\n    <span class=\"hljs-attr\">matchLabels:<\/span>\n      <span class=\"hljs-attr\">app:<\/span> <span class=\"hljs-string\">flink<\/span>\n      <span class=\"hljs-attr\">component:<\/span> <span class=\"hljs-string\">taskmanager<\/span>\n  <span class=\"hljs-attr\">template:<\/span>\n    <span class=\"hljs-attr\">metadata:<\/span>\n      <span class=\"hljs-attr\">labels:<\/span>\n        <span class=\"hljs-attr\">app:<\/span> <span class=\"hljs-string\">flink<\/span>\n        <span class=\"hljs-attr\">component:<\/span> <span class=\"hljs-string\">taskmanager<\/span>\n    <span class=\"hljs-attr\">spec:<\/span>\n      <span class=\"hljs-attr\">containers:<\/span>\n        <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">taskmanager<\/span>\n          <span class=\"hljs-attr\">image:<\/span> <span class=\"hljs-string\">flink:latest<\/span>\n          <span class=\"hljs-attr\">args:<\/span> <span class=\"hljs-string\">&#91;\"taskmanager\"]<\/span>\n          <span class=\"hljs-attr\">ports:<\/span>\n            <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">containerPort:<\/span> <span class=\"hljs-number\">6121<\/span>\n              <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">data<\/span>\n            <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">containerPort:<\/span> <span class=\"hljs-number\">6122<\/span>\n              <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">rpc<\/span>\n          <span class=\"hljs-attr\">env:<\/span>\n            <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">JOB_MANAGER_RPC_ADDRESS<\/span>\n              <span class=\"hljs-attr\">value:<\/span> <span class=\"hljs-string\">flink-jobmanager<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">YAML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">yaml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Deploy the TaskManager resources:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-8\" data-shcb-language-name=\"Shell Session\" data-shcb-language-slug=\"shell\"><span><code class=\"hljs language-shell\">kubectl apply -f taskmanager-deployment.yaml<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-8\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Shell Session<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">shell<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Verify the Deployment<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Check the status of the deployments:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-9\" data-shcb-language-name=\"Shell Session\" data-shcb-language-slug=\"shell\"><span><code class=\"hljs language-shell\">kubectl get deployments -n flink<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-9\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Shell Session<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">shell<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">You should see both the <code>flink-jobmanager<\/code> and <code>flink-taskmanager<\/code> deployments in the <code>Available<\/code> state.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Check the status of the pods:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-10\" data-shcb-language-name=\"Shell Session\" data-shcb-language-slug=\"shell\"><span><code class=\"hljs language-shell\">kubectl get pods -n flink<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-10\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Shell Session<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">shell<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">You should see one <code>flink-jobmanager<\/code> pod and two <code>flink-taskmanager<\/code> pods running.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Accessing Flink Web Dashboard<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To access the Flink web dashboard, you need to expose the JobManager service. You can use a port-forward for local development:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-11\" data-shcb-language-name=\"Shell Session\" data-shcb-language-slug=\"shell\"><span><code class=\"hljs language-shell\">kubectl port-forward svc\/flink-jobmanager 8081:8081 -n flink<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-11\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Shell Session<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">shell<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Now, you can access the Flink web dashboard at <code>http:\/\/localhost:8081<\/code>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Running a Flink Job on Kubernetes<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">With the Flink cluster up and running, let&#8217;s run a Flink job. For demonstration purposes, we will use a sample Flink job that performs word count on a stream of text data.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Sample Flink Job<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Save the following Flink job code in a file named <code>WordCount.java<\/code>:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-12\" data-shcb-language-name=\"Java\" data-shcb-language-slug=\"java\"><span><code class=\"hljs language-java\"><span class=\"hljs-keyword\">import<\/span> org.apache.flink.api.common.functions.FlatMapFunction;\n<span class=\"hljs-keyword\">import<\/span> org.apache.flink.api.java.tuple.Tuple2;\n<span class=\"hljs-keyword\">import<\/span> org.apache.flink.streaming.api.datastream.DataStream;\n<span class=\"hljs-keyword\">import<\/span> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;\n<span class=\"hljs-keyword\">import<\/span> org.apache.flink.util.Collector;\n\n<span class=\"hljs-keyword\">public<\/span> <span class=\"hljs-class\"><span class=\"hljs-keyword\">class<\/span> <span class=\"hljs-title\">WordCount<\/span> <\/span>{\n    <span class=\"hljs-function\"><span class=\"hljs-keyword\">public<\/span> <span class=\"hljs-keyword\">static<\/span> <span class=\"hljs-keyword\">void<\/span> <span class=\"hljs-title\">main<\/span><span class=\"hljs-params\">(String&#91;] args)<\/span> <span class=\"hljs-keyword\">throws<\/span> Exception <\/span>{\n        <span class=\"hljs-comment\">\/\/ set up the execution environment<\/span>\n        <span class=\"hljs-keyword\">final<\/span> StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();\n\n        <span class=\"hljs-comment\">\/\/ create a DataStream from socket text stream<\/span>\n        DataStream&lt;String&gt; text = env.socketTextStream(<span class=\"hljs-string\">\"localhost\"<\/span>, <span class=\"hljs-number\">9999<\/span>);\n\n        <span class=\"hljs-comment\">\/\/ parse the data, group it, window it, and aggregate the counts<\/span>\n        DataStream&lt;Tuple2&lt;String, Integer&gt;&gt; counts = text\n                .flatMap(<span class=\"hljs-keyword\">new<\/span> Tokenizer())\n                .keyBy(value -&gt; value.f0)\n                .sum(<span class=\"hljs-number\">1<\/span>);\n\n        <span class=\"hljs-comment\">\/\/ print the results<\/span>\n        counts.print();\n\n        <span class=\"hljs-comment\">\/\/ execute program<\/span>\n        env.execute(<span class=\"hljs-string\">\"Streaming WordCount\"<\/span>);\n    }\n\n    <span class=\"hljs-comment\">\/\/ Tokenizer function to split text into words<\/span>\n    <span class=\"hljs-keyword\">public<\/span> <span class=\"hljs-keyword\">static<\/span> <span class=\"hljs-keyword\">final<\/span> <span class=\"hljs-class\"><span class=\"hljs-keyword\">class<\/span> <span class=\"hljs-title\">Tokenizer<\/span> <span class=\"hljs-keyword\">implements<\/span> <span class=\"hljs-title\">FlatMapFunction<\/span>&lt;<span class=\"hljs-title\">String<\/span>, <span class=\"hljs-title\">Tuple2<\/span>&lt;<span class=\"hljs-title\">String<\/span>, <span class=\"hljs-title\">Integer<\/span>&gt;&gt; <\/span>{\n        <span class=\"hljs-meta\">@Override<\/span>\n        <span class=\"hljs-function\"><span class=\"hljs-keyword\">public<\/span> <span class=\"hljs-keyword\">void<\/span> <span class=\"hljs-title\">flatMap<\/span><span class=\"hljs-params\">(String value, Collector&lt;Tuple2&lt;String, Integer&gt;&gt; out)<\/span> <\/span>{\n            <span class=\"hljs-keyword\">for<\/span> (String word : value.split(<span class=\"hljs-string\">\"\\\\s\"<\/span>)) {\n                <span class=\"hljs-keyword\">if<\/span> (word.length() &gt; <span class=\"hljs-number\">0<\/span>) {\n                    out.collect(<span class=\"hljs-keyword\">new<\/span> Tuple2&lt;&gt;(word, <span class=\"hljs-number\">1<\/span>));\n                }\n            }\n        }\n    }\n}<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-12\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Java<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">java<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Build and Package the Flink Job<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Compile and package the Flink job into a JAR file using Maven or any other build tool. Ensure you have Maven installed, then run the following command in the directory containing the <code>pom.xml<\/code> file:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-13\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">mvn clean package<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-13\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">This command will generate a JAR file in the <code>target<\/code> directory.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Submit the Flink Job<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Submit the Flink job to the Flink cluster running on Kubernetes using the Flink web dashboard or the Flink CLI.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Using Flink Web Dashboard<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Open the Flink web dashboard at <code>http:\/\/localhost:8081<\/code>.<\/li>\n\n\n\n<li>Navigate to the &#8220;Submit new job&#8221; page.<\/li>\n\n\n\n<li>Upload the JAR file and submit the job.<\/li>\n<\/ol>\n\n\n\n<h4 class=\"wp-block-heading\">Using Flink CLI<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">You can also use the Flink CLI to submit the job. First, copy the JAR file to the JobManager pod:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-14\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">kubectl cp target\/your-flink-job.jar flink\/flink-jobmanager:\/your-flink-job.jar<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-14\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Then, submit the job using the Flink CLI:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-15\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">kubectl <span class=\"hljs-built_in\">exec<\/span> -it $(kubectl get pods -n flink -l app=flink,component=jobmanager -o jsonpath=<span class=\"hljs-string\">'{.items&#91;0].metadata.name}'<\/span>) -n flink -- \/bin\/sh\nflink run \/your-flink-job.jar<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-15\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Replace <code>\/your-flink-job.jar<\/code> with the path to your JAR file in the JobManager pod.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Verifying the Job<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Once the job is submitted, you can verify its execution through the Flink web dashboard. You should see the job listed under &#8220;Running Jobs&#8221; with details about its execution.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Sending Data to the Flink Job<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The sample Flink job expects a text stream from a socket. To send data to the Flink job, you can use <code>nc<\/code> (netcat) to create a socket server:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-16\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">nc -lk 9999<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-16\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Now, type some text into the terminal, and you<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">should see the word counts appearing in the Flink web dashboard.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Scaling Flink Cluster on Kubernetes<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">One of the key benefits of running Apache Flink on Kubernetes is the ease of scaling the cluster. You can scale the number of TaskManager replicas to increase the processing capacity.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Scaling TaskManagers<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To scale the TaskManagers, update the <code>replicas<\/code> field in the <code>taskmanager-deployment.yaml<\/code> file and apply the changes:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-17\" data-shcb-language-name=\"YAML\" data-shcb-language-slug=\"yaml\"><span><code class=\"hljs language-yaml\"><span class=\"hljs-attr\">spec:<\/span>\n  <span class=\"hljs-attr\">replicas:<\/span> <span class=\"hljs-number\">4<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-17\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">YAML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">yaml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Apply the changes:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-18\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">kubectl apply -f taskmanager-deployment.yaml<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-18\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Verify the scaling operation:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-19\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">kubectl get pods -n flink<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-19\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">You should see four <code>flink-taskmanager<\/code> pods running.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Auto-scaling with Kubernetes<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Kubernetes supports Horizontal Pod Autoscaling (HPA) based on CPU utilization or custom metrics. To enable auto-scaling for Flink TaskManagers, you need to set up metrics and create an HPA resource.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Metrics Server<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">First, install the Kubernetes Metrics Server:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-20\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">kubectl apply -f https:\/\/github.com\/kubernetes-sigs\/metrics-server\/releases\/latest\/download\/components.yaml<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-20\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Verify the Metrics Server:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-21\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">kubectl get deployment metrics-server -n kube-system<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-21\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h4 class=\"wp-block-heading\">Create HPA for TaskManagers<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Create a file named <code>taskmanager-hpa.yaml<\/code> with the following content:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-22\" data-shcb-language-name=\"YAML\" data-shcb-language-slug=\"yaml\"><span><code class=\"hljs language-yaml\"><span class=\"hljs-attr\">apiVersion:<\/span> <span class=\"hljs-string\">autoscaling\/v1<\/span>\n<span class=\"hljs-attr\">kind:<\/span> <span class=\"hljs-string\">HorizontalPodAutoscaler<\/span>\n<span class=\"hljs-attr\">metadata:<\/span>\n  <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">flink-taskmanager-hpa<\/span>\n  <span class=\"hljs-attr\">namespace:<\/span> <span class=\"hljs-string\">flink<\/span>\n<span class=\"hljs-attr\">spec:<\/span>\n  <span class=\"hljs-attr\">scaleTargetRef:<\/span>\n    <span class=\"hljs-attr\">apiVersion:<\/span> <span class=\"hljs-string\">apps\/v1<\/span>\n    <span class=\"hljs-attr\">kind:<\/span> <span class=\"hljs-string\">Deployment<\/span>\n    <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">flink-taskmanager<\/span>\n  <span class=\"hljs-attr\">minReplicas:<\/span> <span class=\"hljs-number\">2<\/span>\n  <span class=\"hljs-attr\">maxReplicas:<\/span> <span class=\"hljs-number\">10<\/span>\n  <span class=\"hljs-attr\">targetCPUUtilizationPercentage:<\/span> <span class=\"hljs-number\">50<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-22\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">YAML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">yaml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Apply the HPA resource:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-23\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">kubectl apply -f taskmanager-hpa.yaml<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-23\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Verify the HPA:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-24\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">kubectl get hpa -n flink<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-24\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">The HPA will automatically scale the number of TaskManager replicas based on the CPU utilization.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Advanced Configuration and Monitoring<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Configuring Flink Properties<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You can configure Flink properties using ConfigMaps in Kubernetes. Create a ConfigMap with the desired Flink configuration and mount it as a volume in the JobManager and TaskManager pods.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Create a file named <code>flink-configmap.yaml<\/code> with the following content:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-25\" data-shcb-language-name=\"YAML\" data-shcb-language-slug=\"yaml\"><span><code class=\"hljs language-yaml\"><span class=\"hljs-attr\">apiVersion:<\/span> <span class=\"hljs-string\">v1<\/span>\n<span class=\"hljs-attr\">kind:<\/span> <span class=\"hljs-string\">ConfigMap<\/span>\n<span class=\"hljs-attr\">metadata:<\/span>\n  <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">flink-config<\/span>\n  <span class=\"hljs-attr\">namespace:<\/span> <span class=\"hljs-string\">flink<\/span>\n<span class=\"hljs-attr\">data:<\/span>\n  <span class=\"hljs-attr\">flink-conf.yaml:<\/span> <span class=\"hljs-string\">|\n    jobmanager.rpc.address: flink-jobmanager\n    taskmanager.numberOfTaskSlots: 2\n    state.backend: filesystem\n    state.checkpoints.dir: s3:\/\/your-bucket\/checkpoints\/\n    state.savepoints.dir: s3:\/\/your-bucket\/savepoints\/<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-25\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">YAML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">yaml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Apply the ConfigMap:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-26\" data-shcb-language-name=\"YAML\" data-shcb-language-slug=\"yaml\"><span><code class=\"hljs language-yaml\"><span class=\"hljs-string\">kubectl<\/span> <span class=\"hljs-string\">apply<\/span> <span class=\"hljs-string\">-f<\/span> <span class=\"hljs-string\">flink-configmap.yaml<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-26\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">YAML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">yaml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Update the JobManager and TaskManager deployments to mount the ConfigMap:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-27\" data-shcb-language-name=\"YAML\" data-shcb-language-slug=\"yaml\"><span><code class=\"hljs language-yaml\"><span class=\"hljs-attr\">spec:<\/span>\n  <span class=\"hljs-attr\">containers:<\/span>\n    <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">jobmanager<\/span>\n      <span class=\"hljs-attr\">image:<\/span> <span class=\"hljs-string\">flink:latest<\/span>\n      <span class=\"hljs-attr\">volumeMounts:<\/span>\n        <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">flink-config-volume<\/span>\n          <span class=\"hljs-attr\">mountPath:<\/span> <span class=\"hljs-string\">\/opt\/flink\/conf<\/span>\n  <span class=\"hljs-attr\">volumes:<\/span>\n    <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">flink-config-volume<\/span>\n      <span class=\"hljs-attr\">configMap:<\/span>\n        <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">flink-config<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-27\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">YAML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">yaml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Apply the updated deployments:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-28\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">kubectl apply -f jobmanager-deployment.yaml\nkubectl apply -f taskmanager-deployment.yaml<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-28\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Monitoring Flink with Prometheus and Grafana<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Monitoring is essential for managing a Flink cluster in production. You can use Prometheus and Grafana for monitoring Flink metrics.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Deploy Prometheus and Grafana<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">You can deploy Prometheus and Grafana using Helm charts:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-29\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">helm repo add prometheus-community https:\/\/prometheus-community.github.io\/helm-charts\nhelm repo add grafana https:\/\/grafana.github.io\/helm-charts\nhelm repo update\n\n<span class=\"hljs-comment\"># Deploy Prometheus<\/span>\nhelm install prometheus prometheus-community\/prometheus\n\n<span class=\"hljs-comment\"># Deploy Grafana<\/span>\nhelm install grafana grafana\/grafana<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-29\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h4 class=\"wp-block-heading\">Configure Flink for Prometheus<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Update the Flink configuration to enable Prometheus metrics:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-30\" data-shcb-language-name=\"YAML\" data-shcb-language-slug=\"yaml\"><span><code class=\"hljs language-yaml\"><span class=\"hljs-attr\">data:<\/span>\n  <span class=\"hljs-attr\">flink-conf.yaml:<\/span> <span class=\"hljs-string\">|\n    metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter\n    metrics.reporter.prom.port: 9250<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-30\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">YAML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">yaml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Apply the ConfigMap and restart the JobManager and TaskManager pods.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Create Prometheus Scrape Config<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Add a scrape configuration for Flink in Prometheus:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-31\" data-shcb-language-name=\"YAML\" data-shcb-language-slug=\"yaml\"><span><code class=\"hljs language-yaml\"><span class=\"hljs-attr\">scrape_configs:<\/span>\n  <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">job_name:<\/span> <span class=\"hljs-string\">'flink'<\/span>\n    <span class=\"hljs-attr\">static_configs:<\/span>\n      <span class=\"hljs-bullet\">-<\/span> <span class=\"hljs-attr\">targets:<\/span> <span class=\"hljs-string\">&#91;'&lt;jobmanager-pod-ip&gt;:9250',<\/span> <span class=\"hljs-string\">'&lt;taskmanager-pod-ip&gt;:9250'<\/span><span class=\"hljs-string\">]<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-31\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">YAML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">yaml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\">Replace <code>&lt;jobmanager-pod-ip&gt;<\/code> and <code>&lt;taskmanager-pod-ip&gt;<\/code> with the actual IP addresses of the JobManager and TaskManager pods.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Import Grafana Dashboard<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Import a Flink dashboard into Grafana to visualize the metrics. You can find pre-built dashboards on the Grafana website or create your own.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In this tutorial, we have covered the complete process of deploying Apache Flink on Kubernetes for real-time data processing. We discussed setting up a Kubernetes cluster, deploying Flink components, running a Flink job, scaling the cluster, configuring Flink properties, and monitoring the Flink cluster using Prometheus and Grafana. By following this guide, you should be able to leverage the power of Kubernetes to manage and scale your Apache Flink applications effectively.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Apache Flink is a powerful stream processing framework used for processing large volumes of data in real-time. Kubernetes, on the other hand, is a container orchestration platform that automates the deployment, scaling, and management of containerized applications. Combining Apache Flink with Kubernetes provides a robust solution for managing and scaling real-time data processing applications. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[21],"tags":[],"class_list":["post-2039","post","type-post","status-publish","format-standard","category-containers","entry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Use Kubernetes with Apache Flink for Real-Time Data Processing<\/title>\n<meta name=\"description\" content=\"Apache Flink is a powerful stream processing framework used for processing large volumes of data in real-time.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Use Kubernetes with Apache Flink for Real-Time Data Processing\" \/>\n<meta property=\"og:description\" content=\"Apache Flink is a powerful stream processing framework used for processing large volumes of data in real-time.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/\" \/>\n<meta property=\"article:published_time\" content=\"2024-07-03T20:15:11+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-07-03T20:15:16+00:00\" \/>\n<meta name=\"author\" content=\"w3compadmin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"w3compadmin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"TechArticle\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/use-kubernetes-with-apache-flink-for-real-time-data-processing\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/use-kubernetes-with-apache-flink-for-real-time-data-processing\\\/\"},\"author\":{\"name\":\"w3compadmin\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\"},\"headline\":\"Use Kubernetes with Apache Flink for Real-Time Data Processing\",\"datePublished\":\"2024-07-03T20:15:11+00:00\",\"dateModified\":\"2024-07-03T20:15:16+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/use-kubernetes-with-apache-flink-for-real-time-data-processing\\\/\"},\"wordCount\":1154,\"articleSection\":[\"Containers\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/use-kubernetes-with-apache-flink-for-real-time-data-processing\\\/\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/use-kubernetes-with-apache-flink-for-real-time-data-processing\\\/\",\"name\":\"Use Kubernetes with Apache Flink for Real-Time Data Processing\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#website\"},\"datePublished\":\"2024-07-03T20:15:11+00:00\",\"dateModified\":\"2024-07-03T20:15:16+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\"},\"description\":\"Apache Flink is a powerful stream processing framework used for processing large volumes of data in real-time.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/use-kubernetes-with-apache-flink-for-real-time-data-processing\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/use-kubernetes-with-apache-flink-for-real-time-data-processing\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/use-kubernetes-with-apache-flink-for-real-time-data-processing\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Articles Home\",\"item\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Containers\",\"item\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/containers\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Use Kubernetes with Apache Flink for Real-Time Data Processing\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#website\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/\",\"name\":\"Developer Articles Hub\",\"description\":\"\",\"alternateName\":\"Developer Articles\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\",\"name\":\"w3compadmin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1779536135\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1779536135\",\"contentUrl\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1779536135\",\"caption\":\"w3compadmin\"},\"sameAs\":[\"http:\\\/\\\/w3computing.com\\\/articles\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Use Kubernetes with Apache Flink for Real-Time Data Processing","description":"Apache Flink is a powerful stream processing framework used for processing large volumes of data in real-time.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/","og_locale":"en_US","og_type":"article","og_title":"Use Kubernetes with Apache Flink for Real-Time Data Processing","og_description":"Apache Flink is a powerful stream processing framework used for processing large volumes of data in real-time.","og_url":"https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/","article_published_time":"2024-07-03T20:15:11+00:00","article_modified_time":"2024-07-03T20:15:16+00:00","author":"w3compadmin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"w3compadmin","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"TechArticle","@id":"https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/#article","isPartOf":{"@id":"https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/"},"author":{"name":"w3compadmin","@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561"},"headline":"Use Kubernetes with Apache Flink for Real-Time Data Processing","datePublished":"2024-07-03T20:15:11+00:00","dateModified":"2024-07-03T20:15:16+00:00","mainEntityOfPage":{"@id":"https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/"},"wordCount":1154,"articleSection":["Containers"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/","url":"https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/","name":"Use Kubernetes with Apache Flink for Real-Time Data Processing","isPartOf":{"@id":"https:\/\/www.w3computing.com\/articles\/#website"},"datePublished":"2024-07-03T20:15:11+00:00","dateModified":"2024-07-03T20:15:16+00:00","author":{"@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561"},"description":"Apache Flink is a powerful stream processing framework used for processing large volumes of data in real-time.","breadcrumb":{"@id":"https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.w3computing.com\/articles\/use-kubernetes-with-apache-flink-for-real-time-data-processing\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Articles Home","item":"https:\/\/www.w3computing.com\/articles\/"},{"@type":"ListItem","position":2,"name":"Containers","item":"https:\/\/www.w3computing.com\/articles\/containers\/"},{"@type":"ListItem","position":3,"name":"Use Kubernetes with Apache Flink for Real-Time Data Processing"}]},{"@type":"WebSite","@id":"https:\/\/www.w3computing.com\/articles\/#website","url":"https:\/\/www.w3computing.com\/articles\/","name":"Developer Articles Hub","description":"","alternateName":"Developer Articles","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.w3computing.com\/articles\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561","name":"w3compadmin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1779536135","url":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1779536135","contentUrl":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1779536135","caption":"w3compadmin"},"sameAs":["http:\/\/w3computing.com\/articles"]}]}},"featured_image_src":null,"featured_image_src_square":null,"author_info":{"display_name":"w3compadmin","author_link":"https:\/\/www.w3computing.com\/articles\/author\/w3compadmin\/"},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/2039","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/comments?post=2039"}],"version-history":[{"count":4,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/2039\/revisions"}],"predecessor-version":[{"id":2043,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/2039\/revisions\/2043"}],"wp:attachment":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/media?parent=2039"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/categories?post=2039"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/tags?post=2039"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}