What is Apache Flink?
Apache Flink is an open source stream processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming dataflow engine written in Java and Scala. Apache Flink’s dataflow programming model provides event-at-a-time processing on both finite and infinite datasets. At a basic level, Flink programs consist of streams and transformations. Conceptually, a stream is a (potentially never-ending) flow of data records, and a transformation is an operation that takes one or more streams as input, and produces one or more output streams as a result. Programs can be written in Java, Scala, Python, and SQL and are automatically compiled and optimized into dataflow programs that are executed in a cluster or cloud environment.
Why Apache Flink?
Flink provides a high-throughput, low-latency streaming engine as well as support for event-time processing and state management. Flink applications are fault-tolerant in the event of machine failure and support exactly-once semantics. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. Flink’s pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink’s runtime supports the execution of iterative algorithms natively.
Deploying Flink on Google Cloud and AWS
Data Stream API
Batch Processing API
Connectors to various Systems
Structure data handling using Table API
Accessing the registered table
Complex event processing
Introduction to CEP and Flink CEP
Selecting from Pattern
Flink Graph Library – Gelly
Iterative Graph Processing
Integration between Flink and Hadoop
Job Submission to Flink
Execution of a Flink job on YARN
Flink and YARN interaction details