Kafka – Spark Streaming Integration

Kafka – Spark Streaming Integration

Spark streaming is a distributed stream processing engine which can ingest data from various sources. One of the most popular source is Apache Kafka which is a distributed streaming platform providing you publish and subscribe features of an enterprise messaging system while also supporting data stream processing. In this blog we will create a realtime streaming pipeline for ingesting credit card data and finding Merchants […]

Read Me

Spark Streaming with MongoDB

Spark Streaming with MongoDB

Spark streaming enables us to do realtime processing of data streams. In this blog post we will see how data stream coming to Spark over TCP socket can be processed and the result saved into MongoDB. You can extrapolate this example in your applications where you are using MongoDB as the data sink after processing by Spark.   We will use the word count example […]

Read Me