August 2018 – Vaibhav Sharma

Kafka – Spark Streaming Integration

Spark streaming is a distributed stream processing engine which can ingest data from various sources. One of the most popular source is Apache Kafka which is a distributed streaming platform providing you publish and subscribe features of an enterprise messaging system while also supporting data stream processing. In this blog we will create a realtime streaming pipeline for ingesting credit card data and finding Merchants […]

Posted: 07/08/2018
Under: Big Data, Kafka, Spark

Read Me

Setup Standalone Apache Kafka Instance

Apache Kafka is a distributed streaming platform providing you publish and subscribe features of an enterprise messaging system while also supporting data stream processing. In this blog we will setup a standalone Kafka topic on a local machine on Windows operating system. Please note, consider this setup as a Hello World application as it is not meant for production use. Software versions used in […]

Posted: 05/08/2018
Under: Big Data, Kafka

Read Me

Apache Hive with MongoDB Integration

Apache Hive is a tool from Apache Hadoop eco-system to convert SQL like queries into Hadoop jobs for data summarization, querying and analysis. In this blog post we will see how data stored in MongoDB can be imported into Hive table. The data from Hive table is then processed and the result in stored in another Hive table. We will use a 1 minute […]

Posted: 04/08/2018
Under: Big Data, Hadoop, MongoDB

Read Me

Month: August 2018

Kafka – Spark Streaming Integration

Setup Standalone Apache Kafka Instance

Apache Hive with MongoDB Integration

Pages

Recent Posts

Archives

Categories