Kafka – Spark Streaming Integration

Kafka – Spark Streaming Integration

Spark streaming is a distributed stream processing engine which can ingest data from various sources. One of the most popular source is Apache Kafka which is a distributed streaming platform providing you publish and subscribe features of an enterprise messaging system while also supporting data stream processing. In this blog we will create a realtime streaming pipeline for ingesting credit card data and finding Merchants […]

Read Me

Spark Streaming with MongoDB

Spark Streaming with MongoDB

Spark streaming enables us to do realtime processing of data streams. In this blog post we will see how data stream coming to Spark over TCP socket can be processed and the result saved into MongoDB. You can extrapolate this example in your applications where you are using MongoDB as the data sink after processing by Spark.   We will use the word count example […]

Read Me

Deep Learning at Scale References

Deep Learning at Scale References

Checkout the following projects for Deep Learning at scale.   TensorFlowOnSpark Developed by Yahoo, TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from deep learning framework TensorFlow and big-data frameworks Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed deep learning on a cluster of GPU and CPU servers. https://github.com/yahoo/TensorFlowOnSpark   BigDL: Distributed Deep Learning on Apache Spark Another distributed deep learning library to directly […]

Read Me

Spark MLlib Data Types

Spark MLlib Data Types

Spark MLlib has special data types since in Machine Learning we normally have to deal with a binary distribution of vectors or matrix.   Note: Any sample code is following Scala syntax Overview Local Vector A local vector has integer-typed and 0-based indices and double-typed values, stored on a single machine. MLlib supports two types of local vectors: dense and sparse. Dense Vector Dense vector has […]

Read Me