Apache Hive with MongoDB Integration

Apache Hive with MongoDB Integration

Apache Hive is a tool from Apache Hadoop eco-system to convert SQL like queries into Hadoop jobs for data summarization, querying and analysis. In this blog post we will see how data stored in MongoDB can be imported into Hive table. The data from Hive table is then processed and the result in stored in another Hive table.   We will use a 1 minute […]

Read Me

Spark Streaming with MongoDB

Spark Streaming with MongoDB

Spark streaming enables us to do realtime processing of data streams. In this blog post we will see how data stream coming to Spark over TCP socket can be processed and the result saved into MongoDB. You can extrapolate this example in your applications where you are using MongoDB as the data sink after processing by Spark.   We will use the word count example […]

Read Me

Deep Learning at Scale References

Deep Learning at Scale References

Checkout the following projects for Deep Learning at scale.   TensorFlowOnSpark Developed by Yahoo, TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from deep learning framework TensorFlow and big-data frameworks Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed deep learning on a cluster of GPU and CPU servers. https://github.com/yahoo/TensorFlowOnSpark   BigDL: Distributed Deep Learning on Apache Spark Another distributed deep learning library to directly […]

Read Me

Spark MLlib Data Types

Spark MLlib Data Types

Spark MLlib has special data types since in Machine Learning we normally have to deal with a binary distribution of vectors or matrix.   Note: Any sample code is following Scala syntax Overview Local Vector A local vector has integer-typed and 0-based indices and double-typed values, stored on a single machine. MLlib supports two types of local vectors: dense and sparse. Dense Vector Dense vector has […]

Read Me

Artificial Neural Network

Artificial Neural Network

So far we have seen how basic calculations work in TensorFlow. The computational graph we have built actually resemble the biological neural network of human brain because of which it is commonly known as Artificial Neural Network or simply Neural Network. The neurons in a neural network are organized across three types of layers: Input Layer: This layer is used to feed the input data […]

Read Me

TensorFlow Introduction

TensorFlow Introduction

TensorFlow is open source library by Google for Deep Learning and a Tensor is a multi-dimensional data node having the following three parts: Name Shape Data type Tensor(“Const:0”, shape=(), dtype=string) TensorFlow Hello World example: First use the following command to install TensorFlow on Windows: pip3 install –upgrade tensorflow import tensorflow as tf hello = tf.constant(‘Hello World’) print(hello) If you execute the above program, the text will not […]

Read Me

Gridsearch

Gridsearch

Grid search is good for tuning hyper-parameters. Hyper-parameters are parameters that are not directly learnt within estimators. We will compare the SVM models for different C and gamma values using Gridsearch. Refer to the blog on SVM if you want to learn more about Support Vector Machines. We will directly dive into a practical example of breast cancer classifier problem and optimizing the classifier using […]

Read Me

Support Vector Machines

Support Vector Machines

Support Vector Machines are supervised learning models which can be used for classification as well as regression problems. For classification, the data is represented as a point in space and the classification is achieved by dividing the points by a hyperplane so that it has the maximum distance from the two classes being separated. The margin of separation is with respect to the vector points […]

Read Me

Basic Constructs of Scala

Basic Constructs of Scala

Type Inference First thing to understand about Scala is that it is a Dynamically Typed Language which means that the data type of the values or variables are deduced at runtime. scala> val a = 10.0 a: Double = 10.0   Although it is dynamically types, you can still specify the datatype. scala> val b: Double = 10.0 b: Double = 10.0   Being dynamically […]

Read Me

Understanding Functional Programming with Scala

Understanding Functional Programming with Scala

Introduction Functional languages are those which treat functions as first class citizens. You can identify the validity of first class citizen status for function by checking the following aspects: 1. Can you assign a function to a value or a variable 2. Can you return a function from another function 3. Can you pass function as a parameter to another function Scala supports both Object […]

Read Me