Jul 15

Logistic Regression

Logistic regression is one of the widely used classification algorithms in machine learning. It solves variety of use cases including credit card fraud detection, spam detection, customer attrition, etc. This blog is a continuation of the machine learning blog series and will explain logistic regression algorithm in detail. Consider an... read more →
Jul 08

Linear Regression

Machine learning is the science of getting computers to act without being explicitly programmed. Machine learning is so pervasive today that you probably see it dozens of times a day without knowing it. Recently we covered different techniques involved in machine learning. Here we will take a look at linear... read more →
Jul 01

Introduction to Recommender Systems

Push marketing model is embraced by many organisations to promote their products to potential customers. For example you may see Amazon promotions on a random web page. If you look closely, such promotions are custom tailored for each user. Personalised recommendations have become crucial because, there are thousands and millions of items... read more →
Jun 25

Real Time Processing

The demand for real time processing has increased significantly as processing huge volumes of data alone is not enough to react on changing business conditions in real time. Real time processing is required when data needs to be processed fast and actions need to be computed and initiated in realtime.... read more →
Jun 17

Introduction To Machine Learning

Humans have the ability to learn and take decisions - some of these are logical while some are fuzzy by nature. Some of these abilities can be modelled as complex mathematical equations which mimic the human behaviour. As humans learn from past experience, machines learn from historic/past data. To make... read more →
May 20

Hive on HBase

Hive provides insights into the data present in HBase (and HDFS) by responding to ad hoc queries. Queries can be written in HQL(Hive Query Language) which are sql like. Hive queries are internally converted into mapreduce jobs which run in distributed fashion over the HBase and HDFS systems. Hive vs... read more →
May 08

Introduction to Service Discovery

Web scale architectures and horizontal scalability have seen the growth of micro services and the breakdown of monolithic applications into a set of standalone services. The rise of micro services enables each service to be scaled, monitored and upgraded independent of the other services. Also, innovative interaction of these services... read more →
May 02

Graph databases – What, Why & When

This blog talks about the three 'W's (What, Why and When) of Graph databases. Before we start, let us take a quick look at the relevance of Graph databases w.r.t today's big data needs and its place among other nosql databases. In the last decade big data technologies really took... read more →
Apr 07

Mesos: Introduction to a data center operating system

The growing demand for distributed applications poses the need to re-strategise application deployments and management in data centers. Resource optimization, network management, resource scheduling and fault tolerance are some of the challenges to be addressed in deploying distributed applications. A number of technologies have evolved to address these challenges. In... read more →
Apr 01

Overview of Kubernetes Architecture

In this post, let’s list the challenges associated with any container cluster management tool and understand how Kubernetes architecture addresses them. Would recommend reading my earlier post on what is Kubernetes and its features to get started. At an Infrastructure level Resource utilization Container provisioning (Scheduling container on appropriate node... read more →