At Cognitree, most of our customer projects involve analysing a large amount of data to gather insights. These projects involve design and development of ETL pipelines, real-time analysis, report generation and training models using statistical learning algorithms. The solutions are often built using open source tools and although the components... read more →
In this blog post, we’d like to outline how we defined policies for time series data management in Elasticsearch. Background As part of an IoT security solution built for a startup client, we have a typical real-time data processing pipeline: data from sensors is received into Kafka topics and consumed... read more →
Apart from the pre-built functions available for data analysis, Spark enables developers to write custom user defined functions that can be applied on a single row, a group of rows or a window of rows to analyse data. In this blog, we will explore in detail how we implemented a... read more →
Increasing need for insights from vast data sources has given rise to data-driven business intelligence products which build and execute complex data workflows. A data workflow is a set of inter-dependent data-driven tasks. Simple solutions use a cron based approach which works well for simple workflows with few or no... read more →
Version 2.0.0 of Flume sink plugin is now available. The release includes support for Flume 1.8.0 and Elasticsearch 6.2.4. Many thanks to Alexey Mikka for his contributions. Please use Cognitree's fork of Apache Flume 1.8.0 to make use of class loaders to load the plugin.
Last year, Cognitree enhanced Apache Flume 1.7.0 with a support for using classloaders to load plugins. We have now ported the support for users of Flume 1.8.0. The binary distribution can be downloaded here.
Cognitree has open sourced a Flume sink plugin for Elasticsearch 5.4. The sink plugin is compatible with Flume version 1.7. To avoid dealing with versioning hell for dependencies we highly recommend to use this plugin with Cognitree's fork of Flume. Motivation We were looking to analyze large amounts of streaming... read more →
Apache Flume is a tool for moving large amounts of data from various sources to a centralized data store. It provides an extensible framework to expand its applicability to various sources and stores. The plugin framework currently lacks the ability to provide an isolated class loading to the plugins. Today,... read more →
Logistic regression is one of the widely used classification algorithms in machine learning. It solves variety of use cases including credit card fraud detection, spam detection, customer attrition, etc. This blog is a continuation of the machine learning blog series and will explain logistic regression algorithm in detail. Consider an... read more →
Machine learning is the science of getting computers to act without being explicitly programmed. Machine learning is so pervasive today that you probably see it dozens of times a day without knowing it. Recently we covered different techniques involved in machine learning. Here we will take a look at linear... read more →