Blog | cognitree

Apr 13

Kubernetes cluster autoscaler

This blog shows how we leveraged the Kubernetes cluster autoscaler with Amazon EKS service in order to build a cost effective solution for an on-demand deployment of microservices in a dynamically scaling environment. This blog along with a detailed explanation of the use case also provides a step-by-step guide to... read more →

Jul 25

Traffic insights in real time using Sankey charts in Kibana

There are many challenges involved in visualizing application traffic patterns. We first need to visualize the sequence of components along various flows of the traffic. Then we need to filter the traffic by different dimensions like protocol, client id, etc. and finally we need to view different metrics like volume,... read more →

Jul 06

Big data stack on Kubernetes

At Cognitree, most of our customer projects involve analysing a large amount of data to gather insights. These projects involve design and development of ETL pipelines, real-time analysis, report generation and training models using statistical learning algorithms. The solutions are often built using open source tools and although the components... read more →

Jul 02

Time series data management in Elasticsearch

In this blog post, we’d like to outline how we defined policies for time series data management in Elasticsearch. Background As part of an IoT security solution built for a startup client, we have a typical real-time data processing pipeline: data from sensors is received into Kafka topics and consumed... read more →

Jun 27

User defined aggregate functions (UDAF) in Spark

Apart from the pre-built functions available for data analysis, Spark enables developers to write custom user defined functions that can be applied on a single row, a group of rows or a window of rows to analyse data. In this blog, we will explore in detail how we implemented a... read more →

Jun 22

Kronos – a cron replacement to schedule complex data workflows

Increasing need for insights from vast data sources has given rise to data-driven business intelligence products which build and execute complex data workflows. A data workflow is a set of inter-dependent data-driven tasks. Simple solutions use a cron based approach which works well for simple workflows with few or no... read more →

Jun 13

Flume sink plugin for Elasticsearch 6.x

Version 2.0.0 of Flume sink plugin is now available. The release includes support for Flume 1.8.0 and Elasticsearch 6.2.4. Many thanks to Alexey Mikka for his contributions. Please use Cognitree's fork of Apache Flume 1.8.0 to make use of class loaders to load the plugin.

Jun 11

Classloaders for Apache Flume plugins in 1.8.0

Last year, Cognitree enhanced Apache Flume 1.7.0 with a support for using classloaders to load plugins. We have now ported the support for users of Flume 1.8.0. The binary distribution can be downloaded here.

Jul 14

Flume sink plugin for Elasticsearch 5.x

Cognitree has open sourced a Flume sink plugin for Elasticsearch 5.4. The sink plugin is compatible with Flume version 1.7. To avoid dealing with versioning hell for dependencies we highly recommend to use this plugin with Cognitree's fork of Flume. Motivation We were looking to analyze large amounts of streaming... read more →

Jul 11

Classloaders for Apache Flume plugins

Apache Flume is a tool for moving large amounts of data from various sources to a centralized data store. It provides an extensible framework to expand its applicability to various sources and stores. The plugin framework currently lacks the ability to provide an isolated class loading to the plugins. Today,... read more →