Featured solutions

Featured libraries and frameworks that have improved our time to deliver.

Kronos

cron” replacement to orchestrate big data workflows

Features:

  • Orchestrate complex big data workflows
  • API driven
  • Polyglot task handler support
  • Support namespaces

License – Apache 2.0
Source – GitHub

Data pipeline framework for Spark 2.x

Framework to build stream and batch data pipelines on Spark

Features:

  • Abstracts the underlying Spark integration to pull data
  • Unified representation of data from multiple data sources
  • Streaming constructs to handle stream processing
  • Abstractions that optimise for batch processing of a large amount of data
  • Simple Java handlers to manage data manipulations
  • Integrates with Kafka, Elasticsearch, Mongo and HDFS

Classloader support for Flume plugins

A fork for Apache Flume with classloader support for plugins

Features:

  • Available for both 1.7.0 and 1.8.0
  • Classloader support to avoid jar hell problems
  • Eases up integrations while working with Flume plugins

License – Apache 2.0
Source – GitHub

Flume sink plugin for Elasticsearch

A flume sink plugin to stream data into Elasticsearch.

Features:

  • Integrates with Elasticsearch 5.x and 6.x
  • Support for CSV, JSON and Avro
  • Bulk indexing for higher throughput
  • Customizable indexing options

License – Apache 2.0
Source – GitHub