Increasing need for insights from vast data sources has given rise to data-driven business intelligence products which build and execute complex data workflows.
A data workflow is a set of inter-dependent data-driven tasks. Simple solutions use a cron based approach which works well for simple workflows with few or no task dependencies. However, cron fails if there are complex dependencies between tasks.
At Cognitree, we build and execute complex data workflows for our customers to gather data insights. We built an effective scheduling tool Kronos for our data pipelines which adds more features on top of cron.
What is Kronos
Kronos is a Java based replacement for cron to build, run and monitor complex data pipelines with flexible deployment options. It handles dependency resolution, workflow management, failures. Kronos is built on top of Quartz and uses DAG (Directed Acyclic Graph) to manage the tasks.
Examples of data pipelines include batch jobs, chaining multiple tasks, machine learning job etc.
The architecture is flexible and extensible with each component of the Kronos designed to be pluggable.
- Dependency Management: Define/manage dependency among tasks.
- Dynamic: Define/modify workflow and task dependencies at runtime.
- Extensible: Define custom source of tasks, task handlers and the persistence store.
- Policy Driven: Define custom policies to handle timeouts.
- Fault Tolerant: Handle system/process faults.
- Flexible deployment model: Embed as a library or deploy in standalone or distributed mode.
Today, we are proud to open source and share Kronos, our workflow management framework.
Do give it a try by heading on to the getting started section and help us improve Kronos by giving us feedback.
In upcoming posts, we will talk about the use cases solved by using Kronos, our workflow management framework.