Graph databases – What, Why & When

This blog talks about the three ‘W’s (What, Why and When) of Graph databases. Before we start, let us take a quick look at the relevance of Graph databases w.r.t today’s big data needs and its place among other nosql databases.

In the last decade big data technologies really took off, thanks to the massive growth in data being generated. The limitations of relational databases in handling the volume of big data was immediately evident. What followed after that was the evolution of NoSQL databases (Not Only SQL). Most popular NoSQL databases of today are key-value stores such as Redis and Riak, column oriented storages such as HBase and Cassandra, document oriented databases such as MongoDB and CouchDB, graph databases such as Neo4j and Giraph. Below image places these databases based on their relevance w.r.t to scale and complexity aspects of the data.

Databases complexity and scale variance

Clearly, there is a tradeoff between volume and complexity, in choosing an appropriate database. Graph databases fare really well when it comes to handling complex relationships and complex queries over them.

Lets now get back to the three ‘W’s of the graph databases.

What is a graph database

A graph is a data structure representing a collection of nodes and the relations among them. A graph database provisions creating, updating and querying/processing a graph mainly by means of relations among nodes. Graph databases also employ properties on both nodes and relations. Properties allow us to capture granular details of relationships among nodes.

In the context of graph databases, a graph is not

line graph

But a graph is

Simple graph structure OR

Motivation

There are mainly two aspects of data that graphs are good at, Data complexity & Data representation. Lets first address the challenges w.r.t these aspects.

Complexity

Does a data set become more complex as it grows in volume? Not exactly. Surely it poses certain challenges w.r.t scalability, but the volume does not complicate the API any further for manipulating and querying the data. Data may get complex when one of the following happens

When new answers are seeked from existing data.
When the schema gets an update as a result of business requirements.

Graphs usually don’t get into trouble in these scenarios, as there is no strict schema associated with them. Business entities and relations among them can be added, updated at any point of time without worrying about redesigning the data model/schema.

Data model

Interacting with data is usually at its best when it is represented in its true form. For example relational databases are good at handling tabular data. Other NoSQL databases are good at handling key-value pairs, sparse matrices and documents. Similarly the graph databases are good at handling strongly connected data.

Strongly connected data does not just imply the number of connections. It has also to do with the details of the connections/relations. For example take an employee-organization relation. A simple relation such as this can be represented in relational databases with the help of a foreign key. Now say that we also need to capture and work with details of the relationship such as the age of the relation, the employee’s views on the organization and organization’s views on the employee, number of designations an employee held in each organization he worked in, all of employee’s salaries in all the organizations he worked in etc. And to add to this, all such requirements may not necessarily be given upfront when the data model was being designed. Graph databases do pretty good in these scenarios where relations are many, complex and updated from time to time.

Why can’t I use relational databases instead

Relational databases fit the bill for handling relations about until a decade ago. Today’s applications such as social networks and master data management systems, need a lot more than what RDBMS has to offer. In RDBMS, the relationships are usually worked out with joins. Imagine writing sql queries for social networks (Friends of Friends of Friends…) or recommendation engines (collaborative filtering). Such queries may end up with a hundred line long query on a relational database, while graph databases can express them in a line or two.

Why can’t I use other NoSQL databases instead

NoSQL databases like key-value stores and document-oriented stores, don’t provide intuitive APIs to work with relationships. Graph databases, out of the box, provide semantics and algorithms to query and process data mainly by means of relations among them.

When should I be using Graph databases?

Use graph databases when there are a lot of relationships present in the data and relationships may evolve over a period of time . Few examples are social networks, master data management (of any cluster management system), web pages(connected through links), recommendation systems (user-product relationships such as user’s rating or usage of a product) etc.

What Next
Watch out this space if you are interested in graph databases. We are going to follow up with more blogs that cover existing implementations of graph databases, comparisons and use cases.