Graph Data Structures and Databases

Rachel Cross
5 min readMar 11, 2021

There are many constructs to model and relate data, and depending on the application and uses of that data, a different structure can be more efficient or applicable. I talk about this at a high level in my post about relational vs. non-relational databases here.

Here I’ll be talking about graph data structures. What they are, and also how they are used.

What is a Graph?

A graph is a set of nodes and vertices.

In plainer terms, a graph shows a set of entities and their relationships to one another.

source: https://www.geeksforgeeks.org/graph-data-structure-and-algorithms/

While the above may be understandable in an abstract way, one of the most intuitive representations of a graph database is a social network. Thinking in terms of people as nodes and their connections as vertices, you can start to see how graphs are a flexible and dynamic way to show relationships between data points in a scalable way.

Source: https://dist.neo4j.com/wp-content/uploads/2009/09/socnet-start.png

Graphs have a few key characteristics:

  • Graphs can be directed or undirected. In a directed graph the relationship can be said to go both ways. For example, in the above graph, if node1 and node2 know each other, that is an undirected (bidirectional) relationship. Conversely, if node5 knows who node1 is, but node1 has no idea who node5 is (kinda creepy), that is a directed (unidirectional) relationship. Facebook is an example of an undirected graph.
  • Graphs can be weighted or unweighted. In other words values or (weights) can be associated with each edge. Edge weights can be used to represent things like cost, length or distance.
  • Graphs can be cyclic or acyclic. A cyclic graph is a directed graph which contains a path from at least one node back to itself. This is called a cycle. As you’d expect an acyclic graph is a directed graph that does not contain a cycle, that is, it does not have any nodes that can be traversed back to themselves.[1]
  • Graphs can be dense or sparse. The density and sparsity in a graph is in reference to the number of edges in the graph relative to the maximal or minimal number of edges possible. You can see in the lower right hand corner of the below graphic the comparison of a dense graph vs a sparse one in terms of the relative connectivity of the same number of nodes.
Source: https://neo4j.com/blog/graph-algorithms-neo4j-graph-algorithm-concepts/

So How Does this Translate to Databases?

We have a high level summary of some of the main, high level characteristics of the graph data structure. So how does this translate to graphical databases? What are the implications of the graph data structure within the context of data storage at a large scale?

To start, let’s revisit the idea of relational vs. non-relational databases. A relational database (also known as a SQL database) stores data in tables and rows.

A relational database works by linking information from multiple tables through the use of “keys.” A key is a unique identifier which can be assigned to a row of data contained within a table. This unique identifier, called a “primary key,” can then be included in a record located in another table when that record has a relationship to the primary record in the main table.[2]

Non-relational databases (AKA NoSQL databases) are databases that do not use the table, primary key, foreign key layout of relational databases. There are a variety of different types of non relational databases, including graphs.

Ironically, the non-relational graph databases, is, in many ways more strongly based in relationships than a relational database. Relational databases work by linking multiple databases together through the use of identification keys. In many cases, querying relationships between data in relational databases can involve using a string of searches querying shared keys throughout many tables (called joins).

With graph databases on the other hand, connected data is stored as connected data.

Some of the strengths of graph databases include:

Query Performance

With relational databases, query performance tends to deteriorate as datasets get bigger due to the need to have long, join-intensive searches. Graph databases on the other hand localize their queries to a specific portion of the graph, and so performance remains relatively constant as the dataset grows.[3]

Flexibility

Unlike relational databases, graph databases do not need to establish a schema upfront. The relationships and structure can grow in tandem with the growth of the problem/ dataset. With graphs, new data can be added without affecting the existing data structures.[3]

Agility

Related to the above, graph databases are able to grow and evolve in line with incremental and iterative development, which aligns it with many commonly practiced processes like test-driven development.

Conclusion

The graph data structure is one that can seem intimidating at first glance but surprisingly intuitive in terms of its high level shape (traversing and querying graph databases is a whole different post). According to Allied Market Research the global graph database market was pegged at $651 million in 2018 and is estimated to hit $3.73 billion by 2026, registering a CAGR of 24.5% from 2019 to 2026.[4] With the use of graph databases by applications like Facebook and Google Maps, as well as the strong projected growth of the graph database market, this structure is one worth learning.

  1. https://study.com/academy/lesson/cyclic-acyclic-sparse-dense-graphs.html#:~:text=A%20cyclic%20graph%20is%20a,cyclic%20graphs%20contain%20a%20cycle.&text=An%20acyclic%20graph%20is%20a,be%20traversed%20back%20to%20itself.
  2. https://www.pluralsight.com/blog/software-development/relational-vs-non-relational-databases
  3. Robinson, Ian et al. Graph Databases: New Opportunities for Connected Data. Sebastopol, CA. O’Reilly. 2015
  4. https://www.globenewswire.com/news-release/2020/05/05/2027700/0/en/Global-Graph-Database-Market-Is-Expected-to-Reach-3-73-Billion-by-2026-Says-AMR.html

--

--