Using Gephi for RDB Schema Visualization


I spent a little bit of time this weekend looking at using Gephi for visualizing graphs and networks. I heard about Gephi from this post on the dataists web site.

This is a directed graph of a relational database schema. The node size reflects the count of rows within the table, and the node color reflects the out-degree measurement of the node, that is the number of outgoing references the node has. The node color is ramped from blue to red. I’ve left off node labels for confidentiality.

The most central highly connected node has a large number of incoming edges because it is used for system wide auditing purposes. Certain subsystems are able to be identified, and the ring of stand-alone nodes mainly represent setup and configuration tables that do not participate in cross-domain operations.

I investigated the filtering capabilities of Gephi and found them pretty easy to use, I was able to perform some basic k-core analysis of the node degrees to be able to identify the most linked tables in the schema. I think making an animation of this could prove to be informative.

One feature that I would like but have not determined whether it is possible with Gephi is to identify dominators and cliques.

I used Groovy + gexf4j to generate the GEXF XML graph. The script is specific for Oracle, but should be able to be easily modified to interrogate the metadata of other relational database systems.

/*
 Retrieve database metadata about table names and relationships.
 This information will be used to generate GEXL for use in Gephi.
*/

import groovy.sql.Sql
import com.ojn.gexf4j.core.data.*
import com.ojn.gexf4j.core.impl.*
import com.ojn.gexf4j.core.impl.data.*

def sql = Sql.newInstance(
	"jdbc:oracle:thin:@server:port:sid",
	"username",
	"password",
	"oracle.jdbc.OracleDriver"
)

def query_tables_and_count(sql) {
	def tables = []
	sql.eachRow("select * from user_objects where object_type = 'TABLE'") {
		def name  = it.OBJECT_NAME
		def countQuery = "select count(*) as cnt from ${name}".toString()
		def count = sql.firstRow(countQuery).cnt
		tables << ['name': name, 'count' : count]
	}
	tables
}

/**
 	@see http://www.conandalton.net/2008/09/list-foreign-key-constraints-in-oracle.html
*/
def query_foreign_relationships(sql) {
	String foreignKeyQuery = """
	select
		distinct col.table_name as key,
		rel.table_name as references
	from
		user_tab_columns col
		join user_cons_columns con
		  on col.table_name = con.table_name
		 and col.column_name = con.column_name
		join user_constraints cc
		  on con.constraint_name = cc.constraint_name
		join user_cons_columns rel
		  on cc.r_constraint_name = rel.constraint_name
		 and con.position = rel.position
	where
		cc.constraint_type = 'R'
	order by
		col.table_name, rel.table_name
	"""
	def results = []
	sql.eachRow(foreignKeyQuery) {
		results << ['key':it.key, 'ref':it.references]
	}
	results
}

def write_xml(sql) {
	def gexf = new GexfImpl()

	// Metadata
	gexf.getMetadata()
		.setLastModified(Calendar.getInstance().getTime())
		.setCreator("Jason B. Smith")
		.setDescription("DB Graph")

	attrList = new AttributeListImpl(AttributeClass.NODE);
	gexf.getGraph().getAttributeLists().add(attrList);
	attrCount = attrList.createAttribute("0", AttributeType.INTEGER, "count")

	// Nodes
	node_cache = [:]
	query_tables_and_count(sql).each { t ->
		node = gexf.getGraph().createNode(t.name)
		node
			.setLabel(t.name)
			.getAttributeValues()
				.addValue(attrCount, "${t.count}")	// convert from BigDecimal
		node_cache[t.name] = node
	}

	// Relationships
	query_foreign_relationships(sql).each { r ->
		node_cache[r.key].connectTo(node_cache[r.ref])
	}

	// Write XML
	def writer = new StaxGraphWriter()
	writer.writeToStream(gexf, System.out)
}
write_xml(sql)
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

One Response to Using Gephi for RDB Schema Visualization

  1. Pingback: Tweets that mention Using Gephi for RDB Schema Visualization | devjason -- Topsy.com

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>