Using Gephi for RDB Schema Visualization


I spent a little bit of time this weekend looking at using Gephi for visualizing graphs and networks. I heard about Gephi from this post on the dataists web site.

This is a directed graph of a relational database schema. The node size reflects the count of rows within the table, and the node color reflects the out-degree measurement of the node, that is the number of outgoing references the node has. The node color is ramped from blue to red. I’ve left off node labels for confidentiality.

The most central highly connected node has a large number of incoming edges because it is used for system wide auditing purposes. Certain subsystems are able to be identified, and the ring of stand-alone nodes mainly represent setup and configuration tables that do not participate in cross-domain operations.

I investigated the filtering capabilities of Gephi and found them pretty easy to use, I was able to perform some basic k-core analysis of the node degrees to be able to identify the most linked tables in the schema. I think making an animation of this could prove to be informative.

One feature that I would like but have not determined whether it is possible with Gephi is to identify dominators and cliques.

I used Groovy + gexf4j to generate the GEXF XML graph. The script is specific for Oracle, but should be able to be easily modified to interrogate the metadata of other relational database systems.

/*
 Retrieve database metadata about table names and relationships.
 This information will be used to generate GEXL for use in Gephi.
*/

import groovy.sql.Sql
import com.ojn.gexf4j.core.data.*
import com.ojn.gexf4j.core.impl.*
import com.ojn.gexf4j.core.impl.data.*

def sql = Sql.newInstance(
	"jdbc:oracle:thin:@server:port:sid",
	"username",
	"password",
	"oracle.jdbc.OracleDriver"
)
	
def query_tables_and_count(sql) {
	def tables = []
	sql.eachRow("select * from user_objects where object_type = 'TABLE'") {
		def name  = it.OBJECT_NAME
		def countQuery = "select count(*) as cnt from ${name}".toString()
		def count = sql.firstRow(countQuery).cnt
		tables << ['name': name, 'count' : count]
	}
	tables
}

/**
 	@see http://www.conandalton.net/2008/09/list-foreign-key-constraints-in-oracle.html
*/
def query_foreign_relationships(sql) {
	String foreignKeyQuery = """
	select
		distinct col.table_name as key,
		rel.table_name as references
	from
		user_tab_columns col
		join user_cons_columns con 
		  on col.table_name = con.table_name 
		 and col.column_name = con.column_name
		join user_constraints cc 
		  on con.constraint_name = cc.constraint_name
		join user_cons_columns rel 
		  on cc.r_constraint_name = rel.constraint_name 
		 and con.position = rel.position
	where
		cc.constraint_type = 'R'
	order by
		col.table_name, rel.table_name
	"""
	def results = []
	sql.eachRow(foreignKeyQuery) {
		results << ['key':it.key, 'ref':it.references]
	}
	results
}

def write_xml(sql) {
	def gexf = new GexfImpl()
	
	// Metadata
	gexf.getMetadata()
		.setLastModified(Calendar.getInstance().getTime())
		.setCreator("Jason B. Smith")
		.setDescription("DB Graph")
		
	attrList = new AttributeListImpl(AttributeClass.NODE);
	gexf.getGraph().getAttributeLists().add(attrList);
	attrCount = attrList.createAttribute("0", AttributeType.INTEGER, "count")
	
	// Nodes
	node_cache = [:]
	query_tables_and_count(sql).each { t ->
		node = gexf.getGraph().createNode(t.name)
		node
			.setLabel(t.name)
			.getAttributeValues()
				.addValue(attrCount, "${t.count}")	// convert from BigDecimal
		node_cache[t.name] = node
	}
	
	// Relationships
	query_foreign_relationships(sql).each { r ->
		node_cache[r.key].connectTo(node_cache[r.ref])
	}
		
	// Write XML
	def writer = new StaxGraphWriter()	
	writer.writeToStream(gexf, System.out)
}
write_xml(sql)

3 thoughts on “Using Gephi for RDB Schema Visualization”

  1. Great example Jason. I would suggest you avoid counting rows for each table however. This can become onerous for large tables and you can run into errors if the database user does not have select permissions to that table.

    def query_tables_and_count(sql) {
    def tables = []
    sql.eachRow(“select owner||’.’||table_name OBJECT_NAME, NUM_ROWS CNT from all_tables where owner not in (‘SYS’,’SYSTEM’)”) {
    def name = it.OBJECT_NAME
    def count = it.CNT
    tables << ['name': name, 'count' : count]
    }
    tables
    }

    def query_foreign_relationships(sql) {
    String foreignKeyQuery = """
    select
    distinct col.owner||'.'||col.table_name as key,
    rel.owner||'.'||rel.table_name as references
    from
    all_tab_columns col
    join all_cons_columns con
    on col.table_name = con.table_name
    and col.column_name = con.column_name
    join all_constraints cc
    on con.constraint_name = cc.constraint_name
    join all_cons_columns rel
    on cc.r_constraint_name = rel.constraint_name
    and con.position = rel.position
    where
    cc.constraint_type = 'R'
    and col.owner not in ('SYS','SYSTEM')
    order by 1,2
    """
    def results = []
    sql.eachRow(foreignKeyQuery) {
    results << ['key':it.key, 'ref':it.references]
    }
    results
    }

Leave a Reply