a Groovy Database Compare

In a typical software development environment there are a ton of databases for each of the stages: development databases, system test database, …, up to the production database.

Tools that compare databases come in quite handy in these cases. Database meta-data or schema’s can be compared to discover which new tables, columns or views were added, removed or changed. And the data in particular tables can be diff’ed. For instance to check if a list of static data values has changed.

I tried some tools when I wanted to compare database tables in two similar databases, but they would not suffice. A simple diff between the two tables would show too much information. It was not useful.

The table contained the list of all database update scripts that were run on that particular database. Scripts sometimes failed, or were run multiple times. I had a simple question: what scripts were run on one database, but not on the other.

I needed a smart compare solution to filter out double entries, or entries that were failed runs, run on other dates, etc. Simply skipping some columns from the compare would not suffice.

So…. scripting to the rescue. I composed a Groovy script that makes complex compares possible, using the default Java Set collections.

Here is the code:

/**
 * This groovy script compares two database tables.
 *
 * Usage: groovy compareDatebaseTables.groovy
 *
 * Note: the database drivers must be on the classpath
 * Use something like:
 *   set CLASSPATH=%CLASSPATH%;C:oracle_driversclasses12.zip
 *
 * 2008 Peter Paul Bakker
 */
import groovy.sql.Sql
import java.io.File
import java.util.Properties

class DatabaseUpdate implements Comparable {
	String scriptNumber
	Date runDate
	String scriptFile
	String result

	public DatabaseUpdate(String scriptNumber, Date runDate,
		String scriptFile, String result) {

		this.scriptNumber = scriptNumber
		this.runDate = runDate
		this.scriptFile = scriptFile
		this.result = result
	}

	/*
		The DatabaseUpdate is considered equal when the scriptnumber
		and the result string are equal
	*/
	public boolean equals(Object o) {
		return this.scriptNumber.equals(o.scriptNumber)
			&& this.result.equals(o.result)
	}

	public int hashCode() {
		return this.scriptNumber.hashCode()
			+ this.result.hashCode()
	}

	public String toString() {
		return "${this.scriptNumber}t${this.runDate}" +
			"t${this.scriptFile}t${this.result}"
	}

	public int compareTo(Object o) {a

		// convert null to "null" to avoid nullpointers
		String compResult = (this.result == null ? "null" : this.result)
		String compOtherResult = (o.result == null ? "null" : o.result)

		int resultScriptNumber = this.scriptNumber.compareTo(o.scriptNumber)
		// scriptnumber is leading in the order
		if (resultScriptNumber != 0) { return resultScriptNumber }
		else { return  compResult.compareTo(compOtherResult) }
	}

}

public Set getDatabaseUpdates(connectionString, user, password ) {
	println("Start fetch data from ${connectionString}")

	Set databaseUpdates = new TreeSet()

	// get the complete list of scripts
	sql = Sql.newInstance(connectionString, user, password,
                  "oracle.jdbc.driver.OracleDriver") 

	sql.eachRow("select * from DB_UPDATES") {
		databaseUpdates.add(
			new DatabaseUpdate(
				it.script_number, it.run_date, it.script_file, it.result))
	}
	sql.close()

	return databaseUpdates
}

public void writeDatabaseUpdates(filename, databaseUpdates) {
	println("Writing data to ${filename}")
	file = new java.io.File(filename)
	writer = file.newPrintWriter()
	databaseUpdates.each { writer.println(it) }
	writer.close()
}

public void doYourThing() {
	// read in the config file
	def props = new Properties();
	props.load(new java.io.FileInputStream('compareDatabaseScripts.props'))

	DATABASE1_NAME = props.getProperty("DATABASE1_NAME")
	DATABASE1_ADDRESS = props.getProperty("DATABASE1_ADDRESS")
	DATABASE1_PORT = props.getProperty("DATABASE1_PORT")
	DATABASE1_LOGIN = props.getProperty("DATABASE1_LOGIN")
	DATABASE1_PASSWORD = props.getProperty("DATABASE1_PASSWORD")

	DATABASE2_NAME = props.getProperty("DATABASE2_NAME")
	DATABASE2_ADDRESS = props.getProperty("DATABASE2_ADDRESS")
	DATABASE2_PORT = props.getProperty("DATABASE2_PORT")
	DATABASE2_LOGIN = props.getProperty("DATABASE2_LOGIN")
	DATABASE2_PASSWORD = props.getProperty("DATABASE2_PASSWORD")

	// TreeSet is an implementation of SortedSet
	Set databaseUpdates1 = getDatabaseUpdates(
		"jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS =
			(PROTOCOL = TCP)(HOST = ${DATABASE1_ADDRESS})(PORT = ${DATABASE1_PORT})))
			(CONNECT_DATA = (SID = ${DATABASE1_NAME})))",
		DATABASE1_LOGIN, DATABASE1_PASSWORD)

	Set databaseUpdates2 = getDatabaseUpdates(
		"jdbc:oracle:thin:" +
                "@${DATABASE2_ADDRESS}:${DATABASE2_PORT}:${DATABASE2_NAME}",
		DATABASE2_LOGIN, DATABASE2_PASSWORD)

	// make sure to create a new set, so that the original set is left intact
	def databaseUpdatesOnlyIn1 = new TreeSet(databaseUpdates1)
	def databaseUpdatesOnlyIn2 = new TreeSet(databaseUpdates2)

	databaseUpdatesOnlyIn1.removeAll(databaseUpdates2)
	databaseUpdatesOnlyIn2.removeAll(databaseUpdates1)

	writeDatabaseUpdates(
                "databaseUpdatesIn_${DATABASE1_NAME}_AndNotIn_${DATABASE2_NAME}.txt",
		databaseUpdatesOnlyIn1)
	writeDatabaseUpdates(
                 "databaseUpdatesIn_${DATABASE2_NAME}_AndNotIn_${DATABASE1_NAME}.txt",
		databaseUpdatesOnlyIn2)

	println("Done!")
}

doYourThing() 

(Note that some lines have additional line breaks for readability.)

The script makes use of a class called DatabaseUpdate that implements Comparable. This makes the class usable in the standard Java Set collections. Sets have the property that thay contain only one item of an element that is considered equal.

The script starts by reading the database connection information from a properties file. This is for two databases that have the same schema (or at least the same table).

As you can see it is then very easy to retrieve all data from a table. With a closure of sql.eachRow we create DatabaseUpdate elements that are added to a TreeSet.

With simple removeAll() methods of the Sets, we can determine the outer areas of two overlapping circles (ven-diagrams they are called in Holland).

The tricky bit is to make sure which DatabaseUpdate elements are equal to eachother. This is also the flexible part, because you can decide to ignore certain variables of the elements or to put some other logic in the compare method.

In this case we only look at the script number and the result. The result can be success or failure for instance. A succes should not equal a failure! Also, the rundate is left out, because scripts can be run on different days on different databases.

Note that you need to implement three methods for this to work: compareTo(), equals() and hash(). See also the javadoc of Set, TreeSet and Comparable.

This simple example can easily be extended in more complex compare or diffs of database tables!