Your Computer Science Resource

HOW-TO: Use the Levenshtein Distance Algorithm

For an anti-plagiarism application I developed about a year ago, I needed a way to calculate the similarity between two strings. I wasn’t sure how to go about doing this at first, but then I came across the Levenshtein Distance Algorithm which gets the job done well. In this example, I’ll show you my Distance class implementing the Levenshtein Distance Algorithm, and how to use it to compare strings (specifically the minimum distance between two strings needed to make them equal). This is also very useful for determining how similar (or dissimilar) two strings are.

The Code

The following is my Distance class. To use it, pass in two strings that you want to compare. The resulting integer will be the “distance” between those strings. The bulk of the work is done in the calculateDistance() method, which is called from the overloaded constructor automatically. The algorithm (in pseudo-code) is mentioned in detail here.

public class Distance {

	private int distance;

	public Distance() {}

	/**
	 * Distance Constructor
	 * @param s
	 * @param t
	 */

	public Distance( String s, String t ) {

		this.distance = calculateDistance( s, t );

	} // end Distance constructor

	/**
	 * This method is a public helper to
	 * call the calculateDistance method.
	 *
	 * @param String s
	 * @param String t
	 */

	public void compare( String s, String t ) {

		this.distance = calculateDistance( s, t );

	} // end method compare()

	/**
	 * This method returns the distance.
	 *
	 * @return int distance
	 */

	public int getDistance() {

		return distance;

	} // end method getDistance()

	/**
	 * This method calculates the distance
	 * from String s to String t.
	 *
	 * @param String s
	 * @param String t
	 * @return int
	 */

	private int calculateDistance( String s, String t ) {

		int matrix[][];	// 2-dimensional array
		int sLength;	// Length of s
		int tLength;	// Length of t
		int i;			// Iterates through s
		int j;			// Iterates through t
		char s_i;		// ith character of s
		char t_j;		// jth character of t
		int cost;		// Cost

		sLength = s.length();	// Length of s
		tLength = t.length();	// Length of t

		/**
		 * Check if string lengths are 0.
		 */

		if ( sLength == 0 ) return 0;
		if ( tLength == 0 ) return 0;

		/**
		 * Check if strings are equal
		 */

		if ( s.equals( t ) ) return 0;

		/**
		 * Initialize Matrix
		 */

		matrix = new int[ sLength + 1 ][ tLength + 1 ];

		for ( i = 0; i <= sLength; i++ )
			matrix[i][0] = i;

		for ( j = 0; j <= tLength; j++ )
			matrix[0][j] = j;

		/**
		 * Perform Levenshtein Distance
		 * algorithm
		 */

		for ( i = 1; i <= sLength; i++ ) {

			s_i = s.charAt( i - 1 );

			for ( j = 1; j <= tLength; j++ ) {

				t_j = t.charAt( j - 1 );

				if ( s_i == t_j )
					cost = 0;

				else
					cost = 1;

				matrix[i][j] = getMinimum( matrix[ i - 1 ][ j ] + 1, matrix[ i ][ j - 1 ] + 1, matrix[ i - 1 ][ j - 1 ] + cost );

			} // end for loop

		} // end for loop

		return matrix[ sLength ][ tLength ];

	} // end method calculateDistance()

	/**
	 * This method returns the minimum of
	 * three integers.
	 *
	 * @param int a
	 * @param int b
	 * @param int c
	 * @return int
	 */

	private int getMinimum( int a, int b, int c ) {

		int min = a; // Set min to a

		/**
		 * b is min
		 */

		if ( b < min ){

			min = b;

		} // end if

		/**
		 * c is min
		 */

		if ( c < min ) {

			min = c;

		} // end if

		return min;

	} // end method getMinimum()

} // end class Distance

Example Usage

Example #1

Distance d = new Distance();
d.compare( "testing1", "testing2" ); // Compare two strings
int distance = d.getDistance();  // Get the distance

Example #2

Distance d = new Distance( "testing1", "testing2" );  // The compare method is called automatically
int distance = d.getDistance();  // Get the distance

Note that the smaller the distance, the more similar the strings are (where a distance of zero indicates equality). Conversely, the greater the distance, the more different the strings are.

Questions or comments? Leave them below.

Tags: , , ,

Leave a Reply