Unlocking the Power of Java Strings: A Guide to Dynamic Programming, Backtracking, and Pattern Matching Algorithms

Venkat
12 min readMay 15, 2024

--

Also study about Modules in JDK 9 here.

String has been one of the most fascinating topics to gain knowledge on. From your birth, you have been introduced to strings, firstly as your name and then as words of different sizes or length. Since childhood, we have seen words as a whole. Here in this article, we would focus mainly on understanding the working of Strings and dealing with complex problems related to String. After that, we will also explore the updates Java has brought about in its subsequent versions in regard with Strings.

Prerequisites

This article expects that the user has a profound knowledge of strings in Java and has used it extensively in their programs. This article expects the user to be sound of some common methods of the String class.

String

Unlike other languages that implement strings as an array of characters, Java implements strings as objects of type String. Implementation of strings as objects makes it easier for the programmer to use it efficiently by exposing them to numerous String class methods. As a matter of fact, Java added to its language specification further the support of creating a String in a number of ways including creating a string from an character array. Constructors that helped making this happen were made available so as to nullify the programming language barrier for those who had started up their career in Java in the recent past and were used to languages like C and C++.

Introduction to StringBuffer and StringBuilder

The String class in Java is indeed immutable, which means that once a String object is created, its content cannot be changed. If you want to modify a String, it will create a new String object with the modified content.

To address this limitation and allow for mutable string manipulation, Java introduced two classes:

  1. StringBuffer: StringBuffer is a class that provides a mutable sequence of characters. You can use StringBuffer to perform various string operations without creating a new object each time you make a change. It's designed to be thread-safe, which means it can be used in a multi-threaded environment without the risk of data corruption. However, this thread safety comes with a performance cost, as there are locks involved to ensure thread safety.
  2. StringBuilder: StringBuilder is similar to StringBuffer in that it provides mutable character sequences. The key difference is that StringBuilder is not thread-safe, which means it's more efficient in a single-threaded context. Since there are no locks, it can be faster when you don't need to worry about concurrent modifications from multiple threads. It's the preferred choice for string manipulation in most situations where thread safety isn't a concern.

In summary, StringBuffer and StringBuilder were introduced in Java to provide alternatives to the immutable String class for situations where you need to perform efficient, mutable string operations. StringBuffer is thread-safe but potentially slower in single-threaded applications, while StringBuilder is not thread-safe but more efficient when used in a single-threaded context. Developers choose between them based on the specific requirements of their application.

String Pattern Matching Algorithms

1. Naive String Search

This is the simplest pattern matching algorithm. It checks for a pattern by moving through the text one character at a time.

Pseudocode:

for i from 0 to N-M
for j from 0 to M
if text[i+j] != pattern[j]
break
if j == M
pattern found at position i

Where N is the length of the text and M is the length of the pattern.

2. KMP (Knuth Morris Pratt) Algorithm

This algorithm avoids unnecessary comparisons by using a prefix table.

Pseudocode:

Preprocessing:
Compute the prefix table for pattern
Searching:
while i < N
if pattern[j] == text[i]
increment i and j
if j == M
pattern found at position i-j
set j to prefix[j-1]
else if i < N and pattern[j] != text[i]
if j != 0
set j to prefix[j-1]
else
increment i

3. Boyer-Moore Algorithm

This algorithm skips characters while matching, making it more efficient in many cases.

Pseudocode:

Preprocessing:
Compute the bad character and good suffix table for pattern
Searching:
while i <= N-M
set j to M-1
while j >= 0 and pattern[j] == text[i+j]
decrement j
if j < 0
pattern found at position i
if i < N-M
set i to i + M - min(j, 1 + badChar[text[i+M]])
else
increment i
else
set i to i + max(1, j - badChar[text[i+j]])

4. Rabin-Karp Algorithm

This algorithm uses hashing to find any one of a set of pattern strings in a text.

Pseudocode:

Compute hash value for pattern and for text's first window of pattern's length
for each position i in text from 1 to N-M+1
if hash values match
check for a character by character match
if hash values do not match
compute hash value for next window of text
remove leading digit, add trailing digit for next window

Here is a modular and functional implementation of the Rabin-Karp algorithm in Java:

import java.util.Arrays;
public class RabinKarp {
private static final long Q = 1000000007; // A large prime number
private static final int R = 256; // Alphabet size
public static int search(String text, String pattern) {
int M = pattern.length();
int N = text.length();
if (M > N) return -1; // Pattern is longer than text
long patternHash = computeHash(pattern, M);
long textHash = computeHash(text, M);
if (patternHash == textHash && check(0, text, pattern))
return 0; // Match at the beginning
long RM = 1; // Compute R^(M-1) % Q for later use
for (int i = 1; i <= M - 1; i++)
RM = (R * RM) % Q;
for (int i = M; i < N; i++) {
// Remove leading digit, add trailing digit, check for match
textHash = (textHash + Q - RM * text.charAt(i - M) % Q) % Q;
textHash = (textHash * R + text.charAt(i)) % Q;
if (patternHash == textHash && check(i - M + 1, text, pattern))
return i - M + 1; // Match
}
return -1; // No match found
}
private static long computeHash(String key, int M) {
// Compute hash for key[0..M-1]
long h = 0;
for (int j = 0; j < M; j++)
h = (R * h + key.charAt(j)) % Q;
return h;
}
private static boolean check(int i, String text, String pattern) {
// Check if pattern[0..M-1] equals text[i..i-M+1]
return pattern.equals(text.substring(i, i + pattern.length()));
}
public static void main(String[] args) {
String text = "THIS IS A TEST TEXT";
String pattern = "TEST";
System.out.println("Pattern found at index: " + search(text, pattern));
}
}

This code first computes the hash for the pattern and the initial window of text. It then slides the window over the text one character at a time, updating the hash and checking for a match at each step. The computeHash() method calculates the hash of a string, and the check() method checks if the pattern matches the text at a given position.

String Pattern Matching

String pattern matching is a fundamental aspect of many applications, from validating user input to parsing text files. In Java, string pattern matching is performed using the Pattern and Matcher classes, which are part of the java.util.regex package.

The Pattern class represents a compiled representation of a regular expression. This class provides several static factory methods for creating a pattern, such as Pattern.compile(String regex), which returns a new instance of Pattern that can be used to create a Matcher for a given input.

The Matcher class is used to perform match operations on a character sequence using the previously created Pattern. The Matcher class provides a number of methods for various types of matching operations, such as matches(), find(), and group().

Here’s a simple example of using Pattern and Matcher to perform string pattern matching:

String text = "Hello, World!";
Pattern pattern = Pattern.compile("World");
Matcher matcher = pattern.matcher(text);
if (matcher.find()) {
System.out.println("Match found!");
} else {
System.out.println("Match not found.");
}

In the above example, we first create a Pattern using the static compile() method, passing in the regex pattern we want to match. Then we create a Matcher for our input string using the matcher() method of our Pattern object. Finally, we use the find() method of the Matcher to see if our pattern can be found in the input string.

It’s important to note that the Pattern and Matcher classes are not limited to simple string matching. They support a wide variety of complex pattern matching operations, including group matching, backreferences, and lookahead/lookbehind assertions. This makes them a powerful tool for any Java developer working with text processing.

Solving Subsequence Problems

Subsequence problems in string processing often involve questions about finding, counting, or manipulating subsequences of a string. A subsequence is any sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements. In Java, you can solve these types of problems using various methods and techniques.

One common approach is to use dynamic programming, which is a method for solving complex problems by breaking them down into simpler subproblems and solving each of these subproblems only once, storing their results using a memory-based data structure (array, map, etc.). For example, if you wanted to find the longest common subsequence between two strings, you could use a two-dimensional array to store the length of the longest common subsequence for each pair of prefixes of the two strings.

Another approach is to use recursion, which solves a problem by making the solution depend on solutions to smaller instances of the same problem. For example, if you wanted to find all subsequences of a string, you could use a recursive method that, for each character, either includes it in the subsequence or not.

Here’s a simple example of using recursion to find all subsequences of a string:

void printSubsequences(String input, String output) {
// Base Case - If the input is empty, print the output string
if (input.isEmpty()) {
System.out.println(output);
return;
}
// Recursive Case
// Exclude the first character and solve the rest of the string
printSubsequences(input.substring(1), output);
// Include the first character and solve the rest of the string
printSubsequences(input.substring(1), output + input.charAt(0));
}

In the above example, we recursively call the method printSubsequences twice: once without the first character of the input string, and once with the first character added to the output string. This ensures that we explore all possible subsequences of the input string.

Dynamic Programming with Strings

Dynamic programming is a powerful technique that allows us to solve complex problems by breaking them down into simpler subproblems. When dealing with strings, dynamic programming can be incredibly useful for solving a variety of problems.

One common type of problem involves finding the longest common subsequence (LCS) between two strings. The LCS problem can be solved using a two-dimensional dynamic programming table where the cell at the i-th row and j-th column represents the length of the LCS of the prefixes of the two strings up to the i-th and j-th character, respectively.

Here’s a simplified Java code snippet for solving the LCS problem:

int[][] dp = new int[str1.length() + 1][str2.length() + 1];
for (int i = 1; i <= str1.length(); i++) {
for (int j = 1; j <= str2.length(); j++) {
if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1] + 1;
} else {
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
int lengthOfLCS = dp[str1.length()][str2.length()];

In the above code, we first initialize a 2D array dp where dp[i][j] will hold the length of the LCS of the first i characters of str1 and the first j characters of str2. Then we iterate over the characters of str1 and str2. If the current characters of str1 and str2 are the same, we add 1 to the length of the LCS found so far. If they are not the same, we take the maximum length of the LCS found so far. The length of the LCS will be in the cell dp[str1.length()][str2.length()].

Another common problem involves finding the shortest common supersequence (SCS) between two strings. The SCS is the shortest sequence that has both strings as subsequences. This problem can be solved similarly to the LCS problem, but instead of looking for the longest common subsequence, we look for the shortest sequence that includes both strings.

Solving the Shortest Common Supersequence (SCS) Problem

The shortest common supersequence problem, like the longest common subsequence problem, can also be solved using dynamic programming. In this problem, we want to find the shortest sequence that includes both input strings as subsequences.

Here is a Java code snippet for solving the SCS problem:

int[][] dp = new int[str1.length() + 1][str2.length() + 1];
// Initialization
for (int i = 0; i <= str1.length(); i++) {
dp[i][0] = i;
}
for (int j = 0; j <= str2.length(); j++) {
dp[0][j] = j;
}
// Fill dp table
for (int i = 1; i <= str1.length(); i++) {
for (int j = 1; j <= str2.length(); j++) {
if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
dp[i][j] = 1 + dp[i - 1][j - 1];
} else {
dp[i][j] = 1 + Math.min(dp[i - 1][j], dp[i][j - 1]);
}
}
}
int lengthOfSCS = dp[str1.length()][str2.length()];

In the above code, we initialize a 2D array dp where dp[i][j] will hold the length of the SCS of the first i characters of str1 and the first j characters of str2. The essential difference from the LCS problem is that when the current characters of str1 and str2 are not the same, we add 1 to the minimum length of the SCS found so far. The length of the SCS will be in the cell dp[str1.length()][str2.length()].

Dynamic programming can also be used to solve many other string-related problems, such as the edit distance problem, palindrome-related problems, and many more. The key to solving these problems is to identify the subproblems, define the recurrence relation, and fill up the dynamic programming table in the correct order.

Backtracking with Strings

Backtracking is another powerful technique used in string processing. It is an efficient algorithmic technique that solves problems recursively by trying to build a solution incrementally, one piece at a time. If a solution cannot be found after a sequence of steps, it backtracks to try another path.

One common string-related problem where backtracking can be used is generating all permutations of a string. Here is a simple Java function that uses backtracking to solve this problem:

void permute(String str, int left, int right) {
if (left == right)
System.out.println(str);
else {
for (int i = left; i <= right; i++) {
str = swap(str, left, i);
permute(str, left + 1, right);
str = swap(str, left, i); // backtrack
}
}
}
String swap(String str, int i, int j) {
char[] charArray = str.toCharArray();
char temp = charArray[i];
charArray[i] = charArray[j];
charArray[j] = temp;
return String.valueOf(charArray);
}

In the above code, we recursively call the function permute() for each character in the string. If the left index equals the right index, it means we have a permutation, so we print it. If not, we swap the left character with each of the characters from left to right and make a recursive call to generate the remaining permutations. After each recursive call, we perform a second swap to backtrack and restore the original string.

Enhancements in String Handling and Pattern Matching in Java

Java has seen significant enhancements in how it handles Strings and pattern matching in its subsequent versions, notably the introduction of text blocks and new APIs for String processing.

Text Blocks (Since Java 13)

Java 13 introduced text blocks as a preview feature, which became a standard feature in Java 15. Text blocks allow you to embed multiline string literals in your code without the need for most escape sequences. It makes the creation of HTML, XML, and JSON text, as well as multiline log messages and SQL queries, more convenient and readable. Here is how you may use text blocks:

String textBlock = """
Hello, World!
Welcome to Java.
Enjoy coding!
""";

String Methods (Since Java 11)

Java 11 added a handful of useful instance methods to the String class:

  • String::strip, String::stripLeading, String::stripTrailing: These methods are "Unicode-aware" evolution of trim.
  • String::isBlank: Checks if the string is empty or contains only white space.
  • String::lines: Returns a stream of lines extracted from this string, separated by line terminators.
  • String::repeat: Returns a string whose value is the concatenation of this string repeated count times.

Pattern Matching for instanceof (Since Java 16)

Java 16 introduced Pattern Matching for instanceof as a standard feature, which simplifies common coding patterns. It allows us to combine testing whether an object is of a certain type and casting that object to that type.

Object obj = "Hello, World!";
if (obj instanceof String s) {
System.out.println(s.toLowerCase());
}

Pattern Matching for Switch (Preview in Java 17)

Java 17 introduced Pattern Matching for switch as a preview feature. It brings several enhancements to the switch statement, including pattern matching and switching on patterns.

Object obj = "Hello, World!";
switch (obj) {
case String s -> System.out.println(s.toLowerCase());
case Integer i -> System.out.println(Math.abs(i));
default -> System.out.println("Unknown type");
}

These enhancements make string handling and pattern matching in Java more powerful and flexible, improving the readability and maintainability of your code.

Conclusion

To wrap things up, string processing and pattern matching are essential elements of Java programming. They provide us with the ability to work with and modify text, discover patterns, and so much more. The String, StringBuffer, StringBuilder, Pattern, and Matcher classes in Java form a powerful toolkit for string manipulation, offering efficient and effective solutions. Additionally, techniques like dynamic programming and backtracking bring an extra level of flexibility and efficiency, allowing us to tackle complex string-related problems.

Recently, Java has seen several upgrades such as the addition of text blocks and new APIs for string processing. These enhancements have made Java even more potent and user-friendly. So, whether you’re building a straightforward application or wrestling with a complex algorithm, a solid grasp of these topics will no doubt be extremely beneficial in your programming journey.

--

--

Venkat
Venkat

Written by Venkat

A High School student making Java simple to comprehend and understand with Project Java.

Responses (1)