Longest Common Substring

You are given two strings str1 and str2. Your task is to find the length of the longest common substring among the given strings.

Examples:

Input: str1 = "ABCDGH", str2 = "ACDGHR"
Output: 4
Explanation: The longest common substring is "CDGH" which has length 4.
Input: str1 = "ABC", str2 = "ACB"
Output: 1
Explanation: The longest common substrings are "A", "B", "C" all having length 1.

Expected Time Complexity: O(n*m).
Expected Auxiliary Space: O(n*m).

Constraints:
1<= str1.size(), str2.size()<=1000
Both strings may contain upper and lower case alphabets.


Approach 01:

#include <string>
#include <unordered_set>
#include <algorithm>

using namespace std;

class Solution {
public:
    // Function to check if there is a common substring of given windowSize
    bool isPossible(const string& str1, const string& str2, int windowSize) {
        unordered_set<string> substringsSet;
        int len1 = str1.size();
        int len2 = str2.size();

        // Add all substrings of length windowSize from str1 to the set
        for (int i = 0; i <= len1 - windowSize; ++i) {
            substringsSet.insert(str1.substr(i, windowSize));
        }

        // Check if any substring of length windowSize from str2 is in the set
        for (int i = 0; i <= len2 - windowSize; ++i) {
            string substring = str2.substr(i, windowSize);
            if (substringsSet.find(substring) != substringsSet.end()) {
                return true;
            }
        }

        return false;
    }

    // Function to find the length of the longest common substring between str1 and str2
    int longestCommonSubstr(const string& str1, const string& str2) {
        int low = 0;
        int high = str1.size();
        int longestLength = 0;

        // Binary search to find the length of the longest common substring
        while (low <= high) {
            int mid = (low + high) / 2;
            if (isPossible(str1, str2, mid)) {
                longestLength = mid;
                low = mid + 1;
            } else {
                high = mid - 1;
            }
        }

        return longestLength;
    }
};
class Solution:
    def isPossible(self, str1, str2, windowSize):
        substringsSet = set()
        for i in range(0, len(str1)):
            if i + windowSize > len(str1): break
            substringsSet.add(str1[i: i + windowSize])
            
        for i in range(0, len(str2)):
            if i + windowSize > len(str2): break
            substring = str2[i: i + windowSize]
            if substring in substringsSet:
                return True
        
        return False
            
    def longestCommonSubstr(self, str1, str2):
        low, high = 0, len(str1)
        longestLength = 0
        while low <= high:
            mid = (low + high) // 2
            if self.isPossible(str1, str2, mid):
                longestLength = mid
                low = mid + 1
            else:
                high = mid - 1
        
        return longestLength

Time Complexity

  • Binary Search:

    The binary search in the longestCommonSubstr function performs \( \log_2(n) \) iterations, where \( n \) is the length of the longer of the two input strings. Each iteration involves checking if a substring of length \( k \) (where \( k \) is the current mid-value) is common between the two strings.

  • Substring Check:

    In each binary search iteration, the isPossible function is called to check substrings of length \( k \). The time complexity for building the set of substrings from str1 is \( O((m – k + 1) \cdot k) \), where \( m \) is the length of str1. Checking if a substring of str2 exists in the set is \( O((n – k + 1) \cdot k) \), where \( n \) is the length of str2. Overall, for each binary search iteration, the complexity is \( O((m + n) \cdot k) \).

  • Overall Time Complexity:

    The overall time complexity combines binary search and substring checks. Given that binary search runs in \( O(\log n) \) and each iteration involves \( O((m + n) \cdot k) \), the complexity is \( O((m + n) \cdot \log n \cdot k) \). Since \( k \) can be at most \( \min(m, n) \), the overall complexity is approximately \( O((m + n) \cdot \log \min(m, n) \cdot \min(m, n)) \).

Space Complexity

  • Space for Substrings Set:

    The space complexity is dominated by the storage needed for the set of substrings in isPossible. In the worst case, this set could contain up to \( O(m \cdot k) \) substrings if all substrings of length \( k \) are unique. Hence, the space complexity for storing the set is \( O(m \cdot k) \).

  • Overall Space Complexity:

    The overall space complexity is \( O(m \cdot k) \) due to the set of substrings. This accounts for both storing substrings and auxiliary variables used in the algorithm.


Approach 02:

#include <string>
#include <algorithm>

using namespace std;

class Solution {
public:
    // Function to find the length of the longest common substring between str1 and str2
    int longestCommonSubstr(const string& str1, const string& str2) {
        int length1 = str1.size();
        int length2 = str2.size();
        int dp[length1 + 1][length2 + 1];

        // Initialize the dp table
        for (int i = 0; i <= length1; ++i) {
            for (int j = 0; j <= length2; ++j) {
                if (i == 0 || j == 0) {
                    dp[i][j] = 0;
                }
            }
        }

        int maxLength = 0;  // To store the length of the longest common substring

        // Fill the dp table
        for (int i = 1; i <= length1; ++i) {
            for (int j = 1; j <= length2; ++j) {
                if (str1[i - 1] == str2[j - 1]) {
                    dp[i][j] = 1 + dp[i - 1][j - 1];
                    maxLength = max(maxLength, dp[i][j]);
                } else {
                    dp[i][j] = 0;
                }
            }
        }

        return maxLength;  // Return the length of the longest common substring
    }
};
class Solution:
    def longestCommonSubstr(self, str1: str, str2: str) -> int:
        length1 = len(str1)
        length2 = len(str2)
        dp = [[0] * (length2 + 1) for _ in range(length1 + 1)]

        maxLength = 0  # To store the length of the longest common substring

        # Fill the dp table
        for i in range(1, length1 + 1):
            for j in range(1, length2 + 1):
                if str1[i - 1] == str2[j - 1]:
                    dp[i][j] = 1 + dp[i - 1][j - 1]
                    maxLength = max(maxLength, dp[i][j])
                else:
                    dp[i][j] = 0

        return maxLength  # Return the length of the longest common substring

Time Complexity

  • Initialization:

    The initialization of the dp table involves setting all values to 0. This operation takes \( O(m \cdot n) \) time, where \( m \) is the length of str1 and \( n \) is the length of str2.

  • Filling the DP Table:

    Filling the dp table involves iterating over each cell in the table. Since there are \( (m + 1) \cdot (n + 1) \) cells, and each cell computation takes constant time, the time complexity is \( O(m \cdot n) \).

  • Overall Time Complexity:

    The overall time complexity is \( O(m \cdot n) \), as the most time-consuming operation is filling the DP table.

Space Complexity

  • DP Table:

    The space complexity is dominated by the dp table, which is of size \( (m + 1) \cdot (n + 1) \). Therefore, the space complexity is \( O(m \cdot n) \).

  • Overall Space Complexity:

    The overall space complexity is \( O(m \cdot n) \) due to the storage requirements for the DP table.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top