LeetCode 524 - Longest Word in Dictionary through Deleting


https://leetcode.com/problems/longest-word-in-dictionary-through-deleting
Given a string and a string dictionary, find the longest string in the dictionary that can be formed by deleting some characters of the given string. If there are more than one possible results, return the longest word with the smallest lexicographical order. If there is no possible result, return the empty string.
Example 1:
Input:
s = "abpcplea", d = ["ale","apple","monkey","plea"]

Output: 
"apple"
Example 2:
Input:
s = "abpcplea", d = ["a","b","c"]

Output: 
"a"
Note:
  1. All the strings in the input will only contain lower-case letters.
  2. The size of the dictionary won't exceed 1,000.
  3. The length of all the strings in the input won't exceed 1,000.


https://www.careercup.com/question?id=5757216146587648

https://techdevguide.withgoogle.com/paths/foundational/find-longest-word-in-dictionary-that-subsequence-of-given-string/#!
If you know how to check for subsequences, and are trying to optimize: Can you preprocess S in some way to make checking dictionary words against it very efficient?


How would you approach the question if you assumed that W*N is a lot larger than L, and that more memory can be used to speed things up. For example, what if you preprocessed D, not S

Would your approach change if the size of the searched-in string was much larger than the size of the dictionary?

Consider the trade-offs among different data structures for representing preprocessed data. What kind of preprocessing would be accommodated by a tree? A dictionary? A tree or dictionary of … and so on?
Brute force
Generate all 2^n possible subsequences, and check each one against the dictionary for a match. A slight optimization is to at least ensure the dictionary is represented by a hash table or prefix tree so that lookups are efficient.
Check each dictionary word using a greedy algorithm
You can prove that a greedy algorithm works for checking if a word w is a subsequence of S. Scan S from the beginning for w[0]. After you find it, continue scanning from that point for w[1], and so on, until either you run out of characters in S (w is not a subsequence), or you find all the characters in w (w is a subsequence).
You could sort the dictionary words in order of descending length, run the above decision procedure for each word, and take the first word that is a subsequence of S. The running time is O(N*W) where W is the number of words in D and N is the number of characters in S.
O(N*W) is asymptotically optimal if most dictionary words are close to size N (because then the size of the input is O(N*W)). However, it is far from optimal in a worst-case scenario where S may be long and the dictionary strings quite short.
Define L to be the total number of characters in the dictionary over all words. Then the best possible time complexity (assuming words shorter than S) is O(+ L). However, the existing algorithm is O(* W), which might be O(* L) in the case where dictionary words are a small constant in size.
Improving the greedy approach
Notice that to check a dictionary word w, you end up needing to know, for each character c in w, the least index i in S where S[i] == c, such that i is greater than some given index j (the index of the previously-matched letter). Observe that the greedy algorithm is slow precisely because it naively scans for this index i.
You can preprocess S to find such indices much faster. Build a map of letter -> sorted list of indices where letter occurs. For example, if == "abppplee":
a -> [0]b -> [1]p -> [2, 3, 4]l -> [5]e -> [6, 7]
When you need to know "Given my last matched character was at index X and my next character to match is Y, where does this match occur?", you would look up Y in the map, and then binary search the occurrence list to find the smallest number > X in that list, or report that none exists (the word is not a subsequence then).
This approach requires one binary search per dictionary letter. Since it takes only O(N) time to preprocess S in this way, the total complexity is O(+ L * log N), which is quite close to the best possible complexity.
Advanced approach: You might note that since this is a classic "successor-value" problem, specialized data structures exist that can run the search even faster. A vEB tree, for example, improves the runtime to O(+ L * log log N).
An optimal O(N + L) approach for small alphabets
If the size of the alphabet is a-z or a small constant, we can give a solution based off the previous idea that runs in O(+ L).
The real complexity is O(N*+ L), where A is the size of the alphabet. Instead of making the representation sparse, you could make it dense. Instead of -> [2, 3, 4] you could use p-> [2, 2, 3, 4, -1, -1, -1, -1] where the i-th index directly yields the answer to the query. This requires O(N) space per alphabet letter (O(NA) total) and also that much time during preprocessing. It is therefore not suitable for large alphabets.
An optimal O(N + L) approach for any alphabet
Instead of processing the words one at a time, we will process all words simultaneously.
First, for each word w in D, create a 2-tuple containing w and the number 0 (i.e. (w, 0)). The number refers to the number of characters in w that have already been found in S; initially, no characters have been found. For a tuple t, we'll refer to the word as t.w, and to the number as t.i (since it is an index).
Group the words by t.w[t.i] (initially t.i is 0, so initially it is by first letter). For our example dictionary = {"able", "ale", "apple", "bale", "kangaroo"}, we’ll have:
Java 
a -> [("able", 0), ("ale", 0), ("apple", 0)]b -> [("bale", 0)]k -> [("kangaroo", 0)]
What this is doing is telling you what words you will make progress on finding for each possible letter that you might see next in S.
Scan S one character at a time. If the character you just scanned is c, for each tuple t in map[c], increment t.i by 1 and transfer t to map[t.w[t.i]]. The case where t.==t.w.length is special - transfer these words to a "found" list. In the end, the output is the longest word in the found list
def find_longest_word_in_string(letters, words):    letter_positions = collections.defaultdict(list)    # For each letter in 'letters', collect all the indices at which it appears.    # O(#letters) space and speed.    for index, letter in enumerate(letters):        letter_positions[letter].append(index)    # For words, in descending order by length...    # Bails out early on first matched word, and within word on    # impossible letter/position combinations, but worst case is    # O(#words # avg-len) * O(#letters / 26) time; constant space.    # With some work, could be O(#W * avg-len) * log2(#letters/26)    # But since binary search has more overhead    # than simple iteration, log2(#letters) is about as     # expensive as simple iterations as long as     # the length of the arrays for each letter is    # “small”.  If letters are randomly present in the    # search string, the log2 is about equal in speed to simple traversal    # up to lengths of a few hundred characters.                  for word in sorted(words, key=lambda w: len(w), reverse=True):        pos = 0        for letter in word:            if letter not in letter_positions:                break        # Find any remaining valid positions in search string where this        # letter appears.  It would be better to do this with binary search,        # but this is very Python-ic.        possible_positions = [p for p in letter_positions[letter] if p >= pos]        if not possible_positions:            break        pos = possible_positions[0] + 1        else:            # We didn't break out of the loop, so all letters have valid positions              return word




public String findLongestWordInString(String str, List<String> words) {
        Collections.sort(words, new Comp());
        char[] chArr = str.toCharArray();

        HashMap<Character, List<Integer>> charToIndexeMap = new HashMap();
        for (int i = 0; i < chArr.length; i++) {
            char c = chArr[i];

            if (charToIndexeMap.containsKey(c)) {
                charToIndexeMap.get(c).add(i);
            } else {
                ArrayList<Integer> indexes = new ArrayList<>();
                indexes.add(i);
                charToIndexeMap.put(c, indexes);
            }
        }

        for (String word : words) {
            int pos = 0;
            for (Character c : word.toCharArray()) {
                List<Integer> possiblePosition = new ArrayList<>();

                if (!charToIndexeMap.containsKey(c))
                    break;

                for (Integer p : charToIndexeMap.get(c)) {
                    if (p >= pos) // use binary search here
                        possiblePosition = charToIndexeMap.get(c);
                }

                if (possiblePosition.size() == 0)
                    break;

                pos = possiblePosition.get(0) + 1;

                return word;
            }
        }
        return null;
    }


X.
http://www.cnblogs.com/grandyang/p/6523344.html
上面的方法中用到了sort排序,对于数组中单词的数量不是很多,但是长度特别长的情况很适用,但是如果对于单词数量特别多,但是每个都不长的话,排序的O(nlgn)时间复杂度可能就不是最好的方法了,也可以不用排序来做。我们遍历字典中的单词,然后还是跟上面的方法一样来验证当前单词是否能由字符串s通过删除字符来得到,如果能得到,而且单词长度大于等于结果res的长度,我们再看是否需要更新结果res,有两种情况是必须要更新结果res的,一个是当前单词长度大于结果res的长度,另一种是当前单词长度和res相同,但是字母顺序小于结果res,这两种情况下更新结果res即可
https://discuss.leetcode.com/topic/81474/fast-java-solution-19ms-beats-97-using-indexof/3
public String findLongestWord(String s, List<String> d) {
    String longest = "";
    for (String word : d)
        if (isBetter(word, longest) && isSubsequence(word, s))
            longest = word;
    return longest;
}

private boolean isBetter(String a, String b) {
    return a.length() > b.length() ||
           a.length() == b.length() && a.compareTo(b) < 0;
}

private boolean isSubsequence(String a, String b) {
    int start = -1;
    for (int i = 0; i < a.length(); i++){
        start = b.indexOf(a.charAt(i), start + 1);
        if (start < 0)
            return false;
    }
    return true;
}

https://discuss.leetcode.com/topic/80799/short-java-solutions-sorting-dictionary-and-without-sorting
We sort the input dictionary by longest length and lexicography. Then, we iterate through the dictionary exactly once. In the process, the first dictionary word in the sorted dictionary which appears as a subsequence in the input string s must be the desired solution.
public String findLongestWord(String s, List<String> d) {
    Collections.sort(d, (a,b) -> a.length() != b.length() ? -Integer.compare(a.length(), b.length()) :  a.compareTo(b));
    for (String dictWord : d) {
        int i = 0;
        for (char c : s.toCharArray()) 
            if (i < dictWord.length() && c == dictWord.charAt(i)) i++;
        if (i == dictWord.length()) return dictWord;
    }
    return "";
}
An alternate, more efficient solution which avoids sorting the dictionary:
public String findLongestWord(String s, List<String> d) {
    String longest = "";
    for (String dictWord : d) {
        int i = 0;
        for (char c : s.toCharArray()) 
            if (i < dictWord.length() && c == dictWord.charAt(i)) i++;

        if (i == dictWord.length() && dictWord.length() >= longest.length()) 
            if (dictWord.length() > longest.length() || dictWord.compareTo(longest) < 0)
                longest = dictWord;
    }
    return longest;
}
Time Complexity: O(nk), where n is the length of string s and k is the number of words in the dictionary.

https://leetcode.com/articles/longest-word-in-dictionary-through-deletion/
  • Time complexity : O(2^n)2^n strings are generated.
  • Space complexity : O(2^n). List l contains 2^n strings.
  public String findLongestWord(String s, List<String> d) {
    HashSet<String> set = new HashSet<>(d);
    List<String> l = new ArrayList<>();
    generate(s, "", 0, l);
    String max_str = "";
    for (String str : l) {
      if (set.contains(str))
        if (str.length() > max_str.length() || (str.length() == max_str.length() && str.compareTo(max_str) < 0))
          max_str = str;
    }
    return max_str;
  }

  public void generate(String s, String str, int i, List<String> l) {
    if (i == s.length())
      l.add(str);
    else {
      generate(s, str + s.charAt(i), i + 1, l);
      generate(s, str, i + 1, l);
    }

  }

  public String findLongestWord(String s, List<String> d) {
    HashSet<String> set = new HashSet<>(d);
    List<String> l = new ArrayList<>();
    for (int i = 0; i < (1 << s.length()); i++) {
      String t = "";
      for (int j = 0; j < s.length(); j++) {
        if (((i >> j) & 1) != 0)
          t += s.charAt(j);
      }
      l.add(t);
    }
    String max_str = "";
    for (String str : l) {
      if (set.contains(str))
        if (str.length() > max_str.length() || (str.length() == max_str.length() && str.compareTo(max_str) < 0))
          max_str = str;
    }
    return max_str;

  }

X. DFS
The biggest difference of your code to the topic poster's code is you use the dictionary as the pattern and try to fit the string which can be very long (well, the word in the dictionary can also be very long. If the length of the words in the dictionary have the same length with String s, there should be no difference). But @compton_scatter uses the string as the pattern and try to fit the words in the dictionary. I think the test case may affect the result to some extent. And the sorted solution could be more reliable than the unsorted one.
BTW, I also had the TLE problem here, I used Trie though.
    String max;
    public String findLongestWord(String s, List<String> d) {
        max = "";
        backtrack(s, d, new StringBuilder(), 0);
        return max;
    }
    void backtrack(String s, List<String> d, StringBuilder sb, int start){
        if(d.contains(sb.toString())&&sb.length()>max.length()){
            max = sb.toString();
        }else if(d.contains(sb.toString())&&sb.length()==max.length()){
            max = max.compareTo(sb.toString())>0?(sb.toString()):max;
        }
        for(int i = start;i<s.length();i++){
            sb.append(s.charAt(i));
            backtrack(s, d, sb, i+1);
            sb = sb.deleteCharAt(sb.length()-1);
        }
    }
private void helper(String s, int i, TrieNode curr){
    if(curr.word!=null){
        if(curr.word.length()>max.length()){
            max = curr.word;
        }
        else if(curr.word.length()==max.length()){
            if(curr.word.compareTo(max)<0){
                max = curr.word;
            }
        }
    }
    if(i==s.length()||curr.maxlen<max.length()) return;
    char c = s.charAt(i);
    if(curr.children[c-'a']!=null){
        helper(s,i+1,curr.children[c-'a']);
    }
    helper(s,i+1,curr);
}

private void buildTrie(TrieNode root, List<String> d){
    for(String s: d){
        TrieNode curr = root;
        curr.maxlen = Math.max(curr.maxlen,s.length());
        for(int i = 0;i<s.length();i++){
            char c = s.charAt(i);
            if(curr.children[c-'a']==null){
                curr.children[c-'a'] = new TrieNode();
            }
            curr = curr.children[c-'a'];
            curr.maxlen = Math.max(curr.maxlen,s.length());
        }
        curr.word = s;
    }
}


class TrieNode{
    String word;
    TrieNode[] children;
    int maxlen;//even adding this var, trying to make the search end earlier, still TLE.
    public TrieNode(){
        word = null;
        maxlen = 0;
        children = new TrieNode[26];
    }
}



Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts