Class AbstractMultiSearchProcessorFactory

  • All Implemented Interfaces:
    MultiSearchProcessorFactory, SearchProcessorFactory
    Direct Known Subclasses:
    AhoCorasicSearchProcessorFactory

    public abstract class AbstractMultiSearchProcessorFactory
    extends java.lang.Object
    implements MultiSearchProcessorFactory
    Base class for precomputed factories that create MultiSearchProcessors.
    The purpose of MultiSearchProcessor is to perform efficient simultaneous search for multiple needles in the haystack, while scanning every byte of the input sequentially, only once. While it can also be used to search for just a single needle, using a SearchProcessorFactory would be more efficient for doing that.
    See the documentation of AbstractSearchProcessorFactory for a comprehensive description of common usage. In addition to the functionality provided by SearchProcessor, MultiSearchProcessor adds a method to get the index of the needle found at the current position of the MultiSearchProcessor - MultiSearchProcessor.getFoundNeedleId().
    Note: in some cases one needle can be a suffix of another needle, eg. {"BC", "ABC"}, and there can potentially be multiple needles found ending at the same position of the haystack. In such case MultiSearchProcessor.getFoundNeedleId() returns the index of the longest matching needle in the array of needles.
    Usage example (given that the haystack is a ByteBuf containing "ABCD" and the needles are "AB", "BC" and "CD"):
          MultiSearchProcessorFactory factory = MultiSearchProcessorFactory.newAhoCorasicSearchProcessorFactory(
              "AB".getBytes(CharsetUtil.UTF_8), "BC".getBytes(CharsetUtil.UTF_8), "CD".getBytes(CharsetUtil.UTF_8));
          MultiSearchProcessor processor = factory.newSearchProcessor();
    
          int idx1 = haystack.forEachByte(processor);
          // idx1 is 1 (index of the last character of the occurrence of "AB" in the haystack)
          // processor.getFoundNeedleId() is 0 (index of "AB" in needles[])
    
          int continueFrom1 = idx1 + 1;
          // continue the search starting from the next character
    
          int idx2 = haystack.forEachByte(continueFrom1, haystack.readableBytes() - continueFrom1, processor);
          // idx2 is 2 (index of the last character of the occurrence of "BC" in the haystack)
          // processor.getFoundNeedleId() is 1 (index of "BC" in needles[])
    
          int continueFrom2 = idx2 + 1;
    
          int idx3 = haystack.forEachByte(continueFrom2, haystack.readableBytes() - continueFrom2, processor);
          // idx3 is 3 (index of the last character of the occurrence of "CD" in the haystack)
          // processor.getFoundNeedleId() is 2 (index of "CD" in needles[])
    
          int continueFrom3 = idx3 + 1;
    
          int idx4 = haystack.forEachByte(continueFrom3, haystack.readableBytes() - continueFrom3, processor);
          // idx4 is -1 (no more occurrences of any of the needles)
    
          // This search session is complete, processor should be discarded.
          // To search for the same needles again, reuse the same AbstractMultiSearchProcessorFactory
          // to get a new MultiSearchProcessor.
     
    • Constructor Detail

      • AbstractMultiSearchProcessorFactory

        public AbstractMultiSearchProcessorFactory()
    • Method Detail

      • newAhoCorasicSearchProcessorFactory

        public static AhoCorasicSearchProcessorFactory newAhoCorasicSearchProcessorFactory​(byte[]... needles)
        Creates a MultiSearchProcessorFactory based on Aho–Corasick string search algorithm.
        Precomputation (this method) time is linear in the size of input (O(Σ|needles|)).
        The factory allocates and retains an array of 256 * X ints plus another array of X ints, where X is the sum of lengths of each entry of needles minus the sum of lengths of repeated prefixes of the needles.
        Search (the actual application of MultiSearchProcessor) time is linear in the size of ByteBuf on which the search is performed (O(|haystack|)). Every byte of ByteBuf is processed only once, sequentually, regardles of the number of needles being searched for.
        Parameters:
        needles - a varargs array of arrays of bytes to search for
        Returns:
        a new instance of AhoCorasicSearchProcessorFactory precomputed for the given needles