Package io.netty.buffer.search
Class AbstractSearchProcessorFactory
- java.lang.Object
-
- io.netty.buffer.search.AbstractSearchProcessorFactory
-
- All Implemented Interfaces:
SearchProcessorFactory
- Direct Known Subclasses:
BitapSearchProcessorFactory,KmpSearchProcessorFactory
public abstract class AbstractSearchProcessorFactory extends java.lang.Object implements SearchProcessorFactory
Base class for precomputed factories that createSearchProcessors.
Different factories implement different search algorithms with performance characteristics that depend on a use case, so it is advisable to benchmark a concrete use case with different algorithms before choosing one of them.
A concrete instance ofAbstractSearchProcessorFactoryis built for searching for a concrete sequence of bytes (theneedle), it contains precomputed data needed to perform the search, and is meant to be reused whenever searching for the sameneedle.
Note: implementations ofSearchProcessorscan theByteBufsequentially, one byte after another, without doing any random access. As a result, when usingSearchProcessorwith such methods asByteBuf.forEachByte(io.netty.util.ByteProcessor), these methods return the index of the last byte of the found byte sequence within theByteBuf(which might feel counterintuitive, and different fromByteBufUtil.indexOf(io.netty.buffer.ByteBuf, io.netty.buffer.ByteBuf)which returns the index of the first byte of found sequence).
ASearchProcessoris implemented as a Finite State Automaton that contains a small internal state which is updated with every byte processed. As a result, an instance ofSearchProcessorshould not be reused across independent search sessions (eg. for searching in differentByteBufs). A new instance should be created withAbstractSearchProcessorFactoryfor every search session. However, aSearchProcessorcan (and should) be reused within the search session, eg. when searching for all occurrences of theneedlewithin the samehaystack. That way, it can also detect overlapping occurrences of theneedle(eg. a string "ABABAB" contains two occurrences of "BAB" that overlap by one character "B"). For this to work correctly, after an occurrence of theneedleis found ending at indexidx, the search should continue starting from the indexidx + 1.
Example (given that thehaystackis aByteBufcontaining "ABABAB" and theneedleis "BAB"):SearchProcessorFactory factory = SearchProcessorFactory.newKmpSearchProcessorFactory(needle.getBytes(CharsetUtil.UTF_8)); SearchProcessor processor = factory.newSearchProcessor(); int idx1 = haystack.forEachByte(processor); // idx1 is 3 (index of the last character of the first occurrence of the needle in the haystack) int continueFrom1 = idx1 + 1; // continue the search starting from the next character int idx2 = haystack.forEachByte(continueFrom1, haystack.readableBytes() - continueFrom1, processor); // idx2 is 5 (index of the last character of the second occurrence of the needle in the haystack) int continueFrom2 = idx2 + 1; // continue the search starting from the next character int idx3 = haystack.forEachByte(continueFrom2, haystack.readableBytes() - continueFrom2, processor); // idx3 is -1 (no more occurrences of the needle) // After this search session is complete, processor should be discarded. // To search for the same needle again, reuse the same factory to get a new SearchProcessor.
-
-
Constructor Summary
Constructors Constructor Description AbstractSearchProcessorFactory()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static BitapSearchProcessorFactorynewBitapSearchProcessorFactory(byte[] needle)Creates aSearchProcessorFactorybased on Bitap string search algorithm.static KmpSearchProcessorFactorynewKmpSearchProcessorFactory(byte[] needle)Creates aSearchProcessorFactorybased on Knuth-Morris-Pratt string search algorithm.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface io.netty.buffer.search.SearchProcessorFactory
newSearchProcessor
-
-
-
-
Method Detail
-
newKmpSearchProcessorFactory
public static KmpSearchProcessorFactory newKmpSearchProcessorFactory(byte[] needle)
Creates aSearchProcessorFactorybased on Knuth-Morris-Pratt string search algorithm. It is a reasonable default choice among the provided algorithms.
Precomputation (this method) time is linear in the size of input (O(|needle|)).
The factory allocates and retains an int array of sizeneedle.length + 1, and retains a reference to theneedleitself.
Search (the actual application ofSearchProcessor) time is linear in the size ofByteBufon which the search is performed (O(|haystack|)). Every byte ofByteBufis processed only once, sequentially.- Parameters:
needle- an array of bytes to search for- Returns:
- a new instance of
KmpSearchProcessorFactoryprecomputed for the givenneedle
-
newBitapSearchProcessorFactory
public static BitapSearchProcessorFactory newBitapSearchProcessorFactory(byte[] needle)
Creates aSearchProcessorFactorybased on Bitap string search algorithm. It is a jump free algorithm that has very stable performance (the contents of the inputs have a minimal effect on it). The limitation is that theneedlecan be no more than 64 bytes long.
Precomputation (this method) time is linear in the size of the input (O(|needle|)).
The factory allocates and retains a long[256] array.
Search (the actual application ofSearchProcessor) time is linear in the size ofByteBufon which the search is performed (O(|haystack|)). Every byte ofByteBufis processed only once, sequentially.- Parameters:
needle- an array of no more than 64 bytes to search for- Returns:
- a new instance of
BitapSearchProcessorFactoryprecomputed for the givenneedle
-
-