public abstract class AbstractSearchProcessorFactory extends Object implements SearchProcessorFactory
SearchProcessor
s.
AbstractSearchProcessorFactory
is built for searching for a concrete sequence of bytes
(the needle
), it contains precomputed data needed to perform the search, and is meant to be reused
whenever searching for the same needle
.
SearchProcessor
scan the ByteBuf
sequentially,
one byte after another, without doing any random access. As a result, when using SearchProcessor
with such methods as ByteBuf.forEachByte(io.netty.util.ByteProcessor)
, these methods return the index of the last byte
of the found byte sequence within the ByteBuf
(which might feel counterintuitive,
and different from ByteBufUtil.indexOf(io.netty.buffer.ByteBuf, io.netty.buffer.ByteBuf)
which returns the index of the first byte
of found sequence).
SearchProcessor
is implemented as a
Finite State Automaton that contains a
small internal state which is updated with every byte processed. As a result, an instance of SearchProcessor
should not be reused across independent search sessions (eg. for searching in different
ByteBuf
s). A new instance should be created with AbstractSearchProcessorFactory
for
every search session. However, a SearchProcessor
can (and should) be reused within the search session,
eg. when searching for all occurrences of the needle
within the same haystack
. That way, it can
also detect overlapping occurrences of the needle
(eg. a string "ABABAB" contains two occurrences of "BAB"
that overlap by one character "B"). For this to work correctly, after an occurrence of the needle
is
found ending at index idx
, the search should continue starting from the index idx + 1
.
haystack
is a ByteBuf
containing "ABABAB" and
the needle
is "BAB"):
SearchProcessorFactory factory = SearchProcessorFactory.newKmpSearchProcessorFactory(needle.getBytes(CharsetUtil.UTF_8)); SearchProcessor processor = factory.newSearchProcessor(); int idx1 = haystack.forEachByte(processor); // idx1 is 3 (index of the last character of the first occurrence of the needle in the haystack) int continueFrom1 = idx1 + 1; // continue the search starting from the next character int idx2 = haystack.forEachByte(continueFrom1, haystack.readableBytes() - continueFrom1, processor); // idx2 is 5 (index of the last character of the second occurrence of the needle in the haystack) int continueFrom2 = idx2 + 1; // continue the search starting from the next character int idx3 = haystack.forEachByte(continueFrom2, haystack.readableBytes() - continueFrom2, processor); // idx3 is -1 (no more occurrences of the needle) // After this search session is complete, processor should be discarded. // To search for the same needle again, reuse the same factory to get a new SearchProcessor.
Constructor and Description |
---|
AbstractSearchProcessorFactory() |
Modifier and Type | Method and Description |
---|---|
static BitapSearchProcessorFactory |
newBitapSearchProcessorFactory(byte[] needle)
Creates a
SearchProcessorFactory based on Bitap string search algorithm. |
static KmpSearchProcessorFactory |
newKmpSearchProcessorFactory(byte[] needle)
Creates a
SearchProcessorFactory based on
Knuth-Morris-Pratt
string search algorithm. |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
newSearchProcessor
public static KmpSearchProcessorFactory newKmpSearchProcessorFactory(byte[] needle)
SearchProcessorFactory
based on
Knuth-Morris-Pratt
string search algorithm. It is a reasonable default choice among the provided algorithms.
O(|needle|)
).
needle.length + 1
, and retains a reference
to the needle
itself.
SearchProcessor
) time is linear in the size of
ByteBuf
on which the search is performed (O(|haystack|)
).
Every byte of ByteBuf
is processed only once, sequentially.needle
- an array of bytes to search forKmpSearchProcessorFactory
precomputed for the given needle
public static BitapSearchProcessorFactory newBitapSearchProcessorFactory(byte[] needle)
SearchProcessorFactory
based on Bitap string search algorithm.
It is a jump free algorithm that has very stable performance (the contents of the inputs have a minimal
effect on it). The limitation is that the needle
can be no more than 64 bytes long.
O(|needle|)
).
SearchProcessor
) time is linear in the size of
ByteBuf
on which the search is performed (O(|haystack|)
).
Every byte of ByteBuf
is processed only once, sequentially.needle
- an array of no more than 64 bytes to search forBitapSearchProcessorFactory
precomputed for the given needle
Copyright © 2008–2024 The Netty Project. All rights reserved.