Class BitSequenceReader.BitArrayWorker<C extends Compound>
java.lang.Object
org.biojava.nbio.core.sequence.storage.BitSequenceReader.BitArrayWorker<C>
- Type Parameters:
C
- TheCompound
to use
- Direct Known Subclasses:
FourBitSequenceReader.FourBitArrayWorker
,TwoBitSequenceReader.TwoBitArrayWorker
- Enclosing class:
- BitSequenceReader<C extends Compound>
The logic of working with a bit has been separated out into this class
to help developers create the bit data structures without having to
put the code into an intermediate format and to also use the format
without the need to copy this code.
This class behaves just like a
Sequence
without the interface- Author:
- ayates
-
Field Summary
-
Constructor Summary
ConstructorDescriptionBitArrayWorker
(String sequence, CompoundSet<C> compoundSet) BitArrayWorker
(CompoundSet<C> compoundSet, int length) BitArrayWorker
(CompoundSet<C> compoundSet, int[] sequence) BitArrayWorker
(Sequence<C> sequence) -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract byte
bitMask()
This method should return the bit mask to be used to extract the bytes you are interested in working with.protected int
Returns how many bits are used to represent a compound e.g.protected abstract int
Should return the maximum amount of compounds we can encode per intboolean
Returns what the value of a compound is in the backing bit storage i.e.Should return the inverse information thatgenerateCompoundsToIndex()
returns i.e.getCompoundAt
(int position) Returns the compound at the specified biological indexReturns the compound set backing this storeReturns a map which converts from compound to an integer representationReturns a list of compounds the index position of which is used to translate from the byte representation into a compound.int
int
hashCode()
void
Loops through the chars in a String and passes them ontosetCompoundAt(char, int)
void
Loops through the Compounds in a Sequence and passes them ontosetCompoundAt(Compound, int)
protected byte
processUnknownCompound
(C compound, int position) Since bit encoding only supports a finite number of bases it is more than likely when processing sequence you will encounter a compound which is not covered by the encoding e.g.int
seqArraySize
(int length) void
setCompoundAt
(char base, int position) Converts from char to Compound and sets it at the given biological indexvoid
setCompoundAt
(C compound, int position) Sets the compound at the specified biological index
-
Field Details
-
BYTES_PER_INT
public static final int BYTES_PER_INT- See Also:
-
-
Constructor Details
-
BitArrayWorker
-
BitArrayWorker
-
BitArrayWorker
-
BitArrayWorker
-
-
Method Details
-
bitMask
protected abstract byte bitMask()This method should return the bit mask to be used to extract the bytes you are interested in working with. See solid implementations on how to create these -
compoundsPerDatatype
protected abstract int compoundsPerDatatype()Should return the maximum amount of compounds we can encode per int -
generateIndexToCompounds
Should return the inverse information thatgenerateCompoundsToIndex()
returns i.e. if the Compound C returns 1 from compoundsToIndex then we should find that compound here in position 1 -
generateCompoundsToIndex
Returns what the value of a compound is in the backing bit storage i.e. in 2bit storage the value 0 is encoded as 00 (in binary). -
bitsPerCompound
protected int bitsPerCompound()Returns how many bits are used to represent a compound e.g. 2 if using 2bit encoding. -
seqArraySize
public int seqArraySize(int length) -
populate
Loops through the Compounds in a Sequence and passes them ontosetCompoundAt(Compound, int)
-
populate
Loops through the chars in a String and passes them ontosetCompoundAt(char, int)
-
setCompoundAt
public void setCompoundAt(char base, int position) Converts from char to Compound and sets it at the given biological index -
setCompoundAt
Sets the compound at the specified biological index -
getCompoundAt
Returns the compound at the specified biological index -
processUnknownCompound
Since bit encoding only supports a finite number of bases it is more than likely when processing sequence you will encounter a compound which is not covered by the encoding e.g. N in a 2bit sequence. You can override this to convert the unknown base into one you can process or store locations of unknown bases for a level of post processing in your subclass.- Parameters:
compound
- Compound process- Returns:
- Byte representation of the compound
- Throws:
IllegalStateException
- Done whenever this method is invoked
-
getIndexToCompoundsLookup
Returns a list of compounds the index position of which is used to translate from the byte representation into a compound. -
getCompoundsToIndexLookup
Returns a map which converts from compound to an integer representation -
getCompoundSet
Returns the compound set backing this store -
getLength
public int getLength() -
hashCode
public int hashCode() -
equals
-