org.biojava.nbio.core.sequence.loader.UniprotProxySequenceReader<C>

Type Parameters:: C -

All Implemented Interfaces:: Iterable<C>, DatabaseReferenceInterface, FeaturesKeyWordInterface, Accessioned, ProxySequenceReader<C>, Sequence<C>, SequenceReader<C>

public class UniprotProxySequenceReader<C extends Compound> extends Object implements ProxySequenceReader<C>, FeaturesKeyWordInterface, DatabaseReferenceInterface

Pass in a Uniprot ID and this ProxySequenceReader when passed to a ProteinSequence will get the sequence data and other data elements associated with the ProteinSequence by Uniprot. This is an example of how to map external databases of proteins and features to the BioJava3 ProteinSequence. Important to call @see setUniprotDirectoryCache to allow caching of XML files so they don't need to be reloaded each time. Does not manage cache.

Field Summary

Fields

Modifier and Type

Field

Description

static final Pattern

UP_AC_PATTERN
Constructor Summary

Constructors

Constructor

Description

UniprotProxySequenceReader(String accession, CompoundSet<C> compoundSet)

The UniProt id is used to retrieve the UniProt XML which is then parsed as a DOM object so we know everything about the protein.

UniprotProxySequenceReader(Document document, CompoundSet<C> compoundSet)

The xml is passed in as a DOM object so we know everything about the protein.
Method Summary

Modifier and Type

Method

Description

int

countCompounds(C... compounds)

Returns the number of times we found a compound in the Sequence

AccessionID

getAccession()

Returns the AccessionID this location is currently bound with

ArrayList<AccessionID>

getAccessions()

Pull uniprot accessions associated with this sequence

ArrayList<String>

getAliases()

Pull uniprot protein aliases associated with this sequence

List<C>

getAsList()

Returns the Sequence as a List of compounds

C

getCompoundAt(int position)

Returns the Compound at the given biological index

CompoundSet<C>

getCompoundSet()

Gets the compound set used to back this Sequence

LinkedHashMap<String,ArrayList<DBReferenceInfo>>

getDatabaseReferences()

The Uniprot mappings to other database identifiers for this sequence

String

getGeneName()

Get the gene name associated with this sequence.

int

getIndexOf(C compound)

Scans through the Sequence looking for the first occurrence of the given compound

SequenceView<C>

getInverse()

Does the right thing to get the inverse of the current Sequence.

ArrayList<String>

getKeyWords()

Pull UniProt key words which is a mixed bag of words associated with this sequence

int

getLastIndexOf(C compound)

Scans through the Sequence looking for the last occurrence of the given compound

int

getLength()

The sequence length

String

getOrganismName()

Get the organism name assigned to this sequence

String

getSequenceAsString()

Returns the String representation of the Sequence

String

getSequenceAsString(Integer bioBegin, Integer bioEnd, Strand strand)

SequenceView<C>

getSubSequence(Integer bioBegin, Integer bioEnd)

Returns a portion of the sequence from the different positions.

static String

getUniprotbaseURL()

The current UniProt URL to deal with caching issues.

static String

getUniprotDirectoryCache()

Local directory cache of XML that can be downloaded

Iterator<C>

iterator()

static void

main(String[] args)

static <C extends Compound> UniprotProxySequenceReader<C>

parseUniprotXMLString(String xml, CompoundSet<C> compoundSet)

The passed in xml is parsed as a DOM object so we know everything about the protein.

void

setCompoundSet(CompoundSet<C> compoundSet)

void

setContents(String sequence)

Once the sequence is retrieved set the contents and make sure everything this is valid

static void

setUniprotbaseURL(String aUniprotbaseURL)

static void

setUniprotDirectoryCache(String aUniprotDirectoryCache)

String

toString()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface java.lang.Iterable
forEach, spliterator

Field Details
- UP_AC_PATTERN
  
  public static final Pattern UP_AC_PATTERN
Constructor Details
- UniprotProxySequenceReader
  
  public UniprotProxySequenceReader(String accession, CompoundSet<C> compoundSet) throws CompoundNotFoundException, IOException
  
  The UniProt id is used to retrieve the UniProt XML which is then parsed as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id or network error
  
  Parameters:
  
  accession -
  
  compoundSet -
  
  Throws:
  
  CompoundNotFoundException
  
  IOException - if problems while reading the UniProt XML
- UniprotProxySequenceReader
  
  public UniprotProxySequenceReader(Document document, CompoundSet<C> compoundSet) throws CompoundNotFoundException
  
  The xml is passed in as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id
  
  Parameters:
  
  document -
  
  compoundSet -
  
  Throws:
  
  CompoundNotFoundException
Method Details
- parseUniprotXMLString
  
  public static <C extends Compound> UniprotProxySequenceReader<C> parseUniprotXMLString(String xml, CompoundSet<C> compoundSet)
  
  The passed in xml is parsed as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id
  
  Parameters:
  
  xml -
  
  compoundSet -
  
  Returns:
  
  UniprotProxySequenceReader
  
  Throws:
  
  Exception
- setCompoundSet
  
  public void setCompoundSet(CompoundSet<C> compoundSet)
  
  Specified by:
  
  setCompoundSet in interface SequenceReader<C extends Compound>
- setContents
  
  public void setContents(String sequence) throws CompoundNotFoundException
  
  Once the sequence is retrieved set the contents and make sure everything this is valid
  
  Specified by:
  
  setContents in interface SequenceReader<C extends Compound>
  
  Parameters:
  
  sequence -
  
  Throws:
  
  CompoundNotFoundException
- getLength
  
  public int getLength()
  
  The sequence length
  
  Specified by:
  
  getLength in interface Sequence<C extends Compound>
  
  Returns:
- getCompoundAt
  
  public C getCompoundAt(int position)
  
  Description copied from interface: Sequence
  
  Returns the Compound at the given biological index
  
  Specified by:
  
  getCompoundAt in interface Sequence<C extends Compound>
  
  Parameters:
  
  position -
  
  Returns:
- getIndexOf
  
  public int getIndexOf(C compound)
  
  Description copied from interface: Sequence
  
  Scans through the Sequence looking for the first occurrence of the given compound
  
  Specified by:
  
  getIndexOf in interface Sequence<C extends Compound>
  
  Parameters:
  
  compound -
  
  Returns:
- getLastIndexOf
  
  public int getLastIndexOf(C compound)
  
  Description copied from interface: Sequence
  
  Scans through the Sequence looking for the last occurrence of the given compound
  
  Specified by:
  
  getLastIndexOf in interface Sequence<C extends Compound>
  
  Parameters:
  
  compound -
  
  Returns:
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
  
  Returns:
- getSequenceAsString
  
  public String getSequenceAsString()
  
  Description copied from interface: Sequence
  
  Returns the String representation of the Sequence
  
  Specified by:
  
  getSequenceAsString in interface Sequence<C extends Compound>
  
  Returns:
- getAsList
  
  public List<C> getAsList()
  
  Description copied from interface: Sequence
  
  Returns the Sequence as a List of compounds
  
  Specified by:
  
  getAsList in interface Sequence<C extends Compound>
  
  Returns:
- getInverse
  
  public SequenceView<C> getInverse()
  
  Description copied from interface: Sequence
  
  Does the right thing to get the inverse of the current Sequence. This means either reversing the Sequence and optionally complementing the Sequence.
  
  Specified by:
  
  getInverse in interface Sequence<C extends Compound>
  
  Returns:
- getSequenceAsString
  
  public String getSequenceAsString(Integer bioBegin, Integer bioEnd, Strand strand)
  
  Parameters:
  
  bioBegin -
  
  bioEnd -
  
  strand -
  
  Returns:
- getSubSequence
  
  public SequenceView<C> getSubSequence(Integer bioBegin, Integer bioEnd)
  
  Description copied from interface: Sequence
  
  Returns a portion of the sequence from the different positions. This is indexed from 1
  
  Specified by:
  
  getSubSequence in interface Sequence<C extends Compound>
  
  Parameters:
  
  bioBegin -
  
  bioEnd -
  
  Returns:
- iterator
  
  public Iterator<C> iterator()
  
  Specified by:
  
  iterator in interface Iterable<C extends Compound>
  
  Returns:
- getCompoundSet
  
  public CompoundSet<C> getCompoundSet()
  
  Description copied from interface: Sequence
  
  Gets the compound set used to back this Sequence
  
  Specified by:
  
  getCompoundSet in interface Sequence<C extends Compound>
  
  Returns:
- getAccession
  
  public AccessionID getAccession()
  
  Description copied from interface: Accessioned
  
  Returns the AccessionID this location is currently bound with
  
  Specified by:
  
  getAccession in interface Accessioned
  
  Returns:
- getAccessions
  
  public ArrayList<AccessionID> getAccessions() throws XPathExpressionException
  
  Pull uniprot accessions associated with this sequence
  
  Returns:
  
  Throws:
  
  XPathExpressionException
- getAliases
  
  public ArrayList<String> getAliases() throws XPathExpressionException
  
  Pull uniprot protein aliases associated with this sequence
  
  Returns:
  
  Throws:
  
  XPathExpressionException
- countCompounds
  
  public int countCompounds(C... compounds)
  
  Description copied from interface: Sequence
  
  Returns the number of times we found a compound in the Sequence
  
  Specified by:
  
  countCompounds in interface Sequence<C extends Compound>
  
  Parameters:
  
  compounds -
  
  Returns:
- getUniprotbaseURL
  
  public static String getUniprotbaseURL()
  
  The current UniProt URL to deal with caching issues. www.uniprot.org is load balanced but you can access pir.uniprot.org directly.
  
  Returns:
  
  the uniprotbaseURL
- setUniprotbaseURL
  
  public static void setUniprotbaseURL(String aUniprotbaseURL)
  
  Parameters:
  
  aUniprotbaseURL - the uniprotbaseURL to set
- getUniprotDirectoryCache
  
  public static String getUniprotDirectoryCache()
  
  Local directory cache of XML that can be downloaded
  
  Returns:
  
  the uniprotDirectoryCache
- setUniprotDirectoryCache
  
  public static void setUniprotDirectoryCache(String aUniprotDirectoryCache)
  
  Parameters:
  
  aUniprotDirectoryCache - the uniprotDirectoryCache to set
- main
  
  public static void main(String[] args)
- getGeneName
  
  public String getGeneName()
  
  Get the gene name associated with this sequence.
  
  Returns:
- getOrganismName
  
  public String getOrganismName()
  
  Get the organism name assigned to this sequence
  
  Returns:
- getKeyWords
  
  public ArrayList<String> getKeyWords()
  
  Pull UniProt key words which is a mixed bag of words associated with this sequence
  
  Specified by:
  
  getKeyWords in interface FeaturesKeyWordInterface
  
  Returns:
- getDatabaseReferences
  
  public LinkedHashMap<String,ArrayList<DBReferenceInfo>> getDatabaseReferences()
  
  The Uniprot mappings to other database identifiers for this sequence
  
  Specified by:
  
  getDatabaseReferences in interface DatabaseReferenceInterface
  
  Returns:

Class UniprotProxySequenceReader<C extends Compound>

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface java.lang.Iterable

Field Details

UP_AC_PATTERN

Constructor Details

UniprotProxySequenceReader

UniprotProxySequenceReader

Method Details

parseUniprotXMLString

setCompoundSet

setContents

getLength

getCompoundAt

getIndexOf

getLastIndexOf

toString

getSequenceAsString

getAsList

getInverse

getSequenceAsString

getSubSequence

iterator

getCompoundSet

getAccession

getAccessions

getAliases

countCompounds

getUniprotbaseURL

setUniprotbaseURL

getUniprotDirectoryCache

setUniprotDirectoryCache

main

getGeneName

getOrganismName

getKeyWords

getDatabaseReferences