SIDDbase

By | September 22, 2011

SIDDbase-WS is a SOAP based Web Service created in a collaboration between the
Comparative Microbial Genomics Group at CBS, The Technical University of Denmark and
Prof. Craig Benham’s research group at the UC Davis Genome Center. It provides
interoperable access to the SIDD software, and access to the repository of stored
results from calculations previously performed on complete bacterial genomes.

SIDD (Stress-induced DNA Duplex Destabilization) is the propensity for the DNA duplex to
be destabilized within genomic regions that are experiencing a superhelical stress. This
is a complex, interactive attribute of genomic DNA, that has been implicated in a wide
variety of regulatory processes. Different strategies are used to calculate SIDD
properties of short (i.e. < ~10kb) regions (Bi C, Benham CJ., WebSIDD: A server for
predicting stress-induced duplex destabilized (SIDD) sites in superhelical DNA,
Bioinformatics, 2004 Jun 12;20(9):1477-9), and of long genomic sequences, up to complete
chromosomes (Benham CJ and Bi C, The Analysis of Stress-Induced Duplex Destabilization
in Long Genomic DNA Sequences, J. Comp. Biol. 11: 519-543, 2004). The extent of
destabilization is given by sigma, the superhelix density. For each base pair and each
sigma value two results are reported. These are the probability p(x) of base pair x
being open (i.e. separated) under the given conditions, and the increment G(x) of free
energy needed to insure that base pair x is always open then.

This service consist of two parts:

1. REPOSITORY A repository of pre-calculated SIDD values (for four different sigma
values -0.025, -0.035,-0.045,-0.055,-0.065). This repository is updated daily by
synchronizing prokaryotic genome sequences against NCBI Entrez Genome Projects
(http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi). Any new or changed genomes will be
updated automatically every day. The DNA geometry (i.e., whether the chromosome is
circular or linear) is by default circular, which will be used except in the cases where
the GenBank record indicates a linear chromosome. This part of the service runs
synchronously, which means that the user will receive the output directly as a response
to a request message

2. CALCULATION SERVICE An SOAP compliant interface for making SIDD predictions. This
part of the service runs /a/synchronously, and is therefore split into three different operations
submit, poll, and fetch operations

REPOSITORY
1. getSIDD
Get the genomic sequence by specifying the GenBank accession number
Input:
* ‘accession’ – The Genbank accession number (e.g. AL111168)
of the genome sequence you want to fetch from the
repository.
* ‘sigma’ – The sigma value of the prediction (-0.025,-0.035,-0.045,-0.055,-0.065)
* ‘energetics’ – Energetics used in the calculations. Only copolymeric is supported
by the repository (c)
* ‘weight’ – The overlap/weighting scheme. Only 10 is supported by the repository (10)
* ‘from’ – (optional) From (and including this) postion
* ‘to’ – (optional) To (and including this) postion
* ‘Gformat’ – e.g. ‘%.2f’
* ‘Pformat’ – e.g. ‘%.2e’
* ‘format’ – Either ‘string’ or ‘element’ – determins if output should be
provided as XML entities are comma separated strings. The latter
is much faster.
Output:
* ‘accession’ – The ganbank accession of the record used.
* ‘digest’ – The MD5 checksum of the entire genomic DNA sequence.
* ‘from’ – From (and including this) postion.
* ‘to’ – To (and including this) postion.
* ‘total_length’ – Total length of genome sequence.
* ‘version’ – SIDD back-end program version.
* ‘method’ – The energetics method used.
* ‘weight’ – The overlap/weighting scheme.
* ‘sigma’ – The sigma value of the prediction.

The following two elements consists of a choice (depending on
‘format’ specified in request)
* ‘element’
* ‘x’ – Absolute chromosomal position
* ‘nt’ – The nucleotide at position ‘x’
* ‘P’ – The helix opening probability.
* ‘G’ – Free energy, G.
* ‘string’
* ‘P’ – Comma separated list of probabilities
* ‘G’ – Comma separated list of free energies

CALCULATION
2a. runService
Submit a custom DNA sequnce to the SIDD program
Input:
* ‘sequencedata’
* ‘sequence’
* ‘seq’ – The raw DNA sequence
* ‘id’ – Any sequence identifier
* ‘sigma’ – The sigma value of the prediction (e.g. -0.055)
* ‘energetics’ – Either copolymeric (c) or nearest-neighbor energetics (n)
* ‘weight’ – The overlap/weighting scheme. Only 10 is supported by the repository (10)
* ‘Gformat’ – e.g. ‘%.2f’
* ‘Pformat’ – e.g. ‘%.2e’
* ‘format’ – Either ‘string’ or ‘element’ – determins if output should be
provided as XML entities are comma separated strings. The latter
is much faster.
Output:
* ‘jobid’ – The 32 byte identification string of the job
* ‘datetime’ – The last timepoint at which the status of the job has changed
* ‘status’ – Possible values are QUEUED, ACTIVE, FINISHED, WAITING, REJECTED,
UNKNOWN JOBID or QUEUE DOWN

2b. pollQueue
Once obtained from ‘runService’, a job identification can be used to poll the
status to see if the result is ready for download.

Input:
* ‘jobid’ – The 32 byte identification string of the job
Output:
* ‘jobid’ – The 32 byte identification string of the job
* ‘datetime’ – The last timepoint at which the status of the job has changed
* ‘status’ – Possible values are QUEUED, ACTIVE, FINISHED, WAITING, REJECTED,
UNKNOWN JOBID or QUEUE DOWN

2c. fetchResult
Once the status is ‘FINISHED’ the results generated by the Web Service can be retrieved by
specifying the jobid;

Input:
* ‘jobid’ – The 32 byte identification string of the job
Output:
* ‘total_length’ – Total length of genome sequence.
* ‘version’ – SIDD back-end program version.
* ‘method’ – The energetics method used.
* ‘weight’ – The overlap/weighting scheme.
* ‘sigma’ – The sigma value of the prediction.

The following two elements consists of a choice (depending on
‘format’ specified in request)
* ‘element’
* ‘x’ – Absolute chromosomal position
* ‘nt’ – The nucleotide at position ‘x’
* ‘P’ – The helix opening probability.
* ‘G’ – Free energy, G.
* ‘string’
* ‘P’ – Comma separated list of probabilities
* ‘G’ – Comma separated list of free energies

For more information, please contact Peter F. Hallin: pfh@cbs.dtu.dk

Name
SIDDbase
Documentation
http://www.cbs.dtu.dk/ws/ws.php?entry=SIDDbase
Protocol
SOAP
WSDL
Endpoint
http://ws.cbs.dtu.dk/cgi-bin/soap/ws/simple.cgi
Topic
Biology
Type
Tags
, ,
Description

SIDDbase-WS is a SOAP based Web Service created in a collaboration between the Comparative Microbial Genomics Group at CBS, The [...]

Further information

SIDDbase-WS is a SOAP based Web Service created in a collaboration between the
Comparative Microbial Genomics Group at CBS, The Technical University of Denmark and
Prof. Craig Benham’s research group at the UC Davis Genome Center. It provides
interoperable access to the SIDD software, and access to the repository of stored
results from calculations previously performed on complete bacterial genomes.

SIDD (Stress-induced DNA Duplex Destabilization) is the propensity for the DNA duplex to
be destabilized within genomic regions that are experiencing a superhelical stress. This
is a complex, interactive attribute of genomic DNA, that has been implicated in a wide
variety of regulatory processes. Different strategies are used to calculate SIDD
properties of short (i.e. < ~10kb) regions (Bi C, Benham CJ., WebSIDD: A server for
predicting stress-induced duplex destabilized (SIDD) sites in superhelical DNA,
Bioinformatics, 2004 Jun 12;20(9):1477-9), and of long genomic sequences, up to complete
chromosomes (Benham CJ and Bi C, The Analysis of Stress-Induced Duplex Destabilization
in Long Genomic DNA Sequences, J. Comp. Biol. 11: 519-543, 2004). The extent of
destabilization is given by sigma, the superhelix density. For each base pair and each
sigma value two results are reported. These are the probability p(x) of base pair x
being open (i.e. separated) under the given conditions, and the increment G(x) of free
energy needed to insure that base pair x is always open then.

This service consist of two parts:

1. REPOSITORY A repository of pre-calculated SIDD values (for four different sigma
values -0.025, -0.035,-0.045,-0.055,-0.065). This repository is updated daily by
synchronizing prokaryotic genome sequences against NCBI Entrez Genome Projects
(http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi). Any new or changed genomes will be
updated automatically every day. The DNA geometry (i.e., whether the chromosome is
circular or linear) is by default circular, which will be used except in the cases where
the GenBank record indicates a linear chromosome. This part of the service runs
synchronously, which means that the user will receive the output directly as a response
to a request message

2. CALCULATION SERVICE An SOAP compliant interface for making SIDD predictions. This
part of the service runs /a/synchronously, and is therefore split into three different operations
submit, poll, and fetch operations

REPOSITORY
1. getSIDD
Get the genomic sequence by specifying the GenBank accession number
Input:
* ‘accession’ – The Genbank accession number (e.g. AL111168)
of the genome sequence you want to fetch from the
repository.
* ‘sigma’ – The sigma value of the prediction (-0.025,-0.035,-0.045,-0.055,-0.065)
* ‘energetics’ – Energetics used in the calculations. Only copolymeric is supported
by the repository (c)
* ‘weight’ – The overlap/weighting scheme. Only 10 is supported by the repository (10)
* ‘from’ – (optional) From (and including this) postion
* ‘to’ – (optional) To (and including this) postion
* ‘Gformat’ – e.g. ‘%.2f’
* ‘Pformat’ – e.g. ‘%.2e’
* ‘format’ – Either ‘string’ or ‘element’ – determins if output should be
provided as XML entities are comma separated strings. The latter
is much faster.
Output:
* ‘accession’ – The ganbank accession of the record used.
* ‘digest’ – The MD5 checksum of the entire genomic DNA sequence.
* ‘from’ – From (and including this) postion.
* ‘to’ – To (and including this) postion.
* ‘total_length’ – Total length of genome sequence.
* ‘version’ – SIDD back-end program version.
* ‘method’ – The energetics method used.
* ‘weight’ – The overlap/weighting scheme.
* ‘sigma’ – The sigma value of the prediction.

The following two elements consists of a choice (depending on
‘format’ specified in request)
* ‘element’
* ‘x’ – Absolute chromosomal position
* ‘nt’ – The nucleotide at position ‘x’
* ‘P’ – The helix opening probability.
* ‘G’ – Free energy, G.
* ‘string’
* ‘P’ – Comma separated list of probabilities
* ‘G’ – Comma separated list of free energies

CALCULATION
2a. runService
Submit a custom DNA sequnce to the SIDD program
Input:
* ‘sequencedata’
* ‘sequence’
* ‘seq’ – The raw DNA sequence
* ‘id’ – Any sequence identifier
* ‘sigma’ – The sigma value of the prediction (e.g. -0.055)
* ‘energetics’ – Either copolymeric (c) or nearest-neighbor energetics (n)
* ‘weight’ – The overlap/weighting scheme. Only 10 is supported by the repository (10)
* ‘Gformat’ – e.g. ‘%.2f’
* ‘Pformat’ – e.g. ‘%.2e’
* ‘format’ – Either ‘string’ or ‘element’ – determins if output should be
provided as XML entities are comma separated strings. The latter
is much faster.
Output:
* ‘jobid’ – The 32 byte identification string of the job
* ‘datetime’ – The last timepoint at which the status of the job has changed
* ‘status’ – Possible values are QUEUED, ACTIVE, FINISHED, WAITING, REJECTED,
UNKNOWN JOBID or QUEUE DOWN

2b. pollQueue
Once obtained from ‘runService’, a job identification can be used to poll the
status to see if the result is ready for download.

Input:
* ‘jobid’ – The 32 byte identification string of the job
Output:
* ‘jobid’ – The 32 byte identification string of the job
* ‘datetime’ – The last timepoint at which the status of the job has changed
* ‘status’ – Possible values are QUEUED, ACTIVE, FINISHED, WAITING, REJECTED,
UNKNOWN JOBID or QUEUE DOWN

2c. fetchResult
Once the status is ‘FINISHED’ the results generated by the Web Service can be retrieved by
specifying the jobid;

Input:
* ‘jobid’ – The 32 byte identification string of the job
Output:
* ‘total_length’ – Total length of genome sequence.
* ‘version’ – SIDD back-end program version.
* ‘method’ – The energetics method used.
* ‘weight’ – The overlap/weighting scheme.
* ‘sigma’ – The sigma value of the prediction.

The following two elements consists of a choice (depending on
‘format’ specified in request)
* ‘element’
* ‘x’ – Absolute chromosomal position
* ‘nt’ – The nucleotide at position ‘x’
* ‘P’ – The helix opening probability.
* ‘G’ – Free energy, G.
* ‘string’
* ‘P’ – Comma separated list of probabilities
* ‘G’ – Comma separated list of free energies

For more information, please contact Peter F. Hallin: pfh@cbs.dtu.dk

Original source
BioCatalogue

Leave Your Comment

Your email will not be published or shared. Required fields are marked *

*

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>