Introduction
Structure alignment focuses on making an optimal superposition of the 3D coordinates of biological macromolecules to establish a residue-residue correspondence between sequences of related structures. This user guide will help you understand how to use the Alignment API for running the structure alignment calculations programmatically.
- Reference Documentation: Alignment API Reference
- Query Editor: Alignment API Query Editor
- Examples: Structure Alignment Examples
Stay current with API announcements by subscribing to the RCSB PDB API mailing list:
- signing in with existing google account and subscribe
- or send an email to api+subscribe@rcsb.org
API Basics
The Alignment API serves as a comprehensive platform for the seamless computation of structure alignments. Users have the flexibility to reference atomic structure coordinates in various ways when utilizing the API. One option is to use the unique entry identifier assigned by the Protein Data Bank (PDB) upon deposition of the experimentally determined structure or an identifier from the RCSB.org for incorporated Computed Structure Models (CSMs).
Alternatively, users can opt for the convenience of providing a URL to a file hosted elsewhere. This method allows for seamless utilization of structural data distributed by external resources, facilitating accessibility and reducing the need for manual data transfer. Moreover, the API supports user convenience by enabling the straightforward option of uploading a file containing atomic structure coordinates directly. This method is particularly beneficial for those who have the data readily available in a local file.
Users can choose from a diverse array of alignment algorithms, each designed to address different aspects of structural alignment, whether it be emphasizing global structural similarities or focusing on local structure. This variety allows users to choose the algorithm that aligns with their specific objective, ensuring a more nuanced and accurate comparison. Additionally, the API offers parameterization options for the chosen algorithms. This means that users can fine-tune the settings, adjusting parameters to suit the characteristics of the structures under examination. This level of customization enhances the precision and relevance of the alignment results, catering to the diversity of analysis goals.
Alignment Options
Rigid vs Flexible Alignments
Alignment methods can be classified based on whether the two structures to be aligned are considered as rigid bodies or whether internal flexibility between domains or subdomains is accommodated in the alignment.
Rigid alignments are built based on rigid-body superimposition of structures. Rigid-body aligners are well suited for identification of structural equivalences between related proteins of similar shape.
Introducing flexibility to structural alignment becomes useful for two main reasons. First, a protein may be present in multiple conformational states due to phosphorylation, interaction with other proteins, or ligand binding. Second, distantly related proteins contain twists and bends in their structures that cannot be detected by rigid alignment alone.
Pairwise Alignment
Pairwise structure alignment identifies structural equivalences and optimal superimposition for a pair of protein structures. The resulting pairwise alignments will be produced for structures superimposed to the first one in a given input list. A number of algorithms are provided to perform pairwise structural alignments:
-
Java port of the original FATCAT algorithm.
Two flavors are available:
-
jFATCAT-rigid
uses a rigid-body superposition to align the two structures. -
jFATCAT-flexible
introduces twists between different parts of the proteins which are superimposed independently.
-
-
Java port of the original CE
algorithm.
Two flavors are available:
-
jCE
- obtains an optimal rigid-body superposition of the proteins by employing a combinatorial extension (CE) of an alignment path defined by aligned fragment pairs (AFPs). -
jCE-CP
- Combinatorial Extension with Circular Permutations (CE-CP) allows the structural comparison of circularly permuted proteins.
-
-
TM-align
- uses heuristic dynamic programming iterations to generate sequence independent residue-to-residue alignment based on structural similarity. -
Smith-Waterman 3D
- aligns residues based on Smith and Waterman's 1981 algorithm for local sequence alignment using Blosum65 scoring matrix. The two structures are superimposed based on this alignment. Be aware that errors locating gaps can lead to high RMSD in the resulting superposition due to a small number of badly aligned residues and it only works for structures with significant sequence similarity. However, this method is faster than the structure-based methods.
CE and FATCAT both assume that aligned residues occur in the same order in both proteins (e.g. they are both sequence-order dependent algorithms). In proteins related by a circular permutation, the N-terminal part of one protein is related to the C-terminal part of the other, and vice versa. jCE-CP allows circularly permuted proteins to be compared.
Calculate Alignments
This section provides details on the Alignment API endpoint:
Endpoint | Description | Parameters | Returns |
---|---|---|---|
/submit |
Allows to submit structure alignment job as GET request | Request object with structure alignment query as JSON data | A unique job identifier (ticket) |
/submit |
Allows to submit structure alignment job as POST request | Request object with structure alignment query as JSON data and (optionally) upload files as binary data | A unique job identifier (ticket) |
/results |
Allows to GET the status and available results of a submitted structure alignment query | A unique job identifier (ticket) | The results data for the structure alignment in JSON format |
Refer to the API Reference for a full API documentation.
Submit Alignment Job
The base URL for the structure alignment calculations is as follows:
The /submit
endpoint allows users to programmatically initiate alignment calculations.
Whether you prefer
HTTP GET or
HTTP POST, the API provides both methods for initiating the alignment process.
The request body can be constructed with the following parameters:
-
query
(required) - contains the query data to this part in JSON format. Query data MUST include the following properties:options
Specifies optional query parameters context
Contains query body that describes structure alignment job -
files
(optional) - contains user-provided files as binary data.
Here is an example of the query
data to perform alignment between human insulin single mutant
INS-Q and triple mutant INS-RQD
To initiate an alignment job using HTTP GET, construct a URL with the necessary parameters. The parameters
can be appended to the endpoint URL as query parameters. For users preferring HTTP POST, construct a POST
request with a JSON payload containing the required parameters. The type of the body of the request should
be indicated by the Content-Type header: multipart/form-data
.
Upon successful submission, the API will provide a response containing a unique job identifier (ticket) for tracking the job status.
Atoms Used for Fitting
The algorithms select atoms that are used for the superposition of 3D structures using the following criteria:
- Only backbone atoms: C-alpha for protein structures
- Only the first model found in a given PDB entry for structures with multiple submitted conformers
- Only atoms in the first conformation for atoms with multiple alternate conformations
- Only the first residue in cases of microheterogeneity
File Upload
Files should be supplied with a request as binary data and MUST appear in the order they specified in the query part of the request. If the input structure is supplied as user-provided file, structure identifier MUST include property describing the file format.
The server can recognize the contents of the following structure file formats:
Files in one of the above formats compressed with Gzip algorithm (.gz) are also allowed.
Monitor Job Status
Alignment jobs run in an asynchronous mode. Each user request is assigned a unique identifier in the form of
a ticket, e.g. 095be615-a8ad-4c33-8e9c-c7612fbf6c9f
. This ticket serves as a key to track the
progress of the alignment job. Users can check the status of their ticket, allowing them to monitor the
processing stages until the job reaches completion.
To monitor the job status, use the provided ticket in subsequent requests to the
/results
endpoint:
Replace {job_ticket} with the ticket received upon job submission.
When querying the status of a submitted job, three distinct types of responses provided:
RUNNING
, COMPLETE
, and ERROR
. When the status indicates
RUNNING
, it signifies that the alignment calculation job is actively in progress,
and users may need to await completion before retrieving results.
In the event of an "error" status, users are informed that an issue has occurred during the alignment process, and additional details about the error are typically provided, aiding users in troubleshooting and resolving the issue.
Finally, the COMPLETE
status indicates that the alignment calculation job has been
successfully processed, and the results are returned in the same response.
Alignment Results
The alignment results are encapsulated in a structured JSON format, providing comprehensive information
about the aligned structures. The meta
section outlines key parameters of the performed
alignment process, e.g. specifying the alignment mode as "pairwise" and the alignment method as "fatcat-rigid."
Moving to the results
section, details about the aligned structures are presented.
Aligned structures are listed in structures
section (list of size M, where M - number
of aligned structures).
The structure_alignment
section furnishes transformation details, including translation
and rotation matrices, along with alignment regions. A summary of scores, including Root Mean Square
Deviation (RMSD) and similarity scores, offers a quantitative assessment of the structural alignment.
This section divides the alignment into structurally equivalent blocks with a single rigid-body
transformation. The division can be due to non-topological rearrangements (e.g. circular permutations)
or due to flexible parts (e.g. domain or region swaps). Each block includes:
-
regions
- List of size M that holds information about structurally equivalent residues from a given block, where M - number of aligned structures -
transformations
- List of size M that holds block transformations, where M - number of aligned structures. Each transformation is a 4x4 matrix in a column major (j * 4 + i indexing) format -
summary
- Scores, alignment coverage, number of alpha carbon pairs matched by the superposition, etc. relevant to the block alignment
The sequence_alignment
section parallels the structural alignment but focuses on sequence-level
details. It includes the aligned sequences and their corresponding regions, providing insights into sequence
similarity and identity. The overall summary consolidates scores for sequence similarity, identity, RMSD,
and other metrics, offering a holistic evaluation of the alignment.
Understanding Results Data
This section explains how to build alignment information from an API response object. You can use any JSON parsing library to make the data returned more manageable.
Here are some expressions you can use to access objects and fields returned by your query:
-
results
: List that contains objects that each represents an individual alignment. For example, when alignment mode is set topairwise
and more than 2 structures are specified in the query,results
array will contain multiple objects - one per each pairwise alignment. Let's say, you want to align 3 structures in a pairwise manner: A, B and C. The results will report 2 pairwise alignments: B to A and C to A -
results[0]
: Access the first alignment in the list of alignments -
results[i].structures[0]
: Access the first structure used for the alignment. When alignment mode is set topairwise
,results[i].structures[0]
corresponds to reference structure andresults[i].structures[1]
corresponds to the structure that was superimposed onto the reference structure. For pairwise alignments the length of theresults[i].structures
array will always be equal to 2 -
results[i].structure_alignment[0]
: Access the first block that defines a transformation. For rigid-body methods (jFATCAT-rigid, TM-align) there will always be exactly 1 object in theresults[i].structure_alignment
array. For methods that calculate flexible alignments (jFATCAT-flexible, jCE-CP) this array may contain multiple objects - each corresponding to parts of the structures that were transformed independently -
results[i].sequence_alignment[0]
: Access the data that defines structure based sequence alignments for the first aligned structure
Sequence Alignment Results
Sequence alignments data establish residue correspondences between sequences of aligned structures. This section provides a practical guidance of how to use API response data to build sequence alignments.
Each object in results[i].sequence_alignment
array corresponds to a row in sequence alignment
and the order of objects will match the order of structures entered into the alignment query. For pairwise
alignments the length of this array will always be equal to 2.
You can get residue correspondences by combining the full sequence with a list of regions
and
gaps
:
-
regions
define ranges from the full sequence included into the sequence alignment.regions[0].beg_seq_id
gives a residue number according to the 1-based sequential numbering.regions[0].beg_index
gives a position in sequence alignment according to the 0-based numbering.regions[0].length
tells how long is this residue range. For example, theregions
object {"beg_seq_id": 4, "beg_index": 1, "length": 8} indicates that in the second column of alignment matrix (index 1 in 0-based numbering) there is a residue with sequence number 4. Seven successive positions should be filled with residues 5, 6, 7, 8, 9, 10 and 11 -
gaps
define where in the alignment gaps should be inserted. For example, thegaps
object {"beg_index": 0, "length": 1} indicates that in the first column of alignment matrix (index 0 in 0-based numbering) there is a gap
Graphics below illustrates the process of using sequence alignment data from the results to build the sequence alignments:
Handling Errors
The following error scenarios are possible:
-
If an unexpected error happens during the job submission, the server returns HTTP
500 Internal Server Error
status code. -
When the request object doesn't comply to the API specification, the server returns HTTP
400 Bad Request
status code. -
If the request was processed successfully but the alignment job failed to complete, the server returns
HTTP
200 OK
response status code with status field set to ERROR.
Examples
G proteins
G proteins (guanine nucleotide-binding proteins) are important in signal transduction. They act as molecular switches, changing conformation and interaction partners depending on whether GTP or GDP is bound. Many diverse structures are known. The two main subsets are the small monomeric G proteins, such as Ras, and the larger heterotrimeric G proteins, which act immediately downstream of G-protein-coupled receptors. The α subunits of heterotrimeric G proteins are homologous to the small G proteins.
PDB structure 1TAD contains three copies of the α subunit of transducin, a heterotrimeric G protein. Structures for the monomeric G proteins H-Ras, Rab5a, and ADP-ribosylation factor 1, respectively: 121P, 1R2Q, 1J2J.
Different Conformations of the Same Protein
Calmodulin is a calcium binding protein. It is composed of two similar domains, each of which binds two calcium atoms. The two domains of calmodulin can undergo large changes in relative orientation. Flexible structure alignment can highlight relative mobility between domains, when superposition by rigid alignment alone does not yield meaningful results.
A calmodulin in open conformation is aligned with a calmodulin in close conformation
TIM barrel fold
The ubiquitous TIM barrel structural fold is an example of a protein family that has divergent protein sequences and yet share a high structure, topology, and/or fold similarity.
A TIM barrel aligned with a multi domain protein that contains a TIM barrel
Code Examples
Python
To sent POST request to the alignment API in Python, you can utilize the requests
library.
Here's an example of how to do it:
After running this script, it will print the ticket, e.g. 095be615-a8ad-4c33-8e9c-c7612fbf6c9f
.
Use this ticket to issue a subsequent request to the /results endpoint to
get the alignment results.
You may want to upload files as part of your request. Here's a script that does that: