Introduction

Structure alignment focuses on making an optimal superposition of the 3D coordinates of biological macromolecules to establish a residue-residue correspondence between sequences of related structures. This user guide will help you understand how to use the Alignment API for running the structure alignment calculations programmatically.

Reference Documentation: Alignment API Reference
Query Editor: Alignment API Query Editor
Examples: Structure Alignment Examples

Stay current with API announcements by subscribing to the RCSB PDB API mailing list:

signing in with existing google account and subscribe
or send an email to api+subscribe@rcsb.org

API Basics

The Alignment API serves as a comprehensive platform for the seamless computation of structure alignments. Users have the flexibility to reference atomic structure coordinates in various ways when utilizing the API. One option is to use the unique entry identifier assigned by the Protein Data Bank (PDB) upon deposition of the experimentally determined structure or an identifier from the RCSB.org for incorporated Computed Structure Models (CSMs).

Alternatively, users can opt for the convenience of providing a URL to a file hosted elsewhere. This method allows for seamless utilization of structural data distributed by external resources, facilitating accessibility and reducing the need for manual data transfer. Moreover, the API supports user convenience by enabling the straightforward option of uploading a file containing atomic structure coordinates directly. This method is particularly beneficial for those who have the data readily available in a local file.

Users can choose from a diverse array of alignment algorithms, each designed to address different aspects of structural alignment, whether it be emphasizing global structural similarities or focusing on local structure. This variety allows users to choose the algorithm that aligns with their specific objective, ensuring a more nuanced and accurate comparison. Additionally, the API offers parameterization options for the chosen algorithms. This means that users can fine-tune the settings, adjusting parameters to suit the characteristics of the structures under examination. This level of customization enhances the precision and relevance of the alignment results, catering to the diversity of analysis goals.

Alignment Options

Rigid vs Flexible Alignments

Alignment methods can be classified based on whether the two structures to be aligned are considered as rigid bodies or whether internal flexibility between domains or subdomains is accommodated in the alignment.

Rigid alignments are built based on rigid-body superimposition of structures. Rigid-body aligners are well suited for identification of structural equivalences between related proteins of similar shape.

Introducing flexibility to structural alignment becomes useful for two main reasons. First, a protein may be present in multiple conformational states due to phosphorylation, interaction with other proteins, or ligand binding. Second, distantly related proteins contain twists and bends in their structures that cannot be detected by rigid alignment alone.

Pairwise Alignment

Pairwise structure alignment identifies structural equivalences and optimal superimposition for a pair of protein structures. The resulting pairwise alignments will be produced for structures superimposed to the first one in a given input list. A number of algorithms are provided to perform pairwise structural alignments:

Java port of the original FATCAT algorithm. Two flavors are available:
- jFATCAT-rigid uses a rigid-body superposition to align the two structures.
- jFATCAT-flexible introduces twists between different parts of the proteins which are superimposed independently.
Java port of the original CE algorithm. Two flavors are available:
- jCE - obtains an optimal rigid-body superposition of the proteins by employing a combinatorial extension (CE) of an alignment path defined by aligned fragment pairs (AFPs).
- jCE-CP - Combinatorial Extension with Circular Permutations (CE-CP) allows the structural comparison of circularly permuted proteins.
TM-align - uses heuristic dynamic programming iterations to generate sequence independent residue-to-residue alignment based on structural similarity.
Smith-Waterman 3D - aligns residues based on Smith and Waterman's 1981 algorithm for local sequence alignment using Blosum65 scoring matrix. The two structures are superimposed based on this alignment. Be aware that errors locating gaps can lead to high RMSD in the resulting superposition due to a small number of badly aligned residues and it only works for structures with significant sequence similarity. However, this method is faster than the structure-based methods.

CE and FATCAT both assume that aligned residues occur in the same order in both proteins (e.g. they are both sequence-order dependent algorithms). In proteins related by a circular permutation, the N-terminal part of one protein is related to the C-terminal part of the other, and vice versa. jCE-CP allows circularly permuted proteins to be compared.

Calculate Alignments

This section provides details on the Alignment API endpoint:

Endpoint	Description	Parameters	Returns
`/submit`	Allows to submit structure alignment job as GET request	Request object with structure alignment query as JSON data	A unique job identifier (ticket)
`/submit`	Allows to submit structure alignment job as POST request	Request object with structure alignment query as JSON data and (optionally) upload files as binary data	A unique job identifier (ticket)
`/results`	Allows to GET the status and available results of a submitted structure alignment query	A unique job identifier (ticket)	The results data for the structure alignment in JSON format

Refer to the API Reference for a full API documentation.

Submit Alignment Job

The base URL for the structure alignment calculations is as follows:

The /submit endpoint allows users to programmatically initiate alignment calculations. Whether you prefer HTTP GET or HTTP POST, the API provides both methods for initiating the alignment process.

The request body can be constructed with the following parameters:

query (required) - contains the query data to this part in JSON format. Query data MUST include the following properties:

`options`	Specifies optional query parameters
`context`	Contains query body that describes structure alignment job

files (optional) - contains user-provided files as binary data.

Here is an example of the query data to perform alignment between human insulin single mutant INS-Q and triple mutant INS-RQD

To initiate an alignment job using HTTP GET, construct a URL with the necessary parameters. The parameters can be appended to the endpoint URL as query parameters. For users preferring HTTP POST, construct a POST request with a JSON payload containing the required parameters. The type of the body of the request should be indicated by the Content-Type header: multipart/form-data.

Upon successful submission, the API will provide a response containing a unique job identifier (ticket) for tracking the job status.

Atoms Used for Fitting

The algorithms select atoms that are used for the superposition of 3D structures using the following criteria:

Only backbone atoms: C-alpha for protein structures
Only the first model found in a given PDB entry for structures with multiple submitted conformers
Only atoms in the first conformation for atoms with multiple alternate conformations
Only the first residue in cases of microheterogeneity

File Upload

Files should be supplied with a request as binary data and MUST appear in the order they specified in the query part of the request. If the input structure is supplied as user-provided file, structure identifier MUST include property describing the file format.

The server can recognize the contents of the following structure file formats:

Files in one of the above formats compressed with Gzip algorithm (.gz) are also allowed.

Monitor Job Status

Alignment jobs run in an asynchronous mode. Each user request is assigned a unique identifier in the form of a ticket, e.g. 095be615-a8ad-4c33-8e9c-c7612fbf6c9f. This ticket serves as a key to track the progress of the alignment job. Users can check the status of their ticket, allowing them to monitor the processing stages until the job reaches completion.

To monitor the job status, use the provided ticket in subsequent requests to the /results endpoint:

Replace {job_ticket} with the ticket received upon job submission.

When querying the status of a submitted job, three distinct types of responses provided: RUNNING, COMPLETE, and ERROR. When the status indicates RUNNING, it signifies that the alignment calculation job is actively in progress, and users may need to await completion before retrieving results.

In the event of an "error" status, users are informed that an issue has occurred during the alignment process, and additional details about the error are typically provided, aiding users in troubleshooting and resolving the issue.

Finally, the COMPLETE status indicates that the alignment calculation job has been successfully processed, and the results are returned in the same response.

Alignment Results

The alignment results are encapsulated in a structured JSON format, providing comprehensive information about the aligned structures. The meta section outlines key parameters of the performed alignment process, e.g. specifying the alignment mode as "pairwise" and the alignment method as "fatcat-rigid."

Moving to the results section, details about the aligned structures are presented. Aligned structures are listed in structures section (list of size M, where M - number of aligned structures). The structure_alignment section furnishes transformation details, including translation and rotation matrices, along with alignment regions. A summary of scores, including Root Mean Square Deviation (RMSD) and similarity scores, offers a quantitative assessment of the structural alignment. This section divides the alignment into structurally equivalent blocks with a single rigid-body transformation. The division can be due to non-topological rearrangements (e.g. circular permutations) or due to flexible parts (e.g. domain or region swaps). Each block includes:

regions - List of size M that holds information about structurally equivalent residues from a given block, where M - number of aligned structures
transformations - List of size M that holds block transformations, where M - number of aligned structures. Each transformation is a 4x4 matrix in a column major (j * 4 + i indexing) format
summary - Scores, alignment coverage, number of alpha carbon pairs matched by the superposition, etc. relevant to the block alignment

The sequence_alignment section parallels the structural alignment but focuses on sequence-level details. It includes the aligned sequences and their corresponding regions, providing insights into sequence similarity and identity. The overall summary consolidates scores for sequence similarity, identity, RMSD, and other metrics, offering a holistic evaluation of the alignment.

Understanding Results Data

This section explains how to build alignment information from an API response object. You can use any JSON parsing library to make the data returned more manageable.

Here are some expressions you can use to access objects and fields returned by your query:

results: List that contains objects that each represents an individual alignment. For example, when alignment mode is set to pairwise and more than 2 structures are specified in the query, results array will contain multiple objects - one per each pairwise alignment. Let's say, you want to align 3 structures in a pairwise manner: A, B and C. The results will report 2 pairwise alignments: B to A and C to A
results[0]: Access the first alignment in the list of alignments
results[i].structures[0]: Access the first structure used for the alignment. When alignment mode is set to pairwise, results[i].structures[0] corresponds to reference structure and results[i].structures[1] corresponds to the structure that was superimposed onto the reference structure. For pairwise alignments the length of the results[i].structures array will always be equal to 2
results[i].structure_alignment[0]: Access the first block that defines a transformation. For rigid-body methods (jFATCAT-rigid, TM-align) there will always be exactly 1 object in the results[i].structure_alignment array. For methods that calculate flexible alignments (jFATCAT-flexible, jCE-CP) this array may contain multiple objects - each corresponding to parts of the structures that were transformed independently
results[i].sequence_alignment[0]: Access the data that defines structure based sequence alignments for the first aligned structure

Sequence Alignment Results

Sequence alignments data establish residue correspondences between sequences of aligned structures. This section provides a practical guidance of how to use API response data to build sequence alignments.

Each object in results[i].sequence_alignment array corresponds to a row in sequence alignment and the order of objects will match the order of structures entered into the alignment query. For pairwise alignments the length of this array will always be equal to 2.

You can get residue correspondences by combining the full sequence with a list of regions and gaps:

regions define ranges from the full sequence included into the sequence alignment. regions[0].beg_seq_id gives a residue number according to the 1-based sequential numbering. regions[0].beg_index gives a position in sequence alignment according to the 0-based numbering. regions[0].length tells how long is this residue range. For example, the regions object {"beg_seq_id": 4, "beg_index": 1, "length": 8} indicates that in the second column of alignment matrix (index 1 in 0-based numbering) there is a residue with sequence number 4. Seven successive positions should be filled with residues 5, 6, 7, 8, 9, 10 and 11
gaps define where in the alignment gaps should be inserted. For example, the gaps object {"beg_index": 0, "length": 1} indicates that in the first column of alignment matrix (index 0 in 0-based numbering) there is a gap

Graphics below illustrates the process of using sequence alignment data from the results to build the sequence alignments:

Handling Errors

The following error scenarios are possible:

If an unexpected error happens during the job submission, the server returns HTTP 500 Internal Server Error status code.
When the request object doesn't comply to the API specification, the server returns HTTP 400 Bad Request status code.
If the request was processed successfully but the alignment job failed to complete, the server returns HTTP 200 OK response status code with status field set to ERROR.

Examples

G proteins

G proteins (guanine nucleotide-binding proteins) are important in signal transduction. They act as molecular switches, changing conformation and interaction partners depending on whether GTP or GDP is bound. Many diverse structures are known. The two main subsets are the small monomeric G proteins, such as Ras, and the larger heterotrimeric G proteins, which act immediately downstream of G-protein-coupled receptors. The α subunits of heterotrimeric G proteins are homologous to the small G proteins.

PDB structure 1TAD contains three copies of the α subunit of transducin, a heterotrimeric G protein. Structures for the monomeric G proteins H-Ras, Rab5a, and ADP-ribosylation factor 1, respectively: 121P, 1R2Q, 1J2J.

Different Conformations of the Same Protein

Calmodulin is a calcium binding protein. It is composed of two similar domains, each of which binds two calcium atoms. The two domains of calmodulin can undergo large changes in relative orientation. Flexible structure alignment can highlight relative mobility between domains, when superposition by rigid alignment alone does not yield meaningful results.

A calmodulin in open conformation is aligned with a calmodulin in close conformation

TIM barrel fold

The ubiquitous TIM barrel structural fold is an example of a protein family that has divergent protein sequences and yet share a high structure, topology, and/or fold similarity.

A TIM barrel aligned with a multi domain protein that contains a TIM barrel

Code Examples

Python

To sent POST request to the alignment API in Python, you can utilize the requests library. Here's an example of how to do it:

After running this script, it will print the ticket, e.g. 095be615-a8ad-4c33-8e9c-c7612fbf6c9f. Use this ticket to issue a subsequent request to the /results endpoint to get the alignment results.

You may want to upload files as part of your request. Here's a script that does that: