RCSB PDB: Structure Alignment and Comparison API Documentation

Introduction

Structure alignment focuses on making an optimal superposition of the 3D coordinates of biological macromolecules to establish a residue-residue correspondence between sequences of related structures. This user guide will help you understand how to use the Alignment API for running the structure alignment calculations programmatically.

Stay current with API announcements by subscribing to the RCSB PDB API mailing list:

API Basics

The Alignment API serves as a comprehensive platform for the seamless computation of structure alignments. Users have the flexibility to reference atomic structure coordinates in various ways when utilizing the API. One option is to use the unique entry identifier assigned by the Protein Data Bank (PDB) upon deposition of the experimentally determined structure or an identifier from the RCSB.org for incorporated Computed Structure Models (CSMs).

Alternatively, users can opt for the convenience of providing a URL to a file hosted elsewhere. This method allows for seamless utilization of structural data distributed by external resources, facilitating accessibility and reducing the need for manual data transfer. Moreover, the API supports user convenience by enabling the straightforward option of uploading a file containing atomic structure coordinates directly. This method is particularly beneficial for those who have the data readily available in a local file.

Users can choose from a diverse array of alignment algorithms, each designed to address different aspects of structural alignment, whether it be emphasizing global structural similarities or focusing on local structure. This variety allows users to choose the algorithm that aligns with their specific objective, ensuring a more nuanced and accurate comparison. Additionally, the API offers parameterization options for the chosen algorithms. This means that users can fine-tune the settings, adjusting parameters to suit the characteristics of the structures under examination. This level of customization enhances the precision and relevance of the alignment results, catering to the diversity of analysis goals.

Alignment Options

Rigid vs Flexible Alignments

Alignment methods can be classified based on whether the two structures to be aligned are considered as rigid bodies or whether internal flexibility between domains or subdomains is accommodated in the alignment.

Rigid alignments are built based on rigid-body superimposition of structures. Rigid-body aligners are well suited for identification of structural equivalences between related proteins of similar shape.

Introducing flexibility to structural alignment becomes useful for two main reasons. First, a protein may be present in multiple conformational states due to phosphorylation, interaction with other proteins, or ligand binding. Second, distantly related proteins contain twists and bends in their structures that cannot be detected by rigid alignment alone.

Pairwise Alignment

Pairwise structure alignment identifies structural equivalences and optimal superimposition for a pair of protein structures. The resulting pairwise alignments will be produced for structures superimposed to the first one in a given input list. A number of algorithms are provided to perform pairwise structural alignments:

CE and FATCAT both assume that aligned residues occur in the same order in both proteins (e.g. they are both sequence-order dependent algorithms). In proteins related by a circular permutation, the N-terminal part of one protein is related to the C-terminal part of the other, and vice versa. jCE-CP allows circularly permuted proteins to be compared.

Calculate Alignments

This section provides details on the Alignment API endpoint:

Endpoint Description Parameters Returns
/submit Allows to submit structure alignment job as GET request Request object with structure alignment query as JSON data A unique job identifier (ticket)
/submit Allows to submit structure alignment job as POST request Request object with structure alignment query as JSON data and (optionally) upload files as binary data A unique job identifier (ticket)
/results Allows to GET the status and available results of a submitted structure alignment query A unique job identifier (ticket) The results data for the structure alignment in JSON format

Refer to the API Reference for a full API documentation.

Submit Alignment Job

The base URL for the structure alignment calculations is as follows:

The /submit endpoint allows users to programmatically initiate alignment calculations. Whether you prefer HTTP GET or HTTP POST, the API provides both methods for initiating the alignment process.

The request body can be constructed with the following parameters:

Here is an example of the query data to perform alignment between human insulin single mutant INS-Q and triple mutant INS-RQD

To initiate an alignment job using HTTP GET, construct a URL with the necessary parameters. The parameters can be appended to the endpoint URL as query parameters. For users preferring HTTP POST, construct a POST request with a JSON payload containing the required parameters. The type of the body of the request should be indicated by the Content-Type header: multipart/form-data.

Upon successful submission, the API will provide a response containing a unique job identifier (ticket) for tracking the job status.

Atoms Used for Fitting

The algorithms select atoms that are used for the superposition of 3D structures using the following criteria:

File Upload

Files should be supplied with a request as binary data and MUST appear in the order they specified in the query part of the request. If the input structure is supplied as user-provided file, structure identifier MUST include property describing the file format.

The server can recognize the contents of the following structure file formats:

Files in one of the above formats compressed with Gzip algorithm (.gz) are also allowed.

Monitor Job Status

Alignment jobs run in an asynchronous mode. Each user request is assigned a unique identifier in the form of a ticket, e.g. 095be615-a8ad-4c33-8e9c-c7612fbf6c9f. This ticket serves as a key to track the progress of the alignment job. Users can check the status of their ticket, allowing them to monitor the processing stages until the job reaches completion.

To monitor the job status, use the provided ticket in subsequent requests to the /results endpoint:

Replace {job_ticket} with the ticket received upon job submission.

When querying the status of a submitted job, three distinct types of responses provided: RUNNING, COMPLETE, and ERROR. When the status indicates RUNNING, it signifies that the alignment calculation job is actively in progress, and users may need to await completion before retrieving results.

In the event of an "error" status, users are informed that an issue has occurred during the alignment process, and additional details about the error are typically provided, aiding users in troubleshooting and resolving the issue.

Finally, the COMPLETE status indicates that the alignment calculation job has been successfully processed, and the results are returned in the same response.

Alignment Results

The alignment results are encapsulated in a structured JSON format, providing comprehensive information about the aligned structures. The meta section outlines key parameters of the performed alignment process, e.g. specifying the alignment mode as "pairwise" and the alignment method as "fatcat-rigid."

Moving to the results section, details about the aligned structures are presented. Aligned structures are listed in structures section (list of size M, where M - number of aligned structures). The structure_alignment section furnishes transformation details, including translation and rotation matrices, along with alignment regions. A summary of scores, including Root Mean Square Deviation (RMSD) and similarity scores, offers a quantitative assessment of the structural alignment. This section divides the alignment into structurally equivalent blocks with a single rigid-body transformation. The division can be due to non-topological rearrangements (e.g. circular permutations) or due to flexible parts (e.g. domain or region swaps). Each block includes:

The sequence_alignment section parallels the structural alignment but focuses on sequence-level details. It includes the aligned sequences and their corresponding regions, providing insights into sequence similarity and identity. The overall summary consolidates scores for sequence similarity, identity, RMSD, and other metrics, offering a holistic evaluation of the alignment.

Understanding Results Data

This section explains how to build alignment information from an API response object. You can use any JSON parsing library to make the data returned more manageable.

Here are some expressions you can use to access objects and fields returned by your query:

Sequence Alignment Results

Sequence alignments data establish residue correspondences between sequences of aligned structures. This section provides a practical guidance of how to use API response data to build sequence alignments.

Each object in results[i].sequence_alignment array corresponds to a row in sequence alignment and the order of objects will match the order of structures entered into the alignment query. For pairwise alignments the length of this array will always be equal to 2.

You can get residue correspondences by combining the full sequence with a list of regions and gaps:

Graphics below illustrates the process of using sequence alignment data from the results to build the sequence alignments:

Sequence Alignment

Handling Errors

The following error scenarios are possible:

Examples

G proteins

G proteins (guanine nucleotide-binding proteins) are important in signal transduction. They act as molecular switches, changing conformation and interaction partners depending on whether GTP or GDP is bound. Many diverse structures are known. The two main subsets are the small monomeric G proteins, such as Ras, and the larger heterotrimeric G proteins, which act immediately downstream of G-protein-coupled receptors. The α subunits of heterotrimeric G proteins are homologous to the small G proteins.

PDB structure 1TAD contains three copies of the α subunit of transducin, a heterotrimeric G protein. Structures for the monomeric G proteins H-Ras, Rab5a, and ADP-ribosylation factor 1, respectively: 121P, 1R2Q, 1J2J.

Different Conformations of the Same Protein

Calmodulin is a calcium binding protein. It is composed of two similar domains, each of which binds two calcium atoms. The two domains of calmodulin can undergo large changes in relative orientation. Flexible structure alignment can highlight relative mobility between domains, when superposition by rigid alignment alone does not yield meaningful results.

A calmodulin in open conformation is aligned with a calmodulin in close conformation

TIM barrel fold

The ubiquitous TIM barrel structural fold is an example of a protein family that has divergent protein sequences and yet share a high structure, topology, and/or fold similarity.

A TIM barrel aligned with a multi domain protein that contains a TIM barrel

Code Examples

Python

To sent POST request to the alignment API in Python, you can utilize the requests library. Here's an example of how to do it:

After running this script, it will print the ticket, e.g. 095be615-a8ad-4c33-8e9c-c7612fbf6c9f. Use this ticket to issue a subsequent request to the /results endpoint to get the alignment results.

You may want to upload files as part of your request. Here's a script that does that: