Relevant source files
Purpose and Scope
This document guides users through the process of analyzing protein structures with 3DiPhy. It covers converting PDB structure files to 3Di character representations and explains how these representations enable phylogenetic inference beyond sequence similarity thresholds.
For information on performing sequence alignment using these structures, see Sequence Alignment. For details on constructing phylogenetic trees using structural data, see Phylogenetic Tree Construction.
Protein Structure Analysis Workflow
The protein structure analysis in 3DiPhy follows these main steps:
Protein Structure Analysis Pipeline
Sources: jalview/2jd7_A_AAs_to_3Di.pdb, jalview/2jd7_A_AAs_to_3Di_DONE.pdb
Understanding PDB Files in 3DiPhy
Protein Data Bank (PDB) files contain atomic coordinates and other information about protein structures. In 3DiPhy, these files serve as the primary input for structural analysis.
A typical PDB file in the system includes:
- ATOM records: Atomic coordinates for each atom in the protein
- Chain identifiers: Indicating different polypeptide chains
- Residue information: Amino acid type and sequence number
Example of ATOM records from a PDB file used in 3DiPhy:
ATOM 1 N MET A 1 -54.085 20.473 56.805 1.00 58.74 NATOM 2 CA MET A 1 -53.039 21.326 56.142 1.00 60.47 CATOM 3 C MET A 1 -52.821 20.927 54.687 1.00 52.59 C
Sources: jalview/2jd7_A_AAs_to_3Di.pdb
Converting Amino Acid Sequences to 3Di Characters
The conversion from amino acid sequences to 3Di characters is a central function in 3DiPhy. This process transforms conventional amino acid information into 3Di structural representations that capture tertiary structure interactions.
3Di Conversion Process
Sources: jalview/2jd7_A_AAs_to_3Di.pdb, jalview/2jd7_A_AAs_to_3Di_DONE.pdb
Example of Conversion
The following table illustrates the conversion from amino acids to 3Di characters for a segment of protein 2JD7 (chain A):
Original Amino Acid | Position | Converted to 3Di Character |
---|---|---|
MET | A1 | ASP |
LEU | A2 | ASP |
SER | A3 | ASP |
GLU | A4 | PRO |
ARG | A5 | VAL |
MET | A6 | LEU |
LEU | A7 | LEU |
LYS | A8 | VAL |
ALA | A9 | LEU |
LEU | A10 | LEU |
This conversion preserves structural information while creating a format suitable for phylogenetic analysis. The 3Di characters represent the local structural environment of each residue, encoding information about backbone geometry and tertiary contacts.
Sources: jalview/2jd7_A_AAs_to_3Di.pdb, jalview/2jd7_A_AAs_to_3Di_DONE.pdb
Integration with Jalview for Visualization
The 3DiPhy system integrates with Jalview to provide visualization capabilities for protein structures and their 3Di representations.
Jalview Visualization Components
Key features of the Jalview integration:
- Display of 3Di character sequences
- Application of the 3Di-gecos color scheme for enhanced visualization
- Tools for structure comparison and analysis
- Interactive manipulation of alignments
Sources: jalview/2jd7_A_AAs_to_3Di.pdb, jalview/2jd7_A_AAs_to_3Di_DONE.pdb
Structural Comparison and Analysis
Once protein structures have been converted to 3Di character representations, they can be compared and analyzed for evolutionary relationships.
Structural Analysis Workflow
The 3Di character-based approach allows for:
- Comparison of structurally similar but sequence-divergent proteins
- Detection of distant evolutionary relationships
- Robust phylogenetic inference based on structural conservation
For further details on phylogenetic analysis using this structural data, see Phylogenetic Tree Construction.
Sources: jalview/2jd7_A_AAs_to_3Di.pdb, jalview/2jd7_A_AAs_to_3Di_DONE.pdb
File Formats Used in Protein Structure Analysis
The protein structure analysis workflow in 3DiPhy involves several file formats:
File Format | Description | Usage in 3DiPhy |
---|---|---|
PDB (.pdb) | Standard format containing 3D coordinates of atoms in proteins | Input files with atomic coordinates |
3Di PDB (.pdb) | Modified PDB files with 3Di character assignments | Output of Foldseek 3Di conversion |
FASTA (.fasta) | Text-based format for representing sequences | Used for sequence and 3Di sequence storage |
Alignment Files | Files containing multiple aligned sequences | Result of structure-based alignment |
Sources: jalview/2jd7_A_AAs_to_3Di.pdb, jalview/2jd7_A_AAs_to_3Di_DONE.pdb
Practical Implementation
To analyze a protein structure with 3DiPhy:
- Obtain a PDB file of your protein structure
- Process the PDB file through Foldseek to obtain 3Di character representation
- Load the resulting 3Di file into Jalview for visualization
- For phylogenetic analysis, combine multiple 3Di character files
- Perform structure-based alignment
- Apply maximum likelihood methods for phylogenetic inference
The example files jalview/2jd7_A_AAs_to_3Di.pdb
and jalview/2jd7_A_AAs_to_3Di_DONE.pdb
demonstrate the input and output of the 3Di conversion process for protein 2JD7.
Sources: jalview/2jd7_A_AAs_to_3Di.pdb, jalview/2jd7_A_AAs_to_3Di_DONE.pdb
Summary
Protein structure analysis in 3DiPhy involves converting amino acid sequences from PDB files into 3Di character representations that capture tertiary structure information. These 3Di characters can then be used for structural alignment, visualization, and phylogenetic analysis. The integration with Jalview provides tools for interactive visualization and analysis of protein structures.
The 3Di approach enables phylogenetic analysis beyond the "twilight zone" of sequence similarity by leveraging the more conserved tertiary structural features of proteins.