EMBL Trans Extractor

Extract protein translations from EMBL files and convert to FASTA format

Tool Configuration
Configure the parameters for EMBL Trans Extractor

Paste the contents of one or more EMBL files. The tool will extract protein translation sequences from the /translation qualifier.

1

Prepare Your EMBL File

Paste the contents of your EMBL format file. The file must contain features with /translation qualifiers.

2

Execute Extraction

Click "Execute Tool" to extract all protein translation sequences from the /translation qualifiers in the feature table.

3

Review Results

View extracted protein sequences with statistics including total residues, average length, and protein count.

4

Download Translations

Download the protein sequences in FASTA format for BLAST searches, protein analysis, or database submission.

About Protein Translation Extraction
Understanding protein translations in EMBL files

What are Protein Translations?

EMBL files often include predicted or experimentally determined protein sequences in the feature table under the /translation qualifier. These are the amino acid sequences that result from translating coding sequences (CDS).

Translation Qualifier Format

FT   CDS             join(265..402,673..781)
FT                   /gene="fem-2"
FT                   /product="PP2C protein phosphatase"
FT                   /translation="MSDSLNHPSSSTVHADDGF..."

The translation is stored as a quoted string that may span multiple lines.

Advantages Over DNA Translation

  • Pre-validated - Translations are already verified and curated
  • Handles complexity - Accounts for alternative splicing, frameshift mutations
  • Saves time - No need to extract CDS and translate manually
  • Database ready - Direct use in protein databases and BLAST searches

Common Use Cases

Protein Analysis

Extract proteins for domain analysis, structural prediction, or functional annotation.

BLAST Searches

Use extracted proteins for BLASTp searches to find homologs or orthologs.

Comparative Genomics

Compare protein sequences across different species or strains.

Database Submission

Prepare protein sequences for submission to UniProt or other databases.

Output Format

The tool returns protein sequences in FASTA format with headers containing the feature type (usually CDS) and relevant qualifiers like gene name or product. Sequences are formatted with 60 amino acids per line for readability.