Protein Pattern Find

Search for regex patterns in protein sequences and locate all matches.

Tool Configuration
Configure the parameters for Protein Pattern Find

Paste one or more FASTA sequences (max 500,000,000 characters).

Enter a regular expression pattern to search for. Example: S[^S]{0,5}S (two serines with 0-5 non-serine residues between them)

1

Enter Protein Sequences

Paste one or more protein sequences in FASTA format (up to 500 million characters). The tool searches all sequences simultaneously, reporting matches with sequence name and position. Ideal for screening proteomes or gene families.

2

Create Regex Patterns

Use regular expressions to define search patterns. Examples: [RK] finds basic amino acids, N[ST] finds N-glycosylation sites, CX{2}C finds zinc finger motifs, and [FYW] finds aromatic residues. Square brackets define character sets, curly braces define repeat counts.

3

Understanding Match Results

Results show each match with its position (1-indexed), the matched sequence, and surrounding context. Positions help locate motifs for further analysis or mutagenesis. Multiple matches in one sequence are all reported individually.

4

Common Applications

Find post-translational modification sites (phosphorylation, glycosylation, ubiquitination), locate functional motifs (ATP-binding sites, DNA-binding domains), identify conserved regions, search for protease cleavage sites, or scan for specific structural features like leucine zippers or coiled-coils.

Pattern Matching in Protein Sequences
Finding functional motifs and structural patterns.

Regular Expression Patterns

This tool searches for patterns using regular expressions (regex). Regular expressions are powerful pattern matching tools that allow you to search for specific sequences or patterns in protein sequences.

Common Pattern Examples

S[^S]{0,5}S - Two serines with 0-5 non-serine residues between them
[RK] - Basic amino acids (Lysine or Arginine)
[FYW] - Aromatic amino acids (Phenylalanine, Tyrosine, or Tryptophan)
N[ST] - Glycosylation sites (Asparagine followed by Serine or Threonine)
[^P]P - Proline not preceded by Proline
CX{2}C - Cysteine followed by 2 any amino acids, then Cysteine

Applications

Pattern matching is useful for:
  • • Identifying functional domains and motifs
  • • Finding post-translational modification sites
  • • Locating enzyme recognition sequences
  • • Searching for conserved regions across sequences
  • • Identifying potential signal peptides and transmembrane regions