Split FASTA

Convert a FASTA sequence into multiple sequences

Tool Configuration
Configure the parameters for Split FASTA

Paste FASTA sequences. Input limit is 500,000,000 characters.

The length of each new sequence fragment in bases.

The number of bases to overlap between consecutive fragments (0 for no overlap).

1

Enter FASTA Sequences

Paste your FASTA sequences. The tool accepts DNA, RNA, or protein sequences and automatically removes formatting characters.

2

Set Fragment Parameters

Specify the desired fragment length and overlap. Use overlap of 0 for non-overlapping fragments, or set a value for sliding window analysis.

3

Execute Splitting

Run the tool to split sequences into fragments. Each fragment is labeled with its position range and includes metadata in the FASTA header.

4

Download Fragments

Save the fragmented sequences for sliding window analysis, parallel processing, k-mer studies, or assembly simulation.

Use Cases
Common applications for the Split FASTA tool

Sliding Window Analysis

Create overlapping fragments for sliding window analysis of sequence features, GC content, or motif distribution.

Fragment-based Processing

Split long sequences into manageable fragments for parallel processing or analysis with tools that have length limits.

k-mer Analysis

Generate overlapping k-mers or fixed-length subsequences for pattern recognition and sequence comparison.

Assembly Simulation

Create overlapping reads to simulate sequencing data or test assembly algorithms.

How Split FASTA Works

Example 1: No Overlap

Input: ABCDEFGHIJ (10 bases)
Fragment length: 3 bases
Overlap: 0 bases
Result:
Fragment 1: ABC (positions 1-3)
Fragment 2: DEF (positions 4-6)
Fragment 3: GHI (positions 7-9)
Fragment 4: J (positions 10-10)

Example 2: With Overlap

Input: ABCDEFGHIJ (10 bases)
Fragment length: 4 bases
Overlap: 2 bases
Result:
Fragment 1: ABCD (positions 1-4)
Fragment 2: CDEF (positions 3-6) ← overlaps by 2
Fragment 3: EFGH (positions 5-8) ← overlaps by 2
Fragment 4: GHIJ (positions 7-10) ← overlaps by 2

Output Format

Each fragment is output as a FASTA record with a descriptive header:

>fragment_N;original_title_start=X;end=Y;length=Z;source_length=T
N: Fragment number (1, 2, 3, ...)
X: Starting position in original sequence
Y: Ending position in original sequence
Z: Length of this fragment
T: Total length of original sequence