Sequence editing using BioEdit

This weeks’ lab will be held in Room B8218 (the Biology PC lab). Raw sequence files will be edited this week, and the edited sequence files will be analyzed next week.

Editing (and most of the analysis) will be done using BioEdit, a freeware sequence analysis program developed by Tom Hall at North Carolina State University.

Logging on

Each group should log on to a PC using the class ID bisc431 and the password pseud

Do not save your individual settings.

Sequence files

There are 4 disks containing sequence files. Each group should choose one of the sequence files on the disk, and copy it from drive A to the desktop. Also copy the file pstblue1vector.txt to the desktop. Return the disk as soon as this is done.

Opening BioEdit

Click on Start, Programs, and Bioedit. (You may have to scroll down the program list to find it.)

Opening files

Click on File menu, Open. Look in Desktop, files of type All files. Click on the sequence file you transferred to open it. (BioEdit should recognise it as an ABI Autosequencer Trace file and open it as a chromatogram.)

Editing sequence

Adjust the size of the chromatogram trace with the Horizontal scale and Vertical scale bars to the top left of the image. You should be able to clearly see the peaks of the trace.

Click on the view menu, and check editable sequence. This creates a duplicate sequence that can be edited without changing the original sequence.

The computer will already have called most of the bases from the peaks present However, you will still see some “N”s in the sequence where the computer cannot make a call. By looking at the trace below the N, you should be able to make a visual judgement about which base should be present instead of N. (Each line in the trace is colour-coded to match the colour that one of the 4 bases is displayed in.) Select the N and replace it by typing in the appropriate base. (Note that sequences after 400-500 bases become increasingly unreliable, and are not worth spending much time on.)

Saving the edited file

Click on the File menu, Export as text. Save the edited file to the desktop, (or preferably, your own disk or network account.)

Preparing the Reverse Complement

The sequence present in the original file is the sequence of the newly synthesized strand. To get the sequence of the original template strand, the Reverse Complement must be prepared.

Click on the view menu (for the original unedited file), and check Reverse Complement. Note that this is also displayed in a 5'-3' direction, so the sequence complementary to the beginning of your original unedited forward sequence will be at the end of the reverse complement. Save the reverse complement as a text file under a different name.

Removing vector sequences

The sequences you are working with were prepared by the Davidson lab from DNA fragments cloned in the pSTBlue-1 plasmid vector. (See sequence analysis references for full map.) The clones were sequenced using either the T7 or SP6 promoter primers that flank the multiple cloning site in this vector. Because of this, the bases at the beginning of each sequence file you have are vector sequence, rather than cloned sequence. Since this may interfere with analysis of the sequence, these will have to be edited out.

To identify vector sequences, alignments will be prepared between your edited forward and reverse complement sequences and the sequence in the pstblue1vector.txt file. (This file contains the sequence of the multiple cloning site region of pSTBlue-1.)

Click on the File menu, New alignment.

Click on the File menu, Import. Look in the Desktop (or wherever else you saved the edited sequence files), files of type All files. Click on the edited forward sequence file to open it. Repeat this process for the pstblue1vector.txt file. Select both files with the mouse by dragging it over the file names at the left. Click on Sequence menu, Pairwise alignment, Align two sequence (allow ends to slide). If the vector sequence is on the same strand as the forward sequence, the vector should have a region of exact (or almost exact) homology with the beginning of the forward sequence. If this does not occur, repeat the process with the reverse complement sequence file (in a New alignment). If the vector sequence given is the opposite strand to the forward sequence, then there should be a region of (almost) exact homology with the end of the reverse complement.

Identify the region of vector sequences. (These should show an almost exact match to the forward or reverse sequence. If the alignment starts showing gaps and/or mismatches, the end of the vector sequence has been reached.) Return to your edited forward sequence file, delete the vector sequences, and save for next week.