CS50 Final Project: Introduction to Programming with Python

Description: The program aligns two DNA sequences globally using the Needleman-Wunsch algorithm. This algorithm is particularly useful for identifying the differences between two similar sequences. Normally, you would have to manually create a matrix and calculate the best value for each cell in the matrix, followed by identifying the pathway between each cell to determine the gaps, mismatches, and matches between the sequences. However, for longer sequences, these manual steps can be time-consuming and complex. To simplify this process, I have developed a program that can perform these tasks effortlessly. You just need to input your sequences, and the program will handle the rest!

Program structure: The program contains four functions (other than the main function). 1. The “validity_checker” function which checks the query and target sequences to ensure they are standard. 2. The “needleman_wunsch_algorithm” function which creates a matrix, finds the highest value for each cell, and finally returns the results as two lists of sequences, with “-” if needed. 3. The “needle_to_string” function creates required lines (“|”) if there are matches between sequences and returns the lines and sequences as strings. 4. The “alignment_type_checker” function that returns each string in 100 characters per line, in either colorful or colorless format based on the user’s preference.

Testing the program: The program was tested with the module “Pytest” and it passed all the tests successfully. 

Limitations: The program can only align two sequences, which could sometimes be a problem. It also does not save your results, so you must analyze them in the terminal section.