DNA SEQUENCE ANALYSIS BASED ON REPETITIVE PATTERNS /
Afuohsaa, Boris Neba
DNA SEQUENCE ANALYSIS BASED ON REPETITIVE PATTERNS / BORIS NEBA AFUOHSAA; SUPERVISOR: PROF. DR. AHMET ADALIER - viii, 65 sheets; 31 cm. Includes CD
Thesis (MSc) - Cyprus International University. Institute of Graduate Studies and Research Information Technologies Department
Includes bibliography (sheets 51-55)
ABSTRACT
The volume of DNA data evolves worldwide in various Databanks. DNA data sets
increase in size because they are increasingly being gathered. The difficulty of
storing large DNA data brings about two major drawbacks which are the space
needed to store these data sets and the time required to encode and decode them. In
order to store and process DNA data in an effective way, encryption and
compression algorithms are developed. The aim of this thesis is to implement a DNA
sequence compression algorithm based on repetitive patterns. This compression
technique not only reduces the number of characters of the DNA sequence but
equally converts the DNA sequence into binary and thus reducing the storage space
consumption of one character from 1-byte to a 2-bits binary. The experiments are
conducted using DNA data from National Centre for Biotechnology Information
(NCBI). The minimization of transmission time is attained in sending the DNA. The
end results achieved an average compression ratio of 0.77, a compression of 1.8 bpb
and with an average compression time of 3 seconds. The space saving average was
obtained at a value of 77%. The disarrangement of DNA sequence of any living
organism could therefore be easily determined with the use of this type of
compression of DNA sequences.
Keywords: Algorithm, Compression, DNA, Repetitive Patterns, Sequence.
Algorithms--Dissertations, Academic
DNA--Dissertations, Academic
DNA SEQUENCE ANALYSIS BASED ON REPETITIVE PATTERNS / BORIS NEBA AFUOHSAA; SUPERVISOR: PROF. DR. AHMET ADALIER - viii, 65 sheets; 31 cm. Includes CD
Thesis (MSc) - Cyprus International University. Institute of Graduate Studies and Research Information Technologies Department
Includes bibliography (sheets 51-55)
ABSTRACT
The volume of DNA data evolves worldwide in various Databanks. DNA data sets
increase in size because they are increasingly being gathered. The difficulty of
storing large DNA data brings about two major drawbacks which are the space
needed to store these data sets and the time required to encode and decode them. In
order to store and process DNA data in an effective way, encryption and
compression algorithms are developed. The aim of this thesis is to implement a DNA
sequence compression algorithm based on repetitive patterns. This compression
technique not only reduces the number of characters of the DNA sequence but
equally converts the DNA sequence into binary and thus reducing the storage space
consumption of one character from 1-byte to a 2-bits binary. The experiments are
conducted using DNA data from National Centre for Biotechnology Information
(NCBI). The minimization of transmission time is attained in sending the DNA. The
end results achieved an average compression ratio of 0.77, a compression of 1.8 bpb
and with an average compression time of 3 seconds. The space saving average was
obtained at a value of 77%. The disarrangement of DNA sequence of any living
organism could therefore be easily determined with the use of this type of
compression of DNA sequences.
Keywords: Algorithm, Compression, DNA, Repetitive Patterns, Sequence.
Algorithms--Dissertations, Academic
DNA--Dissertations, Academic