Computing DNA copy number variations of cancer cells by using edge detectors
Purpose: Aberrations in DNA copy number is found to be the reason behind many types
of cancers, autism and other genetic disorders.DNA copy number variations have been
measured by many techniques. The popular ones are : 1) Local fluorescence in situ
hybridization based techniques. 2) Classical CGH 3) Microarray based CGH etc.
The common problem faced by all the techniques is that the signal is usually
corrupted by noise. As a result, sometimes true edges are missed and false edges
are amplified which affects the overall result. Hence, in this project I plan to
carry out the de-noising and smoothing of the DNA signals by using edge detectors,
which are commonly used in image processing as a pre-processing step.
Review: An edge can be defined as a sudden/abrupt change in the state of a signal.
A good edge detector is supposed to have the following two most important
qualities:
1)It should be robust against the background noise. It should not falsely detect
edges caused by noise, nor should it miss weak actual edges.
2)It should localize the edges accurately.
But, both these qualities are difficult to achieve at a time due to the nature of
the smoothing filter and are related by an uncertainty principle.
An edge is usually modeled as a step function. Edge detection is usually a four
step process: 1) Smoothing (to remove the noise) 2) Find the derivative (to pick up
the edges) 3) Find the maxima and decide on the confidence intervals. 4) Threshold
(optional).
Choice of a derivative: The first and second order differentials are usually used
to ascertain the existence of an edge. The first derivative gives a much smoother
edge, it gives a maxima centered on the location of the edge. The second derivative
gives a more accurate location of the edge i.e. the edge occurs where the
derivative crosses the zero axes. Hybrid detectors use both the derivatives using a
logic circuit to obtain a smoother signal with lesser probability of error.
Choice of Filter: Differentiation process which is used to detect an edge cannot
distinguish between an edge and a noise. Hence, the overall SNR of the signal is
reduced, so to avoid this problem the signal is first passed through a filter so
that it gets regularized. This is achieved by convolving the input signal with a
filter, f(z), whose Fourier transform, F(w), satisfies the following conditions,
known as Tikhonov’s Conditions, which are:
1)F(w) must be bounded 2) F(w) is an even function in frequency 3)F(w) and jwF(w)
must be finite energy functions 4) F(w,sigma) = 0 for sigma > 0 and w tends to
infinity 5 ) F(w,sigma)= 1, for sigma=0 where w = radian frequency. Band-limited
filters satisfy Tikhonov’s regularizing conditions, but have infinite support and
are therefore computationally inefficient. Support limited filters are seen to be
computationally efficient but unsatisfactory in the view of Tikhonov’s criteria.
This compromise is met by filters with minimal uncertainty, uncertainty is defined
as the measure of spread of a filter in time and frequency. The known functions
which satisfy the minimal uncertainty condition are the Gaussian, Gabor and Hermite
functions.
Gaussian is more popularly used because of the following benefits, 1) It is smooth
and infinitely differentiable 2) Decays to zero rapidly(compared to sinc functions)
3)Separable in multi dimensions. The choice of sigma of the Gaussian filter,
greatly affects the quality of the output. A large value sigma causes too much
smoothing and removes some actual edges, while a small value of sigma causes a
higher amount of noise to be detected.
Threshold: Once the signal has been smoothed by the filter and if the filter is
using a first order derivative, a threshold value is applied at this stage is to
limit the noise induced edges. If the derivative is greater than the pre-selected
threshold, then an edge is selected.
The most efficient edge detectors are usually hybrid structures of the above
mentioned choices of derivatives and filters.
Some other techniques: Some authors used the technique of histogram equalization
prior to the edge detection to improve the signal quality. This technique is based
on the fact that histogram of a given signal should be as flat as possible.
Other than the zero cross edge detection and the Laplacian edge detection mentioned
earlier two other popular edge detection techniques are :
1) Sobel edge detection: It look for edges in both the horizontal and vertical
directions and then combine this information into a single metric. It uses the
gradient operators along the x and y direction to compute the gradient magnitude.
It can be used in I-D cases as well.
2) Prewitt edge detection: It work in a similar way as the Sobel edge detector
but with a different set of pre-defined weights. It can also work in 1-D.
In case of extremely noisy signals, neural networks(FANN architecture) have also
been used to compute the edge.
Conclusion: I hope to choose one of the above mentioned techniques to denoise and
smoothen the DNA signals.
