Page Menu
Home
c4science
Search
Configure Global Search
Log In
Files
F86112306
ClassSequenceDataCreatorApplication.hpp
No One
Temporary
Actions
Download File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Subscribers
None
File Metadata
Details
File Info
Storage
Attached
Created
Fri, Oct 4, 08:41
Size
5 KB
Mime Type
text/x-c
Expires
Sun, Oct 6, 08:41 (2 d)
Engine
blob
Format
Raw Data
Handle
21352095
Attached To
R8820 scATAC-seq
ClassSequenceDataCreatorApplication.hpp
View Options
#ifndef CLASSSEQUENCEDATACREATORAPPLICATION_HPP
#define CLASSSEQUENCEDATACREATORAPPLICATION_HPP
#include <ApplicationInterface.hpp>
#include <string>
#include <SequenceMatrixCreator.hpp>
// SequenceMatrixCreator::methods
/*!
* \brief The ClassSequenceDataCreatorApplication is a wrapper
* around a ClassSequenceDataCreator that creates a sequence
* matrix containing the data assigned to a given class, given
* a partition of these data.
*
* ClassSequenceDataCreator is an autonomous application that extracts the data
* that have been assigned to a given class K.
*
* Given posterior probabilities and a sequence matrix, the corresponding
* class models can be computed. They are the weighted aggregations of the
* DNA sequences assigned to each given class. However, because DNA sequences
* cannot be summed, the aggregation are represented as probability matrices
* or consensus sequence (A+C is represented as 50%A, 50%C, 0%G, 0%T). Instead
* of this, this program creates the unfolded matrix that, if summed over the
* columns, gives the model of class K.
*
* For a hard clustering methods, this procedure would simply correspond to the
* creation of a matrix of dimensions N'xL where N'<=N is the number sequences
* assigned to class K among the N overall sequences and L the length of
* the each sequence.
*
* In the case of a soft clustering methods, this procedure creates a 3D matrix of
* dimensions NxL'x4. This matrix contains N probability matrices, each one of
* dimensions L'x4 where L'=L-S+1, 4 corresponds to A, C, G, T and S is the
* shifting freedom allowed during the classification. The resulting matrix
* contains as many rows as the starting matrix because in soft clustering, all
* sequences (rows) are assigned to all classes
*
* To construct a final matrix M3 of dimensions NxL3 where L3 covers a given
* range <from>/<to>, the original matrix M1 of dimensions NxL is computed and
* extended into a matrix M2 NxL2 with L2>=L1. The final M3 of dimensions NxL
* is eventually computed, for class K, using the given posterior probabilities.
* A row of the final matrix M3 is the weighted average of each of the S
* possibles slices of the corresponding row in M2, represented as a probability
* matrix. The weights used are the probabilities with which this row was assigned
* to class K, for each of the S shift states, in each flip state.
*
* The original matrix M1 that was partitionned with shifting freedom S is
* generated using the BED and fasta files that were originally used to
* create it.
* The posterior probabilities should be a 4D matrix in binary format, with
* dimensions :
* 1) number of sequences
* 2) number of classes
* 3) number of shift states
* 4) number of flip states
* The results is returned as a 3D binary matrix of dimensions :
* 1) number of sequences
* 2) length of the sequences, as defined by the <from>/<to> range
* 3= 4 for A, C, G, T
*/
class
ClassSequenceDataCreatorApplication
:
public
ApplicationInterface
{
public
:
ClassSequenceDataCreatorApplication
()
=
delete
;
ClassSequenceDataCreatorApplication
(
const
ClassSequenceDataCreatorApplication
&
app
)
=
delete
;
/*!
* \brief Constructs an object from the command line
* options.
* \param argn the number of options passed to the
* main() function.
* \param argv the vector of options passed to the
* main() function.
*/
ClassSequenceDataCreatorApplication
(
int
argn
,
char
**
argv
)
;
/*!
* \brief TODO
* \return an exit code EXIT_SUCCESS or EXIT_FAILURE
* to return to the OS.
*/
virtual
int
run
()
override
;
private
:
/*!
* \brief Parses the program command line options and
* sets the object field accordingly.
* If the help option is detected, the "runnable"
* field is set to false and subsequent calls to
* run() will produce nothing.
* \param argn the number of options passed to the
* main() function.
* \param argv the vector of options passed to the
* main() function.
* \throw std::invalid_argument if an error is found
* in the program options.
*/
void
parseOptions
(
int
argn
,
char
**
argv
)
;
/*!
* \brief the path to the bed file.
*/
std
::
string
file_bed
;
/*!
* \brief the path to the fasta file.
*/
std
::
string
file_fasta
;
/*!
* \brief the path to the posterior probability
* file (the partition).
*/
std
::
string
file_prob
;
/*!
* \brief the path to the file in which the
* results will be written.
*/
std
::
string
file_out
;
/*!
* \brief the coordinate of the most upstream
* position that was in the original matrix, in
* relative coordinate.
*/
int
from
;
/*!
* \brief the coordinate of the most downstream
* position that was in the original matrix, in
* relative coordinate.
*/
int
to
;
/*!
* \brief the class of interest (0-based).
*/
size_t
class_k
;
/*!
* \brief a flag indicating whether the core of run() can be
* run or not.
*/
bool
runnable
;
}
;
#endif
// CLASSSEQUENCEDATACREATORAPPLICATION_HPP
Event Timeline
Log In to Comment