Analytical Process Manager
Overview of Analytical Processes
Analytical processes are special SAS macro programs that perform data manipulations and statistical calculations on the experimental data that you have loaded into SAS Scientific Discovery Solutions. These processes use SAS to generate analysis data sets, listings, statistical results, and graphs. Analytical processes are reusable and flexible. They can range in functionality from very simple data displays to complex statistical modeling.
Scientists and statisticians can run analytical processes against any experiment data set by providing appropriate values for input parameters. After you select an analytical process to run, a parameters window requests the information required for successful execution. The parameter values are specific to the data set at hand, but you do not need to edit the code after you have written and loaded the analytical process into SAS Scientific Discovery Solutions.
The following sections describe the analytical processes provided with SAS Scientific Discovery Solutions for use with your experiment data:
Input Engines for importing data
Input engines are tools that can pull
data from instrumentation systems and load the data into the data warehouses
for SAS Scientific Discovery Solutions. Input Engines are specific to input
data structures. For each analytical instrumentation system, there is a separate
access engine that knows how to pull data out of the instrumentation system's
data structure. It is possible for an Input Engine to support instrumentation
from more than one vendor using standardized data structures.
Refer to Overview of Input Engines
and Input Engine Recommendations for
more information.
Analytical Processes for Microarray-Specific Data Formats
- Array Pseudo Image generates an interactive pseudo image of an array by plotting a color-coded intensity variable using X- and Y- coordinates.
- Hierarchical Clustering
creates a tree of relationships of the observations (rows) of a SAS data set by using one of several methods.
- K-Means Clustering creates optimally
separated groups of observations (rows) of a SAS data set using one of several
methods.
- Loess Model Normalization
normalizes data by using a loess smoothing model.
- Quantile Normalization
normalizes data with the mean of the quantile.
- Ratio Analysis converts two-channel expression data
between log intensities and log ratios.
Analytical Processes for Proteomics-Specific Data Formats
- Spectral Bin reduces the total number of index values by binning them and averaging the spectra within bins.
- Spectral Detrend removes a baseline trend from each member of a group of spectra in order to make them statistically comparable.
- Spectral Peak Find finds and quantifies peaks in a group of spectra.
- Spectral Plot creates
plots of spectra.
Analytical Processes for Microarray and Proteomics Data Formats
- Array Group Correlation computes correlations and scatterplot matrices for expression measurements across groups of arrays.
- Create Gene Annotation Process takes a list of the user-provided gene identification as the query key to search various biomedical databases and create the gene annotation file in HTML format and SAS data set format.
- Discriminant Analysis is a classical statistical method for predicting a classification variable from a set of continuous responses.
- Distance Matrix computes various measures of distance or dissimilarity between the observations (rows) of a SAS data set.
- Experimental Design 1-Way randomly assigns levels of a single treatment to arrays. You specify the number of treatment levels, the number of arrays, and the number of dyes, and the process tells you how to optimally hybridize the arrays.
- Mixed Model Analysis fits a mixed linear model on a gene-by-gene basis to pre-normalized data and creates numerous output displays.
- Mixed Model Normalization normalizes microarray data by fitting a mixed linear model across all of the arrays in an experiment, and is a natural precursor to a gene-by-gene mixed model analysis of variance.
- Mixed Model Power
computes the statistical power of a set of hypothesis tests that arise from a mixed linear model. You specify the experimental design, PROC MIXED statements, variable components, and ranges of values for alpha and effect sizes, and the process outputs a table of power values using a noncentral t-distribution.
- Multidimensional Scaling is a method that estimates the coordinates of a set of objects in a space of specified dimensionality that come from data measuring the distance between pairs of objects.
- Partial Least Squares is a method for simultaneously modeling variability in both dependent variables and predictor variables.
- Partition Trees
recursively partitions data according to optimal splitting relationships that are created between dependent and predictor variables. It creates simple tree-based rules for predicting the dependent variable.
- Principal Components is a multivariate technique for examining relationships among several quantitative variables
- Surface Summary works
with three dimensional data, with variables denoted X, Y, and Z. It summarizes
the Z variable over gridded blocks of the X and Y variables and then creates a
surface plot over the grid.
Analytical Processes for Genetic Marker Data Formats
- Case-Control Association
tests for association between markers and a binary trait by using case-control data.
- Haplotype Estimation estimates haplotype frequencies via EM algorithm using PROC HAPLOTYPE.
- Haplotype Trend Regression performs haplotype trend regression using an output data set from PROC HAPLOTYPE that contains individuals' probabilities for haplotype pairs to test for association with a quantitative trait by using PROC REG or a binary trait with PROC LOGISTIC.
- Linkage Disequilibrium calculates various LD measures and an overall test for LD between pairs of markers by using PROC ALLELE.
- Marker Properties
calculates various single marker measures, such as marker informativeness, test for HWE, and allele and genotype frequencies by using PROC ALLELE.
- Phenotype Summary computes frequencies and creates histograms for categorical phenotypic variables.
- Quantitative TDT performs various versions of the quantitative TDT.
- Quantitative Trait Association tests for association between a quantitative trait and marker genotypes or alleles.
- TDT family tests for
association between markers and a binary trait by using pedigree data.
Analytical Processes for Generic Data
- Data Contents displays the contents of a SAS data set. It calls PROC CONTENTS and outputs the results in HTML format.
- Data Correlation computes correlations between numeric variables. The default type is standard Pearson product-moment correlations.
- Data Export exports a SAS
data set in a variety of text-based formats and opens the data set by using your default viewer.
- Data Filter statistically filters out features with flat profiles across a set of experimental conditions.
- Data Merge merges two SAS data sets, and can optionally be used to create a subset of an existing data set.
- Data Rank ranks a SAS data set by user-provided key variables.
- Data Sort sorts a SAS data set by user-provided key variables.
- Data Standardize
standardizes values of numeric variables in a SAS data set by using a variety of different methods.
- Data Step modifies a SAS data set by executing user-specified SAS DATA step commands on it.
- Data Summary computes summary statistics of a SAS data set.
- Data Transpose transposes
a SAS data set in stacked format to one or both of two types,
tall and wide .
- Data Transpose Rectangular transposes a block of variables in a SAS data set, which
creates a new data set in which the observations (rows) become variables
(columns).
See also:
Overview of the Analytical Process Selector
Overview of Analytical Process Results
SAS Scientific Discovery Solutions Analytical
Processes