/*-------------------------------------------------------------------* * From "SAS System for Statistical Graphics, First Edition" * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * *-------------------------------------------------------------------* * This material is provided "as is" by SAS Institute Inc. There * * are no warranties, express or implied, as to merchantability or * * fitness for a particular purpose regarding the materials or code * * contained herein. The Institute is not responsible for errors * * in this material as it now exists or will exist, nor does the * * Institute provide technical support for it. Questions or * * problem reports concerning this material may be addressed to the * * author, Michael Friendly, by electronic mail: * * * * Internet: * * * *-------------------------------------------------------------------*/ /* Michael Friendly York University */ /* SAS Macro Programs for Statistical Graphics October 30, 1991 */ /* ----------------------------------------------------------------- */ This document describes the SAS macro programs included in Appendix A1 in SAS System for Statistical Graphics, First Edition. Section references contained here indicate the sections in the book where the macro programs are illustrated. In some cases there are minor differences between the programs on this diskette and those in the appendix. These changes were made to account for minor syntax differences between Version 5 and Version 6 of the SAS System. The changes should not affect the appearance of graphs produced by the programs. In cases where syntax changes between versions of the SAS System made it impossible for a single version of a macro program to produce the same results under all releases, two versions of the program have been provided on this diskette; one to be used with Version 5 of the SAS System, and one to be used with Version 6. Both versions of each macro have the same macro names, the same parameter lists, and should produce identical results. Program Requirements The programs were developed under the VM/SP CMS mainframe version of the SAS System, Version 5.18. Where there is a single version of the program, the program should run under all releases (5.18 and later) of the SAS System on all operating systems. Where there are two versions of a macro provided, one version should run under Release 5.18 on all operating systems, and the other should run under all Version 6 releases on all systems. The programs all require the base SAS and SAS/GRAPH products; many require SAS/STAT and SAS/IML or both as well. General Usage Notes You may receive unanticipated results if you use multiple macros in a single SAS session, because global statement parameters set in one macro may be carried over to subsequent macros used in the same session. If you are using Version 6 of the SAS system, we recommend adding the statment GOPTIONS RESET=ALL; to the beginning of each macro. Under Release 5.18, you can modify SYMBOL and PATTERN statements to explicitly specify null values for parameters (V=NONE, for example). The macros were originally written to produce hardcopy output on a white background, and some use black as a foreground color. If you are using the macros to generate graphics on a device with a black background, you may need to change all references to BLACK in the macro to another color. Macro Programs BIPLOT Implements the biplot technique (e.g., Gabriel, 1971) for plotting multivariate observations and variables together in a single display. BOXANNO Provides univariate marginal boxplot annotations for two-dimensional and three-dimensional scatterplots. BOXPLOT Produces standard and notched boxplots for a single response variable with one or more grouping variables. CONTOUR The CONTOUR macro plots a bivariate scatterplot with a bivariate data ellipse for one or more groups with one or more confidence coefficients. CORRESP Performs correspondence analysis (also known as "dual scaling") on a table of frequencies in a two-way (or higher-way) classification. In Version 6 of the SAS System, this analysis is also performed by PROC CORRESP. This version of the macro should only be used with Version 5 of the SAS System. CORRESP2 A version of the CORRESP macro that should be used with Version 6 of the SAS System. DENSITY Calculates a nonparametric density estimate for histogram smoothing of a univariate data distribution The program uses the Gaussian kernel and calculates an optimal window half-width (Silverman, 1986) if not specified by the user. DOTPLOT Produces grouped and ungrouped dot charts of a single variable. (Cleveland, 1984, 1985). LOWESS Performs robust, locally weighted scatterplot smoothing (Cleveland, 1979). LOWESS2 A version of the LOWESS macro that should be used with Version 6 of the SAS System. NQPLOT Produces theoretical normal quantile-quantile (Q-Q) plots for single variable. Options provide a classical (mu, sigma) or robust (median, IQR) comparison line, standard error envelope, and a detrended plot. OUTLIER Multivariate outlier detection. The OUTLIER macro calculates robust Mahalanobis distances by iterative multivariate trimming (Gnanadesikan & Kettenring, 1972; Gnanadesikan, 1977), and produces a chisquare Q-Q plot. PARTIAL Partial regression residual plots. Observations with high leverage and/or large studentized residuals can be individually labeled. This version of the macro should only be used with Version 5 of the SAS System. PARTIAL2 A version of the PARTIAL macro that should be used with Version 6 of the SAS System. SCATMAT Draws a scatterplot matrix for all pairs of variables. A classification variable may be used to assign the plotting symbol and/or color of each point. STARS Draws a star plot of the multivariate observations in a data set. Each observation is depicted by a star- shaped figure with one ray for each variable, whose length is proportional to the size of that variable. SYMPLOT Produces a variety of diagnostic plots for assessing symmetry of a data distribution and finding a power transformation to make the data more symmetric. TWOWAY Analysis of two-way tables. The TWOWAY macro carries out analysis of two-way experimental design data with one observation per cell, including Tukey's (1949) 1 degree of freedom test for non-additivity. Two plots may be produced: a graphical display of the fit and residuals for the additive model, and a diagnostic plot for a power transformation for removable non- additivity. This version of the macro should only be used with Version 5 of the SAS System. TWOWAY2 A version of the TWOWAY macro that should be used with Version 6 of the SAS System. SAS Macro Programs All of the macro programs use keywords for the required and optional parameters. Default values (if any) are given after the "=" sign in the parameter list. Thus, it is only necessary to specify parameters which differ from the default value, and these parameters may be specified in any order in the macro call. The following conventions (which generally follow SAS usage in PROC steps) are used for naming parameters and default values: Parameter Description DATA= The name of the input data set to be analyzed or plotted. The default is usually DATA=_LAST_, which means that the most recently created data set is the default if no data set is specified. VAR= The name(s) of the input variable(s) in the DATA= data set. VAR=_NUMERIC_ means that all numeric variables in the data set are analyzed if no variables list is specified. Some of the macros understand a variable list specified as a range of variables, such as VAR=X1-X5, or VAR=EDUC--INCOME, as in the VAR statement. Others, especially those using PROC IML, require the variables to be listed individually, for example VAR=X1 X2 X3 X4 X5. ID= The name of an input variable used to label observations. There is usually no default ID variable. CLASS= GROUP= Specifies the name of an input variable used to classify observations into groups. OUT= The name of the output data set created by the macro. OUT=_DATA_ means that the output data set is named automatically according to the DATAn convention: the first such data set created is called DATA1, the second is called DATA2, and so on. Typically this contains the data which is plotted. In some cases the macro leaves it to the user to plot the OUT= data set, so that axis labels, values, and ranges can be controlled. ANNO= The name of an input or output data set used for annotating the plot. NAME= The name assigned to the graph(s) in the graphic catalog. The default is usually the name of the macro. GOUT= Specifies the name of the graphics catalog used to save the output for later replay. The default is WORK.GSEG, which is erased at the end of your session. To save graphs in a permanent catalog, use a two-part name. In the listings below, each of the macro parameters is briefly described in the comment at the right of the program line in the %MACRO statement. Where further description is necessary, it is given in the section labeled Parameters. BIPLOT macro The BIPLOT macro uses PROC IML to carry out the calculations for the biplot display described in Section 8.7. The program produces a printer plot of the observations and variables by default, but does not produce a PROC GPLOT graph, since a proper graph should equate the axes. Instead, the coordinates to be plotted and the labels for observations are returned in two data sets, specified by the parameters OUT= and ANNO=, respectively. A typical plotting step, using the defaults OUT=BIPLOT and ANNO=BIANNO would be: proc gplot data=BIPLOT; plot dim2 * dim1 / anno=BIANNO frame href=0 vref=0 vaxis=axis2 haxis=axis1 vminor=1 hminor=1; axis1 length=5 in offset=(2) label=(h=1.5 'Dimension 1'); axis2 length=5 in offset=(2) label=(h=1.5 a=90 r=0 'Dimension 2'); symbol v=none; The axes in the plot should be equated. Parameters DATA=_LAST_ Name of the input data set for the biplot. VAR =_NUMERIC_ Variables for biplot. The list of variables must be given explicitly; the range notation X1-Xn cannot be used. ID =ID Name of a character variable used to label the rows (observations) in the biplot display. DIM =2 Number of biplot dimensions. FACTYPE=SYM Biplot factor type: GH, SYM, or JK SCALE=1 Scale factor for variable vectors. The coordinates for the variables are multiplied by this value. OUT =BIPLOT Output data set containing biplot coordinates. ANNO=BIANNO Output data set containing Annotate labels. STD=MEAN Specifies how to standardize the data matrix before the singular value decomposition is computed. If STD=NONE, only the grand mean is subtracted from each value in the data matrix. This option is typically used when row and column means are to be represented in the plot, as in the diagnosis of two-way tables (Section 7.6.3). If STD=MEAN, the mean of each column is subtracted. This is the default, and assumes that the variables are measured on commensurable scales. If STD=STD, the column means are subtracted and each column is standardized to unit variance. PPLOT=YES Produce printer plot? If PPLOT=YES, the first two dimensions are plotted. The OUT= data set The results from the analysis are saved in the OUT= data set. This data set contains two character variables (_TYPE_ and _NAME_) which identify the observations and numeric variables (DIM1, DIM2, ...) which give the coordinates of each point. The value of the _TYPE_ variable is 'OBS' for the observations that contain the coordinates for the rows of the data set, and is 'VAR' for the observations that contain the coordinates for the columns. The _NAME_ variable contains the value of ID= variable for the row observations and the variable name for the column observations in the output data set. Missing data The program makes no provision for missing values on any of the variables to be analyzed. BOXANNO macro BOXANNO SAS contains two SAS macros to annotate a scatterplot with marginal boxplots of one or more of the variables plotted with either PROC GPLOT or PROC G3D. BOXAXIS Creates an Annotate data set to draw a boxplot for one axis in a 2D or 3D scatterplot. BOXANNO Uses two calls to BOXAXIS to create an Annotate data set for boxplots on both axes. Use BOXANNO to draw the boxplots for both variables in a scatterplot. For a G3D scatterplot, use one call to BOXANNO for two of the variables and BOXAXIS for the third. See the examples in Section 4.5. Parameters for BOXAXIS DATA=_LAST_ Name of the input data set OUT=_DATA_ Name of the output Annotate data set VAR= Variable for which a boxplot is constructed BAXIS=X Axis on which it goes. Must be X, Y, or Z. OAXIS=Y The other axis in the plot PAXIS=Z The 3rd axis (ignored in GPLOT) BOXWIDTH=4 Width of the box in percent of the data range POS=98 Position of the center of the box on OAXIS in data percent. POS - BOXWIDTH/2 and POS + BOXWIDTH/2 must both be between 0 and 100. Parameters for BOXANNO DATA=_LAST_ Data set to be plotted XVAR= Horizontal variable YVAR= Vertical variable OUT=BOXANNO Output Annotate data set BOXPLOT macro The BOXPLOT macro draws side-by-side boxplots for the groups defined by one or more grouping (CLASS) variables in a data set. Parameters DATA=_LAST_ Name of the input data set. CLASS= Grouping variable(s). The CLASS= variables can be character or numeric. VAR= The name of the variable to be plotted on the ordinate. ID= A character variable to identify each observation. If an ID= variable is specified, outside variables are labelled on the graph, using the first 8 characters of the value of the ID variable (to reduce overplotting). Otherwise, outside points are not labelled. WIDTH=.5 Box width as proportion of the maximum. The default, WIDTH=.5, means that the maximum box width is half the spacing between boxes. NOTCH=0 Specifies whether or not to draw notched boxes. 1=draw notched boxes; 0=do not. CONNECT=0 Specifies the line style used to connect medians of adjacent groups. If CONNECT=0, the medians of adjacent groups are not to be connected. F=0.5 For a notched boxplot, the parameter F determines the notch depth, from the center of the box as a fraction of the halfwidth of each box. F must be between 0 and 1; the larger the value, the less deep is the notch. FN=1 Box width proportionality factor. The default, FN=1 means all boxes are the same width. If you specify FN=sqrt(n), the boxes width will be proportional to the square root of the sample size of each group. Other functions of n are possible as well. VARFMT= The name of a format for the ordinate variable. CLASSFMT= The name of a format for the class variable(s). If the CLASS variable is a character variable, or there are two or more CLASS variables, the program maps the sorted values of the class variable(s) into the integers 1, 2, ... levels, where levels is the number of distinct values of the class variable(s). A format provided for CLASSFMT should therefore provide labels corresponding to the numbers 1, 2, ... levels. VARLAB= Label for the ordinate variable. If not specifed, the ordinate is labelled with the variable name. CLASSLAB= Label for the class variable(s) used to label the horizontal axis. YORDER= Tick marks, and range for ordinate, in the form YORDER = low TO high BY tick. ANNO= The name of an (optional) additional ANNOTATE data set to be used in drawing the plot. OUT=BOXSTAT Name of the output data set containing statistics used in drawing the boxplot. There is one observation for each group. The variables are N, MEAN, MEDIAN, Q1, Q3, IQR, LO_NOTCH, HI_NOTCH, LO_WHISK, HI_WHISK. NAME=BOXPLOT The name assigned to the graph in the graphic catalog. GOPTIONS required If there are many groups and/or the formatted labels of group names are long, you may need to increase the HPOS= option to allow a sufficient number of character positions for the labels. CONTOUR macro The CONTOUR macro plots a bivariate scatterplot with a bivariate data ellipse for one or more groups. Parameters DATA=_LAST_ Name of the input data set X= Name of the X variable Y= Name of the Y variable GROUP= Group variable. If a GROUP= variable is specified, one ellipse is produced for each value of this variable in the data set. If no GROUP= variable is specified, a single ellipse is drawn for the entire sample. The GROUP= variable may be character or numeric. PVALUE=.5 Confidence coefficient(s) ( 1 - alpha ). This is the proportion of data from a bivariate normal distribution contained within the ellipse. Several values may be specified in a list (e.g., PVALUE=.5 .9), in which case one ellipse is generated for each value. STD=STDERR Error bar metric. STD=STDERR gives error bars equal to each mean +- one standard error (s / sqrt n) for both variables. STD=STD gives error bars whose length is one standard deviation for both variables. POINTS=40 The number of points on each contour. ALL=NO Specifies whether the contour for the total sample should be drawn in addition to those for each group. If there is no GROUP= variable, ALL=YES just draws the ellipse twice. OUT=CONTOUR Name of the output Annotate data set used to draw the ellipses, error bars and group labels. PLOT=YES If YES, the macro plots the data together with the generated ellipses. Otherwise, only the output Annotate data set is generated. I=NONE SYMBOL statement interpolate option for drawing points. Use I=RL to include the regression line as well. NAME=CONTOUR The name assigned to the graph(s) in the graphic catalog. COLORS=RED GREEN BLUE BLACK PURPLE YELLOW BROWN ORANGE List of colors to use for each of the groups. If there are g groups, specify g colors if ALL=NO, and g + 1 colors if ALL=YES. COLORS(i) is used for group i. SYMBOLS=+ SQUARE STAR - PLUS : $ = List of symbols, separated by spaces, to use for plotting points in each of the groups. SYMBOLS(i) is used for group i. Usage Notes When using the CONTOUR macro with the SAS System for Personal Computers, it may be neccessary to add the option WORKSIZE=100 to the PROC IML statement. When displaying output from the macro on a terminal, you should change occurences of the color BLACK to WHITE. CORRESP macro CORRESP performs correspondence analysis on a table of frequencies in a two-way (or higher-way) classification. The VAR= variables list specify one of the classification variables. The observations in the input data set form the other classification variable(s). The coordinates of the row and column points are output to the data set specified by the OUT= parameter. The labels for the points are output to the data set specified by the ANNO= parameter. See Section 10.3.2 for details about plotting the results and equating axes. Parameters DATA=_LAST_ Name of the input data set VAR= Column variables ID= ID variable: row labels OUT=COORD Output data set for coordinates ANNO=LABEL Name of the Annotate data set for row and column labels ROWHT=1 Height (in character cells) for the row labels COLHT=1 Height (in character cells) for the column labels The OUT= data set The results from the analysis are saved in the OUT= data set. This data set contains two character variables (_TYPE_ and _NAME_) which identify the observations and two numeric variables (DIM1 and DIM2) which give the locations of each point in two dimensions. The value of the _TYPE_ variable is 'OBS' for the observations that contain the coordinates for the rows of the table, and is 'VAR' for the observations that contain the coordinates for the columns. The _NAME_ variable contains the value of ID= variable for the row observations and the variable name for the column observations in the output data set. DENSITY macro The DENSITY macro calculates a nonparametric density estimate of a data distribution as described in Section 3.4. The macro produces the output data set specified by the OUT= parameter, but leaves it to the user to call PROC GPLOT, so that the plot can be properly labeled. The output data set contains the variables DENSITY and WINDOW in addition to the variable specified by the VAR= parameter. A typical plotting step, using the defaults, OUT=DENSPLOT and VAR=X, would be: proc gplot data=densplot; plot density * X ; symbol1 i=join v=none; Parameters DATA=_LAST_ Name of the input data set OUT=DENSPLOT Name of the output data set VAR=X Name if the input variable (numeric) WINDOW= Bandwidth (H) for kernel density estimate XFIRST=. Smallest X value at which density estimate is computed. If XFIRST = ., the minimum value of the VAR= variable is used. XLAST=. Largest X value at which density estimate is computed. If XLAST = ., the maximum value of the VAR= variable is used. XINC=. Step-size (increment) for computing density estimates. If XINC = ., the increment is calculated as values, XINC = (XLAST-XFIRST)/60. DOTPLOT macro DOTPLOT produces grouped and ungrouped dot charts, as described in Section 2.5. Parameters DATA=_LAST_ Name of the input data set XVAR= Horizontal (response) variable XORDER= Plotting range for response. Specify XORDER in the form XORDER = low TO high BY step. XREF= Specifies the horizontal values at which reference lines are drawn for the response variable. If not specified, no reference lines are drawn. YVAR= Vertical variable (observation label) for the dot chart. This should specify a character variable. At most 16 characters of the value are used for the label. YSORTBY=&XVAR How to sort the observations. The default, YSORTBY=&XVAR, indicates that observations are sorted in ascending order of the response variable. YLABEL= Label for y variable. If not specified, the vertical axis is labelled with the name of the YVAR= variable. GROUP= Vertical grouping variable. If specified, a grouped dot chart is produced with a separate panel for each value of the GROUP= variable. GPFMT= format for printing group variable value (include the "." at the end of the format name). CONNECT=DOT Specifies how to draw horizontal lines for each observation. Valid values are ZERO, DOT, AXIS, or NONE. The default, CONNECT=DOT, draws a dotted line from the Y axis to the point. CONNECT=ZERO draws a line from an X value of 0 to the point. CONNECT=AXIS draws a line from the Y axis to the plot frame at the maximum X value. CONNECT=NONE does not draw a line for the observation. DLINE=2 Line style for horizontal lines for each observation. DCOLOR=BLACK Color of horizontal lines ERRBAR= Name of an input variable giving length of error bar for each observation. If not specified, no error bars are drawn. NAME=DOTPLOT The name assigned to the graph in the graphic catalog. GOPTIONS required DOTPLOT plots each observation in a row of the graphics output area. Therefore the VPOS= graphics option should specify a sufficient number of vertical character cells. The value for VPOS= should be VPOS ge number of observations + number of groups + 8 LOWESS macro The LOWESS macro performs robust, locally weighted scatterplot smoothing as described in Section 4.4.2. The data and the smoothed curve are plotted if PLOT=YES is specified. The smoothed response variable is returned in the output data set named by the OUT= parameter. Parameters DATA=_LAST_ Name of the input data set. X = X Name of the independent (X) variable. Y = Y Name of the dependent (Y) variable to be smoothed. ID= Name of an optional character variable to identify observations. OUT=SMOOTH Name of the output data set. The output data set contains the X=, Y=, and ID= variables plus the variables _YHAT_, _RESID_, and _WEIGHT_. _YHAT_ is the smoothed value of the Y= variable, _RESID_ is the residual, and _WEIGHT_ is the combined weight for that observation in the final iteration. F = .50 Lowess window width, the fraction of the observtions used in each locally-weighted regression. ITER=2 Total number of iterations. PLOT=NO Draw the plot? If you specify PLOT=YES, a high- resolution plot is drawn by the macro. NAME=LOWESS The name assigned to the graph in the graphic catalog. When using the LOWESS macro with the SAS System for Personal Computers, it may be neccessary to add the option WORKSIZE=100 to the PROC IML statement. NQPLOT macro NQPLOT produces normal Q-Q plots for single variable. The parameters MU= and SIGMA= determine how the comparison line, representing a perfect fit to a normal distribution, is estimated. Parameters DATA=_LAST_ Name of the input data set VAR=X Name of the variable to be plotted OUT=NQPLOT Name of the output data set MU=MEDIAN Estimate of the of mean of the reference normal distribution: Specify MU=MEAN, MU=MEDIAN, or MU=. SIGMA=HSPR Estimate of the standard deviation of the reference normal distribution: Specify SIGMA=STD, SIGMA=HSPR, or SIGMA=. STDERR=YES Plot std errors around curves? DETREND=YES Plot detrended version? If DETREND=YES the detrended version is plotted too. LH=1.5 Height, in character cells, for the axis labels. ANNO= Name of an optional input Annotate data set NAME=NQPLOT The name assigned to the graph(s) in the graphic catalog. GOUT= The name of the graphic catalog used to store the graph(s) for later replay. OUTLIER macro The OUTLIER macro calculates robust Mahalanobis distances for each observation in a data set. The results are robust in that potential outliers do not contribute to the distance of any other observations. A high-resolution plot may be constructed from the output data set; see the examples in Section 9.3 The macro makes one or more passes through the data. Each pass assigns 0 weight to observations whose DSQ value has Prob ( chisquare ) < PVALUE. The number of passes should be determined empirically so that no new observations are trimmed on the last step. Parameters DATA=_LAST_ Name of the data set to analyze VAR=_NUMERIC_ List of input variables ID= Name of an optional ID variable to identify observations OUT=CHIPLOT Name of the output data set for plotting. The robust squared distances are named DSQ. The corresponding theoretical quantiles are named EXPECTED. The variable _WEIGHT_ has the value 0 for observations identified as possible outliers. PVALUE=.1 Probability value of chi sup 2 statistic used to trim observations. PASSES=2 Number of passes of the iterative trimming procedure. PRINT=YES Print the OUT= data set? PARTIAL macro The PARTIAL macro draws partial regression residual plots as described in Section 5.5. Parameters DATA = _LAST_ Name of the input data set. YVAR = Name of the dependent variable. XVAR = List of independent variables. The list of variables must be given explicitly; the range notation X1-Xn cannot be used. ID = Name of an optional character variable used to label observations. If ID= is not specified, the observations are identified by the numbers 1, 2, ... LABEL=INFL Specifies which points in the plot should be labelled with the value of the ID= variable. If LABEL=NONE, no points are labelled; if LABEL=ALL, all points are labelled; otherwise (LABEL=INFL) only potentially influential observations (those with large leverage values or large studentized residuals) are labelled. OUT = Name of the output data set containing partial residuals. This data set contains ( p + 1 ) pairs of variables, where p is the number of XVAR= variables. The partial residuals for the intercept are named UINTCEPT and VINTCEPT. If XVAR=X1 X2 X3, the partial residuals for X1 are named UX1 and VX1, and so on. In each pair, the U variable contains the partial residuals for the independent (X) variable, and the V variable contains the partial residuals for the dependent (Y) variable. GOUT=GSEG Name of graphic catalog used to store the graphs for later replay. NAME=PARTIAL The name assigned to the graphs in the graphic catalog. Computing note In order to follow the description in the text, the program computes one regression analysis for each regressor variable (including the intercept). Vellman & Welsh (1981) show how the partial regression residuals and other regression diagnostics can be computed more eficiently--from the results of a single regression using all predictors. They give an outline of the computations in the PROC MATRIX language. Usage Note When using the PARTIAL macro with the SAS System for Personal Computers, it may be neccessary to add the option WORKSIZE=100 to the PROC IML statement. SCATMAT macro The SCATMAT macro draws a scatterplot matrix for all pairs of variables specified in the VAR= parameter. The program will not do more than 10 variables. You could easily extend this, but the plots would most likely be too small to see. If a classification variable is specified with the GROUP= parameter, the value of that variable determines the shape and color of the plotting symbol. The macro GENSYM defines the SYMBOL statements for the different groups, which are assigned according to the sorted value of the grouping variable. The default values for the SYMBOLS= and COLORS= parameters allow for up to eight different plotting symbols and colors. If no GROUP= variable is specified, all observations are plotted using the first symbol and color. Parameters DATA=_LAST_ Name of the data set to be plotted. VAR= List of variables to be plotted. GROUP= Name of an optional grouping variable used to define the plot symbols and colors. SYMBOLS=%str(- + : $ = X _ Y) List of symbols, separated by spaces, to use for plotting points in each of the groups. The i-th element of SYMBOLS is used for group i. COLORS=BLACK RED GREEN BLUE BROWN YELLOW ORANGE PURPLE List of colors to use for each of the groups. If there are g groups, specify g colors. The i-th element of COLORS is used for group i. GOUT=GSEG Name of the graphics catalog used to store the final scatterplot matrix constructed by PROC GREPLAY. The individual plots are stored in WORK.GSEG. Note that if the SCATMAT macro is invoked after other graphs have been created in the same SAS session, the previously created graphs will be replayed into the template panels instead of the plots generated in the macro. To avoid this problem, either delete existing graphs from the WORK.GSEG catalog before invoking the macro, or modify the macro to store the individual plots in a catalog other than WORK.GSEG. STARS macro The STARS macro draws a star plot of the multivariate observations in a data set, as described in Section 8.4. Each observation is depicted by a star-shaped figure with one ray for each variable, whose length is proportional to the size of that variable. Missing data The scaling of the data in the PROC IML step makes no allowance for missing values. Parameters DATA=_LAST_ Name of the data set to be displayed. VAR= List of variables, in the order to be placed around the star, starting from angle=0 (horizontal), and proceeding counterclockwise. ID= Character observation identifier variable (required). MINRAY=.1 Minimum ray length, 0<=MINRAY<1. ACROSS=5 Number of stars across a page. DOWN=6 Number of stars down a page. If the product of ACROSS and DOWN is less than the number of observations, multiple graphs are produced. SYMPLOT macro The SYMPLOT macro produces any of the plots for diagnosing symmetry of a distribution described in Section 3.6. Parameters DATA=_LAST_ Name of the input data to be analyzed. VAR= Name of the variable to be plotted. Only one variable may be specified. PLOT=MIDSPR Type of plot(s): NONE, or one or more of UPLO, MIDSPR, MIDZSQ, or POWER. One plot is produced for each keyword included in the PLOT= parameter. TRIM=0 A number or percent of extreme observations to be trimmed. If you specify TRIM=number, the highest and lowest number observations are not plotted. If you specify TRIM=percent PCT, the highest and lowest percent% of the observations are not plotted. The TRIM= option is most useful in the POWER plot. OUT=SYMPLOT Name of the output data set NAME=SYMPLOT The name assigned to the graph(s) in the graphic catalog. TWOWAY macro The TWOWAY macro carries out analysis of two-way experimental design data with one observation per cell, including Tukey's 1 degree of freedom test for non-additivity as described in Section 7.6. Two plots may be produced: a graphical display of the fit and residuals for the additive model, and a diagnostic plot for removable non-additivity. Parameters DATA=_LAST_ Name of the data set to be analyzed. One factor in the design is specified by the list of variables in the VAR= parameter. The other factor is defined by the observations in the data set. VAR= List of variables (columns of the table) to identify the levels of the first factor. ID= Row identifier, a character variable to identify the levels of the second factor. RESPONSE=Response Label for the response variable on the vertical axis of the two-way FIT plot. PLOT=FIT DIAGNOSE Specifies the plots to be done. The PLOT parameter can contain one or more of the keywords FIT, DIAGNOSE and PRINT. FIT requests a high- resolution plot of fitted values and residuals for the additive model. DIAGNOSE requests a high-resolution diagnostic plot for removable non-additivity. PRINT produces both of these plots in printed form. NAME= The name assigned to the graphs in the graphic catalog. GOUT= Specifies the name of the graphics catalog used to save the output for later replay. The default is WORK.GSEG, which is erased at the end of your session. To save graphs in a permanent catalog, use a two-part name. GOPTIONS You should adjust the HSIZE= and VSIZE= values on the GOPTIONS statement to equate the data units in the horizontal and vertical axes of the FIT plot so that the corners are square. /*-------------------------------------------------------------------* * BIPLOT SAS - Macro to construct a biplot of observations and * * variables. Uses IML. * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 1 Mar 1989 13:16:36 * * Revised: 20 Dec 1989 09:54:19 * * Version: 1.2 * * * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * *-------------------------------------------------------------------*/ %macro BIPLOT( data=_LAST_, /* Data set for biplot */ var =_NUMERIC_, /* Variables for biplot */ id =ID, /* Observation ID variable */ dim =2, /* Number of biplot dimensions */ factype=SYM, /* Biplot factor type: GH, SYM, or JK */ scale=1, /* Scale factor for variable vectors */ out =BIPLOT, /* Output dataset: biplot coordinates */ anno=BIANNO, /* Output dataset: annotate labels */ std=MEAN, /* How to standardize columns: NONE|MEAN|STD*/ pplot=YES); /* Produce printer plot? */ %let factype=%upcase(&factype); %if &factype=GH %then %let p=0; %else %if &factype=SYM %then %let p=.5; %else %if &factype=JK %then %let p=1; %else %do; %put BIPLOT: FACTYPE must be GH, SYM, or JK. "&factype" is not valid.; %goto done; %end; Proc IML; Start BIPLOT(Y,ID,VARS,OUT, power, scale); N = nrow(Y); P = ncol(Y); %if &std = NONE %then Y = Y - Y[:] %str(;); /* remove grand mean */ %else Y = Y - J(N,1,1)*Y[:,] %str(;); /* remove column means */ %if &std = STD %then %do; S = sqrt(Y[##,] / (N-1)); Y = Y * diag (1 / S ); %end; *-- Singular value decomposition: Y is expressed as U diag(Q) V prime Q contains singular values, in descending order; call svd(u,q,v,y); reset fw=8 noname; percent = 100*q##2 / q[##]; *-- cumulate by multiplying by lower triangular matrix of 1s; j = nrow(q); tri= (1:j)`*repeat(1,1,j) >= repeat(1,j,1)*(1:j) ; cum = tri*percent; c1={'Singular Values'}; c2={'Percent'}; c3={'Cum % '}; Print "Singular values and variance accounted for",, q [colname=c1 format=9.4 ] percent [colname=c2 format=8.2 ] cum [colname=c3 format=8.2 ]; d = &dim ; *-- Extract first d columns of U & V, and first d elements of Q; U = U[,1:d]; V = V[,1:d]; Q = Q[1:d]; *-- Scale the vectors by QL, QR; * Scale factor 'scale' allows expanding or contracting the variable vectors to plot in the same space as the observations; QL= diag(Q ## power ); QR= diag(Q ## (1-power)); A = U * QL; B = V * QR # scale; OUT=A // B; *-- Create observation labels; id = id // vars`; type = repeat({"OBS "},n,1) // repeat({"VAR "},p,1); id = concat(type, id); factype = {"GH" "Symmetric" "JK"}[1 + 2#power]; print "Biplot Factor Type", factype; cvar = concat(shape({"DIM"},1,d), char(1:d,1.)); print "Biplot coordinates", out[rowname=id colname=cvar]; %if &pplot = YES %then call pgraf(out,substr(id,5),'Dimension 1', 'Dimension 2', 'Biplot'); ; create &out from out[rowname=id colname=cvar]; append from out[rowname=id]; finish; use &data; read all var{&var} into y[colname=vars rowname=&id]; power = &p; scale = &scale; run biplot(y, &id,vars,out, power, scale ); quit; /*----------------------------------* | Split ID into _TYPE_ and _NAME_ | *----------------------------------*/ data &out; set &out; drop id; length _type_ $3 _name_ $16; _type_ = scan(id,1); _name_ = scan(id,2); /*--------------------------------------------------* | Annotate observation labels and variable vectors | *--------------------------------------------------*/ data &anno; set &out; length function text $8; xsys='2'; ysys='2'; text = _name_; if _type_ = 'OBS' then do; /* Label the observation */ color='BLACK'; x = dim1; y = dim2; position='5'; function='LABEL '; output; end; if _type_ = 'VAR' then do; /* Draw line from */ color='RED '; x = 0; y = 0; /* the origin to */ function='MOVE' ; output; x = dim1; y = dim2; /* the variable point */ function='DRAW' ; output; if dim1 >=0 then position='6'; /* left justify */ else position='2'; /* right justify */ function='LABEL '; output; /* variable name */ end; %done: %mend BIPLOT; /*-------------------------------------------------------------------* * BOXANNO SAS Annotate a scatter plot with univariate boxplots * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 20 Apr 1988 11:32:44 * * Revised: 17 May 1990 09:51:13 * * Version: 1.5 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * *-------------------------------------------------------------------*/ /*------------------------------------------------------------------* | BOXAXIS macro - create an annotate dataset to draw a boxplot for | | ONE axis in a scatterplot. Can be used with Proc GPLOT | | or Proc G3D scatterplots. | | This macro just creates the annotate dataset. It is up to| | the user to call the appropriate plot procedure. | | e.g., Proc GPLOT data= ; | | Plot Y * X / annotate= ... ; | *------------------------------------------------------------------*/ %macro BOXAXIS( data=_LAST_, /* Input dataset */ out=_DATA_, /* Output ANNOTATE dataset */ var=, /* Variable to be plotted */ baxis=x, /* Axis on which it goes- X, Y, or Z */ oaxis=y, /* The other axis in the plot */ paxis=z, /* The 3rd axis (ignored in GPLOT) */ boxwidth=4, /* width of box in data percent */ pos=98); /* position of box on OAXIS 0 (q3+1.5*iqr) then outside=2; if &var < (q1-3.0*iqr) or &var > (q3+3.0*iqr) then outside=3; run; /*----------------------------------------------------* | Whiskers go from quartiles to most extreme values | | which are *NOT* outside. | *----------------------------------------------------*/ data whis; set plotdat; if outside = 1; proc univariate data=whis noprint; var &var; output out=whisk min=lo_whisk max=hi_whisk; run; data boxfile; merge quartile whisk; proc print data=boxfile; /*-----------------------------------------------* | Annotate data set to draw boxes & whiskers | *-----------------------------------------------*/ %let bx = &oaxis; %let by = &baxis; %let bz = &paxis; data &out; set boxfile; drop n lo_whisk hi_whisk q1 q3 iqr median mean center halfwid; length function $8 text $8; halfwid= &boxwidth / 2; %if ( &pos > 50 ) %then %do; center= &pos - halfwid; %end; %else %do; center= &pos + halfwid; %end; &bx.sys = '1'; /* data percentage coordinates for 'other' */ &by.sys = '2'; /* data value coordinates for box axis */ %if ( &paxis ^= %str() ) %then %do; &bz.sys = '1'; /* data percentage coordinates for 3rd axis*/ &bz = 1 ; %end; &bx =center-halfwid ; &by = q1; dot=1 ; link out; * box ; &bx =center+halfwid ; &by = q1; dot=21; link out; &bx =center+halfwid ; &by = q3; dot=22; link out; &bx =center-halfwid ; &by = q3; dot=23; link out; &bx =center-halfwid ; &by = q1; dot=24; link out; * box ; &bx =center-halfwid ; &by = median ; dot=3 ; link out; * median; &bx =center+halfwid ; &by = median ; dot=4 ; link out; &bx =center ; &by = q1 ; dot=5 ; link out; * lo ; &bx =center ; &by = lo_whisk; dot=6 ; link out; * whisker; &bx =center ; &by = q3 ; dot=7 ; link out; * hi ; &bx =center ; &by = hi_whisk; dot=8 ; link out; * whisker; &bx =center-halfwid/2 ; &by = lo_whisk; dot=9 ; link out; &bx =center+halfwid/2 ; &by = lo_whisk; dot=10; link out; &bx =center-halfwid/2 ; &by = hi_whisk; dot=11; link out; &bx =center+halfwid/2 ; &by = hi_whisk; dot=12; link out; &bx =center ; &by = mean ; dot=13; link out; return; out: select; when (dot=1 | dot=3 | dot=5 | dot=7 | dot=9 | dot=11) do; line = .; function = 'MOVE'; output; end; when (dot=4 | dot=6 | dot=8 | dot=10 | dot=12 | dot=21| dot=22| dot=23| dot=24 ) do; if dot=6 | dot=8 then line = 3; else line = 1; function = 'DRAW'; output; end; when (dot = 13) do; text = 'STAR'; function = 'SYMBOL'; output; end; otherwise; end; return; run; %mend boxaxis; /*---------------------------------------------------------* | BOXANNO macro - creates annotate dataset for both X & Y | *---------------------------------------------------------*/ %macro boxanno( data=_last_, /* Data set to be plotted */ xvar=, /* Horizontal variable */ yvar=, /* Vertical variable */ out=boxanno /* Output annotate dataset */ ); %boxaxis( data=&data, var=&xvar, baxis=x, oaxis=y, out=xanno); %boxaxis( data=&data, var=&yvar, baxis=y, oaxis=x, out=yanno); /*----------------------------------------* | Concatenate the two annotate datasets | *----------------------------------------*/ data &out; set xanno yanno; %mend boxanno; /*-------------------------------------------------------------------* * BOXPLOT SAS SAS/Graph Box and Whisker plot * * * * This SAS macro constructs and plots side-by-side Box and whisker * * plots for ONE quantitative variable classified by ONE OR MORE * * grouping (CLASS) variables. The CLASS variables may be character * * or numeric. * * * * The box for each group shows the location of the median, mean and * * quartiles. Whisker lines extend to the most extreme observations * * which are no more than 1.5*IQR beyond the quartiles. Observations * * beyond the whiskers are plotted individually. * * * * Optional NOTCHES are drawn to show approximate 95% confidence * * intervals for each group median. Other options are provided to * * connect group medians, draw box widths proportional to sample * * size, and allow formatted labels for both variables. * * References: * * Olmstead, A. "Box Plots using SAS/Graph Software", SAS SUGI, * * 1985, 888-894. * * McGill, R., Tukey, J.W., & Larsen, W. "Variations of Box Plots",* * American Statistician, 1978, 32, 12-16. * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 12 Apr 1988 10:19:15 * * Revised: 12 Oct 1990 09:25:10 * * Version: 1.3 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * *-------------------------------------------------------------------*/ /* Description of Parameters: */ %macro BOXPLOT( /* ------------------------- */ data=_LAST_, /* Input dataset */ class=, /* Grouping variable(s) */ var=, /* Ordinate variable */ id=, /* Observation ID variable */ width=.5, /* Box width as proportion of maximum */ notch=0, /* =0|1, 1=draw notched boxes */ connect=0, /* =0 or line style to connect medians*/ f=0.5, /* Notch depth, fraction of halfwidth */ fn=1, /* Box width proportional to &FN */ varfmt=, /* Format for ordinate variable */ classfmt=, /* Format for class variable(s) */ varlab=, /* Label for ordinate variable */ classlab=, /* Label for class variable(s) */ yorder=, /* Tick marks, range for ordinate */ anno=, /* Addition to ANNOTATE set */ out=boxstat, /* Output data set: quartiles, etc. */ name=BOXPLOT /* Name for graphic catalog entry */ ); options nonotes; %let _DSN_ = %upcase(&DATA); %if &classlab = %str() %then %let classlab = &class; %let CLASS = %upcase(&CLASS); proc sort data=&DATA; by &CLASS; run; %let clvars = %nvar(&class); /*----------------------------------------* | Determine if &CLASS is char or numeric | *----------------------------------------*/ %let cltype=; proc contents data=&DATA out=work noprint; data _NULL_; length label2 $40; set work; if name="&CLASS" then if type=1 then call symput('CLTYPE', 'NUM'); else call symput('CLTYPE', 'CHAR'); *-- find length of variable label and set y label angle --; %if &varlab ^= %str() %then %str( label2 = "&varlab"; ); %else %str( if name="&VAR" then label2=label; ); if length(label2) <=8 then call symput('YANGLE',''); else call symput('YANGLE','a=90 r=0'); run; /* Run required here */ /*----------------------------------------------------------------* | If there are more than one class variables or class variable | | is CHAR, create a numeric class variable, XCLASS. XCLASS | | numbers the groups from 1,...number-of-groups. It is up to | | the user to supply a format to associate proper group labels | | with the XCLASS value. | *----------------------------------------------------------------*/ %if ( &cltype=CHAR or &clvars > 1 ) %then %do; %let lclass = %scan( &CLASS, &clvars ); data work; set &DATA; by &CLASS; if (first.&LCLASS) then xclass + 1; %if &cltype=CHAR and &clvars=1 and &classfmt=%str() %then %do; call symput('val'||left(put( xclass, 2. )), trim(&class) ); %end; run; %let KLASS = xclass; %let data = work; run; %end; %else %let KLASS = &CLASS; /*------------------------------------------------* | Determine number of groups & quartiles of each | *------------------------------------------------*/ proc means noprint data=&data; var &KLASS; output out=_grsum_ min=grmin max=grmax ; run; proc univariate data=&data noprint; by &KLASS; var &VAR; output out=_qtile_ n=n q1=q1 q3=q3 median=median qrange=iqr mean=mean; data _qtile_; set _qtile_; By &KLASS; Lo_Notch = Median - 1.58*IQR / sqrt(N); Hi_Notch = Median + 1.58*IQR / sqrt(N); run; data merged; merge &DATA _qtile_; by &KLASS; /*-----------------------------------------------* | Find outside & farout points | *-----------------------------------------------*/ data plotdat; set merged; keep &KLASS &VAR &ID outside; if &VAR ^= .; outside=1; if &VAR < (Q1 -1.5*IQR) or &VAR > (Q3 +1.5*IQR) then outside=2; if &VAR < (Q1 -3.0*IQR) or &VAR > (Q3 +3.0*IQR) then outside=3; run; data _out_; set plotdat; if outside > 1 ; proc sort data=_out_; by &KLASS &VAR ; proc print data=_out_; id &ID &KLASS; title3 "Outside Observations in Data Set &_DSN_ "; run; /*-----------------------------------------------------* | If connnecting group medians, find them and append | *-----------------------------------------------------*/ %if ( &connect ) %then %do; data connect; set _qtile_(keep=&KLASS Median rename=(Median=&VAR)); outside=0; proc append base=plotdat data=connect; run; %end; /*----------------------------------------------------* | Whiskers go from quartiles to most extreme values | | which are *NOT* outside. | *----------------------------------------------------*/ data _in_; set plotdat; if outside = 1; /* select inside points */ proc univariate data=_in_ noprint; by &KLASS; var &VAR; /* find min and max */ output out=_whisk_ min=lo_whisk max=hi_whisk; run; data &out; merge _qtile_ _whisk_ end=lastobs; by &KLASS; retain halfmax 1e23 fnmax -1e23; drop span halfmax fnmax offset grps; span = dif ( &KLASS ); /* x(k+1) - x(k) */ if (_n_ > 1 ) then halfmax = min( halfmax, span/2); fnmax = max( fnmax, &FN ); if ( lastobs ) then do; if _n_=1 then halfmax=.5; call symput ('HALFMAX', left(put(halfmax,best.)) ); put ' Maximum possible halfwidth is: ' halfmax /; call symput ('FNMAX', left(put(fnmax,best.)) ); grps=_n_; offset=max(5, 35-5*grps); call symput('OFFSET',left(put(offset,2.)) ); put ' Number of groups: ' grps 'offset=' offset ; end; proc print ; id &KLASS; title3 'BOXPLOT: Quartiles, notches and whisker values'; run; /*-----------------------------------------------* | Annotate data set to draw boxes & whiskers | *-----------------------------------------------*/ data _dots_; set &out; retain halfmax &HALFMAX k ; drop k halfmax halfwid hi_notch lo_notch iqr median mean q1 q3 ; drop grmin grmax ; if ( _n_ = 1) then do; set _grsum_; K = &WIDTH * HalfMax; end; halfwid = K * &FN / &FNMax ; length function text $8; XSYS = '2'; YSYS = '2'; /* Produce connect-the-dots X, Y pairs */ X = &KLASS ; Y= Lo_Whisk ; dot = 1; link out; X = &KLASS ; Y= Q1 ; dot = 2; link out; X = &KLASS - halfwid ; Y= Q1 ; dot = 3; link out; %if ( ¬ch ) %then %do; X = &KLASS - halfwid ; Y= Lo_Notch ; dot = 4; link out; X = &KLASS - (1-&F)*halfwid ; Y= Median ; dot = 5; link out; X = &KLASS - halfwid ; Y= Hi_Notch ; dot = 6; link out; %end; X = &KLASS - halfwid ; Y= Q3 ; dot = 7; link out; X = &KLASS ; Y= Q3 ; dot = 8; link out; X = &KLASS ; Y= Hi_Whisk ; dot = 9; link out; X = &KLASS ; Y= Q3 ; dot = 10; link out; X = &KLASS + halfwid ; Y= Q3 ; dot = 11; link out; %if ( ¬ch ) %then %do; X = &KLASS + halfwid ; Y= Hi_Notch ; dot = 12; link out; %end; X = &KLASS+(1-&NOTCH*&F)*halfwid; Y= Median ; dot = 13; link out; X = &KLASS-(1-&NOTCH*&F)*halfwid; Y= Median ; dot = 14; link out; X = &KLASS+(1-&NOTCH*&F)*halfwid; Y= Median ; dot = 15; link out; %if ( ¬ch ) %then %do; X = &KLASS + halfwid ; Y= Lo_Notch ; dot = 16; link out; %end; X = &KLASS + halfwid ; Y= Q1 ; dot = 17; link out; X = &KLASS ; Y= Q1 ; dot = 18; link out; X = &KLASS - halfwid/3 ; Y= Lo_Whisk ; dot = 19; link out; X = &KLASS - halfwid/3 ; Y= Hi_Whisk ; dot = 19; link out; X = &KLASS ; Y= Mean ; dot = 20; link out; return; out: Select; when ( dot=1 ) do; FUNCTION = 'MOVE'; output; FUNCTION = 'POLY'; output; End; when ( 1< dot <=18) do; FUNCTION = 'POLYCONT'; output; End; when ( dot=19) do; FUNCTION = 'MOVE'; output; X = X + 2*halfwid/3 ; FUNCTION = 'DRAW'; output; End; when ( dot=20) do; FUNCTION = 'MOVE'; output; FUNCTION = 'SYMBOL'; TEXT='STAR'; output; End; Otherwise ; End; Return; run; /*-----------------------------------------------------* | Annotate data set to plot and label outside points | *-----------------------------------------------------*/ data _label_; set _out_; /* contains outliers only */ by &KLASS; keep xsys ysys x y function text style position; length text function style $8; xsys = '2'; ysys = '2'; y = &VAR; x = &KLASS ; function = 'SYMBOL'; /* draw the point */ style = ' '; position = ' '; if OUTSIDE=2 then do; text='DIAMOND'; size=1.7; end; else do; text='SQUARE '; size=2.3; end; output; %if &ID ^= %str() %then %do; /* if ID variable, */ if first.&KLASS then out=0; out+1; function = 'LABEL'; /* .. then label it */ text = &ID; size=.9; style='SIMPLEX'; x = &KLASS; if mod(out,2)=1 /* on alternating sides*/ then do; x=x -.05; position='4'; end; else do; x=x +.05; position='6'; end; output; %end; data _dots_; set _dots_ _label_ &anno ; /*------------------------------------* | Clean up datasets no longer needed | *------------------------------------*/ proc datasets nofs nolist library=work memtype=(data); delete work _grsum_ merged _in_ _whisk_ _qtile_ _label_; options notes; /*--------------------------------------* | Symbols for connecting group medians | *--------------------------------------*/ %if &connect ^= 0 %then %do; symbol1 C=BLACK V=NONE I=JOIN L=&connect r=1; /* connected medians */ symbol2 C=BLACK V=NONE R=3; /* rest done by annotate */ %end; %else %do; symbol1 C=BLACK V=NONE R=3 i=none; /* all done by annotate */ %end; title3; proc gplot data=plotdat ; plot &VAR * &KLASS = outside / frame nolegend name="&name" vaxis=axis1 haxis=axis2 hminor=0 annotate=_dots_; %if %length(&yorder) > 0 %then %let yorder = order=(&yorder); axis1 &yorder value=(h=1.2) label =(&yangle h=1.5); axis2 value=(h=1.2) label =(h=1.5) offset=(&offset pct); %if &varfmt ^= %str() %then %do; format &var &varfmt ; %end; %if &classfmt^= %str() %then %do; format &KLASS &classfmt ; %end; %if &varlab ^= %str() %then %do; label &var = "&varlab"; %end; %if &classlab^= %str() %then %do; label &KLASS = "&classlab"; %end; run; %mend boxplot; /*----------------------------------* | Count number of &CLASS variables | *----------------------------------*/ %macro nvar(varlist); %local wvar result; %let result = 1; %let wvar = %nrbquote(%scan( &varlist, &result)); %do %until ( &wvar= ); %let result = %eval( &result + 1); %let wvar = %nrbquote(%scan( &varlist, &result)); %end; %eval( &result - 1) %mend nvar; /*-------------------------------------------------------------------* * Name: CONTOUR SAS * * Title: IML macro to plot elliptical contours for X, Y data * * Ref: IML User's Guide, version 5 Edition * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 8 Jun 1988 12:33:21 * * Revised: 10 May 1990 11:15:56 * * Version: 2.0 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * * * *-------------------------------------------------------------------*/ %macro CONTOUR( data=_LAST_, /* input data set */ x=, /* X variable */ y=, /* Y variable */ group=, /* Group variable (optional) */ pvalue= .5, /* Confidence coefficient (1-alpha) */ std=STDERR, /* error bar metric: STD or STDERR */ points=40, /* points on each contour */ all=NO, /* include contour for total sample?*/ out=CONTOUR, /* output data set */ plot=YES, /* plot the results? */ i=none, /* SYMBOL statement interpolate opt */ name=CONTOUR, /* Name for graphic catalog entry */ colors=RED GREEN BLUE BLACK PURPLE YELLOW BROWN ORANGE, symbols=+ square star - plus : $ = ); %let all = %upcase(&all); %if &x=%str() or &y=%str() %then %do; %put CONTOUR: X= and Y= variables must be specified; %goto DONE; %end; proc iml; start ellipse(c, x, y, npoints, pvalues, formean); /*----------------------------------------------------------------* | Computes elliptical contours for a scatterplot | | C returns the contours as consecutive pairs of columns | | X,Y coordinates of the points | | NPOINTS scalar giving number of points around a contour | | PVALUES column vector of confidence coefficients | | FORMEAN 0=contours for observations, 1=contours for means | *----------------------------------------------------------------*/ xx = x||y; n = nrow(x); *-- Correct for the mean --; mean = xx[+,]/n; xx = xx - mean @ j(n,1,1); *-- Find principal axes of ellipses --; xx = xx` * xx / (n-1); print 'Variance-Covariance Matrix',xx; call eigen(v, e, xx); *-- Set contour levels --; c = 2*finv(pvalues,2,n-1,0); if formean=1 then c = c / (n-1) ; print 'Contour values',pvalues c; a = sqrt(c*v[ 1 ] ); b = sqrt(c*v[ 2 ] ); *-- Parameterize the ellipse by angles around unit circle --; t = ( (1:npoints) - {1}) # atan(1)#8/(npoints-1); s = sin(t); t = cos(t); s = s` * a; t = t` * b; *-- Form contour points --; s = ( ( e*(shape(s,1)//shape(t,1) )) + mean` @ j(1,npoints*ncol(c),1) )` ; c = shape( s, npoints); *-- C returned as NCOL pairs of columns for contours--; finish; start dogroups(x, y, gp, pvalue); d = design(gp); %if &all=YES %then %do; d = d || j(nrow(x),1,1); %end; do group = 1 to ncol(d); Print group; *-- select observations in each group; col = d[, group ]; xg = x[ loc(col), ]; yg = y[ loc(col), ]; *-- Find ellipse boundary ; run ellipse(xyg,xg,yg,&points, pvalue, 0 ); nr = nrow(xyg); *-- Output contour data for this group; cnames = { X Y PVALUE GP }; do c=1 to ncol(pvalue); col=(2*c)-1 : 2*c ; xygp = xyg[,col] || j(nr,1,pvalue[c]) || j(nr,1,group); if group=1 & c=1 then create contour from xygp [colname=cnames]; append from xygp; end; end; finish; *-- Get input data: X, Y, GP; use &data; read all var {&x} into x [colname=lx]; read all var {&y} into y [colname=ly]; %if &group ^= %str() %then %do; read all var {&group} into gp [colname=lg] ; %end; %else %do; gp = j(nrow(x),1,1); %end; close &data; *-- Find contours for each group; run dogroups(x, y, gp, { &pvalue} ); /*-----------------------------------* | Plot the contours using ANNOTATE | *-----------------------------------*/ data contour; set contour ; by gp pvalue notsorted; length function color $8; xsys='2'; ysys='2'; if first.pvalue then function='POLY'; else function='POLYCONT'; color=scan("&colors",gp); line = 5; run; /*----------------------------* | Crosses at Mean +- StdErr | *----------------------------*/ proc summary data=&data nway; class &group; var &x &y; output out=sumry mean=mx my &std=sx sy; proc print; data bars; set sumry end=eof; %if &group ^= %str() %then %str(by &group;); length function color $8; retain g 0; drop _freq_ _type_ mx my sx sy g; xsys='2'; ysys='2'; %if &group ^= %str() %then %do; if first.&group then g+1; %end; color=scan("&colors",g); line=3; x = mx-sx; y=my; function='MOVE'; output; x = mx+sx; function='DRAW'; output; x = mx ; y=my-sy; function='MOVE'; output; y=my+sy; function='DRAW'; output; *-- Write group label (convert numeric &group to character); %if &group ^= %str() %then %do; length text $16; text = left(&group); position='3'; size = 1.4; x = mx+.2*sx ; y=my+.2*sy; function='LABEL'; output; %end; if eof then call symput('NGROUP',put(g,best.)); run; data &out; set contour bars; %if &group = %str() %then %let group=1; %if %upcase(&plot)=YES %then %do; %gensym(n=&ngroup, h=1.2, i=&i, colors=&colors, symbols=&symbols ); proc gplot data=&data ; plot &y * &x = &group / annotate=&out nolegend frame vaxis=axis1 vminor=0 haxis=axis2 hminor=0 name="&name"; axis1 offset=(3) value=(h=1.5) label=(h=1.5 a=90 r=0); axis2 offset=(3) value=(h=1.5) label=(h=1.5); run; %end; %done: ; %mend contour; /*----------------------------------------------------* | Macro to generate SYMBOL statement for each GROUP | *----------------------------------------------------*/ %macro gensym(n=1, h=1.5, i=none, symbols=%str(- + : $ = X _ Y), colors=BLACK RED GREEN BLUE BROWN YELLOW ORANGE PURPLE); %*-- note: only 8 symbols & colors are defined; %*-- revise if more than 8 groups (recycle); %local chr col k; %do k=1 %to &n ; %let chr =%scan(&symbols, &k,' '); %let col =%scan(&colors, &k, ' '); symbol&k h=&h v=&chr c=&col i=&i; %end; %mend gensym; /*-------------------------------------------------------------------* * CORRESP SAS Correspondence analysis of contingency tables * * * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 19 Jan 1990 15:23:09 * * Revised: 27 Jun 1991 11:48:09 * * Version: 1.0 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * * * *-------------------------------------------------------------------*/ %macro CORRESP( data=_LAST_, /* Name of input data set */ var=, /* Column variables */ id=, /* ID variable: row labels */ out=COORD, /* output data set for coordinates */ anno=LABEL, /* name of annotate data set for labels */ rowht=1, /* height for row labels */ colht=1 /* height for col labels */ ); /*------------------------------------------* | IML routine for Correspondence Analysis | *------------------------------------------*/ Proc IML; Start CORRESP(F,RowId,Vars); I = nrow(F); J = ncol(F); R = F[,+]; * Row totals; C = F[+,]; * Col totals; N = F[+]; * Grand total; E = R * C / N; * Expected frequencies; D = (F - E) / sqrt(E); * Standardized deviates; D = D / sqrt( N ); DPD = D` * D; Inertia = trace(DPD); Chisq = N * Inertia; * Total chi-square; DF = (I-1)*(J-1); reset noname; Print 'Overall Association', CHISQ[colname={'ChiSq'}] DF[colname={DF} format=6.0]; call eigen(values, vectors, dpd); k = min(I,J)-1; * number of non-zero eigenvalues; values = values[1:k]; cancorr = sqrt(values); * singular values = Can R; chisq = n * values ; * contribution to chi-square; percent = 100* values / inertia; *-- Cumulate by multiplying by lower triangular matrix of 1s; tri= (1:k)`*repeat(1,1,k) >= repeat(1,k,1)*(1:k) ; cum = tri*percent; print 'Singular values, Inertia, and Chi-Square Decomposition',, cancorr [colname={' Singular Values'} format=9.4] values [colname={'Principal Inertias'} format=9.4] chisq [colname={' Chi- Squares'} format=9.3] percent [colname={'Percent'} format=8.2] cum [colname={'Cum % '} format=8.2]; L = values[1:2]; U = vectors[,1:2]; Y = diag(1/sqrt(C/N)) * U * diag(sqrt(L)); X = diag(N/R) * (F / N) * Y * diag(sqrt(1/L)); Print 'Row Coordinates' , X [Rowname=RowId Colname={DIM1 DIM2}]; Print 'Column Coordinates', Y [Rowname=Vars Colname={DIM1 DIM2}]; OUT = X // Y; ID = RowId // Vars`; * Call PGRAF(OUT,ID,'Dimension 1', 'Dimension 2', 'Row/Col Association'); TYPE = repeat({"OBS "},I,1) // repeat({"VAR "},J,1); ID = concat(TYPE, ID); Create &out from OUT[rowname=ID colname={"Dim1" "Dim2"}]; Append from OUT[rowname=ID]; Finish; Use &data; Read all VAR {&var} into F [Rowname=&id Colname=Vars]; Run CORRESP(F,&id,Vars); quit; /*----------------------------------* | Split ID into _TYPE_ and _NAME_ | *----------------------------------*/ data &out; set &out; drop id; length _type_ $3 _name_ $16; _type_ = scan(id,1); _name_ = scan(id,2); proc print data=&out; id _type_ _name_; /*--------------------------------------------------* | Annotate row and column labels | *--------------------------------------------------*/ data &anno; set &out; length function $8 text $16; xsys='2'; ysys='2'; text = _name_; style='DUPLEX'; x = dim1; y = dim2; if _type_ = 'OBS' then do; size= &rowht ; color='BLACK'; position='5'; function='LABEL '; output; end; if _type_ = 'VAR' then do; color='RED '; size= &colht; if dim1 >=0 then position='6'; /* left justify */ else position='4'; /* right justify */ function='LABEL '; output; end; %mend CORRESP; /*-------------------------------------------------------------------* * CORRESP2 SAS Correspondence analysis of contingency tables * * **** USE WITH VERSION 6 ONLY **** * * * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 19 Jan 1990 15:23:09 * * Revised: 27 Jun 1991 11:48:09 * * Version: 1.0 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * * * *-------------------------------------------------------------------*/ %macro CORRESP( data=_LAST_, /* Name of input data set */ var=, /* Column variables */ id=, /* ID variable: row labels */ out=COORD, /* output data set for coordinates */ anno=LABEL, /* name of annotate data set for labels */ rowht=1, /* height for row labels */ colht=1 /* height for col labels */ ); /*------------------------------------------* | IML routine for Correspondence Analysis | *------------------------------------------*/ Proc IML; Start CORRESP(F,RowId,Vars); I = nrow(F); J = ncol(F); R = F[,+]; * Row totals; C = F[+,]; * Col totals; N = F[+]; * Grand total; E = R * C / N; * Expected frequencies; D = (F - E) / sqrt(E); * Standardized deviates; D = D / sqrt( N ); DPD = T(D) * D; Inertia = trace(DPD); Chisq = N * Inertia; * Total chi-square; DF = (I-1)*(J-1); reset noname; C1={"ChiSq"}; C2={"DF"}; Print 'Overall Association', CHISQ[colname=C1] DF[colname=C2 format=6.0]; call eigen(values, vectors, dpd); k = min(I,J)-1; * number of non-zero eigenvalues; values = values[1:k]; cancorr = sqrt(values); * singular values = Can R; chisq = n * values ; * contribution to chi-square; percent = 100* values / inertia; *-- Cumulate by multiplying by lower triangular matrix of 1s; tri= T(1:k)*repeat(1,1,k) >= repeat(1,k,1)*(1:k) ; cum = tri*percent; C1={'Singular Values'}; C2={'Inertias'}; C3={'Chi-Squares'}; C4={'Percent'}; C5={' Cum % '}; print 'Singular values, Inertia, and Chi-Square Decomposition',, cancorr [colname=C1 format=9.4] values [colname=C2 format=9.4] chisq [colname=C3 format=9.3] percent [colname=C4 format=8.2] cum [colname=C5 format=8.2]; L = values[1:2]; U = vectors[,1:2]; Y = diag(1/sqrt(C/N)) * U * diag(sqrt(L)); X = diag(N/R) * (F / N) * Y * diag(sqrt(1/L)); D2={'DIM1' 'DIM2'}; Print 'Row Coordinates' , X [Rowname=RowId Colname=D2]; Print 'Column Coordinates', Y [Rowname=Vars Colname=D2]; OUT = X // Y; ID = RowId // T(Vars); * Call PGRAF(OUT,ID,'Dimension 1', 'Dimension 2', 'Row/Col Association'); TYPE = repeat({"OBS "},I,1) // repeat({"VAR "},J,1); ID = concat(TYPE, ID); Create &out from OUT[rowname=ID colname={"Dim1" "Dim2"}]; Append from OUT[rowname=ID]; Finish; Use &data; Read all VAR {&var} into F [Rowname=&id Colname=Vars]; Run CORRESP(F,&id,Vars); quit; /*----------------------------------* | Split ID into _TYPE_ and _NAME_ | *----------------------------------*/ data &out; set &out; drop id; length _type_ $3 _name_ $16; _type_ = substr(id,1,3); _name_ = substr(id,5); proc print data=&out; id _type_ _name_; /*--------------------------------------------------* | Annotate row and column labels | *--------------------------------------------------*/ data &anno; set &out; length function $8 text $16; xsys='2'; ysys='2'; text = _name_; style='DUPLEX'; x = dim1; y = dim2; if _type_ = 'OBS' then do; size= &rowht ; color='BLACK'; position='5'; function='LABEL '; output; end; if _type_ = 'VAR' then do; color='RED '; size= &colht; if dim1 >=0 then position='6'; /* left justify */ else position='4'; /* right justify */ function='LABEL '; output; end; %mend CORRESP; /*-------------------------------------------------------------------* * DENSITY SAS Nonparametric density estimates from a sample. * * * * User chooses a bandwidth parameter to balance smoothness and bias * * and the range of the data over which the density is to be fit. * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 23 Mar 1989 16:21:12 * * Revised: 11 Jun 1991 12:05:09 * * Version: 1.0 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * * * *-------------------------------------------------------------------* * Original program by: C. ROGER LONGBOTHAM * * while at Rockwell International, Rocky Flats Plant * * From: SAS SUGI 12, 1987, 907-909. * *-------------------------------------------------------------------*/ %macro DENSITY( data=_LAST_, /* Name of input data set */ out=DENSPLOT, /* Name of output data set */ var=X, /* Input variable (numeric) */ window=, /* Bandwidth (H) */ xfirst=., /* . or any real; smallest X value */ xlast=., /* . or any real; largest X value */ xinc=. ); /* . or value>0; X-value increment */ /* Default: (XLAST-XFIRST)/60 */ data _in_; set &data; keep &var; if &var ^= .; proc sort data=_in_; by &var; proc iml; start WINDOW; *-- Calculate default window width; mean = xa[+,]/n; css = ssq(xa - mean); stddev = sqrt(css/(n-1)); q1 = floor(((n+3)/4) || ((n+6)/4)); q1 = (xa[q1,]) [+,]/2; q3 = ceil(((3*n+1)/4) || ((3*n-2)/4)); q3 = (xa[q3,]) [+,]/2; quartsig = (q3 - q1)/1.349; h = .9*min(stddev,quartsig) * n##(-.2); * Silvermans formula; finish; start INITIAL; *-- Translate parameter options; if xf=. then xf=xa[1,1]; if xl=. then xl=xa[n,1]; if xl <= xf then do; print 'Either largest X value chosen is too small'; print 'or all data values are the same'; stop; end; if dx=. | dx <= 0 then do; inc = (xl-xf)/60; rinc = 10 ## (floor(log10(inc))-1); dx = round(inc,rinc); end; if xf=xa[1,1] then xf=xf-dx; nx = int((xl-xf)/dx) + 3; finish; *-- calculate density at specified x values; start DENSITY; fnx = j(nx,3,0); vars = {"DENSITY" "&VAR" "WINDOW"}; create &out from fnx [colname=vars]; sigmasqr = .32653; * scale constant for kernel ; gconst = sqrt(2*3.14159*sigmasqr); nuh = n*h; x = xf - dx; do i = 1 to nx; x = x + dx; y = (j(n,1,x) - xa)/h; ky = exp(-.5*y#y / sigmasqr) / gconst; * Gaussian kernel; fnx[i,1] = sum(ky)/(nuh); fnx[i,2] = x; end; fnx[,3] = round(h,.001); append from fnx; finish; *-- Main routine ; use _in_; read all var "&var" into xa [colname=invar]; n = nrow(xa); %if &window=%str() %then %do; run window; %end; %else %do; h = &window ; %end; xf = &xfirst; xl = &xlast; dx = &xinc; run initial; run density; close &out; quit; %mend DENSITY; /*-------------------------------------------------------------------* * DOTPLOT SAS Macro for dot charts * * * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 14 May 1989 09:12:26 * * Revised: 25 Sep 1991 16:39:06 * * Version: 1.0 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * * * *-------------------------------------------------------------------*/ %macro dotplot( data=_LAST_, /* input data set */ xvar=, /* horizontal variable (response) */ xorder=, /* plotting range of response */ xref=, /* reference lines for response variable */ yvar=, /* vertical variable (observation label) */ ysortby=&xvar, /* how to sort observations */ ylabel=, /* label for y variable */ group=, /* vertical grouping variable */ gpfmt=, /* format for printing group variable */ /* value (include the . at the end) */ connect=DOT, /* draw lines to ZERO, DOT, AXIS, or NONE */ dline=2, /* style of horizontal lines */ dcolor=BLACK, /* color of horizontal lines */ errbar=, /* variable giving length of error bar */ /* for each observation */ name=DOTPLOT); /* Name for graphic catalog entry */ %if &yvar= %str() %then %do; %put DOTPLOT: Must specify y variable; %goto ENDDOT; %end; %let connect=%upcase(&connect); %if &ylabel = %str() %then %let ylabel=%upcase(&yvar); %global nobs vref; /*--------------------------------------------------* | Sort observations in the desired order on Y axis | *--------------------------------------------------*/ %if &group ^= %str() OR &ysortby ^= %str() %then %do; proc sort data=&data; by &group &ysortby; %end; /*-----------------------------------------------------* | Add Sort_Key variable and construct macro variables | *-----------------------------------------------------*/ data _dot_dat; set &data; %if &group = %str() %then %do; %let group= _GROUP_; _group_ = 1; %end; run; data _dot_dat; set _dot_dat end=eof; retain vref ; drop vref; length vref $60; by &group; sort_key + 1; call symput( 'val' || left(put( sort_key, 3. )), trim(&yvar) ); output; /* output here so sort_key is in sync */ if _n_=1 then vref=''; if last.&group & ^eof then do; sort_key+1; vref = trim(vref) || put(sort_key, 5.); call symput('val'|| left(put(sort_key, 3.)), ' ' ); end; if eof then do; call symput('nobs', put(sort_key, 4.)); call symput('vref', trim(vref)); end; run; %if &nobs=0 %then %do; %put DOTPLOT: Data set &data has no observations; %goto ENDDOT; %end; %makefmt(&nobs); /*---------------------------------------------------* | Annotate data set to draw horizontal dotted lines | *---------------------------------------------------*/ data _dots_; set _dot_dat; by &group; length function $ 8 text $ 20; text = ' '; %if &connect = ZERO %then %str(xsys = '2';) ; %else %str(xsys = '1';) ; ysys = '2'; line = &dline; color = "&dcolor"; y = sort_key; x = 0; function ='MOVE'; output; function ='DRAW'; %if &connect = DOT | &connect = ZERO %then %do; xsys = '2'; x = &xvar; output; %end; %else %if &connect = AXIS %then %do; function='POINT'; do x = 0 to 100 by 2; output; end; %end; %if &group ^= _GROUP_ %then %do; if first.&group then do; xsys = '1'; x = 98; size=1.5; function = 'LABEL'; color='BLACK'; position = 'A'; %if &gpfmt ^= %str() %then %str(text = put(&group, &gpfmt ) ;) ; %else %str(text = &group ;) ; output; end; %end; %if &errbar ^= %str() %then %do; data _err_; set _dot_dat; xsys = '2'; ysys = '2'; y = sort_key; x = &xvar - &errbar ; function = 'MOVE '; output; text = '|'; function = 'LABEL'; output; x = &xvar + &errbar ; function = 'DRAW '; output; function = 'LABEL'; output; data _dots_; set _dots_ _err_; %end; /*-----------------------------------------------* | Draw the dot plot, plotting formatted Y vs. X | *-----------------------------------------------*/ proc gplot data= _dot_dat ; plot sort_key * &xvar /vaxis=axis1 vminor=0 haxis=axis2 frame name="&name" %if &vref ^= %str() %then vref=&vref ; %if &xref ^= %str() %then href=&xref lhref=21 chref=red ; annotate=_dots_; label sort_key="&ylabel"; format sort_key _yname_.; symbol1 v='-' h=1.4 c=black; axis1 order=(1 to &nobs by 1) label=(f=duplex) major=none value=(j=r f=simplex); axis2 %if %length(&xorder)>0 %then order=(&xorder) ; label=(f=duplex) offset=(1); run; %enddot: %mend dotplot; /*-----------------------------------------* | Macro to generate a format of the form | | 1 ="&val1" 2="&val2" ... | | for observation labels on the y axis. | *-----------------------------------------*/ %macro makefmt(nval); %if &sysver < 6 & "&sysscp"="CMS" %then %do; x set cmstype ht; /* For SAS 5.18 on CMS, must */ x erase _yname_ text *; /* erase format so that dotplot */ x set cmstype rt; /* can be used more than once */ %end; /* in a single SAS session */ %local i ; proc format; value _yname_ %do i=1 %to &nval ; &i = "&&val&i" %end; ; %mend makefmt; /*-------------------------------------------------------------------* * LOWESS SAS Locally weighted robust scatterplot smoothing * * * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 21 Apr 1989 09:21:55 * * Revised: 11 Jun 1991 12:10:16 * * Version: 1.3 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * * * *-------------------------------------------------------------------*/ %macro LOWESS( data=_LAST_, /* name of input data set */ out=SMOOTH, /* name of output data set */ x = X, /* name of independent variable */ y = Y, /* name of Y variable to be smoothed */ id=, /* optional row ID variable */ f = .50, /* lowess window width */ iter=2, /* total number of iterations */ plot=NO, /* draw the plot? */ name=LOWESS); /* name for graphic catalog entry */ proc sort data=&data; by &x; proc iml; start WLS( X, Y, W, B, I ); *-- Weighted least squares; x = j(nrow(x), 1, 1) || x; xpx = x` * diag( w ) * x; xpy = x` * diag( w ) * y; if abs(det(xpx)) > .00001 then b = inv(xpx) * xpy; else do; b = (y[loc(w^=0)])[:] // { 0 } ; print 'Singular matrix for observation', I; end; finish; start MEDIAN( W, M); * calculate median ; n = nrow( W ); R = rank( W ); i = int((n+1)/2); i = i || n-i+1; M = W[ R[i] ]; M = .5 # M[+]; finish; start ROBUST( R, WTS); * calculate robustness weights; run median(abs(R), M); W = R / (6 # M); * bisquare function; WTS = (abs(W) < 1) # (1 - W##2) ## 2; finish; start LOWESS( X, Y, F, STEPS, YHAT, RES, DELTA); n = nrow(X); if n < 2 then do; yhat = y; return; end; q = round( f * n); * # nearest neighbors; res = y; yhat = J(n,1,0); delta= J(n,1,1); * robustness weights; if steps <= 0 then steps=1; do it = 1 to steps; do i = 1 to n; dist = abs( x - x[i] ); * distance to each other pt; r = rank( dist ); s = r; s[r]=1:n; near = s[1:q] ; * find the q nearest; nx = x [ near ]; ny = y [ near ]; d = dist[ near[q] ]; * distance to q-th nearest; if d > 0 then do; u = abs( nx - x[i] ) / d ; wts = (u < 1) # (1 - u##3) ## 3; * neighborhood wts; wts = delta[ near ] # wts; if sum(wts[2:q]) > .0001 then do; run wls( nx, ny, wts, b, i ); yhat[i] = (1 || x[i]) * b; * smoothed value; end; else yhat[i] = y[i]; end; else do; yhat[i] = ny [+] /q; end; end; res = y - yhat; run robust(res,delta); end; finish; *-- Main routine; use &data; %if &id.NULL=NULL %then %let rowid=; %else %let rowid=rowname=&id; read all var{&x &y} into xy[ colname=vars &rowid ]; close &data; x = xy[,1]; y = xy[,2]; run lowess(x, y, &f, &iter, yhat, res, weight); xyres =x || y || yhat || res || weight; cname = vars || {"_YHAT_" "_RESID_" "_WEIGHT_" }; print "Data, smoothed fit, residuals and weights", xyres[ colname=cname &rowid ]; *-- Output results to data set &out ; xys = yhat || res || weight; cname = {"_YHAT_" "_RESID_" "_WEIGHT_" }; create &out from xys [ colname=cname ]; append from xys; quit; /*--------------------------------------------* | Merge data with smoothed results. | | (In a data step to retain variable labels) | *--------------------------------------------*/ data &out; merge &data(keep=&x &y &id) &out ; label _yhat_ = "Smoothed &y" _weight_='Lowess weight'; %if %upcase(&PLOT)=YES %then %do; proc gplot data=&out ; plot &y * &x = 1 _yhat_ * &x = 2 / overlay frame vaxis=axis1 haxis=axis2 name="&name" ; symbol1 v=+ h=1.5 i=none c=black; symbol2 v=none i=join c=red; axis1 label=(h=1.5 f=duplex a=90 r=0) value=(h=1.3); axis2 label=(h=1.5 f=duplex) value=(h=1.3); run; %end; %mend LOWESS; /*-------------------------------------------------------------------* * LOWESS2 SAS Locally weighted robust scatterplot smoothing * * **** USE WITH VERSION 6 ONLY **** * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 21 Apr 1989 09:21:55 * * Revised: 11 Jun 1991 12:10:16 * * Version: 1.3 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * * * *-------------------------------------------------------------------*/ %macro LOWESS( data=_LAST_, /* name of input data set */ out=SMOOTH, /* name of output data set */ x = X, /* name of independent variable */ y = Y, /* name of Y variable to be smoothed */ id=, /* optional row ID variable */ f = .50, /* lowess window width */ iter=2, /* total number of iterations */ plot=NO, /* draw the plot? */ name=LOWESS); /* name for graphic catalog entry */ proc sort data=&data; by &x; proc iml; start WLS( X, Y, W, B, I ); *-- Weighted least squares; x = j(nrow(x), 1, 1) || x; xpx = t(x) * diag( w ) * x; xpy = t(x) * diag( w ) * y; if abs(det(xpx)) > .00001 then b = inv(xpx) * xpy; else do; b = (y[loc(w^=0)])[:] // { 0 } ; print 'Singular matrix for observation', I; end; finish; start MEDIAN( W, M); * calculate median ; n = nrow( W ); R = rank( W ); i = int((n+1)/2); i = i || n-i+1; M = W[ R[i] ]; M = .5 # M[+]; finish; start ROBUST( R, WTS); * calculate robustness weights; run median(abs(R), M); W = R / (6 # M); * bisquare function; WTS = (abs(W) < 1) # (1 - W##2) ## 2; finish; start LOWESS( X, Y, F, STEPS, YHAT, RES, DELTA); n = nrow(X); if n < 2 then do; yhat = y; return; end; q = round( f * n); * # nearest neighbors; res = y; yhat = J(n,1,0); delta= J(n,1,1); * robustness weights; if steps <= 0 then steps=1; do it = 1 to steps; do i = 1 to n; dist = abs( x - x[i] ); * distance to each other pt; r = rank( dist ); s = r; s[r]=1:n; near = s[1:q] ; * find the q nearest; nx = x [ near ]; ny = y [ near ]; d = dist[ near[q] ]; * distance to q-th nearest; if d > 0 then do; u = abs( nx - x[i] ) / d ; wts = (u < 1) # (1 - u##3) ## 3; * neighborhood wts; wts = delta[ near ] # wts; if sum(wts[2:q]) > .0001 then do; run wls( nx, ny, wts, b, i ); yhat[i] = (1 || x[i]) * b; * smoothed value; end; else yhat[i] = y[i]; end; else do; yhat[i] = ny [+] /q; end; end; res = y - yhat; run robust(res,delta); end; finish; *-- Main routine; use &data; %if &id.NULL=NULL %then %let rowid=; %else %let rowid=rowname=&id; read all var{&x &y} into xy[ colname=vars &rowid ]; close &data; x = xy[,1]; y = xy[,2]; run lowess(x, y, &f, &iter, yhat, res, weight); xyres =x || y || yhat || res || weight; cname = vars || {"_YHAT_" "_RESID_" "_WEIGHT_" }; print "Data, smoothed fit, residuals and weights", xyres[ colname=cname &rowid ]; *-- Output results to data set &out ; xys = yhat || res || weight; cname = {"_YHAT_" "_RESID_" "_WEIGHT_" }; create &out from xys [ colname=cname ]; append from xys; quit; /*--------------------------------------------* | Merge data with smoothed results. | | (In a data step to retain variable labels) | *--------------------------------------------*/ data &out; merge &data(keep=&x &y &id) &out ; label _yhat_ = "Smoothed &y" _weight_='Lowess weight'; %if %upcase(&PLOT)=YES %then %do; proc gplot data=&out ; plot &y * &x = 1 _yhat_ * &x = 2 / overlay frame vaxis=axis1 haxis=axis2 name="&name" ; symbol1 v=+ h=1.5 i=none c=black; symbol2 v=none i=join c=red; axis1 label=(h=1.5 f=duplex a=90 r=0) value=(h=1.3); axis2 label=(h=1.5 f=duplex) value=(h=1.3); run; %end; %mend LOWESS; /*-------------------------------------------------------------------* * NQPLOT SAS SAS macro for normal quantile-comparison plot * * * * minimal syntax: %nqplot (data=dataset,var=variable); * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 16 Feb 1989 21:43:25 * * Revised: 11 Jun 1991 12:12:55 * * Version: 1.0 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * * * *-------------------------------------------------------------------*/ %macro nqplot ( data=_LAST_, /* input data set */ var=x, /* variable to be plotted */ out=nqplot, /* output data set */ mu=MEDIAN, /* est of mean of normal distribution: */ /* MEAN, MEDIAN or literal value */ sigma=HSPR, /* est of std deviation of normal: */ /* STD, HSPR, or literal value */ stderr=YES, /* plot std errors around curves? */ detrend=YES, /* plot detrended version? */ lh=1.5, /* height for axis labels */ anno=, /* name of input annotate data set */ name=NQPLOT, /* name of graphic catalog entries */ gout=); /* name of graphic catalog */ %let stderr=%UPCASE(&stderr); %let sigma=%UPCASE(&sigma); %let detrend=%UPCASE(&detrend); %if &sigma=HSPR %then %let sigma=HSPR/1.349; %if &anno^=%str() %then %let anno=ANNOTATE=&anno; %if &gout^=%str() %then %let gout=GOUT=&gout; data pass; set &data; _match_=1; if &var ne . ; * get rid of missing data; proc univariate noprint; * find n, median and hinge-spread; var &var; output out=n1 n=nobs median=median qrange=hspr mean=mean std=std; data n2; set n1; _match_=1; data nqplot; merge pass n2; drop _match_; by _match_; proc sort data=nqplot; by &var; run; data &out; set nqplot; drop sigma hspr nobs median std mean ; sigma = σ _p_=(_n_ - .5)/nobs; * cumulative prob.; _z_=probit(_p_); * unit-normal Quantile; _se_=(sigma/((1/sqrt(2*3.1415926))*exp(-(_z_**2)/2))) *sqrt(_p_*(1-_p_)/nobs); * std. error for normal quantile; _normal_= sigma * _z_ + &mu ; * corresponding normal quantile; _resid_ = &var - _normal_; * deviation from normal; _lower_ = _normal_ - 2*_se_; * +/- 2 SEs around fitted line; _upper_ = _normal_ + 2*_se_; _reslo_ = -2*_se_; * +/- 2 SEs ; _reshi_ = 2*_se_; label _z_='Normal Quantile' _resid_='Deviation From Normal'; run; /*- proc plot; plot &var * _z_='*' _normal_ * _z_='-' _lower_ * _z_='+' _upper_ * _z_='+' / overlay; * observed and fitted values; plot _resid_ * _z_='*' _reslo_ * _z_='+' _reshi_ * _z_='+' / overlay vref=0; * deviation from fitted line; run; -*/ proc gplot data=&out &anno &gout ; plot &var * _z_= 1 _normal_ * _z_= 2 %if &stderr=YES %then %do; _lower_ * _z_= 3 _upper_ * _z_= 3 %end; / overlay frame vaxis=axis1 haxis=axis2 hminor=1 vminor=1 name="&name" ; %if &detrend=YES %then %do; plot _resid_ * _z_= 1 %if &stderr=YES %then %do; _reslo_ * _z_= 3 _reshi_ * _z_= 3 %end; / overlay vaxis=axis1 haxis=axis2 vref=0 frame hminor=1 vminor=1 name="&name" ; %end; %let vh=1; *-- value height; %if &lh >= 1.5 %then %let vh=1.5; %if &lh >= 2.0 %then %let vh=1.8; symbol1 v=+ h=1.1 i=none c=black l=1; symbol2 v=none i=join c=blue l=3 w=2; symbol3 v=none i=join c=green l=20; axis1 label=(f=duplex a=90 r=0 h=&lh) value=(h=&vh); axis2 label=(f=duplex h=&lh) value=(h=&vh); run; %mend; /*-------------------------------------------------------------------* * OUTLIER SAS Robust multivariate outlier detection * * * * Macro to calculate robust Mahalanobis distances for each * * observation in a dataset. The results are robust in that * * potential outliers do not contribute to the distance of any * * other observations. * * * * The macro makes one or more passes through the data. Each * * pass assigns 0 weight to observations whose DSQ value * * has prob < PVALUE. The number of passes should be determined * * empirically so that no new observations are trimmed on the * * last pass. * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 16 Jan 1989 18:38:18 * * Revised: 11 Jun 1991 12:16:31 * * Version: 1.0 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * * * *-------------------------------------------------------------------*/ %macro OUTLIER( data=_LAST_, /* Data set to analyze */ var=_NUMERIC_, /* input variables */ id=, /* ID variable for observations */ out=CHIPLOT, /* Output dataset for plotting */ pvalue=.1, /* Prob < pvalue -> weight=0 */ passes=2, /* Number of passes */ print=YES); /* Print OUT= data set? */ /*-------------------------------------------------------* | Add WEIGHT variable. Determine number of observations | | and variables, and create macro variables. | *-------------------------------------------------------*/ data in; set &data end=lastobs; array invar{*} &var; _weight_ = 1; /* Add weight variable */ if ( lastobs ) then do; call symput('NOBS', _n_); call symput('NVAR', left(put(dim(invar),3.)) ); end; %do pass = 1 %to &PASSES; %if &pass=1 %then %let in=in; %else %let in=trimmed; /*--------------------------------------------------------------* | Transform variables to scores on principal components. | | Observations with _WEIGHT_=0 are not used in the calculation,| | but get component scores based on the remaining observations.| *--------------------------------------------------------------*/ proc princomp std noprint data=&in out=prin; var &var; freq _weight_; /*-----------------------------------------------------------* | Calculate Mahalanobis D**2 and its probability value. For | | standardized principal components, D**2 is just the sum | | of squares. Output potential outliers to separate dataset.| *-----------------------------------------------------------*/ data out1 (keep=pass case &id dsq prob) trimmed (drop=pass case ); set prin ; pass = &pass; case = _n_; dsq = uss(of prin1-prin&nvar); /* Mahalanobis D**2 */ prob = 1 - probchi(dsq, &nvar); _weight_ = (prob > &pvalue); output trimmed; if _weight_ = 0 then do; output out1 ; end; run; proc append base=outlier data=out1; %end; proc print data=outlier; title2 'Observations trimmed in calculating Mahalanobis distance'; /*------------------------------------------* | Prepare for Chi-Square probability plot. | *------------------------------------------*/ proc sort data=trimmed; by dsq; data &out; set trimmed; drop prin1 - prin&nvar; _weight_ = prob > &pvalue; expected = 2 * gaminv(_n_/(&nobs+1), (&nvar/2)); %if &print=yes %then %do; proc print data=&out; %if &id ^=%str() %then %str(id &id;); title2 'Possible multivariate outliers have _WEIGHT_=0'; %end; %if &ID = %str() %then %let SYMBOL='*'; %else %let SYMBOL=&ID; proc plot data=&out; plot dsq * expected = &symbol expected * expected = '.' /overlay hzero vzero; title2 'Chi-Squared probability plot for multivariate outliers'; run; %done: proc datasets nofs nolist; delete outlier out1; %mend outlier; /*-------------------------------------------------------------------* * PARTIAL SAS - IML macro program for partial regression residual * * plots. (Version 5 only) * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 23 Jan 1989 16:11:15 * * Revised: 9 Jul 1991 14:37:40 * * Version: 1.0 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * * * *-------------------------------------------------------------------*/ %macro PARTIAL( data = _LAST_, /* name of input data set */ yvar =, /* name of dependent variable */ xvar =, /* list of independent variables */ id =, /* ID variable */ label=INFL, /* label ALL, NONE, or INFLuential obs */ out =, /* output data set: partial residuals */ gout=gseg, /* name of graphic catalog */ name=PARTIAL); /* name of graphic catalog entries */ %let label = %UPCASE(&label); Proc IML; start axes (xa, ya, origin, len, labels); col= 'BLACK'; ox = origin[1,1]; oy = origin[1,2]; call gxaxis(origin,len,xa,7) color=col format='7.1'; call gyaxis(origin,len,ya,7) color=col format='7.1'; xo = ox + len/2 - length(labels[1])/2; yo = oy - 8; call gscript(xo,yo,labels[1]) color=col; xo = ox - 12; yo = oy + len; if nrow(labels)>1 | ncol(labels)>1 then call gscript(xo,yo,labels[2]) angle=270 rotate=90 color=col; finish; *-----Find partial residuals for each variable-----; start partial(x, y, names, obs, uv, uvname ); k = ncol(x); n = nrow(x); yname = names[,k+1]; k1= k + 1; *-- number of predictors; x = j( n , 1 , 1) || x; *-- add column of 1s; name1 = { 'INTCEPT'}; names = name1 || names[,1:k]; *----- module to fit one regression ----------; start reg (y, x, b, res, yhat, h, rstudent); n = nrow(x); p = ncol(x); xpx = x` * x; xpy = x` * y; xpxi= inv(xpx); b = xpxi * xpy; yhat= x * b; res = y - yhat; h = vecdiag(x * xpxi * x`); sse = ssq(res); sisq= j(n,1,sse) - (res##2) / (1-h); sisq= sisq / (n-p-1); rstudent = res / sqrt( sisq # (1-h) ); finish; run reg( y, x, b, res, yhat, hat, rstudent ); print "Full regression"; print "Regression weights" , b[ rowname=names ]; lev = hat > 2*k1/n; if any( lev ) then do; l = loc(lev)`; xl= x[l ,]; Print "High leverage points", L XL [colname=names ]; end; flag = lev | (rstudent > 2); do i = 1 to k1; name = names[,i]; reset noname; free others; do j = 1 to k1; if j ^=i then others = others || j; end; run reg( y, x[, others], by, ry, fy, hy, sry ); run reg( x[,i],x[, others], bx, rx, fx, hx, srx ); uv = uv || ry ||rx; uvname = uvname || concat({'U'},name) || concat({'V'},name); if i>1 then do; /**--------------------------------** | Start IML graphics | **--------------------------------**/ %if &sysver < 6 %then %do; %let lib=%scan(&gout,1,'.'); %let cat=%scan(&gout,2,'.'); %if &cat=%str() %then %do; %let cat=&lib; %let lib=work; %end; call gstart gout={&lib &cat} name="&name" descript="Partial regression plot for &data"; %end; %else %do; /* Version 6 */ call gstart("&gout"); call gopen("&name",1,"Partial regression plot for &data"); %end; labels = concat( {"Partial "}, name ) || concat( {"Partial "}, yname ) ; run axes(rx,ry,{15 15}, 75, labels) ; call gpoint(rx,ry) color = 'BLACK'; *-- Draw regression line from slope; xs = rx[<>] // rx[><]; ys = b[i] * xs; call gdraw(xs, ys, 3, 'BLUE'); *-- Mark influential points and large residuals; %if &label ^= NONE %then %do; outy = ry[ loc(flag) ]; outx = rx[ loc(flag) ]; outl = obs[ loc(flag) ]; call gpoint(outx, outy,19) color ='RED'; call gtext(outx,outy,outl) color ='RED'; %end; %if &label = ALL %then %do; outy = ry[ loc(^flag) ]; outx = rx[ loc(^flag) ]; outl = obs[ loc(^flag) ]; call gtext(outx,outy,outl) color ='BLACK'; %end; call gshow; end; end; print "Partial Residuals", uv[ colname=uvname]; finish; /* end of partial */ *-----read the data and prepare partial regression plots----; use &data; %if &id ^= %str() %then %do; read all var{&xvar} into x[ rowname=&id colname=xname ]; %end; %else %do; read all var{&xvar} into x[ colname=xname ]; %let id = obs; obs = char(1:nrow(x),3,0); %end; read all var{&yvar } into y[ colname=yname ]; names = xname || yname; run partial(x, y, names, &id, uv, uvname); %if &out ^= %str() %then %do; create &out from uv; append from uv; %end; quit; %mend PARTIAL; /*-------------------------------------------------------------------* * PARTIAL2 SAS- IML macro program for partial regression residual * * plots. * * ***** USE IN VERSION 6 ONLY ***** * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 23 Jan 1989 16:11:15 * * Revised: 9 Jul 1991 14:37:40 * * Version: 1.0 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * * * *-------------------------------------------------------------------*/ %macro PARTIAL( data = _LAST_, /* name of input data set */ yvar =, /* name of dependent variable */ xvar =, /* list of independent variables */ id =, /* ID variable */ label=INFL, /* label ALL, NONE, or INFLuential obs */ out =, /* output data set: partial residuals */ gout=gseg, /* name of graphic catalog */ name=PARTIAL); /* name of graphic catalog entries */ %let label = %UPCASE(&label); Proc IML worksize=100; start axes (origin, len, labels, xa, ya); col= 'BLACK'; ox = origin[1,1]; oy = origin[1,2]; /*---obtain min and max---*/ xmin=min(xa); xmax=max(xa); ymin=min(ya); ymax=max(ya); /*---define viewport and window for the application---*/ p=(origin[1]||origin[2])//((origin[1]+len)||(origin[2]+len)); call gport(p); w=(xmin||ymin)//(xmax||ymax); call gwindow(w); call gxaxis(xmin||ymin,xmax-xmin,10,,,'7.1',,,col,'n'); call gyaxis(xmin||ymin,ymax-ymin,8,,,'7.1',,,col,'n'); /*---restore to default window and viewport for axes labels---*/ call gport({0,0,100,100}); call gwindow({0,0,100,100}); xo = ox + len/2 - length(labels[1])/2; yo = oy - 8; call gscript(25,95,'Partial regression residuals', ,,1.5,'complex','BLACK'); call gscript(xo,yo,labels[1],,,,,col); xo = ox - 5; yo = oy + len; if nrow(labels)>1 | ncol(labels)>1 then call gscript(xo-5,yo,labels[2],-90,90,,,col); p=(origin[1]||origin[2])//((origin[1]+len)||(origin[2]+len)); call gport(p); w=(xmin||ymin)//(xmax||ymax); call gwindow(w); finish; *-----Find partial residuals for each variable-----; start partial(x, y, names, obs, uv, uvname ); k = ncol(x); n = nrow(x); yname = names[,k+1]; k1= k + 1; *-- number of predictors; x = j( n , 1 , 1) || x; *-- add column of 1s; name1 = { 'INTCEPT'}; names = name1 || names[,1:k]; *----- module to fit one regression ----------; start reg (y, x, b, res, yhat, h, rstudent); n = nrow(x); p = ncol(x); xpx = t(x) * x; xpy = t(x) * y; xpxi= inv(xpx); b = xpxi * xpy; yhat= x * b; res = y - yhat; h = vecdiag(x * xpxi * x`); sse = ssq(res); sisq= j(n,1,sse) - (res##2) / (1-h); sisq= sisq / (n-p-1); rstudent = res / sqrt( sisq # (1-h) ); finish; run reg( y, x, b, res, yhat, hat, rstudent ); print "Full regression"; print "Regression weights" , b[ rowname=names ]; lev = hat > 2*k1/n; if any( lev ) then do; l = loc(lev)`; xl= x[l ,]; Print "High leverage points", L XL [colname=names ]; end; flag = lev | (rstudent > 2); do i = 1 to k1; name = names[,i]; reset noname; free others; do j = 1 to k1; if j ^=i then others = others || j; end; run reg( y, x[, others], by, ry, fy, hy, sry ); run reg( x[,i],x[, others], bx, rx, fx, hx, srx ); uv = uv || ry ||rx; uvname = uvname || concat({'U'},name) || concat({'V'},name); if i>1 then do; /**--------------------------------** | Start IML graphics | **--------------------------------**/ %if &sysver < 6 %then %do; %let lib=%scan(&gout,1,'.'); %let cat=%scan(&gout,2,'.'); %if &cat=%str() %then %do; %let cat=&lib; %let lib=work; %end; call gstart gout={&lib &cat} name="&name" descript="Partial regression plot for &data"; %end; %else %do; /* Version 6 */ call gstart("&gout"); call gopen("&name",1,"Partial regression plot for &data"); %end; labels = concat( {"Partial "}, name ) || concat( {"Partial "}, yname ) ; run axes({10 10}, 75, labels, rx, ry) ; call gpoint(rx,ry,,'BLACK'); *-- Draw regression line from slope; xs = rx[<>] // rx[><]; ys = b[i] * xs; call gdraw(xs, ys, 3, 'BLUE'); *-- Mark influential points and large residuals; %if &label ^= NONE %then %do; outy = ry[ loc(flag) ]; outx = rx[ loc(flag) ]; outl = obs[ loc(flag) ]; call gpoint(outx, outy,,'RED'); call gtext(outx,outy,outl,'RED'); %end; %if &label = ALL %then %do; outy = ry[ loc(^flag) ]; outx = rx[ loc(^flag) ]; outl = obs[ loc(^flag) ]; call gtext(outx,outy,outl,'BLACK'); %end; call gshow; end; end; print "Partial Residuals", uv[ colname=uvname]; finish; /* end of partial */ *-----read the data and prepare partial regression plots----; use &data; %if &id ^= %str() %then %do; read all var{&xvar} into x[ rowname=&id colname=xname ]; %end; %else %do; read all var{&xvar} into x[ colname=xname ]; %let id = obs; obs = char(1:nrow(x),3,0); %end; read all var{&yvar } into y[ colname=yname ]; names = xname || yname; run partial(x, y, names, &id, uv, uvname); %if &out ^= %str() %then %do; create &out from uv; append from uv; %end; quit; %mend PARTIAL; /*-------------------------------------------------------------------* * SCATMAT SAS Construct a scatterplot matrix - all pairwise * * plots for n variables. * * * * %scatmat(data=, var=, group=); * *-------------------------------------------------------------------* * Author: Michael Friendly * * Created: 4 Oct 1989 11:07:50 * * Revised: 24 May 1991 17:33:41 * * Version: 1.1 * * From ``SAS System for Statistical Graphics, First Edition'' * * Copyright(c) 1991 by SAS Institute Inc., Cary, NC, USA * *-------------------------------------------------------------------*/ %macro SCATMAT( data =_LAST_, /* data set to be plotted */ var =, /* variables to be plotted */ group=, /* grouping variable (plot symbol) */ symbols=%str(- + : $ = X _ Y), colors=BLACK RED GREEN BLUE BROWN YELLOW ORANGE PURPLE, gout=GSEG); /* graphic catalog for plot matrix */ options nonotes dquote; proc greplay igout=gseg nofs; delete _all_; quit; *-- Parse variables list; %let var = %upcase(&var); data _null_; set &data (obs=1); %if %index(&var,-) > 0 or "&var"="_NUMERIC_" %then %do; * find the number of variables in the list and convert shorthand variable list to long form; length _vname_ $ 8 _vlist_ $ 200; array _xx_ &var; _vname_ = ' '; do over _xx_; call vname(_xx_,_vname_); if _vname_ ne "&group" then do; nvar + 1; if nvar = 1 then startpt = 1; else startpt = length(_vlist_) + 2; endpt = length(_vname_); substr(_vlist_,startpt,endpt) = _vname_; end; end; call symput( 'VAR', _vlist_ ); %end; %else %do; * find the number of variables in the list; nvar = n(of &var) + nmiss(of &var); %end; call symput('NVAR',trim(left(put(nvar,2.)))); RUN; %if &nvar < 2 or &nvar > 10 %then %do; %put Cannot do a scatterplot matrix for &nvar variables ; %goto DONE; %end; /*----------------------------------------------------* | Determine grouping variable and plotting symbol(s) | *----------------------------------------------------*/ %if &group = %str() %then %do; %let NGROUPS=1; %let plotsym=1; /* SYMBOL for data panels */ %let plotnam=2; /* for variable name panel */ %end; %else %do; %let plotsym=&group; *-- How many levels of group variable? --; proc freq data = &data; tables &group / noprint out=_DATA_; data _null_; set end=eof; ngroups+1; if eof then do; call symput( 'NGROUPS', put(ngroups,3.) ); end; run; %let plotnam=%eval(&ngroups+1); %end; %gensym(n=&ngroups, ht=&nvar, symbols=&symbols, colors=&colors); goptions NODISPLAY; * device dependent; title h=0.1 ' '; %let plotnum=0; * number of plots made; %let replay = ; * replay list; %do i = 1 %to &nvar; /* rows */ %let vi = %scan(&var , &i ); proc means noprint data=&data; var &vi; output out=minmax min=min max=max; %do j = 1 %to &nvar; /* cols */ %let vj = %scan(&var , &j ); %let plotnum = %eval(&plotnum+1); %let replay = &replay &plotnum:&plotnum ; %*put plotting &vi vs. &vj ; %if &i = &j %then %do; /* diagonal panel */ data title; length text $8; set minmax; xsys = '1'; ysys = '1'; x = 50; y = 50; text = "&vi"; size = 2 * &nvar; function = 'LABEL'; output; x = 6; y = 6; position = '6'; text = left(put(min, best6.)); size = &nvar; output; x = 95; y = 95; position = '4'; text = trim(put(max, best6.)); size = &nvar; output; proc gplot data = &data; plot &vi * &vi = &plotnam / frame anno=title vaxis=axis1 haxis=axis1; axis1 label=none value=none major=(h=-%eval(&nvar-1)) minor=none offset=(2); run; %end; %else %do; /* off-diagonal panel */ proc gplot data = &data; plot &vi * &vj = &plotsym / frame nolegend vaxis=axis1 haxis=axis1; axis1 label=none value=none major=none minor=none offset=(2); run; %end; %end; /* cols */ %end; /* rows */ goptions DISPLAY; * device dependent; %macro TDEF(nv, size, shift ); %* ---------------------------------------------------------------; %* Generate a TDEF statement for a scatterplot matrix ; %* Start with (1,1) panel in upper left, and copy it across & down; %* ---------------------------------------------------------------; %local i j panl panl1 lx ly; TDEF scat&nv DES="scatterplot matrix &nv x &nv" %let panl=0; %let lx = &size; %let ly = %eval(100-&size); %do i = 1 %to &nv; %do j = 1 %to &nv; %let panl = %eval(&panl + 1); %if &j=1 %then %do; %if &i=1 %then %do; %* (1,1) panel; &panl/ ULX=0 ULY=100 URX=&lx URY=100 LLX=0 LLY=&ly LRX=&lx LRY=&ly %end; %else %do; %* (i,1) panel; %let panl1 = %eval(&panl - &nv ); &panl/ copy= &panl1 xlatey= -&shift %end; %end; %else %do; %let panl1 = %eval(&panl - 1); &panl/ copy= &panl1 xlatex= &shift %end; %end; %end; %str(;); %* end the TDEF statement; %mend TDEF; proc greplay igout=gseg gout=&gout nofs template=scat&nvar tc=templt ; %if &nvar = 2 %then %do; TDEF scat2 DES="scatterplot matrix 2x2" 1/ ULX=0 ULY=100 URX=52 URY=100 LLX=0 LLY=52 LRX=52 LRY=52 2/ copy=1 XLATEX= 48 /* Panels are numbered: */ 3/ copy=1 XLATEY=-48 /* 1 2 */ 4/ copy=3 XLATEX= 48; /* 3 4 */ %end; %if &nvar = 3 %then %TDEF(&nvar,34,33); %if &nvar = 4 %then %TDEF(&nvar,25,25); %if &nvar = 5 %then %TDEF(&nvar,20,20); %if &nvar = 6 %then %TDEF(&nvar,17,16); %if &nvar = 7 %then %TDEF(&nvar,15,14); %if &nvar = 8 %then %TDEF(&nvar,13,12); %if &nvar = 9 %then %TDEF(&nvar,12,11); %if &nvar =10 %then %TDEF(&nvar,10,10); TREPLAY &replay; %DONE: options notes; %mend SCATMAT; /*----------------------------------------------------* | Macro to generate SYMBOL statement for each GROUP | *----------------------------------------------------*/ %macro gensym(n=1, ht=1.5, symbols=%str(- + : $ = X _ Y), colors=BLACK RED GREEN BLUE BROWN YELLOW ORANGE PURPLE); %*-- note: only 8 symbols & colors are defined; %*-- revise if more than 8 groups (recycle); %local chr col k; %do k=1 %to &n ; %let chr =%scan(&symbols, &k,' '); %let col =%scan(&colors,