ISUW, version JUN2009 ======================================================================== This is ISUWHELP.TXT, the complete ISUW "on-line" manual. ======================================================================== -------------------------------------------------------------------------------- HELP ON HELP F1 (from the command field or the program editor) displays the help pages. F1 once more (from the help pages) displays a list of ISU commands. Select help on a specific command by the cursor up/down arrows and Return. Ctrl-F1 from the command field displays help on the present command. Shift-F1 disables/enables "hint mode" where hints on active keys are displayed in a small pop-up window at the mouse cursor. In the help pages, search for a word or phrase by Ctrl-F. Repeat the search by Return or Ctrl-L. The search is forwards from the cursors position, ignores case, interpretes new line as blank and ignores multiple blanks. If you want to have a hardcopy of the "on-line-manual", print all pages by Ctrl-P (88 pages of 72 lines). But wait, it is much easier to search for commands and phrases "on-line"; and later, perhaps, print out selected sections (Select by Shift and the cursor arrows, then press Ctrl-P). We suggest that you start softly with a printout of the article "Introduction to ISUW". -------------------------------------------------------------------------------- HOW TO RUN ISUW INTERACTIVELY - a brief summary. Notes on installation and directory structure are given in a later section approximately 18 screenfuls below (search for "install"). ISUW is essentially mouse free. You can use the mouse to resize or move the ISUW window, and it works as usual when you are editing or selecting from a menu. But a general principle behind the design of ISUW is that there should not be more on the screen than necessary at any time. The buttons that control ISUW are not placed on the screen, they can be found on the keyboard where they are a lot easier to hit. No rule without exceptions, and a useful exception is this. When you place the mouse cursor in a window on the screen, a small message with summary hints (in particular concerning the keys you can use) will pop up. At some point, this "hint mode" becomes more irritating than useful. Shift-F1 can be used to switch it off and on. In the entry dialog for selection of working directory, move around in the directory tree by the cursor arrows. The Up/Down arrows work in the obvious way, the Left/Right arrows shift between "show siblings" and "show children" (and the space bar can be used for both). Press Return when the desired working directory is highlighted. If you want to create a new working directory, press Escape instead of Return, then edit the selected directory name and press Return. Notice that the working directory must be a subdirectory of the ISUW root directory (probably C:\ISUW). An easy way of getting started is to select DEMOS as your working directory and run some of the demonstration programs placed there. To select the same working directory as last time - which is what you do most of the time - this dialog can be skipped by Return. In general, the Escape key is used when you leave a window or dialog box. Sometimes, in particular in dialogs, you can also leave by Return. In this case Return means "perform action", if there is anything to perform, Escape means "leave without performing". In interactive mode, commands are written line by line in the command field in the bottom of ISUW's front page and executed when Return is pressed. In addition to standard editing keys, the following keys are active. Escape clears the command line. Escape from an empty command line allows you to QUIT. Ctrl-Return truncates the command line from the cursors position. Cursor Up/Down allows you to reuse (and reedit) earlier commands. Cursor Right/Left with Ctrl completes/replaces command names lexographically, when the cursor is in the first connected word starting in position 1. Cursor Right/Left with Ctrl completes/replaces vector names lexographically, when the cursor is in a connected word starting after position 1. In addition you can define your own shortkeys, see the description of the KEYS command. When a command is written from position 1 of the line, the command name is completed automatically as soon as it is unique. Also, some standard beginnings of commands are completed (I -> INCLUDE, LI -> LIST, OP -> OPEN, SA -> SAVE, SK -> SKIP). This means that the keys you must use to write INCLUDEALL are actually IA (here you can even use A), whereas to write SKIPLINE you must use SKL. You will soon learn this, it is impossible to write anything else than a command from position 1. You can also import a command to the command window from the command list associated with the help pages. Press F1 F1, select command with Cursor Up/Down or the mouse, and press BackSpace or LeftArrow. With a blank in position 1, the command line is interpreted as a COMPUTE command. The general syntax is commandname [parameter1 [parameter2 [...]]] Thus, parameters are in general separated by blanks. In special cases (PLOT, TABULATE, ... ) a "pseudo blank" |, which can be written by the ½ key in the upper left corner of the keyboard, is used for subdivision of parameters. When a command produces output, this is shown in a (light green) preview window. Leave this window by Return if you want output appended to the sessions output file, Escape if you don't. Press Ctrl-P to print it out. Use the command SHOW (without parameters) to look at the sessions output file. From the corresponding window you can also use Ctrl-P to print out the whole output file or a selected (marked) portion of it. It is possible, and often more convenient, to write an ISUW program (i.e. a sequence of ISUW commands, each occupying a line) and execute it by a RUN command, or directly from the program editor by pressing F9. See the command descriptions for EDIT and RUN. In this case all output is written to the output file without any previewing. ******************************************************************************** INTRODUCTION. The following approximately 17 screenfuls give an overview of ISUW. It is roughly identical to the material you can find in the article "Introduction to ISUW", see my homepage ezlearn.cbs.dk/stat/hamat-2/tt/. To learn about the possibilities, read this and try simultaneously to perform some simple operations. You can leave the help file by Escape and return to the same position by F1. After this comes the detailed descriptions of all commands. To find the description of a command, write it in the command line and press Ctrl F1. Or just press Ctrl-F1 from an empty command line or press F1 twice to get a menu for selection (by Return) of command. -------------------------------------------------------------------------------- A SIMPLE EXAMPLE. Suppose we have an ASCII (plain text) file EX1.TXT of the form Dose Response 0.968 0 0.909 1 ... 1.689 1 0.524 0 consisting of a heading and 415 lines, each containing a value of a covariate x and a binary response y. This could be data from an experiment where 415 animals have been given a dose x of some drug, y being the binary response, e.g. 1 for reaction, 0 for no reaction. The following commands read this data set and fits a standard logit linear model with the base 10 logarithm of x as the independent variable, then fits the model with slope zero and performs the likelihood-ratio test for this hypothesis ("no drug effect"). VAR X Y 415 { declares the variates to hold data } OPEN EX1.TXT { opens the data file for input } SKIPLINE { skips the heading line } READ X Y { reads the two variates in parallel } COMPUTE LOG10X=LN(X)/LN(10) { computes the log10 transform of X } FITLOGIT Y=1+LOG10X { fits the logistic regression model } LISTPARAMETERS { lists parameter estimates } FITLOGIT Y=1 { fits the reduced model } TEST { tests last against previous model } -------------------------------------------------------------------------------- VARIATES AND FACTORS. The basic structures in StatUnit are called vectors. A vector can be a factor (an array of bytes, for storage of qualitative variables) or a variate (an array of single precision real numbers, 7-8 significant digits). Variates and factors are created by the commands VARIATE and FACTOR. In addition to this, variates are often declared implicitly, for example by COMPUTE commands, like LOG10X in the example above. EXAMPLE. To declare two variates X and Y of length 100, simply write VAR X Y 100 Notice that we have written VAR, not VARIATE. Actually V would have been enough here. In ISUW, any command can be truncated as long as it is unambigous. When written from first position of the command window in interactive mode, the command is simply completed as soon as it is recognised. A value of a variate can be missing (which internally means that it has the value -1.0E-37). Missing values are recognised by most commands and treated appropriately as such. A factor has, apart from its length, a property called its number of levels, an integer from 1 to 255 which specifies the maximal level allowed. This is given as an additional parameter in the declaration. EXAMPLE. To declare a factor SEX of length 100 on 2 levels, write FACTOR SEX 100 2 Names of vectors can be of length up to 8. The first character must be a letter A..Z or an underbar _ , the remaining characters can also be digits 0..9. Actually, the special characters #, &, %, $ and @ can also be used (even as first characters), but we recommend that you do not do so because these characters are sometimes used for special purposes by ISUW. For example, vectors starting with a dollar sign have the special property that they are always deleted at exit from a program (!) Vector names are case insensitive, in output from ISUW they are usually written in capitals. At most 255 vectors can be in use simultaneously. Vectors can be removed from memory (to save capacity or to release their names) by the command DELETE, and their names can be changed by the command RENAME. The command RAMSTATUS displays a list of vectors present and the space they occupy. -------------------------------------------------------------------------------- INPUT FROM TEXT FILES. The OPENINFILE, READ, SKIPITEM and SKIPLINE commands are designed for input from ASCII (plain text) files in free format. EXAMPLE. An ISUW program dealing with a data set of 178 units to be read from a file A:\HEIGHTS.DAT might begin something like this. FACTOR SEX 178 2 VARIATE AGE HEIGHT 178 FACTOR GROUP 178 6 OPEN A:\HEIGHTS.DAT READ * SEX AGE HEIGHT GROUP This READ command assumes that the file has the data in standard format, like 001 1 23.1 178.2 1 002 2 43.6 173.1 4 ... where the first unit (or here, person) is a male (SEX=1), of age 23.1, etc. etc. The only separators allowed (when nothing else is specified) are blanks, newline symbols (and, in fact, any control characters in the range 0-31), commas and semicolons. The asterix in the READ command implies that the first item for each unit (here the unit or line number) is skipped. The READ command above assumes that factor levels are represented by their numerical levels. If this is not the case, an equality sign followed by a comma separated list of level names can be appended to the factor name. For example, READ * SEX=*,Male,Female AGE HEIGHT GROUP would work if the file looked like this: 001 Male 23.1 178.2 1 002 Female 43.6 173.1 4 ... 114 * 32.9 167.0 2 ... with level 1 of SEX coded as 'Male', level 2 as 'Female' and level 0 ("the missing level") as *. Values of variates must be in standard format (like 1.2, -0.22, +2.0E7). The symbol * is recognised as a missing value (for variates only). -------------------------------------------------------------------------------- LISTING IN ASCII FORMAT. This is done by the commands LIST and LIST1. LIST is for parallel listing of vectors (usually of equal lengths). LIST1 is for listing of single vectors across the page. For both commands, formats can be used to determine width and number of digits after the decimal point. With LIST it is also possible to write factor levels as names. -------------------------------------------------------------------------------- COMMENTS ON THE OUTPUT FILE. To write a comment to the output file, use the command REMARK. -------------------------------------------------------------------------------- DATA STORAGE in an internal binary file format is handled by the commands SAVEDATA and GETDATA. In their simplest form, these commands are used to dump and restore all vectors present in an ISUW session. The command SHOW can display the contents of a data set without importing it, with indication of potential name conflicts. ISUW data sets are files with extension .SUD - do not try to edit them or handle them with other tools than ISUW (or the DOS version ISU). -------------------------------------------------------------------------------- GRAPHICS. The commands PLOT and HISTOGRAM are used for graphics. PLOT produces scatter plots (one variate against another). Colors and plot symbols can be chosen according to the levels of factors. Points can be connected by lines as desired, and overlayed plots can be produced. HISTOGRAM produces histograms for variates (or factors), optionally parallel histograms grouped by the levels of a factor. Headings and axis titles are controlled by the commands FRAMETEXT, XTEXT and YTEXT. Without these specifications reasonable default texts (variate names etc.) are used. Graphics can be saved in JPEG format (*.JPG) or as bitmap (*.BMP) files, and thereafter imported to text handling programs (like MicroSoft Word or Word Perfect) or image processing programs. Hardcopies can be printed as PostScript files (see the descriptions of commands OPENPSFILE, PSFRAME and CLOSEPSFILE). Interactively rotatable 3-d graphics can be produced by PLOT and HISTOGRAM, by specification of an extra variate or factor. -------------------------------------------------------------------------------- PARALLEL SORTING OF VECTORS is perfomed by the command SORT. -------------------------------------------------------------------------------- RESTRICTIONS. In most applications, data are given as a rectangular data set, i.e. a number of variates and factors of the same length, which is the number of "records" or "experimental units" or "patients" or "persons" or "runs" or "plots" or whatever, depending on the applied context. We shall use the word units. To restrict attention to a subset of the units set, use the commands EXCLUDE, INCLUDE, INCLUDEALL, FOCUSONLEVEL, EXCLUDELEVEL and EXCLUDEMISSING. These commands control a hidden array of booleans (all TRUE from the beginning), telling which units are "present". All ISUW commands for which this is relevant obey restrictions, in the sense that only units present are taken into account. For example, model fit commands and COMPUTE commands act only on the subset of data specified as "present". WARNING. Restrictions act in parallel on all vectors, independently of their lengths. Parallel restrictions on vectors of different lengths are usually meaningless. Be careful - use INCLUDEALL as soon as restrictions are no longer required. Special care should be taken in connection with SORT, SAVEDATA, TABULATE and the short form of TRANSFER - see the command descriptions. For convenience, the important command INCLUDEALL can be executed by pressing A from an empty command window. -------------------------------------------------------------------------------- COMPUTATIONS. Unit-by-unit computations are performed by COMPUTE. For example, if P is a variate of length 100 with values between 0 and 1, and you want to create another variate LOGIT_P of length 100 holding its logit transformed values, write COMPUTE LOGIT_P=LN(P)-LN(1-P) If LOGIT_P is not previously declared, it will be declared as a variate of length 100. If it is declared before, it must be a variate of length 100. Values of P that are not in the range ]0,1[ will result in a missing value of LOGIT_P, and a warning about this is written to the output file. Factors can also be handled in this way, and many transformations that are not just "unit by unit" are also possible. See the description of the COMPUTE command. A COMPUTE command without a left hand side simply displays the result. For example, to display the mean and standard deviations of (the values in) a variate X, just write COMPUTE mean(X) COMPUTE sqrt(variance(X)) For convenience, the COMPUTE command can be generated from an empty command window by pressing the key + (without an equality sign) or = (including the equality sign). Moreover, if the line starts with a blank it is interpreted as a COMPUTE command. -------------------------------------------------------------------------------- OTHER WAYS OF ASSIGNING VALUES/LEVELS TO VARIATES/FACTORS. GENERATE assigns systematically varying levels to a factor. For example, if G is a factor of length 20 on 4 levels, the command GENERATE G 3 will assign levels 1 1 1 2 2 2 3 3 3 4 4 4 1 1 1 2 2 2 3 3 to the factor. The level changes cyclically, the last parameter (here 3) determining the lag between change points. GROUP is used for construction of a factor by interval grouping of a variate. TRANSFER can be used to copy subvector into subvector. For example, if X is a vector of length 100, to split it into two vectors of length 50, write VARIATE X1 X2 50 TRANSFER X 1 50 X1 1 50 TRANSFER X 51 100 X2 1 50 TRANSFER can also be used to copy the values/levels present in a vector into a new vector of the appropriate length. -------------------------------------------------------------------------------- SUMMARIES, TABLES, TABULAR SUMMATION. SUMMARY displays summary descriptions of variates or factors. ONEWAYTABLE produces one-way tables of counts for factors (number of units for each level) or variates (counts of values in specified intervals). Two- or Threewaytables of counts of units or sums of a given variate over level combinations for two or three factors are produced by the commands TWOWAYTABLE and THREEWAYTABLE. The command TABULATE performs counting (of units) or summation (of variate values) over the cells of a cross classification determined by an arbitrary number of factors. For convenience, these commands can be generated by a single key from an empty command window as follows. Press key to write command 0 SUMMARY 1 ONEWAYTABLE 2 TWOWAYTABLE 3 THREEWAYTABLE 4 TABULATE -------------------------------------------------------------------------------- STATISTICAL MODELS. FITLINEARNORMAL is for analysis of variance and regression. FITLOGLINEAR is for multiplicative or log-linear models for Poisson or multinomial data. FITLOGITLINEAR is for logistic regression models for binary or binomial data. FITNONLINEAR is for a class of nonlinear regression models, including the generalised linear models with overdispersion, user-specified mean (=inverse link) and variance functions. FITCOXMODEL is for proportional hazards models by Cox's likelihood, optionally with right censoring, left truncation and stratification. FITMCLOGIT, FITMCPROBIT and FITMCCLOGLOG are for ordered categorical response models as described by P. McCullagh (JRSS B 42, 109-142), where the responses are assumed to be the result of a grouping with unknown cutpoints of continuous data from a linear position parameter model with error distribution logistic, normal or Compertz. FITCLOGIT is for conditional logistic regression (like in matched case-control studies). FITCRASCH is for a special case of this, the conditional Rasch model (two way logit-additive model for binary data by conditioning on the row sums, arbitrary linear structure for column parameters). FITNEGBIN is for log-linear models for negative binomial data, usually coming up as "Poisson data with over-dispersion". FITANOVA is for analysis of variance, including random effects models, in balanced orthogonal designs. After any model fit command except FITANOVA, the command LISTPARAMETERS produces a listing of the estimated parameters in the last model fitted, and the command SAVEFITTED can be used for extraction of fitted values, residuals and normed residuals from the last model fitted (whenever this makes sense, see the command descriptions). See also the command SAVENORMEDRESIDUALS, which can be used after FITLINEARNORMAL to compute exact T-distributed ("studentized") normed residuals. SAVEPARAMETERS saves parameter estimates and their estimated standard deviations as variates (except after FITANOVA). ESTIMATE outputs the estimates of specified linear combinations of the parameters and their estimated standard deviations. TESTMODELCHANGE can be used for computation of the likelihood ratio test (Chi-square or F) for model reduction after fit of two nested models by any FIT... command except FITANOVA. All model fit commands involve the concept of a model formula, i.e. a code for a linear expression involving linear effects of covariates, effects of factors, interactions between factors etc. This concept, which is more or less common to all statistics packages, is explained carefully in the description of FITLINEARNORMAL. WARNING. The design matrix determined by a model formula is not physically stored. What is kept is a code telling how to compute its elements from values or levels of existing variates and factors. Hence, commands referring to the last model fit use the actual values/levels of vectors occurring in the last model. If these have been changed, incorrect results will come out. If some of them have been deleted, the information that can be extracted is reduced accordingly. For example, if some of the independent variables (or an offset variable, if such was present) has been deleted, SAVEFITTED will not work. If the response variate has been deleted, SAVEFITTED will be able to produce fitted values, but not residuals and normed residuals. Similarly, if a weight variate has been deleted, only fitted values and residuals, but not normed residuals, can be produced. -------------------------------------------------------------------------------- TESTS, NON-PARAMETRICS, DESCRIPTIVE STATISTICS. The command WILCOXON performs a two-sample Wilcoxon or Mann-Whitney test. The command SPEARMAN computes Spearmans rank correlation and performs the test for "no ordinal correlation". The command BARTLETT performs Bartlett's test for variance homogeneity in a one-way setting. The command CORMAT writes the matrix of correlations for a set of variates, with optional indication of significances. -------------------------------------------------------------------------------- CALLING OTHER PROGRAMS. Other programs (or documents to be opened by applications determined by their file extensions) can be called directly from ISUW by a SHELL command. For example, to edit a file PRG.ISU with NotePad (if you prefer this to ISUW's own EDIT command), use the command SHELL NOTEPAD PRG.ISU -------------------------------------------------------------------------------- PROGRAMMING THE KEYBOARD. The function keys F2..F12, alone or in combination with Alt, Ctrl or Shift, and the keys A..Z and 0..9 in combination with Alt or Ctrl can (with certain exceptions that are used for other things) be programmed. For example, the command KEY ao show! will imply that the sessions output file is displayed whenever Alt-O is pressed (the exclamation sign means "Return"). Your programmed keys are automatically saved on exit and recovered at next startup under the same ISUW root directory. -------------------------------------------------------------------------------- ISUW PROGRAMS. ISUW commands can be written line by line on text files by commands of the form EDIT programname and executed by RUN programname Programs are saved on files with extension .ISU . Make sure that you include this extension in the program name if you use another editor than ISUW's own EDIT command. To avoid long lines in programs, you can split them up in pieces by the "continue line symbol" \. For example, rather than FITLINEAR Y = 1 + ROWS + COLUMNS + LAYERS + ROWS*COLUMNS + COLUMNS*LAYERS + ROWS*LAYERS it is usually preferable to write FITLINEAR Y = 1 \ + ROWS + COLUMNS + LAYERS \ + ROWS*COLUMNS + COLUMNS*LAYERS + ROWS*LAYERS Empty lines can be inserted and lines can be indented as desired to make the program more readable. Comments can be included in two ways: (1) Lines starting with a percentage sign % are ignored. (2) Text in curled parentheses {} within a line is ignored. As opposed to REMARKSs, such comments are not echoed to the output file. A primitive device for parameter substitution is also available, see the description of the command SUBSTITUTE. The command GOTO can be used for conditional branching and loops. In programs, command names (but not vector names) can be truncated. See the list below of shortest unique truncations. The command name COMPUTE can be omitted for COMPUTE commands with a left hand side. COMPUTE commands without a left hand side can be written with an equality sign '=' or a plus '+' as the first character of the expression to display. A more primitive device for program execution is implemented in the command OPENCOMMANDFILE. The effect of this command is that the commands on a file can be imported one by one to the command field by the CursorDown arrow. This can be used if you want to execute a sequence of commands, with the optional possibility of modifying the commands and inserting other commands. The demonstration programs on the directory DEMOS under the ISUW root directory are executed in this way (but with the additional feature that explanatory text is shown in a separate "supervisor" window). -------------------------------------------------------------------------------- INSTALLATION, DIRECTORY STRUCTURE AND CALL OF ISUW. To install ISUW, download (as you have probably already done) the file ISUWINST.EXE to an empty directory on your harddisk. We recommend C:\ISUW to keep file names short, but in principle a directory like C:\Program Files\Danish Mouse Free Software\Interactive StatUnit could also be used. Unpack this "selfunpacking" file by executing it, for example by double clicking on it from Windows Explore, or calling it by its name from the Run window or a command (MS-DOS or equivalent) window from the same directory. This operation results in the creation of four files, ISUW.EXE (the executable program) ISUWHELP.TXT (the file you are reading here) BORLNDMM.DLL (a Delphi system file required for memory management) DEMOS.EXE (a self-unpacking file containing the DEMOS files) After this ISUWINST.EXE can be deleted. The entire ISUW package takes up less than 2 MB of space on the harddisk. To install a new version of ISUW, overwriting the old version, simply repeat this. Here you can avoid four confirm-overwrite-dialogs by use of the option -o, i.e. by writing ISUWINST -o rather than just ISUWINST. In the following we refer to two directories: 1. The ISUW root directory. This is where the files SHORTKEY.TXT and SHORTKEY.BIN, holding your shortkey definitions, are kept, and the file LASTDIR holding the name of the last sessions working directory, your latest selection of size and position on the screen of the ISUW window and your choice of having "hint mode" on or off. You can have several ISUW root directories for different applications, if you wish. If you ever get the (probably rather useless) idea of having two or more ISUW sessions running at the same time, make sure you do it from different ISUW root directories, otherwise there will be file sharing conflicts. However, ISUW makes some initial file checks that will usually prevent this error. By default, the ISUW root directory becomes the directory where you unpacked the four system files. But the ISUW root directory does not have to be the directory where these files are located. You can (and should, if these files are placed e.g. on a read-only network drive) redefine this to any other directory by giving a valid directory name (including drive letter and colon) as the first parameter in the call of ISUW.EXE. 2. The working directory. This directory, which (with an exception to be mentioned below) must be a subdirectory (or sub-sub etc.) of the ISUW root directory, is selected (and sometimes created) from a dialog box when the session begins. Typically, you will use the same directory again and again, in this case you can skip the dialog by Return. The working directory is the place where all sorts of output is placed by default, and also the place where you will typically place data files before or during the session. The working directory becomes the current Windows directory throughout the session, which means that file names without a path specification refer to files on that directory. The working directory can be selected directly in the call of ISUW.EXE by specification of the full name (including drive letter and colon) of an existing directory as the second parameter in the call of ISUW.EXE. In this case the working directory does not have to be a subdirectory of the ISUW root directory. The entry dialog is skipped, and the file LASTDIR is left unchanged. Here, you can also extend the name of the desired working directory to the full name of an existing file with extension either ISU or SUD on that directory. In the first case, the ISU program is executed, in the second case the ISU data set is imported. This is useful if you want to set up your Windows computer in such a way that double clicking from Windows Explore on an ISU program or an ISU data set results in a startup of ISUW with the appropriate action. Write a command file - say ISUWBAT.BAT - containing a single line of the form C:\isuw\ISUW.EXE c:\isuw %1 and make this your Windows default application for opening .ISU and .SUD files (Click Tools -> Folder Options -> File Types from Windows Explore). If both the ISUW root and the working directory are specified as the first two parameters to ISUW.EXE, additional parameters can be added. These parameters should constitute a valid ISUW command, which will be copied to the command field. If, in addition, the last of these parameters ends with an exclamation sign, this command will be executed immediately after entry to the program. In this way you can build some standard initialization into the call of ISUW, like import of a data set, execution of an ISUW program defining some local shortkeys etc. EXAMPLE. On the computer I use at work, I have placed the ISUW system files on a directory named C:\DELPHI\ISUW1 because I am developing ISUW under Borland Delphi. However, my (only) ISUW root directory is C:\ISUW. Thus, the shortcut starting ISUW from my desktop has as its "target property" the command C:\DELPHI\ISUW1\ISUW.EXE C:\ISUW However, I have a main application of ISUW related to a course called MPAS. For that reason, I have another shortcut on my desktop with the command C:\DELPHI\ISUW1\ISUW.EXE C:\ISUW C:\ISUW\MPAS\06 in the target field, which means that I can go directly to this application without any entry dialog. Next year I am probably going to change 06 to 07 (after having created C:\ISUW\MPAS\07). In between, I use the form C:\DELPHI\ISUW1\ISUW.EXE C:\ISUW C:\ISUW\MPAS\06\DATA.SUD to load a certain data set automatically at startup. AUTOEXEC.ISU. Another way of specifying automatic initialization goes as follows. If an ISUW program named AUTOEXEC.ISU exists on a working directory, this program will be executed automatically at startup from that working directory. This works quite generally - i.e. also when the working directory is selected from the entry dialog. Uninstallation. In the very unlikely case that you want to uninstall ISUW, simply remove the root directory (or directories, if you have more than one) and whatever you want to get rid of among files you have created elsewhere by ISUW on working directories that are not subdirectories of the root directory. This can be tedious, of course, but not more tedious than the occasional garbage collection which is required anyway. ISUW does not make any hidden changes to your computers setup. The shortcuts that you may have created are easy to delete. -------------------------------------------------------------------------------- OUTPUT. ISUW writes output to a temporary file on the ISUW root directory named ISUWOUT.TMP. This is the file you look at in a somewhat modified form by the command SHOW. When an ISUW session ends you will be given the option of saving this file on the working directory under another name. If you answer "No" here, the output file is lost. In spite of some special effects created by the SHOW command (ISUW prompts beeing replaced with special colors of command lines, error messages and notes beeing printed in special colors, REMARKS in italics etc.), ISUW output files are ordinary text files which can be edited and printed e.g. by NotePad or imported to standard text handling programs. Whenever a command sent from the command window produces written output, this is shown in a light green preview window, which you leave by Return if you want output appended to ISUWOUT.TMP, Escape if you do not. For commands in a program executed by a RUN command the rule is different. Here, all output is written directly to ISUWOUT.TMP, unless you redirect it explicitely to another file (or the "paper basket" NUL) by an OUTFILE command. Commands and error messages are echoed to the output file by default. This means that the ISUW output file will contain a complete log of what has happended during the session. In some cases (for example to avoid echoes of the same GOTO loop again and again) you may prefer to switch this default off by the ECHO command. ******************************************************************************** Alphabetic list of ISUW commands The last column indicates (when it makes sense) whether a command does (+) or does not (-) take restrictions into account. For details, see the command description. Command Shortest Equivalent Restrictions truncation brief form Shortkey from empty command window BARTLETT B + CLOSEPSFILE CL COMPUTE COM+ = + = + CONSTRUCTMINIMUM CON - CORMAT COR + DELETE D ECHO EC EDIT ED ESTIMATE ES EXCLUDE EXCLUDE EXCLUDELEVEL EXCLUDEL EXCLUDEMISSING EXCLUDEM FACTOR FA FITANOVA FITA + FITCLOGIT FITCL + FITCOXMODEL FITCO + FITCRASCH FITCR + FITLINEARNORMAL FITLI + FITLOGLINEAR FITLOGL + FITLOGITLINEAR FITLOGI + FITMCCLOGLOG FITMCC + FITMCLOGIT FITMCL + FITMCPROBIT FITMCP + FITNEGBIN FITNE + FITNONLINEAR FITNO + FOCUSONLEVEL FO FRAMETEXT FR GENERATELEVELS GEN - GETDATA GET - GOTO GO - GROUP GR + HISTOGRAM H + INCLUDE INCLUDE I INCLUDEALL INCLUDEA A KEYS K LIST LIST L + LIST1 LIST1 L1 + LISTPARAMETERS LISTP ONEWAYTABLE ON 1 + OPENCOMMANDFILE OPENC OPENINFILE OPENI OPEN OPENPSFILE OPENP OUTFILE OU PLOT PL + PSFRAME PS QUIT Q RAMSTATUS RA - READ REA + REMARK REM RENAME REN RUN RU SAVEDATA SAVED SAVE + SAVEFITTED SAVEF SAVENORMEDRESIDUALS SAVEN + SAVEPARAMETERS SAVEP SHELL SHE SHOW SHO SKIPITEM SKIPI SKIPLINE SKIPL SORT SO - SPEARMAN SP + SUBSTITUTE SUB SUMMARY SUM 0 + TABULATE TA 4 + TESTMODELCHANGE TE THREEWAYTABLE TH 3 + TRANSFER TR -/+ TWOWAYTABLE TW 2 + VARIATE V WILCOXON W + XTEXT X YTEXT Y ******************************************************************************** ======================== COMMAND DESCRIPTIONS ======================== -------------------------------------------------------------------------------- VARIATE Declaration of variates. ................................................................................ Syntax: VARIATE name1 [name2 [...]] length Creates new variates named name1 ... of the same length. The length must be a positive integer or integer expression. EXAMPLE. VAR X Y 10*48 creates two variates X and Y of length 480. Variates are arrays or vectors of single precision real numbers. Formally, there is no upper limit to the length of variates and factors. Or, rather, the realistic limit is set by the computers RAM. But things will work rather slowly if the RAM is filled up. Depending on your Windows version, the computer may start using disk cache (which will slow it down to a speed where it is almost useless), or the session will crash. Another problem is that for single precision numbers greater than appr. 15 millions, rounding to integer values will not be correct. This means that for a variate X of length greater than 15 millions you can not use expressions like X(UNIT) for all possible values of UNIT. For most data sets this is no problem at all - but now you are warned. There is, however, an upper limit of 255 to the number of variates and factors that can be present simultaneously in an ISUW session. At declaration, all values are set to zero. WARNING. If an error occurs during execution, like in the command VAR X Y 1A Z 100 where 1A is an illegal variate name, the command is interrupted by an error message. However, the command is executed up to the place where the error is met. In the above example X and Y are declared, but not Z. -------------------------------------------------------------------------------- FACTOR Declaration of factors. ................................................................................ Syntax: FACTOR name1 [name2 [...]] length levels Creates new factors named name1 ... of given length and number of levels. The length and the number of levels are integer expressions, both positive. EXAMPLE. FAC SEX TREAT 132 2 FAC GROUP 132 6 creates three factors of length 132, SEX and TREAT on 2 levels and GROUP on 6 levels. Factors are stored as arrays of bytes. For this reason, the maximal number of levels is 255. At declaration, all levels (present or not) are set to zero. WARNING. See the warning to VARIATE (just above). -------------------------------------------------------------------------------- RAMSTATUS Writes information about existing vectors in memory. ................................................................................ No parameters. Writes information about the vectors present and the dynamically allocated memory they occupy. The information includes Names of variates and their lengths. Names of factors and their lengths and numbers of levels. The number of bytes occupied by each vector and totally. In addition, RAMSTATUS tells whether restrictions are present or not. -------------------------------------------------------------------------------- DELETE or DEL Deletes existing vectors. ................................................................................ Syntax: DELETE [name1 [name2 [...]]] Deletes existing vectors, releasing the space they occupy and their names. WARNING. If an error occurs during execution, like in the command DEL X Y 1ST Z where 1ST is an illegal variate name, the command is interrupted by an error message. However, the command is executed up to the place where the error is met. In the above example X and Y are deleted (if they both exist), but not Z. If the command is used without parameters, all vectors present are deleted. In addition, all restrictions are removed. If an input file, an alternative output file or a PostScript output file is open it is closed, information from last model fit is lost, and the parameters determining text for PLOT and HISTOGRAM are set to their defaults. In addition, if an OUTFILE command is in force, output is redirected back to the sessions output file, and command echo is set "on". Thus, a DELETE command without parameters is a sort of "reset" command. The only difference from closing the session and starting a new is that the output file is still there, and if DELETE is used in this way in a program after a SUBSTITUTE command the effects of that command are still in force. DELETE without parameters is very often useful as the first command in a program. So are the commands ECHO 0 and OUTFILE 0, if you want to avoid commands echoes and output from loops, or OUTFILE if you want to direct output to another file than the sessions output file (useful when a program is tested). It follows from what was said above, that the DELETE command must come before the two other commands (but a SUBSTITUTE command may be placed before it). -------------------------------------------------------------------------------- RENAME Gives an existing vector a new name. ................................................................................ Syntax: RENAME oldname newname oldname should, of course, be the name of an existing vector, and newname must be a valid vector name which is not in use. The command can be used to solve name coincidence conflicts before import of a data set. Use a SHOW command to see if such conflicts are present. -------------------------------------------------------------------------------- EXCLUDE Excludes specified units, marking them as "non-present". ................................................................................ Syntax: EXCLUDE range1 [range2 [...]] A range can be an integer expression or an expression of the form integer1:integer2 where integer1 and integer2 are integer expressions. In the last case, 0 < integer1 <= integer2 <= length of longest vector is required, and the units from integer1 to integer2 are excluded. EXAMPLE. To remove all units <= 10 and >= 91 write EXCLUDE 1:10 91:100 provided that the relevant vector length is 100. Ranges are handled one by one, and units for each range in the natural order. Thus, if (in the above situation, with 100 as the length of all existing vectors) you write EXCLUDE 1:10 91:101 an error would occur for unit 101. Nevertheless, the desired restrictions would actually be imposed. Whereas EXCLUDE 91:101 1:10 would remove only unit 91 to 100, but not 1 to 10 since this comes after the error interrupt. A range can also be specified as the name of a variate, or an expression that would be valid as a right hand side of a COMPUTE command. In this case, the units excluded are those for which the variate value is defined, non-missing and positive. For example, to exclude all units for which the variate X takes a value which is not in the interval [0,100], write EXCLUDE (X<0)+(X>100) or just (specifying two ranges) EXCLUDE X<0 X>100 Notice that missing values of X in this command, or more generally units for which the variate or expression is missing or results in a missing value, are not excluded. For example, EXCLUDE X>ln(0) has no effect, and no warning is given. WARNING. The command EXCLUDE 100 excludes unit 100, whereas EXCLUDE 100.2 excludes unit 1 (!). Since 100.2 is not interpretable as a range, it is assumed to be the right hand side of a compute statement, resulting in a variate of length 1 with the value 100.2. Similarly, EXCLUDE -3 results in an error message, whereas EXCLUDE -3.2 has no effect. Units that are already excluded are not touched by EXCLUDE. Thus, the two commands EXCLUDE SEX=0 EXCLUDE CODE=999 will do exactly the same as the single statement EXCLUDE SEX=0 CODE=999 -------------------------------------------------------------------------------- EXCLUDELEVEL Excludes all units on specified levels of a factor. ................................................................................ Syntax: EXCLUDELEVEL factor level1 [level2 [...]] EXAMPLE. To remove all units on level 0 or 2 of the factor SEX, write EXCLUDELEVEL SEX 0 2 As for EXCLUDE (see above), levels are handled one by one, and the command is interrupted with an error message if an error occurs. Thus, if SEX has 2 levels (which is the usual state of affairs), the command EXCLUDELEVEL SEX 0 5 2 would exclude level 0, but not level 2. -------------------------------------------------------------------------------- FOCUSONLEVEL Excludes all units that are not on a specified level of a factor. ................................................................................ Syntax: FOCUSONLEVEL factor level Equivalent to EXCLUDELEVEL ... (see above) where ... stands for the list of levels different from the level specified in the FOCUS command. EXAMPLE. To focus on the females in group 1, write something like FOCUSONLEVEL SEX 2 FOCUSONLEVEL GROUP 1 Equivalently, you could use EXCLUDE SEX<>2 GROUP<>1 -------------------------------------------------------------------------------- EXCLUDEMISSING Excludes units for which values of given variates are missing. ................................................................................ Syntax: EXCLUDEMISSING name1 [name2 [...]] This command is typically used before a model fit command. EXAMPLE. EXCLUDEMISSING HEIGHT WEIGHT EXCLUDELEVEL SEX 0 FITLINEARNORMAL WEIGHT=1+SEX+HEIGHT+SEX*HEIGHT The rules for error interrupts are similar to what has been said about EXCLUDE and EXCLUDELEVEL. -------------------------------------------------------------------------------- INCLUDE Includes specified units, marking them as "present". ................................................................................ Syntax: INCLUDE range1 [range2 [...]] Opposite to EXCLUDE. Units specified are included, units not specified are untouched. Ranges (and the rules for error interrupts) are explained a few screenfuls above under EXCLUDE. When a range is expressed as a valid right hand side of a COMPUTE command, the computation does, of course, take place also for non-present units - otherwise nothing would happen. -------------------------------------------------------------------------------- INCLUDEALL Includes all units, i.e. removes all restrictions. ................................................................................ No parameters. Reestablishes the initial state of affairs, where no restrictions are present. Use this command whenever restrictions are not required any more. From an empty command window, just press A. -------------------------------------------------------------------------------- OPENINFILE or just OPEN Opens a text file for input. ................................................................................ Syntax: OPEN [filename] ASCII files for input are handled by the commands READ, SKIPITEM and SKIPLINE, see below. If another file is already open for input it is closed. To close an input file without opening a new, use the command without parameters. This is often necessary if you want to EDIT an input file to correct errors detected by a READ command, because you will not be allowed to make changes to an input file while it is open. -------------------------------------------------------------------------------- READ Input of data from a text file. ................................................................................ Syntax: READ [[separators]] name1 [name2 [...]] Data are read in parallel from the file opened by OPEN (see above). Items on the file must be separated by blanks, newline symbols (or other characters in the range 0-32), commas or semicolons. Other separator characters can be specified, see approximately 4 screenfuls below. name1, name2 etc. must be names of existing vectors of equal lengths. The symbol '*' can be used for "skip next item", and '/' means "skip to start of next line". EXAMPLE. Suppose that A:\PROJECT.DAT contains 100 lines, beginning with 001 12.32 1.19 Male 009 1 002 11.15 1.23 Female 009 1 003 11.91 1.18 Female 004 1 ... To read columns 2 and 3 as variates named AGE and INC and column 4 as a factor SEX on 2 levels with Male and Female represented by levels 1 and 2, write OPEN A:\PROJECT.DAT VAR AGE INC 100 FAC SEX 100 2 READ * AGE INC SEX=,Male,Female / Notice the use of a list of level names. This is required when at least one factor level is coded as something else than its integer numerical level. Level names are case sensitive, i.e. 'male' instead of 'Male' would not work. In the example, the comma right after the equality sign means that level 0 is not given any name because it does not occur. If unknown sex occurred and was coded as *, we could write READ * AGE INC SEX=*,Male,Female / instead, to assign level 0 of SEX to the unknowns. The symbol '/' meaning "skip remainder of line", is only required if there is actually something to skip. A single slash '/' right at the point where a line is read has no effect; whereas two slashes with a blank between will imply that the every second line is skipped. For variates, standard format of numbers is assumed (exponential notation is allowed, like -1.2e3 instead of -1200). Decimal points must be periods, not commas. An asterix '*' or a period '.' will be interpreted as a missing value. Restrictions are obeyed by the READ command, in the sense that only the units present are read. This is useful if you have to piece together variates or factors of segments from different files. But in the standard situation, it means that you must remember to remove all restrictions by INCLUDEALL before you READ. An error (an invalid real number, a named level not in the list of level names or a numerical level out of range) results in an error message written to the output file, and the corresponding value/level is set to missing value/level zero. But the reading is not interrupted. If you want to correct errors that have to do with the data file, you must edit that file. Otherwise, if the error has to do with the READ command, you can correct it immediately and repeat it. But before this, the file must be OPENed again, otherwise reading continues from the position where it stopped. The same happens if reading finishes before the end of a file. In this case, a new READ command will continue from where the last ended. EXAMPLE. Suppose that a data file EXAMPLE.DAT contains the values of two variates of lengths 10 and 4 in the following obscure layout: The first six values of X are 1.2, 1.3, 2.l, 4.3, 5.0, 1.2; Here are the four values of Y: 34.1 32.2 45.0 23.6; Finally, the last four values of X are 5.4 1.7 1.9 3.3; And here comes some junk: 1234567890 You could read X and Y as follows: VAR X 10 VAR Y 4 OPEN EXAMPLE.DAT EXCLUDE 7:10 SKIPITEM 7 { or SKIPLINE } READ X INCLUDEALL SKIPITEM 7 { or SKIPLINE } READ Y EXCLUDE 1:6 SKIPITEM 8 { or SKIPLINE } READ X INCLUDEALL Here, you would receive a warning concerning the third value of X, READ ERROR: Invalid real number 2.l for variate X at unit 3 where a lower case L has been typed instead of 1, and the corresponding entry of X will contain a missing value. However, SHOW EXAMPLE.DAT would tell you what went wrong, and you could then correct the error by COMPUTE X(3)=2.1 A more permanent solution would be to edit EXAMPLE.DAT to correct the error, and then do the READing once more. However, since EXAMPLE.DAT is still open as an input file, you would not be allowed to save the changes. To do this you would have to close it first, which can be done by an OPEN command without parameters. For example, OPEN EDIT EXAMPLE.DAT {make the correction and save} OPEN EXAMPLE.DAT ... would work. As this example also illustrates, a data file must not necessarily end exactly where the reading terminates. In the tail of the file you can keep e.g. a description of data. An attempt to read through the end of an input file is, of course, an error. The reading is interrupted, an error message is given, and all values/levels not read yet are left unchanged (except for the value/level that was read when the EOF mark was met; this may or may not be set to missing value/level zero, depending on some circumstances around the termination of the file). If a data file uses other separators than blank, newline, comma and semicolon, you can specify this by adding a bracket containing these additional separator characters as the first argument of the READ command. For example, to read from a file where the characters / , [ and ] should be interpreted as blanks, use READ [/[]] ... Notice that such additional separating characters are only in force within the READ command where they are specified, e.g. not in a following (or preceding) SKIPITEM command. Notice also that separating characters must not occur in level names for factors. Fixed format files (with data in fixed positions, no delimiters) must be edited or handled by other tools. -------------------------------------------------------------------------------- SKIPITEM Skips next item on a data file. ................................................................................ Syntax: SKIPITEM [integer] The next "integer" items on the file opened by OPEN (see above) are skipped. if the integer parameter is missing, 1 is assumed. This command is useful if a data file contains headings or other comments. EXAMPLE. Suppose that A:\PROJECT.DAT contains 100 lines plus a "header line", beginning with NO AGE INC SEX 001 12.32 1.19 Male 002 11.15 1.23 Female 003 11.91 1.18 Female ... To read column 2 and 3 as variates named AGE and INC, column 3 as a factor SEX on 2 levels with 1 and 2 coded as Male and Female, write VAR AGE INC 100 FAC SEX 100 2 OPEN A:\PROJECT.DAT SKIPITEM 4 { or SKIPLINE } READ * AGE INC SEX=,Male,Female -------------------------------------------------------------------------------- SKIPLINE Skips to beginning of next line on a data file. ................................................................................ Syntax: SKIPLINE [integer] Without the parameter, SKIPLINE reads through the present line, to the beginning of the next. If a line has just been read through the next will be skipped, otherwise the remainder of the present line is skipped. If an integer parameter is given, this operation is simply performed "integer" times, i.e. the remainder of the present line and the next "integer"-1 whole lines are skipped. Whereas SKIPITEM interpretes newline symbols as delimiters and thus skips as many empty lines as necessary to reach the items to be skipped, SKIPLINE counts also empty lines. For example, if a data file begins with 4 empty lines followed by a line consisting of two variable names, these 5 lines can be skipped either by SKIPITEM 2 or SKIPLINE 5 -------------------------------------------------------------------------------- LIST or L Parallel print of data. ................................................................................ Syntax: LIST vector1 [vector2 [...]] EXAMPLE. If AGE is a variate and SEX a factor on 2 levels, the statement LIST AGE SEX will produce output like AGE SEX 24.7100 2 42.1200 1 32.4300 1 ... 22.1300 2 The default format for variates is :10:4, which means width 10 with 4 decimals after the decimal point. For factors it is :4, i.e. width 4 or the length of the factors name if this is more than 4. You can change this by addition of a format to the vector name. For example, LIST AGE:4:1 SEX:3 would result in something like AGE SEX 24.7 2 42.1 1 32.4 1 ... 22.1 2 In addition to this, you can add a list of level names to a factor, like LIST AGE:4:1 SEX=,M,F:3 which would result in a listing like AGE SEX 24.7 F 42.1 M 32.4 M ... 22.1 F Notice that the list of levels comes before the format, if both are present. Notice also that the equality sign, which indicates that a list of level names will follow, is followed immediately by a comma. This is because the name for level 0 is here set to an empty string. If "missing sex", or rather "sex unknown", does actually occur, one would perhaps prefer something like LIST AGE:4:1 SEX=Unknown,Male,Female:7 which might produce a listing like AGE SEX 24.7 Female 42.1 Male 32.4 Male ... 20.2 Unknown ... 22.1 Female Notice the format :7, which is necessary here because the longest level name is of length 7 > 4. Otherwise, level names would be truncated. If LIST is used without parameters, all vectors present are listed with default formats. Restrictions are obeyed, in the sense that the lines corresponding to hidden units are not printed. -------------------------------------------------------------------------------- LIST1 or L1 Condensed print of data (across the page) ................................................................................ Syntax: LIST1 vector1 [vector2 [...]] EXAMPLE. If AGE is a variate of length 4, SEX a factor on 2 levels also of length 4, the statement LIST1 AGE SEX will produce output like AGE 24.7100 42.1200 32.4300 22.1300 SEX 2 1 1 2 Formats can be used, just as for LIST (see above), and the standard formats are the same. Level names for factors can not be used. Restrictions are obeyed, in the sense that the values/levels corresponding to hidden units are not printed. -------------------------------------------------------------------------------- SAVEDATA or SAVE Creates a StatUnit data set. ................................................................................ Syntax: SAVEDATA dataset [vector1 [vector2 [...]]] StatUnit data sets are files written in an internal binary format for fast storage and recovery of data. If only the data set name is specified, all vectors present are stored in the data set. Physically, the data set becomes a file with the name specified followed by the extension .SUD (for "StatUnit Data"). For example, SAVE C:\PROJECTS\A_SCHEME\DATA1 will create a file DATA1.SUD on the directory C:\PROJECTS\A_SCHEME. It is an error if this directory does not exist. If such a file exists already, you will be asked to confirm that you want to overwrite it. However, this is only in interactive mode; if the SAVEDATA command occurs in a program, the file is overwritten without warning. If a list of vector names is added only these vectors are stored. The names must be names of existing vectors in the present session, otherwise the file is not created. Restrictions are taken into account in the sense that only units present are stored. This means that you can use SAVEDATA to create "physically restricted" sub data sets, i.e. data sets where the excluded units are not only marked as non-present, but are actually not there at all. EXAMPLE. Suppose we have variates AGE and HEIGHT and a factor SEX on two levels, all of the same length. To create a data set MALES that contains only the part of data with SEX=1, and import this to our session, we could do something like the following (assuming no restrictions present from the beginning). FOCUSONLEVEL SEX 1 SAVE MALES AGE HEIGHT DELETE GET MALES Notice that the DELETE command is without parameters here. This form of the DELETE command removes all restrictions also, and this is important because the restrictions imposed on the "long data set" will almost certainly be meaningless for the "short data set". If DELETE can not be used in this way because other vectors are to be kept, INCLUDEALL must be used. The DOS versions' GETDATA command can import data sets created by ISUW, provided that ISU's length constraint (MaxLength = 16379) is satisfied. -------------------------------------------------------------------------------- GETDATA Imports data from StatUnit data set. ................................................................................ Syntax: GETDATA [dataset [vector1 [vector2 [...]]] If only the data set name is specified, all vectors in the data set are read. The name, say A:\PROJECT\DATA1, is the name of the corresponding file A:\PROJECT\DATA1.SUD, created by SAVEDATA (written without the extension .SUD). If the file name "dataset" is omitted or replaced with an asterix or a directory name, a file selection menu appears. If a list of vector names is added, only these vectors are read. These names should, of course, be names of vectors in the data set. However, if other names of non-existing vectors are included by mistake, the remaining vectors will still be imported. Names of vectors imported must not coincide with names of existing vectors. SHOW dataset.SUD will tell you if this is the case. Use RENAME as necessary. The reading is stopped if an error of this type occurs. This means that part of the command may be executed. However, the interrupt point depends not on the order of vectors in the list, if present, but rather on the order in which vectors were stored originally by SAVEDATA. Try a RAMSTATUS if something goes wrong. Restrictions are not taken into account and not changed by this command. Data sets created by the DOS version ISU can be imported by GETDATA, provided that the special characters of the Danish-Norwegian alphabet do not occur in vector names. -------------------------------------------------------------------------------- SUMMARY Summary statistics for variates and factors. ................................................................................ Syntax: SUMMARY [vector1 [vector2 [...]]] The parameters must be names of factors or variates. Information about the vectors is written. For a factor, the information includes length, number of levels, number of units present and the number of levels=0. For a variate, the information includes length, units present, the number of missing values, MAX, MIN, MEAN and standard deviation (for present and non-missing values). If the command is used without parameters, summaries of all vectors present are given. -------------------------------------------------------------------------------- ONEWAYTABLE One-way tables of counts. ................................................................................ Syntax: ONEWAYTABLE vector1 [vector2 [...]] The parameters may be names of variates or factors. If a parameter is the name of a factor, a one-way table of counts is produced, which for each level gives the number of units present. You can extend the name of the factor by a list of names separated by commas, as for the READ command. For example, ONEWAYTABLE SEX=Unknown,Male,Female could produce something like Factor SEX, 100 units present. Unknown 1 Male 43 Female 56 If the parameter is the name of a variate, a table of counts is produced with cutpoints chosen by ISUW. But you can extend the name to specify lower limit, number of intervals and upper limit. For example ONEWAYTABLE X=2,10,7 would produce a table of counts of X-values in the ten intervals between cutpoints 2.0, 2.5, 3.0, ... , 6.5, 7.0. ONEWAYTABLE obeys restrictions in the sense that non-present units are ignored. If the parameter is a variate, missing values are treated as non-present. The commands TWOWAYTABLE and THREEWAYTABLE (see below) have additional options which enables the formation of tables of variate sums for a given variate instead of tables of counts. This is not implemented for ONEWAYTABLE, because it is usually just as easy to use TABUALATE and LIST (see 3 screenfuls below). For example, to produce a one-way table of sums of a given variate Y over the groups determined by a factor F of the same length, use TABULATE Y1=Y F F1 LIST F1 Y1 -------------------------------------------------------------------------------- TWOWAYTABLE Two-way tables of counts or variate sums. ................................................................................ Syntax: TWOWAYTABLE [variate] factor1 factor2 The two last parameters must be names of factors of the same length, optionally extended by lists of level names (as for ONEWAYTABLE above). If the first parameter "variate" is not given (or written as the pseudo variate name 1) the command writes a table of counts of units present in the two-way classification (factor 1 as rows, factor 2 as columns). If the parameter "variate" is present, it must be the name of a variate of the same length, and a table of sums of its values over level combinations of the two factors is produced. The variate name can be extended by a format, like in a LIST command. However, this is mainly to let you decide the accuracy displayed when sums of variate values are tabulated. For tables of counts, the figures are displayed as integers, and if the width given by the format is too small, it will be increased as necessary. The default action of TWOWAYTABLE is to produce tables with row sums and column sums. To avoid one of these (or both), add a minus sign as the first character to the name of the factor(s) for which the additional "total" level should not be displayed. EXAMPLE. Suppose we have factors AGEGR and ATTITUDE, classifying some survey sample data according to age and answer to an attitude related question. The table of counts in this cross classification is produced by TWOWAYTABLE agegr attitude A table showing the distribution of ATTITUDEs within AGEGRoups in percentages with one digit after the decimal point can be produced by TABULATE rowsum agegr {see two screenfuls below} COMPUTE pct=100/rowsum(agegr) TWOWAYTABLE pct::1 -agegr attitude Notice the minus sign before AGEGR in the last command. The row sums in this table are 100, and they are displayed to make the interpretation of the percentages clear. But the column sums (and the total sum) are irrelevant and therefore suppressed. Restrictions are taken into account. Units with one or both factor levels equal to zero, or with a missing value of the variate (if specified), are handled as non-present. Thus, if you want tables where factor level zero is taken into account, you must recode the factors first. WARNING. Most output producing commands in ISUW break up lines in such a way that the width of the output file does not exceed 80 characters. TWOWAYTABLE (and THREEWAYTABLE below) is an exception from this. If the second factor has many levels, a very wide table is produced, and this may result in lines of length > 80 (up to 1024, in fact). This gives you the option of producing such a table and later print it out in a readable format after some editing, like change to a smaller font. -------------------------------------------------------------------------------- THREEWAYTABLE Three-way tables of counts or variate sums. ................................................................................ Syntax: THREEWAYTABLE [variate] factor1 factor2 factor3 Exactly as TWOWAYTABLE, except that three factors are specified and a three way table is written. For each level of factor1, a factor2 by factor3 table is produced, and unless factor1 is preceeded by a minus sign, an additional factor2 by factor3 table of totals (summed over factor1) is written. Notice that level 0 of factor1 is excluded also in the "total" two-way table by factor2 and factor3, which comes last in the listing. Hence, this final table will not always coincide with the table one would get by TWOWAYTABLE [variate] factor2 factor3 -------------------------------------------------------------------------------- TABULATE Computation and storage of counts or variate sums in a k-way table. ................................................................................ Syntax: TABULATE newvar[=oldvar] oldfactors [newfactors] EXAMPLE. If ROW and COL are existing factors of the same (arbitrary) length on 3 and 4 levels respectively, then TABULATE COUNT ROW*COL ROW1*COL1 will do as follows. A variate COUNT and two factors ROW1 and COL1 of length 12 (=3*4) will be created, ROW1 on 3 and COL1 on 4 levels. If some of these exist already, they must be of correct types and dimensions. Factor levels will be generated such that each level combination occurs exactly once, COL1 varying fastest, and the corresponding counts of units in the original setting, corrected for restrictions (units with a level 0 counting as excluded) will be stored in COUNT. Generally, the two string parameters oldfactors and newfactors must contain the same number (not necessarily two) of factor names, separated by asterixes * or "pseudoblanks" | . The factors in oldfactors must be previously declared and of equal lengths, the names in newfactors and the new variate name(s) occurring in the first parameter must not be names of existing vectors, unless they just happen to be of correct types, lengths and numbers of levels (e.g. created by a similar TABULATE command). The common length of the vectors created becomes the product of the numbers of levels for the "old factors". TABULATE can also be used to form sums of values of a given variate. If COUNT above is replaced with Y_SUM=Y, for Y a variate of length equal to the lengths of the factors ROW and COL, then the values of Y_SUM will become the sums over the corresponding (product) factor levels of the values of Y. If Y was a variate filled with 1s, the result would be the same as above. The first parameter may contain several specifications separated by pseudoblanks. For example, TABULATE COUNT|Y_SUM=Y ROW*COL ROW1*COL1 would perform both tasks mentioned above, and is thus equivalent to the two commands TABULATE COUNT ROW*COL ROW1*COL1 TABULATE Y_SUM=Y ROW*COL ROW1*COL1 In the last command here, we could actually have written TABULATE Y_SUM=Y ROW*COL omitting the last argument ROW1*COL1. This is legal, and in general it implies that only the variate is formed. In the present case it makes no difference since the factors ROW1 and COL1 are generated by the first command. Restrictions are taken into account in the sense that non-present units are not counted, or the corresponding variate values are treated as zeroes in the summation. Missing values of a summand are treated as zeroes. WARNING. Notice that if restrictions are present, it will usually be unavoidable to continue with INCLUDEALL. The restrictions on the "long" vectors are not likely to be relevant for the resulting "short" vectors. EXAMPLE. To produce a list of average INCOMEs of persons in the 30 groups of a 2 x 5 x 3 classification by factors SEX (2 levels), SITE (5 levels) and SOC (3 levels), do something like this: EXCLUDEMISSING INCOME TABULATE COUNT|INCOME0=INCOME SEX*SITE*SOC SEX0*SITE0*SOC0 INCLUDEALL INCOME0=INCOME0/COUNT LIST SEX0 SITE0 SOC0 INCOME0::0 Notice the EXCLUDEMISSING and INCLUDEALL commands, which are required if INCOME has missing values. Without this, the COUNTs would include units for which INCOME were missing, and this would result in incorrect averages (since missing INCOMEs are treated as zeroes). EXAMPLE. If Y is a variate, R and C two (row and column) factors that arrange the values of Y in a balanced two-way table, fitted values in the additive two-way model (which would usually be computed by SAVEFITTED FIT after FITLIN Y=1+R+C) can be computed by TABULATE rowsums=y|rowcoun R TABULATE colsums=y|colcoun C rowmeans=rowsums/rowcoun colmeans=colsums/colcoun fit=rowmeans(r)+colmeans(c)-mean(y) (to understand the last line, see the section VECTORS AS FUNCTIONS OF UNIT INDEX in the description of the COMPUTE command). -------------------------------------------------------------------------------- PLOT Scatter plots on screen and paper. ................................................................................ Syntax for 2-dimensional version: PLOT xvariates yvariates [colors [symbols]] and for the 3-d version: PLOT xvariates yvariates zvariates [colors] We begin with a description of the 2-dimensional version. A description of the modifications required for the 3-d version follows approximately 11 screenfuls below. In the simplest case, xvariates and yvariates are just names of single variates and the two other parameters are not written, like PLOT X Y which will produce a scatter plot of the points ( X(i) , Y(i) ). Restrictions are obeyed in the sense that non-present points are not plotted, and also points with one or both coordinates missing are skipped. Endpoints of axis intervals are chosen in such a way that the coordinate frame becomes the smallest rectangle containing all the points to be plotted. The variate names can be extended by axis specifications of the form =LowerLimit,NumberOfLabels,UpperLimit. For example, PLOT X=-1.0,10,9 Y implies that the horizontal axis will go from -1 to 9, labels will be displayed at integer multiples of 1=(9-(-1))/10, and one decimal after the comma will be written (since the lower bound for X is given with one decimal). A minus sign before the number of intervals (e.g. -10 instead of 10 in the example above) will produce lattice lines at the label points. Any field can be left empty, and the first field may contain a "pseudo number" determining only the number of digits. For example PLOT X=-1.0,10,9.0 Y=d.dd,10, will imply that the vertical axis is eqipped with 10(+1) labels with 2 digits after the decimal point, but the defaults MIN(Y) and MAX(Y) will be used as the limits, since these are not given. Points that are not within the specified limits (in one or both directions) are not plotted. A heading can be given in a separate FRAMETEXT command before the plot command. Similarly, XTEXT and YTEXT can be used to specify texts to be written at the two axis, otherwise the variate names are used. EXAMPLE. VAR X Y 100 X=RANDOM Y=RANDOM FRAMETEXT 100 random points PLOT X=0.00,10,1.00 Y=0.00,10,1.00 =red =* These five commands would produce something like this (except that the points will be red): 100 random points 1.00 |-----------*-------------------------*-------------------------------| Y | * * * | 0.90 | * * ** * * | * * * | 0.80 | * ** * * * | | * * * * * * | 0.70 |* * * * * | | * * * | 0.60 | * * * * ** * * | | * * * | 0.50 | * * * * * * | |* ** * * * | 0.40 | * * * * | | * * * * * * * | 0.30 | ** * * * * | | * * * * | 0.20 | * * * * | | * * * * * | 0.10 | * * * * * | | * * * * * | 0.00 |*-------*------------------------------------------------------------| 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.0 X In the above example, the last two parameters imply that the points will be plotted as red stars. More generally, the third and fourth parameters specify colors and symbols according to the following rules. The third parameter colors, if non-empty, can be specified as an equality sign followed by a color number or a color name. Color numbers and their names can be found in the table below. This is merely if you want to plot all points in a color different from the default 0 (=black). A more relevant application of the color specification is to choose color according to the level of a factor. The color parameter can be specified as the name of a factor of length equal to the common length of the two variates, followed by a comma separated list of color codes. For example, if WEIGHT and HEIGHT are variates, SEX a factor on two levels, all of the same length, then PLOT HEIGHT WEIGHT SEX=,9,12 will produce a scatter plot with light blue points for SEX=1 and light red points for SEX=2. You can also let ISUW assign the colors in standard order, by PLOT HEIGHT WEIGHT SEX which is equivalent to PLOT HEIGHT WEIGHT SEX=15,1,2 In general, if only the factor is specified, but not the colors, the list '15,1,2,3,4,5,6,7,8,9,10,11,12,13,14,0,0,0,0...' is implyed. By changing 15 to something else you could select a color different from 15 (white, no symbol plotted) for the factor level 0. TABLE OF COLORS. No. Name 0 BLACK 1 BLUE 2 GREEN 3 CYAN 4 RED 5 MAGENTA 6 BROWN 7 GRAY or LIGHTGRAY 8 DARKGRAY 9 LIGHTBLUE 10 LIGHTGREEN 11 LIGHTCYAN 12 LIGHTRED 13 LIGHTMAGENTA 14 YELLOW 15 NONE or WHITE Names of colors can be used instead of the numbers. Only the first eight characters of a color name need to be specified. To see the 16 colors of ISUW, copy and paste the following program into the editor and RUN it: fac $colfac 16 15 $colfac=#-1 var $x $y 16 $x=$colfac $y=1 xtext Colors ytext | frametext Colors in ISUW plot $x $y=0 $colfac=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 # The fourth and last parameter symbols has a similar role as the color parameter, but it determines the plot symbol instead of the color. For example (referring to the WEIGHT HEIGHT SEX example above), PLOT HEIGHT WEIGHT | SEX=,+,o would plot points corresponding to SEX=1 as plusses, and points with SEX=2 as small circles. If the list of symbols is omitted, '=0,1,2,3,4,5,6,7,8,2,2,2...' is implied. Notice the "pseudo blank" | occuring here as parameter 3. It simply means that we want the default color (black). Symbols can be identified by numbers or names, according to the following table: Table of plot symbols: No. Name 0 NONE 1 X or CROSS 2 + or PLUS 3 O or CIRCLE 4 * or STAR 5 DELTA or TRIANGLE 6 NAPLA {upside down triangle} 7 SQUARE 8 DIAMOND {45 degrees rotated square} To see the 8 symbols and 16 colors, copy and paste the following program into the editor and RUN it: fac $colfac 16*9 16 gen $colfac 1 fac $symfac 16*9 9 gen $symfac 16 var $col $sym 16*9 $col=$colfac-1 $sym=$symfac-1 xtext Colors ytext Symbols frametext Colors and plot symbols in ISUW plot $col=-1,17,16 \ $sym=-1,10,9 \ $colfac=,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 \ $symfac=,1,2,3,4,5,6,7,8 A special use of the symbol parameter takes the form =L which has the effect that points on the same level of the factor are connected by lines, provided that they come right after each other in the ordering by unit number. Thus, the effect of this depends strongly on the order of units. A sorting by the factor with the X-variate as the secondary criterion is the most common application, drawing factor groups as broken lines (Y-variate as a function of X-variate). EXAMPLE. Suppose that TIME and TEMP are variates, LOCALITY a factor on 4 levels, all of the same length. The commands SORT LOCALITY TIME TEMP PLOT TIME TEMP LOCALITY=12,13,14,0 LOCALITY=L will produce a plot where, for each LOCALITY, TEMP is drawn as a (linearly interpolated) function of TIME, and the color (light red, light magenta, yellow, black) follows the LOCALITY. As for the color parameter, the factor identifier in "symbols" can be omitted, meaning that a factor with constant level zero is assumed. For example, to connect all points by a broken line use PLOT X Y | =L Another special use of the symbol parameter, which involves no factor name and no equality sign, takes the form +v1-v2 (or, equivalently, -v2+v1 ) where v1 and v2 are names of variates of the same length as xvariates and yvariates. This is used when vertical lines through the points should be drawn to indicate e.g. confidence bounds. The typical application (for symmetric confidence intervals) is PLOT X Y ... +SD-SD where SD is a variate holding standard deviations or double standard deviations. In the typical application, both variates will have non-negative values. Hence, it is a natural requirement that the two signs must be different when two variates are specified. If only one is specified, a half line from each point is drawn, the sign determining which way the line goes. Notice that PLOT X Y=0 =14 -Y will produce a plot with (yellow) points sitting on top of "sticks" (only relevant if Y is nonnegative). A third option for the last parameter "symbols" is to let it consist of the single character #. In this case each point is represented by a box standing on the x-axis, of suitable width with the invisible point right in the middle of its top. The width of these boxes becomes 0.8 times the range for the x-variate, divided by the number of units present. This is usually only relevant if the values of the x-variate are equidistant. Notice that the y-axis bounds are determined as usual if nothing else is specified. Usually, a specification of the form y=0,... is required if the lower endpoint of the y-axis should be 0. This can be used to draw histograms when counts (or percentages) are given as the values of a variate (e.g. produced by a TABULATE command). Notice that automatic definition of x-axis bounds as Max and Min in this case would imply that the first and last box hang halfway outside the frame. For this reason, a small correction is made in this case. But for overlayed plots, this will only work if the histogram is the first plot in the parallel list (see below). If colors are specified, they will be used as fill colors of the boxes. If you produce 'stacked' histograms by overlaying such plots (see below) take care that the histograms with the lower boxes come after those with the higher boxes. Similarly, if you plot a histogram together with a curve - e.g. to show the fit of a distribution - plot the curve after the histogram, unless you want to hide it partially behind the boxes. OVERLAYED PLOTS. Overlayed plots are produced by 'merging' of PLOT commands as follows. Let the four parameters xvariates, yvariates, colors and symbols each contain two, three or more parallel specifications, separated by pseudo blanks. For example, PLOT X=0.0,5,10|X Y|FITTED GROUP=,14,13,12|=WHITE =|GROUP=L will (roughly) overlay the results of the two commands PLOT X=0.0,5,10 Y GROUP=,14,13,12 = PLOT X FITTED =WHITE GROUP=L Notice the use of an equality sign alone as indicating an empty element of a parallel list. The limits and labelling of axes are taken from the specifications in xvariates and yvariates for the first element of the parallel lists, later specifications are ignored. Notice that such specifications are usually necessary, unless the ranges of variation for the first elements happens to cover the ranges for later elements. This is why Y comes before FITTED in the above example (but things may go wrong, also in this case). HARDCOPIES. The immedeate effect of a PLOT or (see below) HISTOGRAM command is that a picture is displayed on the screen. The picture is removed by Escape. There are two ways of getting the plots out on paper. The simplest is to press Return instead of Escape to remove the picture from the screen. This brings up a "Save picture as ..." dialog box, in which you can select a file name and save the picture as a *.JPG file. A *.BMP (bitmap) file can also be selected, but these files are much bigger, and since most image processing programs can import both types, there is generally no reason to do so. If, for some reason, you need to produce a *.BMP file, you should be aware that this will only work if you write the file name without the extension or explicitely with extension .BMP. Both file types can be imported to image processing programs and most text handling programs running under Windows, where they can be modified, merged with text and other pictures and printed out. To produce graphics of a somewhat higher resolution (but without colors), optionally with several plots per page, you can use the commands OPENPS, PSFRAME and CLOSEPS for handling of PostScript files. For example, to produce a single plot in landscape format on a page, use OPENPS filename { opens a file for PostScript output } PSFRAME 1 1 { selects "first picture out of one" } PLOT ... CLOSEPS { closes the file } After this, the file filename.PS will contain PostScript code that can be sent directly to a PostScript printer. OPENPS can also produce encapsulated PostScript files, which can be imported e.g. by Microsoft Word. See the command descriptions. When PostScript code is generated, color codes are handled as follows. When points are plotted, all colors are translated to black, except no. 15 which is translated to white (no point plotted). For boxes (symbol code #), where the color usually controls the fill color, color codes are translated to "whiteness" proportionally to their numerical values, with 0 meaning black and 15 meaning white. THE 3-DIMENSIONAL VERSION OF THE PLOT COMMAND. This is activated when the third parameter is (or begins with) a variate name. In the simplest case PLOT X Y Z a 3-dimensional scatter plot is produced. As for the 2-d version, the variates can be extended by information about their domain and the desired numbers of cutpoints, and the four parameters may be parallel lists of any (common) length, to produce overlayed 3-d-plots. The fourth parameter "colors" determines the colors of points. The syntax here is exactly the same as for the correponding parameter in the 2-d version. The differences from the 2-d version are explained here: The plot symbol is fixed (a three-dimensional cross), and points can not be connected by lines. Negative numbers of cutpoints are interpreted as 0, except in the third parameter zvariates, where the sign has an effect quite different from the one it has for the 2-d version, see below. The effect of specifying numbers of cutpoints is that whenever two of the three variates have a positive number of cutpoints, the corresponding lattice will be drawn on the corresponding (back/bottom/left) side of the frame box. For example, PLOT X=0,10,1 Y=0,10,1 Z or equivalently PLOT X=0,10,1 Y=0,10,1 Z=,0 will produce a plot with a 10 by 10 lattice drawn at the bottom of the frame box. If instead PLOT X=0,10,1 Y=0,10,1 Z=,1 is used, vertical lines will also be drawn on the back and left sides of the frame box. A minus sign before the number of cutpoints for the Z-variate has the effect that vertical "sticks" are drawn from points to the bottom of the frame box. Similarly, a plus sign here has the effect that the points will "hang in strings from the roof". This option may be specified without an actual number of cutpoints, like PLOT X Y Z=,+ For hardcopies, this option is recommended, because it is the only way to give the copies a taste of the 3-d perspective (since it does not help much to rotate the paper). Once the picture is on the screen, the following keys are active (this applies also to the 3-d version of HISTOGRAM): - The four cursor movement keys rotate the frame box (or rather: moves the flying observer around the coordinate box). - The keys + and - on the numerical keyboard zoom and unzoom. - Escape or Return terminates. Use Return if you want to save the final picture as a .JPG file. Labels and axis titles can not be created by the 3-d version of the PLOT command. The initial picture on the screen tells which axis is which, but after this you are on your own. Here some lattice lines may help you to avoid losing orientation. Hardcopies of the final picture via .JPG files or .BMP files can be produced exactly as described above, but PostScript code can not. A PostScript output file may be open, but 3-d plots will not write any code to it. 3-d can only be produced in interactive mode, not in programs. -------------------------------------------------------------------------------- HISTOGRAM Histograms on screen and paper. ................................................................................ Syntax for 2-dimensional version: HISTOGRAM [weight*]variate [factor] and for the 3-dimensional version (3-d histogram for a 2-dimensional empirical distribution) HISTOGRAM variate1 variate2 Without the second parameter "factor" and the specification of "weight", the command produces an ordinary histogram, showing the empirical distribution of the (present and non-missing) values of the variate. Notice that HISTOGRAM computes the counts. If the counts are known in advance and stored in a variate (for example by a TABULATE command), use PLOT instead (with fourth parameter '#', see 6 screenfuls above). Or use the "weighted" version of HISTOGRAM explained 1-2 screenfuls below. If the second parameter "factor" is included and is the name of a factor of the same length, one histogram for each factor level except level 0 is produced, for comparison of the distributions of variate values on the different factor levels. The factor name can be extended by a list of level names as usual. However, level 0 is always ignored. In some cases (long level names, many intervals) the level names are written in such a way that the first box is more or less destroyed by the text string. The variate name can be extended by a specification of the form =lower,intervals,upper to determine the number of intervals and its endpoints. For example, HIST AGE=20,16,100 SEX will produce a histogram for each SEX, with the AGE interval [20,100] divided into 5-year groups. AGEs outside the interval [20,100] are ignored. If no extension of this form is given, min(AGE) and max(AGE) are taken as the endpoints, and the number of intervals is suitably chosen. This specification may be more or less incomplete. For example, HIST AGE=d.dd,16 SEX would use the default endpoints max(AGE) and min(AGE), but the number of intervals would become 16. The layout of the "pseudo left endpoint" d.dd implies that x-axes labels are written with two significant digits after the decimal point. The first parameter may also be the name of a factor. In this case, the factor is interpreted as a variate with integer values (0,)1,2,..., and the histogram is drawn in the obvious way, with one box for each level of the factor. However, if the factor name is extended by a list of level names, these names will be written under the frame instead of the integer levels. In this case, a non-empty level name for level 0 will imply that a box for level 0 is also drawn. For a factor with many levels, the level names must be short. If the number of levels exceeds, say, 15, it is usually preferable to let the procedure write the integer levels. Weights. In the versions of the HISTOGRAM command explained above, the heights of the boxes drawn are counts of units, summing up to the number of units present and non-missing in the variate or factor given as the first argument. Sometimes, in particular when dealing with aggregated data obtained by grouping of a variate or factor, it is desirable to make a histogram where each unit i counts some value w[i] instead of 1. The syntax for this is to "multiply" the first argument by this "weight" variate from the left. EXAMPLE. To display a distribution of individuals according to the levels of a factor AGEGR for each SEX level we can use HIST AGEGR SEX Suppose, however, that we have at our disposal only the counts in the AGEGR*SEX groups - for example as they would be after TABULATE COUNT AGEGR*SEX AGEGR0*SEX0 To produce the same two parallel histograms as before, we could then use HIST COUNT*AGEGR0 SEX0 To produce a single histogram, showing the distribution of individuals in age groups with the vertical axis scaled such that the box heights are percentages (summing up to 100), we could use PERCENT=100*COUNT/sum(COUNT) HIST PERCENT*AGEGR0 THE 3-DIMENSIONAL VERSION OF THE HISTOGRAM COMMAND. If the second parameter is included and is the name of a variate, a different StatUnit procedure, making a 3-dimensional picture of the 2-dimensional histogram, is activated. The two variate names can be extended by limits etc. as described above. The picture can be rotated, zoomed/unzoomed etc. as described for the 3-d version of the PLOT command, see approximately 4 screenfuls above. Weights can not be used, and parallel histograms can not be produced. HARDCOPIES are produced exactly as for PLOTs. But the 3-d version of the HISTOGRAM command can not be used in programs, and cannot produce PostScript code -------------------------------------------------------------------------------- FRAMETEXT Text above frame for 2-d versions of PLOT and HISTOGRAM. ................................................................................ Syntax: FRAMETEXT [text] The default action of PLOT is to draw scatter plots without any heading. For HISTOGRAM, the default is "Histogram for ", or (in case of two parameters "Histogram for by ". These defaults are suppressed as long as a FRAMETEXT command is in force, which remains until the command FRAMETEXT (without parameters) or a DELETE command without parameters is given. Multiple blanks in a string are usually removed, but you can use "pseudo blanks" to enforce multiple blanks. In particular, FRAMETEXT | will suppress headings entirely in the following PLOT and HISTOGRAM commands. -------------------------------------------------------------------------------- XTEXT Text below frame for PLOT and HISTOGRAM. ................................................................................ Syntax: XTEXT [text] The default action of PLOT is to write the variate name(s) at the x-axis. For HISTOGRAM, the default is to write nothing there in case of a single parameter, and the name of the variate if two parameters are specified. These defaults are overwritten by an XTEXT command. Other rules (for pseudo blanks, cancellation of action etc.) are exactly as for FRAMETEXT above. -------------------------------------------------------------------------------- YTEXT Text at vertical axis for PLOT and HISTOGRAM. ................................................................................ Syntax: YTEXT [text] The default action of PLOT is to write the variate name(s) at the y-axis. For HISTOGRAM, the default is to write "Count" there in case of a single parameter, the factor name if two parameters are given. These defaults are overwritten by a YTEXT command. Other rules (for pseudo blanks, cancellation of action etc.) are exactly as for FRAMETEXT above. The command YTEXT | is often useful for enlargement of the labels on the y-axis. -------------------------------------------------------------------------------- OPENPSFILE Open file for output of PostScript code ................................................................................ Syntax: OPENPSFILE [filename] If the file name is empty the default name ISUW.PS is used. Use this together with PSFRAME and CLOSEPSFILE to send graphics to a printer in PostScript format. EXAMPLE (two plots on a page). OPENPS PSFRAME 1 2 {can be omitted, since this is the default} PLOT ... or HIST ... PSFRAME 2 2 { "second of two", i.e. lower half of the paper } PLOT ... or HIST ... CLOSEPS ... to be followed e.g. by COPY ISUW.PS LPT2 from a command window, or a similar operation via Windows explore, GhostScript or whatever. If "filename" is specified explicitely with extension .EPS, encapsulated PostScript code is produced. This is only for a single plot per page (the PSFRAME command can not be used), but these files can be imported to some text handling programs. When a PostScript file is open in interactive mode, the images produced are shown on the screen as usual, and you still have the option of saving them (in addition) as .JPG or .BMP files. In programs this option is not available. The images are formed on the screen as usual, but they are removed immediately. This enables you to write programs that can run unattended while they produce one or more PostScipt pages. -------------------------------------------------------------------------------- PSFRAME Selecting the position on the page for PostScript graphichs output. ................................................................................ Syntax: PSFRAME select total The two parameters must be integer or integer expressions, satisfying select = 1,2,...,total, total = 1, 2, 4, 8, 9, 16, 18, 25, 32, 36, 49, 50, 64 or 72. The idea is that the command selects frame no. "select" out of "total" equally sized frames on the paper. The values total = 2, 8, 18, 32, 50, 72 (double squares) produce pages in "portrait" format. The remaining values 1, 4, 9, 16, 25, 36, 49 and 64 (squares) produce pages in "landscape" format. ------------- EXAMPLE. To produce 3 plots in a | 1 | 2 | 4 x 2 arrangement, leaving the lower ------------- half of the paper and the lower | 3 | | right corner of the upper half blank, ------------- do something like this: | | | ------------- | | | OPENPS ------------- PSFRAME 1 8 PLOT ... { 1 } PSFRAME 2 8 PLOT ... { 2 } PSFRAME 3 8 PLOT ... { 3 } CLOSEPS A special version of the command takes the form PSFRAME # total Here, the number of frames on the page is given by the last parameter, but the specification of the frame by the symbol # implies that the first parameter "select" will take the values 1, 2, ..., shifting by 1 each time a PLOT or HISTOGRAM command is met. EXAMPLES. In the example above, we could obtain exactly the same by OPENPS PSFRAME # 8 PLOT ... { 1 } PLOT ... { 2 } PLOT ... { 3 } CLOSEPS The following program generates a PostScript page with 32 probit diagrams for 100 simulated normal observations. DEL VAR x probit 100 probit=phiinv(#/101) OPENPS $i=0 %%% $i=$i+1 PSFRAME $i 32 x=normal SORT x PLOT x probit =15 =L GOTO %%% $i<32 CLOSEPS Here, we could also have put the PSFRAME command before the loop if we had given it the form PSFRAME # 32. If no PSFRAME command is used, ISUW uses the default which corresponds to PSFRAME 1 2. The command can not be used when OPENPS has specified an encapsulated postscript file (extension .eps). -------------------------------------------------------------------------------- CLOSEPSFILE Close file for PostScript graphics output. ................................................................................ No parameters. Closes the file for output of PostScript code. Use with OPENPSFILE, PSFRAME, PLOT and HISTOGRAM. -------------------------------------------------------------------------------- SORT Parallel sorting of vectors. ................................................................................ Syntax: SORT vector1 [vector2 [...]] [+] The vectors occurring as parameters must be of the same length. The effect of this command is that the vectors are sorted in parallel, such that the resulting vectors are ordered increasingly, primarily by values/levels of vector1, then (within constant value/level of vector1) by vector2, etc. etc. A final plus sign has the effect that all other vectors of the same length are sorted in parallel with those in the list; but the ordering within tie groups (if any) determined by the vectors in the list, becomes arbitrary. EXAMPLE. Let X be a variate and SEX a factor on two levels, both of length 10, with values/levels as LISTed here: SEX X 1 1.5933 1 1.8114 1 1.5953 2 2.0208 2 1.6493 1 1.9338 1 1.9628 2 1.2509 1 1.2309 2 1.5081 Then, after the command SORT SEX X a similar LISTing would look like this: SEX X 1 1.2309 1 1.5933 1 1.5953 1 1.8114 1 1.9338 1 1.9628 2 1.2509 2 1.5081 2 1.6493 2 2.0208 In this case, you might want to perform the sorting by SEX only, preserving the order of X-values unchanged within SEX-groups. However, you can not do this by just writing SORT SEX, because only the vectors in the list are sorted; nor can you do it by SORT SEX +, because the QuickSort algorithm will create arbitrary permutations of units within SEX groups. A solution to this problem goes as follows: Create (if it doesn't exist already) a vector which is ordered by unit number, for example by VAR UNIT 10 COMPUTE UNIT=# Include this as the second vector in the list, SORT SEX UNIT X or just (if other vectors of length 10 are present and should be sorted in parallel) SORT SEX UNIT + With the original unit number as the secondary criterion, the original order within SEX-groups is preserved. Notice that a SORT command without a final plus sign should usually have all vectors of the relevant length in the list of parameters. Otherwise, the unit-to-unit correspondance between vectors is lost, and this is rarely useful. If you make a sorting without some vectors of the relevant length, a warning is given. SORT ignores restrictions, and if restrictions are present the restriction indicator is NOT sorted in parallel. Thus, the excluded units, if any, are not the same as before. For this reason, a warning is given if restrictions are present under a SORT command. If you forgot to do it before, you may as well right after a SORT command perform an INCLUDEALL command. If you want to preserve the restrictions, you will have to construct your own "restriction indicator". For example by FAC PRESENT N 1 { where N stands for the common length of } { the vectors to be sorted. } COMPUTE PRESENT=1 { PRESENT becomes the "presence indicator", } { since the default value 0 remains unchanged } { for the non-present units. } INCLUDEALL { this command can also be placed after the } { SORT command, it doesn't matter. } SORT ... { where the argument list should include } { PRESENT or end with a + } EXCLUDE PRESENT=0 DELETE PRESENT Missing values of variates are treated according to their physical representation, which is the numerical value -1E-37. Thus, after a SORTing by values of a variate, the missing values will occur after the negative values and before the zeroes. -------------------------------------------------------------------------------- COMPUTE Computation of variates and factors from other variates and factors. ................................................................................ Syntax: COMPUTE vectorname=expression or COMPUTE vectorname(index) = expression or COMPUTE expression[:[width]:[decimals]] In interactive mode, the command name COMPUTE can be replaced with a blank in position 1 of the command window. In programs you may even omit the blank provided that the COMPUTE command has a left hand side. In this case the command is identified by the equality sign after the first connected string. In interactive mode you can generate a COMPUTE command from position 1 by = (with equality sign) or + (without equality sign). In programs, COMPUTE commands without a left hand side can be preceeded by = or + instead of the command name. In the simplest case, the action taken by COMPUTE is a unit-by-unit computation of values of the vector on the left hand side. For example, if Y is a variate of length 100, the statement LOGY=ln(Y)/ln(10) will give LOGY values ln(Y(1))/ln(10), ... ,ln(Y(100))/ln(10). If LOGY is declared in advance it must be a variate of length 100, otherwise it will be declared as such. The expression on the right hand side may involve the algebraic operators +, -, *, / and ^ (meaning "raised to the power", e.g.(-2)^3=-8), six relational operators (see 2 screenfuls below), explicit constants, standard functions (see 5 screenfuls below) and parentheses as necessary. All the vectors occurring on the right hand side must be existing vectors of the same length, with some exceptions to be explained later. We call them "parallel" vectors, to emphasise the one-to-one correspondence between their entries and the entries of the resulting vector on the left hand side. The result of a COMPUTE command without a left hand side is that the result is computed and displayed, rather than stored in a vector. For example, the result of the command COMPUTE X/Y:6:2 for existing variates X and Y of the same length is roughly the same as you would obtain by COMPUTE RATIO=X/Y LIST1 RATIO:6:2 DELETE RATIO For this reason, most of what follows is only explained for the case where a left hand side is present. MISSING VALUES. Missing values are taken into account by COMPUTE in the sense that the result will always become missing for entries where one of the parallel vectors on the right hand side has a missing value. Even if you write COMPUTE Y=X+0*Z a missing value of Z will result in a missing value of Y. Algebraically or numerically undefined quantities, such as ln(0) or ln(-4), sqrt(12-20), (-7)^1.3, exp(100) etc. etc., are set to the missing value, and a warning about this is given. RESTRICTIONS. Restrictions are obeyed in the sense that for non-present units the vector on the left hand side is left unchanged. An exception from this occurs when the result is a variate of length 1, this will always be computed. EXAMPLE. The statements FOCUSONLEVEL F 3 COMPUTE Y=1/0 INCLUDEALL can be used to give Y missing values for all units on level 3 of the factor F. However, a shorter way of doing this is by the single command COMPUTE Y=Y/(F<>3) RELATIONAL OPERATORS. In the command line above, the denominator (F<>3) becomes 1 for units on an F-level different from 3, 0 for units on level 3. The following six relational operators = equal to <> different from < less than <= less than or equal to > greater than >= greater than or equal to can be used on the right hand side of a COMPUTE command, and the resulting boolean expressions are given values 1 (for TRUE) or 0 (for FALSE). For example, COMPUTE INDIC=exp(X)>Y+3 will return a variate INDIC of zeroes and ones, 1 when exp(X)>Y+3. As a more complicated example, COMPUTE MINXY=(X<=Y)*X+(Y 1 on the right hand side. If the result is to become a factor, it must be declared in advance, see 6-7 screenfuls below. FACTORS ON THE RIGHT HAND SIDE. Factor dummies can be written like (F=2), meaning the variate which is 1 when the factor F takes the level 2, 0 otherwise. More generally, factors of the correct length may occur on the right hand side, where they are interpreted as parallel variates with their numerical levels as values. EXAMPLE. If F is a factor on three levels, COMPUTE X= 1.1*(F=1) + 2.3*(F=2) + 4.8*(F=3) will result in a variate X with values 1.1, 2.3 and 4.8, determined by the levels of F. UNIT-BY-UNIT FUNCTIONS. Vector valued functions, operating unit by unit, are EXP() the exponential function LN() log (base e, use LN()/LN(10) if you want base 10) SQR() x -> x*x SQRT() x -> square root of x ABS() x -> |x| SIN() COS() wellknown trigonometric functions ARCTAN() INT() x -> [x] (integer part, upwards rounding for negative argument) PHI() The c.d.f. of the normalised normal distribution. PHIINV() The inverse of PHI(). NORMAL simulated standardised normal values. POISSON() random Poisson distributed values with the argument as parameter. If the parameter is 0 or negative, 0 is returned. RANDOM random uniform on [0,1] EXAMPLES. To generate discretely uniform random values in the range 1, 2, ... , 6, say, use construcions like FAC DICE 10000 6 COMPUTE DICE=1+INT(6*RANDOM) To fill an existing variate Y with random zeroes and ones, 1 occurring with probability 1/3, write COMPUTE Y=RANDOM<1/3 If FITTED is a variate holding supposedly correct means in a multiplicative Poisson model (produced e.g. by FITLOGLINEAR ... and SAVEFITTED FITTED), then COMPUTE SIMDATA=POISSON(FITTED) will produce a simulated response variate under the estimated model. To make a probit diagram for the observations in a variate X write something like SORT X + COMPUTE N=##(X) { ##() is explained below } VAR PROBIT N COMPUTE PROBIT=PHIINV(#/(N+1)) { # is explained below } PLOT X PROBIT SCALARS ON THE RIGHT HAND SIDE. Until now, all vectors occurring on the right hand side have been assumed to be parallel vectors, i.e. vectors whose lengths must coincide with (and sometimes determine) the length of the resulting vector on the left hand side. Here comes the first exception: Variates of length 1 may occur on the right hand side, where they are treated exactly as explicitely written constants. The length of the resulting vector will then be determined by other parallel vectors, or by its own length if it exists. For example, VAR X 100 COMPUTE A=2 COMPUTE X=#^A { # is explained below, but have a guess ... } will create a vector of length 100 with values 1, 4, 9, ... , 10000, provided that A has not been created earlier as something else than a variate of length 1. SCALAR VALUED VECTOR FUNCTIONS. The following scalar valued functions are available: SUM() returns the sum of the (present and non-missing) values of a variate. MEAN() returns the average of the (present and non-missing) values of a variate. If no values are present and non-missing, 0 is returned. MIN() and MAX() return minimum and maximum of the (present and non-missing) values of a variate. If no values are present and non-missing, 0 is returned. VARIANCE() returns the (denominator n-1) sample variance of the (present and non-missing) values of a variate. If no values or only a single value are present and non-missing, 0 is returned. ##() returns the full length of a vector, without correction for restrictions or missing values. If the argument is the name of a vector that does not exist, 0 is returned. This convention is useful if you want to check for existence of a vector in a program. #() returns the number of units present of a variate or factor. For variates, missing values are counted as non-present. But units with level 0 for factors are regarded as present. Thus, if F is a factor and no restrictions are present, we have ##(F) = #(F). The argument of a scalar-valued function must be the name of a vector, not an expression, and this vector is NOT a parallel vector, i.e. it may be of any length, and this length has no influence on automatic declaration of the vector on the left hand side. EXAMPLE. if X is a variate of length 100, and SD is not declared (or of length 1), you can write COMPUTE SD=sqrt(variance(X)) to obtain the standard deviation (as a variate of length 1), and then COMPUTE X0=(X-mean(X))/SD to compute the vector X0 of standardised values. This can also be done in a single step by COMPUTE X0=(X-mean(X))/sqrt(variance(X)) THE UNIT INDEX # AND THE NUMBER OF UNITS ##. The identifier # (not to be confused with the scalar valued vector function #() ) has a special meaning as a (non-existing) variate with the values 1, 2, 3,... . Writing e.g., for an existing variate UNIT of length 100, COMPUTE UNIT=# the variate UNIT will get values 1, 2, ... , 100. The identifier ## (not to be confused with the scalar valued vector function ##() ) has another special meaning, as the length of the resulting left hand side. Thus, if you write, for an existing vector X of length N, COMPUTE X=#/## you will give X the values 1/N, 2/N, ... , 1. EXAMPLE. Suppose we have a vector X of length 100, and want to split it up in two vectors, one containing the odd-numbered entries and the other containing the even. This can be done by VARIATE X1 X2 50 COMPUTE X1=X(2*#-1) COMPUTE X2=X(2*#) (cfr. VECTORS AS FUNCTIONS OF UNIT INDEX a few screenfuls below) VECTOR CONSTANTS. A list of real numbers, seperated by blanks and embraced by brackets [], can be used in COMPUTE commands to represent an unnamed vector with given values. For example, to declare and simultaneously give values to (short) variates, simply use commands like COMPUTE X=[1.1 -1.2 0.2 3.4] EXAMPLE. In an earlier example, the statement COMPUTE X= 1.1*(F=1) + 2.3*(F=2) + 4.8*(F=3) was suggested as a way of giving values 1.1, 2.3 and 4.8 to a variate, depending on the level of a factor. An easier solution is COMPUTE X=[1.1 2.3 4.8](F) (cfr. VECTORS AS FUNCTIONS OF UNIT INDEX a few lines below) It is also possible to include vector names in the list, representing the list of the vectors values (or levels, in case of a factor). For example COMPUTE A=[1 2 3] COMPUTE B=[a 0 a] will result in a vector B of length 7 with values 1 2 3 0 1 2 3. However, these "bracketed lists" must not (and need not) be nested. For example, COMPUTE B=[[1 2 3] 0 [1 2 3]] would NOT work. FACTORS ON THE LEFT HAND SIDE. The vector on the left hand side may be a factor. If the result is non-integer or out of range, the level 0 will be assigned, and a warning will be given. EXAMPLE. If F is a factor of length 100 on 5 levels, and you want to collapse it to a factor G on three levels, representing the groups {1}, {2,3} and {4,5} of F-levels, you can write FACTOR G 100 3 COMPUTE G=[1 2 2 3 3](F) - where the explanation of the last line follows now. VECTORS AS FUNCTIONS OF UNIT INDEX. Variates and factors may occur as "functions" on the right hand side. In this case the argument must be integer and is interpreted as a unit index. The vector is NOT a parallel vector in this case (but its argument may very well be so). The following example illustrates this point. EXAMPLE. The "display" command (i.e. a COMPUTE command without a left hand side, here extended by a format) COMPUTE [10 20 30 40 50]([1 5 2]):3:0 will produce the output 10 50 20 A more useful example follows here. EXAMPLE. If X and X1 are variates of the same length, COMPUTE X1=X(#-1) will give X1 the "lagged" values of X. The first value X1(1) will become missing, because X(0) is undefined (and a warning about this will be given). Notice that X1 must be declared first, since X on the rigth hand side is not a parallel vector. If X1 was undeclared, the statement would result in a variate of length one with a single missing value. WARNING. The statement COMPUTE X=X(#-1) will not work as - perhaps - expected, since the computations are performed unit by unit in the natural order. This statement would actually result in a vector of missing values, since we would get first for unit 1 X(1) = X(0) = * then for unit 2 X(2) = X(1) = * then for unit 3 X(3) = X(2) = * etc. etc. A similar warning comes here: Suppose you want to transform a variate X by subtraction of its first value from all entries. Then COMPUTE X=X-X(1) will not work, because X(1) is set to zero before the later entries are computed. Instead, you would have to do something like COMPUTE X1=X(1) { X1 undeclared, thus becoming of length 1 } COMPUTE X=X-X1 DEL X1 In general one has to be very careful when the variate on the left hand side occurs as a function on the right hand side. The computations are performed unit by unit, and if entries of the variate have been changed by earlier steps, this may give unexpected results. However, as long as you know the rules, the dynamic execution can be useful. For example, to produce a vector S holding the cumulated values X(1), X(1)+X(2), X(1)+X(2)+X(3), ... of an existing variate X, write (provided that no restriction are present) COMPUTE S=X EXCLUDE 1 COMPUTE S=S+S(#-1) INCLUDE 1 The exclusion of unit 1 is necessary here, because otherwise the reference to unit 0 would produce a missing value, and this would persist all the way through, producing a vector of missing values. Notice that it is OK to refer to S(#-1) also for #=2, the exclusion of unit 1 does not prevent this. The restrictions refer to the unit index for the vector on the left hand side. Another example, clearly demonstrating how and why vectors are not parallel when they occur as functions, follows here. EXAMPLE. Suppose we have some monthly data over 20 years. Let MONTH be a factor of length 240 on 12 levels, holding ... guess what. To create a variate DAYS of length 240, holding the number of days in each month, simply write DAYS=[31 28 31 30 31 30 31 31 30 31 30 31](MONTH) Here, MONTH is a parallel vector, the vector [31 28 ... 31] is not. SINGLE VECTOR ENTRIES ON THE LEFT HAND SIDE. To get the leap years correct in the example above, you could add a few statements of the form DAYS((1984-1981)*12+2)=29 DAYS((1988-1981)*12+2)=29 etc. (pretending here that the first month is January 1981). Quite generally, the left hand side in a COMPUTE command may be of the form vectorname(integer expression). In this case the right hand side must interpretable as a vector of length 1, and the result is stored in the corresponding entry of the vector on the left hand side. For example (provided that A and B are undeclared, or variates of length 1) VAR X 10 COMPUTE A=3 COMPUTE B=23.5 COMPUTE X(Sqr(A))=B DEL A B is an extremely complicated way of setting the 9'th value of X to 23.5; which could also be done by the single command COMPUTE X(9)=23.5 Also in this case can the vector on the left be a factor, if the expression on the right hand side is integer and in the range of valid levels. RULES FOR NAME CONFLICTS. Names of vectors may be ISUW function names, but in this case the corresponding functions are no longer available. For example, If you declare a variate named EXP you can no longer use the exponential function, because e.g. exp(1.3) will be interpreted as the (missing) 1.3'rd value of the variate EXP. Similarly, if you declare vectors named RANDOM, MEAN, SUM, ... these functions can no longer be used. THE CONSTANT PI. The constant PI=3.14159.. can be constructed (if 3.14159 is not good enough) by COMPUTE PI=4*ARCTAN(1) REGISTER OVERFLOW AND STACK OVERFLOW. Large formulas may result in an error message reporting "register overflow" or "stack overflow". The reason for this is that COMPUTE is based on a parser procedure (formula interpreter) that calls itself, and also that intermediate results are stored in a limited number of registers. If this appears to be a problem, you will have to perform the computations in two or more steps. For example, computing a sum of more than (approximately, depending on other circumstances) 100 terms may result in such an error. Splitting it up as a sum of two, like COMPUTE X=A1+...+A50 COMPUTE X=X+A51+...+A100 will solve this problem (which you will hardly ever meet). -------------------------------------------------------------------------------- GENERATELEVELS Assigning levels to a factor in a systematic (cyclic) way. ................................................................................ Syntax: GENERATELEVELS factorname lag Assigns cyclically varying levels to a factor. For example, if F is a factor on 3 levels, GEN F 2 will assign levels 1 1 2 2 3 3 1 1 2 2 ... to the factor. Hence, the second parameter (2, in this case) determines the lag between change points. Restrictions are NOT taken into account. EXAMPLE. A file contains the 3 by 4 table 1.2 1.4 1.1 1.9 1.2 1.1 1.3 1.6 1.1 1.4 1.2 1.3 We can read these values into a variate of length 12 by VAR Y 12 OPEN filename READ Y The two factors reflecting the two-way structure can be constructed by FAC ROW 12 3 GEN ROW 4 FAC COL 12 4 GEN COL 1 -------------------------------------------------------------------------------- GROUP Construction of a factor by interval grouping of a variate. ................................................................................ Syntax: GROUP variatename factorname [levels [cutpoints]] Constructs a factor by interval grouping of an existing variate. The number of levels is set by the integer parameter "levels". If the factor is existing in advance, it must be of the same length as the variate, and "levels" must be the number of levels for that factor, if specified (or it can be specified as zero, or marked with an asterix). If the factor does not exist, it is automatically declared. If "levels" is a positive integer, it is taken as the number of levels for the new factor. If "levels" is specified as 0 or not specified at all, the new factor is automatically declared with its number of levels set to the rounded value of the square root of the length of the variate, but at most 255. The final parameter(s) cutpoints, if specified, must contain levels-1 real numbers separated by blanks. These must be in increasing order, and they determine the cutpoints between intervals. This parameter can also be the name of a variate, holding the desired cutpoints. The length of this variate must be levels-1, and its values must be nonmissing and increasing. If this parameter is not specified, the cutpoints are placed equidistantly between MIN and MAX of the variate. EXAMPLE. GROUP AGE AGEGRP 4 20 40 60 is equivalent to FACTOR AGEGRP ##(AGE) 4 COMPUTE AGEGRP=1+(AGE>20)+(AGE>40)+(AGE>60) Ties are handled according to the convention "] , ]", meaning that a value falling exactly at a cutpoint is put in the lower of the two possible categories. Restrictions are obeyed in the following sense. If the factor exists in advance, it is unchanged for the hidden units. If the factor is automatically declared, it becomes 0 for those units. Notice that if cutpoints are selected automatically, they will be based on MIN and MAX of the present observations only. -------------------------------------------------------------------------------- TRANSFER Transfers subvector to subvector, or compresses vector by removal of hidden entries. ................................................................................ Syntax: TRANSFER name1 s1 e1 name2 s2 e2 or TRANSFER name1 name2 For the first form (with six parameters), the two names must be names of either factors or variates, and the four integer parameters s1, e1, s2 and e2, specifying the start and end of the vector segments, must satisfy obvious consistency requirements involving the lengths of the two vectors and the two subvectors. EXAMPLE. If X and Y are vectors of lengths (at least) 10 and 100, the command TRANSFER X 1 10 Y 91 100 will set Y-values equal to X-values according to the scheme Y(91) = X(1) Y(92) = X(2) ... Y(100) = X(10) Transfer of a subvector with values/levels in reversed order is performed when either s1 is greater than e1 or s2 is greater than e2. Special care should be taken in the case where name1 = name2. This is allowed, but the transfer takes place in the order s1 first ... e1 last, and this may give unexpected results. EXAMPLE. If X is a vector of length 100, the command TRANSFER X 1 99 X 2 100 will produce a vector with the same value for all units, because the same value (originally X(1)) is transferred again and again. Whereas TRANSFER X 99 1 X 100 2 COMPUTE X(1)=1/0 will produce a proper "lagged" vector (with the first value set to a missing value, by the last statement). TRANSFER with six parameters does NOT take restrictions into account. For the short form (two parameters), name1 must be the name of an existing vector. name2 should usually be specified as a valid vector name which is not in use, but if earlier defined it must be of length equal to the number of entries present in name1, and its type (including number of levels, if it is a factor) must coincide with the type of name1. The action taken is to create name2 if necessary and transfer the values/levels present in name1 to name2 in the obvious order. EXAMPLE. Suppose we have variates AGE and HEIGHT and a factor SEX on two levels, all of the same length. To create a data set MALES that contains only the part of data with SEX=1, and import this to our session, we can (as explained earlier in the description of SAVEDATA) do as follows (assuming no restrictions present from the beginning). FOCUSONLEVEL SEX 1 SAVE MALES AGE HEIGHT DELETE GET MALES However, "short" vectors AGE_MEN and HEIG_MEN can also be created directly by FOCUSONLEVEL SEX 1 TRANSFER AGE AGE_MEN TRANSFER HEIGHT HEIG_MEN If this is to be followed by some relevant operations on the new vectors, we should obviously proceed with INCLUDEALL since the restrictions imposed on the long vectors are not likely to be relevant for the short vectors. -------------------------------------------------------------------------------- FITLINEARNORMAL Regression and analysis of variance ................................................................................ Syntax: FITLINEARNORMAL variate=modelformula[/weight] Fits a standard linear model for normal observations. "variate" is the dependent variable, and "modelformula" is a code for the linear expression for the mean in the model. The model formula consists of terms separated by plusses or blanks. Each term may be a factor, a variate, or a formal product of factors and variates. The special term '1' represents a constant term in the model (like a factor on one level or a variate filled with 1's). As opposed to most other statistics packages, ISUW requires explicit specification of the constant term when it should be included (and it does not have to be the first term). All vectors occuring on the right hand side must be of the same length as the reponse variate on the left. If "weight" is specified, it must be a variate of the same length. Its values must be positive, and they are interpreted as weights of observations - implying that the assumption of constant variance is replaced with the assumption that the variances are proportional to the inverse weights. In this case, the unknown proportionality factor takes over the role of the variance in the homoscedastic case. Restrictions are obeyed in the sense that the analysis is performed only for the data present. Missing values of variates must not occur. Factors in the model formula may take the level 0, but this complicates the interpretation and is not recommended in general. See, however, 2-3 screenfuls below. EXAMPLES. X, Y and COUNT are variates, F and G are factors. When more than one model formula is given, they correspond to different parameterisations of the same model, provided that none of the factors take the level zero. FITLIN Y=F FITLIN Y=1+F one-way analysis of variance FITLIN Y=1+X ordinary linear regression FITLIN Y=F+G two-way analysis of variance, FITLIN Y=1+F+G additive model FITLIN Y=F*G two-way analysis of variance, FITLIN Y=1+F+F*G with interaction FITLIN Y=1+F+G+F*G FITLIN Y=F+F*X a regression line for each F-level FITLIN Y=1+F+F*X or 1+F*(1+X) (this is legal - the "distributive law" is build into the syntax) FITLIN Y=F+X FITLIN Y=1+F+X parallel regression lines FITLIN Y=1+F*X regression lines with common intercept FITLIN Y=F*X regression lines through (0,0) FITLIN Y=1+X+X*X second degree polynomial regression FITLIN Y=1+X/COUNT linear regression with weights COUNT (referring to a standard context where Y holds the averages in groups of observations with common X-value, COUNT holding the group sizes). Quite generally, the rules for translation of the model formula to a model can be described as follows. The model states that the mean vector for the response variate is a linear combination of columns of a MODEL MATRIX (or DESIGN MATRIX), the (number of observations) times (number of parameters) matrix which is ususally called X when the model is treated mathematically. The model matrix can be thought of as generated from the model formula by the following simple algorithm. Each term of the model formula generates a number of columns of the matrix. A term which is a product of factors generates a number of columns which is the product of the numbers of levels for the factors. These columns become "dummies", i.e. 0/1-indicators for the levels of the corresponding product factor or cross classification. Multiplication of a term by a variate does not change the number of columns generated, but each column is multiplied (entry by entry) with the values of the variate. In particular, a term consisting of a single variate simply creates a column holding the values of the variate. The term 1 generates a single column filled with 1's, and is thus equivalent to a variate filled with 1's. Hence, to count the number of columns (which is also the number of linear parameters, including those that are set to zero due to overparameterisation), simply add the orders of terms in the model formula, where the order of a term is computed as the product of the numbers of levels for factors in the term (1 if no factors are involved). If the model matrix has linearly dependent columns (which is almost always the case), parameters are set to zero according to the following rule: Whenever a column is a linear combination of the preceding columns, the corresponding parameter (i.e. the coefficient to that column in the expression for the mean vector) is set to zero. In this way, a unique parameterisation is obtained. For models involving factors, specified with a constant term first, then main effects, then first order interactions etc., this results in the usual "corner point parameterisation" where the parameters corresponding to last levels or (for interactions) level combinations involving a last level are set to zero. As an implicit consequence of these rules, the level 0 for a factor (or a level combination for a cross classifications involving a level 0) does not create a dummy column of the model matrix. Our general advice is not to use factors taking the level 0. If the level 0 is used at all it should be as a "missing level", and accordingly units with a level 0 should be excluded before FITLINEARNORMAL. However, level 0 as a "relevant level" can be used to avoid overparametrisations, if desired. For example, in a design with many factors on two levels, it is sometimes preferable to declare these factors with one level and use levels 0 and 1 as the two levels. This actually means that you take over the complete control of the dummies that form the model matrix. Multiplication of two factors on one level in a model formula litteraly means multiplication of the two dummies, etc. In this case you will have to be completely aware of what you are doing. In particular, you should be aware that main effects without a constant term, interactions without main effects etc., which can usually be specified without much danger of confusion if you avoid to use level 0, may result in meaningless models when the level 0 occurs. EXAMPLE. To fit a one-way ANOVA model in such a way that the parameter estimates are directly interpretable as expectations in the groups, a command of the form FITLIN Y=F can be used. However, this will only work if the factor F does not take the level 0, because the specification assumes E(Y)=0 for units with F=0. If the level 0 actually occurs, you can use FITLIN Y=1+F to avoid this. But notice that the interpretation of the estimated parameters will be very different from what it is when the level 0 does not occur. Level 0 takes over the role as "baseline level", which is usually taken by the last level. Output from FITLINEARNORMAL consists of an analysis of variance table, giving the degrees of freedom, square sums, mean square sums, F-statistics and P-values for successive reduction of the model by removal of the terms from the model formula, beginning with the last. This is what SAS users call the "type I" ANOVA table. However, there are some differences. Firstly, the constant term will (if it is specified) occupy a line, just as any other model term. Accordingly, the last "Total" line will have the total number n of observations (rather than n-1) as "degrees of freedom" and the total square sum of the observations (rather than the square sum of deviations from the mean) as its "sum of squares". Secondly, the tests performed are in accordance with the usual rules for successive model reduction, in the sense that the denominators of the F-statistics are "pooled variance estimates", not just the variance estimate from the original model. As a consequence of this, the order of terms determines the model reductions to be tested, and also which parameters to be set to zero to avoid overparameterisation. For this reason, the supposedly least important terms (e.g. higher order interactions) should be put last in the model formula. For example, if in a simple regression analysis a line through the origin can be expected, it is more natural to write "Y=X+1" than "Y=1+X". If the hypothesis "intercept=0" is accepted, the proportionality model is fitted by "Y=X". The estimated variance and standard deviation of the model is also printed out. In addition, if the model formula has a constant term as its first term, the usual R-square statistic (the proportion of the total square sum of deviations explained by the model) is displayed. To list the parameter estimates and their standard deviations for the last model fitted by FITLINEARNORMAL, use the command LISTPARAMETERS. To save these estimates (and optionally their estimated standard deviations) in a variate use SAVEPARAMETERS. To extract fitted values, residuals and normed residuals, use SAVEFITTED. See also the descriptions of ESTIMATE (for estimation of specific contrasts) and SAVENORMEDRESIDUALS (for extraction of studentized residuals). TESTMODELCHANGE, which refers to the last two models fitted, can be used for test of model reductions if more than one term is removed in a single step. -------------------------------------------------------------------------------- FITLOGLINEAR Multiplicative Poisson models ................................................................................ Syntax: FITLOGLINEAR variate[-offset]=modelformula Fits a log-linear or multiplicative Poisson model. It is wellknown how conditioning on the total sum or a set of marginal sums in such models results in models for multinomial data, which can be handled by the same computational methods. But in the following, we refer to the "independent Poisson" interpretation. "variate" is the dependent (non-negative integer) variable, and the model formula is a code for the linear expression for the logarithmised mean in the model, cfr. the description of FITLINEARNORMAL above. EXAMPLE. It should be emphasised that the following example is not relevant at all, it is just a fast and easy way of explaining the relation between model formulas and some wellknown concepts related to contingency tables. Let BIRTHS be a variate of length 24 holding the monthly counts of births of boys and girls in a certain area during a year. Let MONTH be the factor of length 24 on 12 levels holding the month number 1..12, and let SEX be the factor of length 24 on 2 levels holding the sex (boys/girls). The model fitted by FITLOGLIN BIRTHS=MONTH*SEX is the full or saturated model, where each observation has its own freely varying expectation. The model fitted by FITLOGLIN BIRTHS=MONTH+SEX corresponds to independence in the 12*2 table, i.e. the sex proportion is constant from month to month. The further reduction to FITLOGLIN BIRTHS=MONTH correponds to the assumption that the sex proportion is exactly 1:1, whereas FITLOGLIN BIRTHS=SEX assumes (more or less, see below) constant birth intensity over the year. Finally, FITLOGLIN BIRTHS=1 assumes both constant sex proportion and equal distribution over months. An OFFSET is a given variate of values that should be added to the linear expression determining the mean. In the context of multiplicative models, the usual expectation E(Y) = exp(linear expression) becomes, with an offset variate OFFSET, E(Y) = exp(OFFSET)*exp(linear expression) EXAMPLE. The above mentioned model "BIRTHS=SEX" does not take into account the fact that months are unequally long. A more interesting hypotesis would be that the numbers of births per day are i.i.d., or that the expected number per month is proportional to the number of days in the month. This model could be fitted (assuming that the year is not a leap year) by OFFS=LN([31 28 31 30 31 30 31 31 30 31 30 31](MONTH)) FITLOGLIN BIRTHS-OFFS=SEX The output from FITLOGLINEAR includes the likelihood ratio test and Pearson's approximation to it for the model against the full model (i.e. the model where each observation has its own free parameter). Be careful with these tests, they are not reliable for small (expected) counts. -------------------------------------------------------------------------------- FITLOGITLINEAR Logit linear models for binary or binomial data. ................................................................................ Syntax: FITLOGITLINEAR variate[-offset]=modelformula[/total] Fits a logit-linear or logistic regression model. "variate" is the dependent variable. The command has two different forms, depending on whether "total" is specified or not: 1. BINARY RESPONSE. Here, the response variate is assumed to be binary, i.e. only the values 0 and 1 are allowed. The model states that these 0-1 variables are independent, with (let us call them Y(i)) exp(linear expression) P(Y(i)=1) = p(i) = ----------------------------- 1 + exp(linear expression) where the linear expression is determined by the model formula, exactly as for FITLINEARNORMAL. "total" should not be specified. 2. BINOMIAL (FREQUENCY) RESPONSE. Here, the responses are assumed to be relative frequencies (notice: FREQUENCIES, not COUNTS) of the form Y(i)/M(i), where the Y(i) are independent, binomially distributed with binomial totals or indices M(i) and probability parameters p(i) parameterised as above. The variate "total" after the slash must contain the binomial totals M(i). Notice that model 1 is a special case of model 2, namely if "total" is a variate filled with 1's. Conversely, model 2 can always be regarded as a model derived from model 1 by sufficient reduction, namely summation of binary responses over covariate classes (i.e. groups in which all explanatory variables and factors are constant). The rules for translation of the model formula to a linear expression, the handling of overparameterisations etc. are exactly as for FITLINEARNORMAL, see 6-8 screenfuls above. EXAMPLE. See the "SIMPLE EXAMPLE" starting approximately 4 screenfuls below top of this file. The interpretation of the optional "offset variate" "offset" is exactly as for FITLOGLINEAR. The offset is a vector of values that are added in advance to the linear expression. You may think of it as a covariate in the model with its coefficient "frozen" at the value 1. In the binomial case, the output from FITLOGITLINEAR includes the likelihood ratio test for the model against the full model, which merely assumes that the responses are binomial frequencies with freely varying probability parameters. Pearson's approximation is not given, but it can be computed as the square sum of the normed residuals, and it also comes out if you fit the corresponding overdispersion model by FITNONLINEAR, see below. Be careful with these tests, they are not reliable when the binomial totals are small or when relative frequencies are close to 0 or 1. -------------------------------------------------------------------------------- FITNONLINEAR Nonlinear regression and generalised linear models with overdispersion. ................................................................................ Syntax: FITNONLINEAR modelspec m Dm v [init [worky [workw]]] where "modelspec" has the form variate[-offset]=modelformula[/weight] The syntax for the model specification is exactly as for FITLINEARNORMAL, except that an offset variate is allowed (cfr. FITLOGLINEAR above). However, for obvious reasons there must not be any blanks in the model specification (use +, not blank, to separate terms), nor in the following function specifications. The parameters m, Dm and v are expressions for the three functions that define the model (closely following the notation in Tjur (1998): Nonlinear regression, quasi likelihood, and overdispersion in generalized linear models, The American Statistician 52 pp. 222-227). m is the mean (or inverse link) function, Dm is the first derivative of m (required for the numerical procedure) and v is the variance (up to a common scale factor and the weights) as a function of the mean. In these expressions, the argument is written as a period '.'. Apart from this, the syntax is exactly as for the right hand side of a COMPUTE statement, with '.' entering as a (parallel) variate of the relevant length. EXAMPLES. To fit a log-linear model with variance proportional to the mean (multiplicative Poisson type structure) write FITNONLINEAR ... exp(.) exp(.) . To fit a linear model with a variance proportional to the squared mean (constant coefficient of variation) write FITNONLINEAR ... . 1 sqr(.) To fit a logit-linear model for binomial frequencies FREQ=COUNT/M (including an overdispersion parameter) write FITNONL FREQ=.../M exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.) To fit a probit-linear model instead, write FITNONLINEAR FREQ=.../M phi(.) exp(-sqr(.)/2)/sqrt(2*3.14159) .*(1-.) The nonlinear regression models that can be handled by FITNONLINEAR are characterised as follows: The observations are (in principle) independent, normally distributed. The expectation for each observation is given as a known function (m) of the "linear parameter" associated with each unit. These linear parameters are, in turn, linear combinations of covariates specified by a model formula, just like the mean in a linear model. The variance is specified as a known function (v) of the mean, multiplied by an unknown "overdispersion" or squared scale parameter, common to all observations, optionally divided by known weights. The procedure estimates a nonlinear regression model of this kind by the method known as Iteratively Reweighted Least Squares (IRLS) or Quasi Likelihood. In case of a constant variance function, this reduces to ordinary (optionally weighted) least squares or maximum likelihood. In full detail, the function parameters m, Dm and v and the model formula determine the model as follows. The i'th observation Y(i) is normally distributed with mean E[Y(i)] = mu(i) = m( [ offset(i)+ ] eta(i) ) = m( [ offset(i)+ ] beta(1)*x(i,1) + ... + beta(p)*x(i,p) ) where x(i,j) are the elements of the model matrix determined by the model formula, and variance var[Y(i)] = lambda*v(mu(i))/w(i) where w(i) is the i'th value of the weight variate (or 1, if a weight is not specified) and lambda is the overdispersion (or squared scale) parameter. The parameters to be estimated are beta(1), ... , beta(p) and lambda. The last three command parameters INIT, WORKY and WORKW are optional. They can be omitted, or replaced with an asterix to be skipped. INIT, if specified, should be a variate of the same length as all other vectors involved. The values of INIT are taken as the initial linear parameters. This may speed things up, and in some cases (in particular when the mean function has a singularity at zero, like m='1/.') such initial values are necessary for the iterative method to get started. Indeed, if INIT is not specified a variate of zeroes is used. EXAMPLE. For a model with m(eta)=exp(eta) and variance proportional to the mean you could do something like this (if not for any other reason, then to save computing time): LNY=LN(Y+0.5) {'+0.5' can be omitted if Y has no zeroes} FITLINEARNORMAL LNY=1+SEX+AGE SAVEFITTED ETA0 FITNONLINEAR Y=1+AGE+SEX exp(.) exp(.) . ETA0 SAVEFITTED ETA0 FITNONLINEAR Y=1+AGE exp(.) exp(.) . ETA0 etc. In general, suitable initial values of the linear parameters can be produced as the fitted values in a (possibly weighted) linear regression where the response variate has values computed as the observations transformed by the inverse mean function (also called the link function). In later FITNONLINEAR commands with a slightly modified model formula (same y-variate, same offset, same weights) the estimated linear parameters from a fit can usually be taken as the initial values for the next. GENERALISED LINEAR MODELS WITH OVERDISPERSION. It is a property of the IRLS method that if Y has nonnegative integer values, the estimates in the above example will actually coincide with the maximum likelihood estimates in a multiplicative Poisson model given by the same model formula. Similarly, with observations FREQ=Y/M defined as relative frequencies, after a command of the form FITNONL FREQ=.../M exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.) the IRLS estimates will coincide with the maximum likelihood estimates in the corresponding logistic regression model. Quite generally, this equivalence holds for any generalised linear model in the sense of Nelder and Wedderburn (1972, JRSS A p. 370-84). The results produced by FITNONLINEAR are thus, to a large extend, valid for generalised linear models with overdispersion as described e.g. in the book on Generalised Linear Models by McCullagh and Nelder (Chapman and Hall 1989). The standard deviations produced by LISTPARAMETERS are corrected for overdispersion, and the tests for beta(j)=0 are based on the relevant T-distribution. The F-tests in the analysis of variance table can be regarded as second order approximations to the usual likelihood ratio tests based on the Chi-square approximation, with correction for overdispersion and (using F rather than Chi square) for random variation of the estimate of the overdispersion parameter. According to the quasi likelihood interpretation (Wedderburn, Biometrika 1974, p. 439-447), the IRLS method is in fact valid quite generally for "distribution free" models specified by their first and second moments only. However, standard central-limit-theorem-type assumptions are, of course, required for F-tests, Chi-square tests etc. to be asymptotically valid. The computations are performed iteratively. Each iteration calls (with output suppressed) the StatUnit procedure FitLinearNormal with a dependent variable WORKY and a weight variate WORKW computed from the results of the previous iteration as WORKY = (Y-m(FITTED))/Dm(FITTED) + FITTED [-OFFSET] WORKW = W*sqr(Dm(FITTED))/v(m(FITTED)) (Y is the original dependent variate, W the original weight vector, FITTED the variate of fitted values from the previous fit, including the offset, if specified). The iterations are stopped when the weighted square sum of changes, defined as the sum of the quantities W*sqr(m(NEWFITTED)-m(LASTFITTED))/v(m(LASTFITTED)) is less than (1.0E-12)*ModelVariance*ResDF where ModelVariance is the present estimate of the overdispersion parameter, ResDf its degrees of freedom. A final iteration is performed resulting in an analysis of variance table with approximate F-tests for removal of terms (beginning with the last, as usual). Notice that a new fit after removal of insignificant terms from the bottom of the table will result in square sums for the remaining terms that are slightly changed, as opposed to what happens in the case of a linear model with constant variance function. After the ANOVA table FITLINEARNORMAL prints the estimate of the overdispersion parameter and its square root, and finally FITNONLINEAR adds the Chi-square test for "no overdispersion" (i.e. overdispersion parameter = 1), which coincides with Pearsons goodness-of-fit test in the multiplicative Poisson and logistic regression type situations mentioned above. Notice that the test for "no overdispersion" is irrelevant in proper nonlinear regression situtations, where the value 1 of the scale parameter plays no particular role. And also in the case of a logistic regression model for binary responses. If nothing else is specified, the variates referred to as WORKY and WORKW above are saved under the names #NL_WY and #NL_WW. Existing variates of these names will be deleted. These variates are saved because they are used by SAVEFITTED. However, you can specify other names for them as the last two parameters to FITNONLINEAR. After a call to FITNONLINEAR, ISUW commands referring to the last model fit refer to the call of FITLINEARNORMAL in the final iteration. This implies that LISTPARAMETERS will return the IRLS estimates, with standard deviations and T-tests based on the approximating linear model. SAVEFITTED FIT0 RES0 NRES0 will result in the following: FIT0 will contain the estimated linear parameters. To produce estimated means, transform to FIT = m( [ OFFSET + ] FIT0 ). RES0 will contain the quantities (Y-FIT)/Dm( [ OFFSET + ] FIT0 ). To obtain residuals in the usual sense (observations minus estimated means) multiply by Dm( [ OFFSET + ] FIT0 ), or compute more directly (with FIT computed as above) RES=Y-FIT. NRES0 will actually contain the correct normed residuals (residuals divided by their estimated standard deviations). Notice that you have also, at your disposal after a model fit, the variate of "linearised observations" (default name #NL_WY) and the workweights for the last fit (default name #NL_WW). The class of models that can be handled by FITNONLINEAR is acutally broader than indicated above, because variates may occur in the formulas for m, Dm and v. This means that some kinds of unit-specific mean and variance functions are allowed. As a trivial example (which does obviously not extend the class), notice that the command FITNONLINEAR Y=1+AGE+SEX OFFS+. 1 1/W will perform exactly the same as FITNONLINEAR Y-OFFS=1+AGE+SEX/W . 1 1 which, in turn, could be obtained simply by Y0=Y-OFFS FITLINEARNORMAL Y0=1+AGE+SEX/W In this sense, FITNONLINEAR's conventions for offsets and weights are unnecessary, because they can be built into the functions. Restrictions are obeyed in the obvious way. -------------------------------------------------------------------------------- FITCOXMODEL Proportional hazards models for survival data by Cox's partial likelihood. ................................................................................ Syntax: FITCOXMODEL exittime[*deathind][-entrytime][/stratum]=modelformula exittime must be a variate holding the times of death/censoring, and this must come first on the left hand side. The order of the three optional specifications deathind (indicator of the event death, as opposed to censoring), entrytime (times of left truncation) and stratum (a factor dividing individuals into groups with common underlying intensity) is irrelevant, they are identified by the preceeding characters ( * , - or / ). "deathind" must be specified in case of right censoring. It must be a factor on a single level and of length equal to the length of exittime. It is interpreted as an indicator of the event death. Thus, censored individuals should have the level of this factor equal to zero. entrytime should hold the times of entrance when survival times are left truncated. For each individual, the time under observation must be positive, i.e. entrytime < exittime (sharply). "stratum", if specified, must be a factor of the same length as exittime. This specification means that each stratum has its own underlying (unknown) intensity. Only the parameters of interest (given by the right hand side of the model specification) are common to the strata. A typical example is stratification by SEX, which is often required. If the same factor occurs also on the right hand side in interaction with everything else, the model is equivalent to a separate Cox model for each stratum. "modelformula" must be a model formula, involving variates and factors of the same length as exittime. Notice that a constant term 1 should not be specified, because a constant factor on the intensity is absorbed by the unknown underlying intensity. In the expression for Cox's likelihood, a constant factor simply cancels out. The model considered is given by the following expression for the death intensity of an individual: DeathIntensity(time)=lambda0(time)*exp(linear expression in covariates) Here, lambda0 is the "underlying intensity", or the intensity for an individual with all covariates = 0. The linear expression involves individual specific information like (in a medical context) sex, age, weight, smoking habits, treatment or whatever, with unknown parameters as coefficients, just as in a multiple regression model or any other generalised linear model. Cox's partial likelihood for the parameters of interest (based on the order in which events took place, disregarding the actual times where they took place) can be written as the product over all dead individuals of the fractions exp(linear expression for dead individual) ------------------------------------------------------- Sum over all indivials at risk at that time of exp(...) where ... stands for the linear expression for the single indivial in the set of individuals under risk at the time of the death. "Under risk" means present at that time (entrytime < t <= exittime) and in the same stratum as the one who died. Ties (coincident times of death/censoring) are handled according to Breslow's method, which is to include all individuals under risk just before a death in the risk set for that event. For coincident deaths this means that the individuals occur in each other's risk sets. This may seem artificial, but at least it gives a non-arbitrary correction for ties, which is acceptable in case of only few ties. Time dependent covariates can not be handled, with the following exception: If you have one or a few time dependent but PIECEWISE CONSTANT covariates, these can be handled as follows. Whenever a covariate shifts its value, let the corresponding individual "change its identity", i.e. remove it by rigth censoring and introduce a new (with the new values of the covariate) by entrance (left truncation) at the same time point. This may seem restrictive, but in principle any time dependent covariate can be handled in this way, because what matters is only the values of covariates at the finitely many timepoints where a death takes place. The problem is the construction of the data set, and the fact that this data set may become very long. Restrictions are taken into account in the obvious way. Missing values of variates for units present must not occur. Levels 0 for factors are treated as usual (no dummy generated, take care that main effects are present when interactions are specified etc.). The level 0 of the factor stratum is treated as any other level, if it occurs. In rarely occuring situations, involving few individuals and/or many covariates, the ouput will contain the following warning: Iterations stopped, fitted values are out of range - probably because deterministic model fits. This happens when a covariate or a linear combination of such is a monotone function of the time of death (or in situations where this is close to the truth). In this case, the maximum-likelihood estimate of the parameters does not exist, unless the values plus/minus infinity are allowed. Iterations are stopped due to numerical problems. The results are not reliable (and hardly of any interest either) in this case. After FITCOXMODEL, the command SAVEFITTED can be used, but only for storage of fitted values, i.e. the quantities referred to as "linear expressions" above. Residuals and normed residuals are not defined. The commands LISTPARAMETERS, ESTIMATE, SAVEPARAMETERS etc. will work in the obvious way. Similarly, TESTMODELCHANGE can be used after fit of a model and a reduced model. Notice that a model reduction here means removal of one or more terms from the model formula. Removal of a stratifying factor, for example, can not be tested in this way. ESTIMATION OF THE INTEGRATED UNDERLYING INTENSITY. This can be done simultaneously with the fit. The syntax for this is to add a final variate name to the model formula, preceeded by a slash / , like FITCOXMODEL EXITTIME*DEAD-ENTRYTIM/SEX=AGE+TREATMT/INTINT which will imply that the usual estimate of the integrated underlying intensity is saved in a variate named INTINT. The name specified after the slash must be a valid variate name, and if this variate is already declared it must be of the same length as all other vectors in play. The resulting variate will have missing values in all entries, except those correponding to individuals that are "present" and dead. In a plot of INTINT against EXITTIME, these are the upper left breakpoints of the broken line, when the function is drawn as a step function in the usual way. To produce the usual plot, connect consecutive points by a horizontal line (from left to right) and a vertical line (upwards). In case of a stratified model, the integrated intensities are estimated correctly, but notice that a point plot of this variate against exittime will be rather confusing, unless different colors or symbols are used for the strata, or all but one stratum are excluded. Notice that a fit of a Cox model without covariates, like FITCOXMODEL EXITTIME*DEAD-ENTRYTIM=/INTINT (empty model formula) will result in the usual nonparametric estimate of the integrated intensity for a homogenouos sample of individuals. A good approximation to the non-parametric Kaplan-Meier estimate of the survival function (i.e. one minus the c.d.f. of the survival time distribution) can then be computed as KM=exp(-INTINT) From a plot of KM against EXITTIME, the usual plot of the Kaplan-Meier estimate is obtained when consecutive points (beginning with (0,1)) are connected by a horizontal line (from left to right) and a vertical line (downwards). The estimated standard deviation of the estimated baseline intensity can also be computed. The syntax for this is to add an extra variate name after a plus sign, like FITCOXMODEL EXITTIME*DEAD-ENTRYTIM/SEX=AGE+TREATMT/INTINT+IISD After this, pointwise confidence limits for the estimated baseline intensity can be computed by UPPER=INTINT+1.96*IISD LOWER=INTINT-1.96*IISD or better (by log-transformation) UPPER=EXP( LN(INTINT)+1.96*IISD/INTINT ) LOWER=EXP( LN(INTINT)-1.96*IISD/INTINT ) CONSTRAINT. Since the Pascal code for this command is taken over directly from the DOS version ISU, the length of the vectors entering in the model specification must not exceed 16379. Hopefully to be changed in the future, but not a matter of highest priority. -------------------------------------------------------------------------------- FITMCLOGIT FITMCPROBIT FITMCCLOGLOG P. McCullagh's model for ordered qualitative responses. ................................................................................ Syntax: FITMC... response=modelformula where the left hand side "response" is either (1) A factor on k levels, holding the responses, or (2) A list C1 C2 ... Ck of k variates, holding the multinomial counts of individuals in the k response groups. In the first case, the "unit by unit" (no aggregation) case, "response" must be the name of a factor on k levels, where k is the number of possible (ordered) responses. The levels of this factor must be in the range 1..k (no zeroes). The length n of this factor (the number of units) must coincide with the lengths of all vectors occurring in the model formula. In the second case, C1, C2, ... Ck must be names of variates, containing the multinomial counts of the k responses in n "covariate groups". Only in this case will the command compute goodness of fit statistics and fitted values. This representation is only relevant when the model considered is specified by factors or covariates with several units for each combination of factor levels / variate values. The common length of C1..Ck must coincide with the lengths of all vectors occurring in the model formula. The model formula after the equality sign MUST contain a constant term 1 as its first term, since only k-2 cutpoint parameters are used. Thus, for k=2, no cutpoint parameters are defined, and the model is equivalent to a logit-linear, probit-linear or cloglog-linear model for binary data. For k>2, the first parameter CONSTANT represents the cutpoint between level k-1 and k, and the cutpoints THR[1]..THR[k-2] are negative, representing the differences between cutpoints 1, 2 ... k-2 and cutpoint k-1. Mathematically, the model can be stated as follows. The probability that an individual responds r (=1..k) is F(CUTPT[r] + linear expression ) - F(CUTPT[r-1] + linear expression ) (subsuming CUTPT[0]=minus infinity, CUTPT[k]=plus infinity). Hence, the response can be regarded as a discretised version of a non-observable continuous variable X with c.d.f. F(linear expression + x). In other words, the model is really a linear position parameter model with error distribution F, but the observations are only available in "rounded" form, where the "rounding" is a grouping in k intervals with unknown cutpoints. In the parameterisation chosen here, these cutpoints are CONSTANT+THR[1] < CONSTANT+THR[2] < ... < CONSTANT+THR[k-2] < CONSTANT The "linear expression" is determined by the model formula as usual. The three models available in ISUW are defined by their c.d.f.'s as follows: COMMAND F(x) distribution inverse F FITMCLOGIT F(x) = exp(x)/(1+exp(x)) Logistic logit FITMCPROBIT F(x) = PHI(x) Normal probit FITMCCLOGLOG F(x) = 1-exp(-exp(x)) Gompertz cloglog (cloglog is short for "complementary log log"). For F(x) = exp(x)/(1+exp(x)) (the logistic c.d.f.) and k=2, the usual logit-linear model for binary or binomial data comes out of it. For k>2, this model becomes a "union" of several such logistic models, in the sense that the marginal models obtained by dichotomisations of the ordered scale are ordinary logistic regression models for binary data. But the point is that the interesting parameters (the coefficients to covariates in the linear expression) are common to these marginal models; only the constant term (the cutpoint) does, of course, depend on the choice of dichotomisation. EXAMPLE. Consider a classical dose-response situation, where doses of some drug are given to animals. A standard model for analysis of this situation states that the probability of death for an animal depends logit-linearly on log(dose). A similar model could be used to describe dose-dependence of an event like "death or serious damage". This corresponds to the two possible dichotomisations of the ordered three-point scale 1: No effect 2: Seriously damaged, but not dead 3: Dead The Logistic McCullagh model for the full three-level response, which can be fitted by a command of the form FITMCLOGIT RESP=1+LOGDOSE (RESP a factor on three levels) can be regarded as a way of incorporating both binary models in one, subsuming that the interesting parameters (here the slope, i.e. the coefficient to log(dose)) is the same in the two models. Notice that FITMCPROBIT can be used to fit classical probit linear models for binary data (k=2). Similarly FITMCLOGIT can be used to fit logistic regression models for binary data - but FITLOGITLINEAR is much faster. FITMCCLOGLOG is useful in survival analysis, where this model comes out when survival times in a proportional hazards model are grouped. After any of the FITMC... commands, the command SAVEFITTED can be used, but only for storage of fitted values, i.e. the quantities referred to as "linear expressions" above. Residuals and normed residuals are not defined (or, rather, if they were they should be variates of length n*k, not n). The commands LISTPARAMETERS, ESTIMATE, SAVEPARAMETERS etc. will work in the obvious way. Similarly, TESTMODELCHANGE can be used after fit of a model and a reduced model. CONSTRAINT. Each response must occur at least once. If this is not the case, the response scale must be reduced by collapse of neighbour levels. -------------------------------------------------------------------------------- FITCLOGIT Conditional logistic regression. ................................................................................ Syntax: For binary data: FITCLOGIT response/groups=modelformula For binomial data: FITCLOGIT response/groups=modelformula/totals Consider a logit linear model of the following form. A binary response y (values 0 and 1) is assumed to have independent elements with logit( P(y=1) ) = a(g) + general linear expression. In principle, this is an ordinary logistic regression model, but the factor level g (for group) has a particular role in the following. It is assumed to represent a classification of units in groups, for which the parameters a(g) are regarded as nuissance parameters. In analogy with wellknown variance analysis concepts, we may think of the groups as blocks, and the conditional analysis performed is similar to the intra-block analysis. Let S(g) denote the sum of the responses y(i) over all units in group g. It is easy to prove that the model obtained by conditioning on these group sums has a likelihood which does not depend on the nuissance parameters a(g). This is an exclusive property of the logit-linear model, which is not shared e.g. by probit-linear models or other generalised linear models for binary data. An important special case occurs in connection with case-control studies, where each case (response y=1) is matched with a given number of controls (y=0). The controls are selected at random from a large population in such a way that they match the case with respect to characteristics that are not to be analyzed in the context (age, sex, ...). Considering the grouping into case-control groups, and conditioning on the corresponding sums (which are all 1), we obtain a model where the group parameters (and, in turn, all parameters representing effects of matched factors) disappear. "response" must be the name of a variate, containing the binary responses (0/1). In case of binomial data (i.e. when the binary responses are aggregated in covariate groups within the "conditioning groups"), response must contain the relative frequencies, and the name of the variate holding the binomial totals must be given after a slash after the model formula. "groups" must be the name of a factor or variate of the same length as response. The levels/values of this must be increasing, and they determine the groups. Typically, this vector could hold the group numbers. In a case-control data set, where the case occurs first in each group, these values can be obtained by cumulation of the reponses. But the values are actually irrelevant, only their order matters. A positive increase means that a new group begins, a proper decrease must never occur. Variates and factors in the model formula must be of the same length as response and groups. The string "modelformula" has exactly the same form and interpretation as in a call to FITLOGITLINEAR, except that the conditioning factor must be given after a slash before the equality sign, and an offset is not allowed. A typical call in connection with case-control studies might look like FITCLOGIT CASE_IND/CCGROUP=SOCGR+EXPOSURE Here, CASE_IND is a variate with the value 1 for cases, 0 for controls. CCGROUP is a variate or factor with increasing values/levels, holding the number of the case control group. SOCGRP could be a factor, containing some information about social covariates, and EXPOSURE could be a covariate holding the exposures for some suspected toxic matter or whatever. Usually relevant effects like AGE, SEX, COHORT etc. can be left out, if taken into account by the matching. If they are included in the model formula, the corresponding parameters will be set to zero, and so will the intercept parameter corresponding to a constant term 1, if included. Groups, in which all units respond 1, or all units respond 0, will obviously not contribute to the likelihood. Restrictions are taken into account in the obvious way. Units excluded are simply excluded from their group, and if this results in a "trivial group" in the above sense, the entire group is ignored. However, the grouping factor (or variate) GROUP must still be sorted, also as regards the missing values. After a call to FITCLOGIT, SAVEFITTED can be used for storage of fitted values (but not residuals and normed residuals). However, these fitted values are not very useful in themselves, because they do not contain the contributions of the conditioning factor or effects confounded with it. Exact fitted values can not be easily obtained (cfr. the problems with computation of a mean in a non-central hypergeometric distribution), but a good approximation can be obtained as follows. Use SAVEFITTED FIT and after this, fit a logit linear model (unconditional) with the conditioning group factor as the only explanatory variable and with the variate FIT as offset, like FITLOGIT Y-FIT=GROUP/M SAVEFITTED NEWFIT NEWFIT=M*NEWFIT After this, NEWFIT will contain a good approximation to the expectations of the binomial observations under the estimated model. CONSTRAINT. The number of positive responses in a group must not exceed 255. -------------------------------------------------------------------------------- FITCRASCH Conditional estimation in the Rasch model. ................................................................................ Syntax: FITCRASCH response/sgsizes=modelformula In its simplest form, the model considered here is the logit additive model for a two-way table of binary responses (referred to as the full Rasch model below), stating that the probability of a positive response is additive on the logit scale. That is, if y(row,col) denotes the binary response in the (row,col)'th cell, logit( P( y(row,col) = 1 ) ) = alpha(row) + beta(col). The usual maximum likelihood estimates for column parameters are known to have bad asymptotic properties when the number of rows (and thereby the total number of parameters) tends to infinity. This problem disappears when the conditional estimates, given the row sums, are used instead. More generally, we are considering a special case of the conditional logistic regression model (see above) defined by the following two properties concerning the groups defined by the conditioning factor (here the rows of our table): (1) The groups are equally sized (2) All covariates are functions of the internal unit number in the group. Or, in other words: Our vector of binary responses can be set up in a two-way table such that the sums conditioned on are the row sums, and such that the covariates in the logistic model (disregarding those that are "conditioned away") are columnwise constant. In particular, the largest model satisfying this is the "full Rasch model", the model with a free parameter for each column. In this context, rows are often regarded as subjects and columns as items. The idea is that each subject responds 0 or 1 to each item, and the conditioning taking place here is on subject sums or subject "scores". The command FITCLOGIT is unnecessarily slow for estimation of these models, because many quantities that need only be computed once in each iteration will be computed once for each subject in each iteration. Under the above assumptions, the conditional likelihood turns out to depend only on the item totals (i.e. sums of responses for each item) and the sizes of the score groups, i.e. the number of subjects with 1,2,...,k "correct" answers. In the command (see syntax description above) "response" is a variate of length k, holding the item totals. "sgsizes" is a variate of the same length k, holding the sizes of score groups. Its first value must be the number of subjects with score 1, its second value the number of subjects with score 2, etc. The number of individuals with score 0, i.e. no correct answers, is left out because it is irrelevant. In fact, the last value (the number of subjects with all answers correct) is also irrelevant, but it is included for cosmetic reasons (to keep all variates of the same length) and to allow for the check of consistency mentioned below. The model formula is, in the simplest case (the full Rasch model) a factor of the same length k with distinct levels 1,...,k. But it may also take the more general form of a model formula in variates/factors of length k. The command takes restrictions into account in the following sense: If a unit is missing, it means that the corresponding item is left out. Think of a table where a column is deleted. However, this will change the row sums, and accordingly one will have to change the score group sizes. Entries in the vector of score group sizes are never, in any sense, regarded as missing. This means that you can not remove an item from the analysis merely by excluding the unit number, you must modify the variate of score group sizes accordingly. In practice, a more relevant kind of restrictions has to do with fit of the Rasch model to a subset of the set of subjects. Notice that this can be done here, but it has nothing to do with restrictions on the formal set of units, which is the set of items. The values of "response" and "sgsizes" are sums over the set of subjects, and any change of the set of subjects can be performed by accordingly changing the values of these variates. EXAMPLE. Consider the table item 1 2 3 sum subject 1 1 1 0 2 2 0 1 0 1 3 0 1 0 1 4 1 1 0 2 5 1 1 0 2 6 0 0 1 1 7 0 0 1 1 8 1 1 1 3 9 0 0 0 0 10 1 1 0 2 sum 5 7 3 Let ITEM be factor of length 3 with three levels, SUM and SG_SIZE variates of length 3 with values given by ITEM SUM SG_SIZE 1 5 4 2 7 4 3 3 1 Then FITCRASCH SUM / SG_SIZE = ITEM will fit a full Rasch model. There is an obvious check of consistency, which in the above example takes the form Total sum of responses = 5 + 7 + 3 = 1*0 + 4*1 + 4*2 + 1*3 This check is performed by FITCRASCH, and the command is interrupted with an error message if the check fails. CONSTRAINT. The number of items must not exceed 255. -------------------------------------------------------------------------------- FITNEGBIN Estimation in log linear models for negatively binomially distributed counts. ................................................................................ Syntax: FITNEGBIN response=modelformula[/weight] [initalpha] The negative binomial distribution is the distribution on {0,1,2,...} with point probabilites a y P(Y=y) = (a+y-1 over y) (1-p) p ( a > 0 ). (with an obvious notation '( ... over ... )' for binomial coefficients). For integer values of the parameter a (called ALPHA in procedure output) this is the distribution of the waiting time to (or rather, the number of non-succesful outcomes before) the a'th success in a sequence of independent identical binary experiments with probability 1-p of "success". For arbitrary a>0, the negative binomial distribution can be characterised as a mixture of Poisson distributions with respect to a Gamma distribution in the following sense. If Y is Poisson distributed with a random paramater lambda, which is drawn from a Gamma distribution with form parameter a and scale parameter b, then the resulting distribution of Y is a negative binomial with "a=a" (the form parameter of the Gamma distribution takes the role of the a in the negative binomial) and p=b/(1+b). The last interpretation justifies the use of negative binomial models in situations where the usual log linear Poisson models fail due to overdispersion. The mean in the negative binomial distribution is m=a*p/(1-p), the variance m*(1+m/a) > m. The simplest kind of models (without a "weight") that can be estimated by the command FITNEGBIN can be characterised as follows. Each (nonnegative integer) observation y[i] has a negative binomial distribution. The parameter a is common to all observations, the parameter p = p[i] depends logit-linearly on background variates/factors, specified as usual by a model formula. Thus, the mean of the i'th observation becomes E(Y[i]) = a * exp( par1*X(i,1) + par2*X(i,2) + ... ) , which is of the usual log-linear form. This kind of models are fitted by commands like (in case of a model with a constant term and a single covariate) FITNEGBIN Y=1+X "WEIGHTED" MODELS. A useful generalisation is the following. Rather than assuming the parameter a to be the same for all units, we assume that there are parameters a[i] = a*n[i], proportional to a given variate n. This is typically the case if each y[i] has come out by summation of i.i.d. counts for n[i] individuals. If these counts (on a nonobservable "micro-level") follow a model of the simpler kind described above, a model with such proportional parameters a[i]=a*n[i] comes out of it. The interpretation of the unknown parameter ALPHA (=a) is exactly as before (but on the micro-level). Since the variate n here has a role very similar to a weight variate in a generalised linear model, the syntax for this has been chosen such that the "weight" variate should follow the model formula, separated by a slash, like FITNEGBIN Y=1+X/N Notice, however, that the variate N here is not merely a weight occuring directly in the summation of the log likelihood function. Moreover, this kind of "aggregation" over "micro-units" is not quite as simple as the corresponding concept in the log-linear Poisson models. In the Poisson case, this is simply a sufficient reduction. For the present kind of models, the aggregation is not a sufficient reduction, but the marginal model for the aggregated data set is a similar model, due to the convolution property of the negative binomial distribution. Sometimes the iterative estimation procedure will be interrupted by an error message stating that the information matrix is not positively semi-definite. Typically, this happens if ALPHA has become too large. The default action of FITNEGBIN is to take ALPHA=1 as the starting value. If this problem appears give an initial value of ALPHA as the last parameter, like FITNEGBIN Y=1+X/N 0.01 An initial value of ALPHA can also help if ALPHA becomes negative during the iterative maximisation (which results in an error message). If there is no overdispersion at all, convergence may fail. The result should not be trusted in this case, use a Poisson model (FITLOGLINEAR) instead. Or, if there is some overdispersion, use FITNONLINEAR (see one screenful below). Restrictions are taken into account in the obvious way. Missing values of variates among the units present must not occur. WARNINGS, LIMITATIONS. Do not rely too much on the standard deviation estimated for the parameter ALPHA (called a above). The log likelihood is not well approximated by a quadratic function in this parameter. Moreover, the test for ALPHA=0 is irrelevant. The log linear Poisson model corresponds to the value plus infinity of ALPHA. To test for ALPHA = plus infinity (no overdispersion), fit the corresponding log linear Poisson model (with log of the "weights", if present, as an offset) and look at the test against the full model. Do not use the negative binomial model when overdispersion is not present - what happens is simply that the estimate of ALPHA becomes very large, so that the model is almost a Poisson model. The computation of the log likelihood and its derivatives involves a summation from 1 to y for each unit. Thus, for large observations (e.g. y > 1000) the procedure is slow. An alternative (if you insist on a variance function of the same shape as for the negative binomial) is to use FITNONLINEAR modelformula exp(.) exp(.) .*(1+./a) trying with different values of a until the residual plot is OK. You can also handle the weigted case by FITNONLINEAR (use variance function .*(1+./(a*n)), where n is the variate of "weights"). -------------------------------------------------------------------------------- LISTPARAMETERS Lists parameter estimates and their estimated standard deviations. ................................................................................ No parameters. After any model fit command except FITANOVA, this command lists the parameter estimates, their estimated standard deviations and the T- or approximate U-tests for hypotheses of the form "parameter=0". There is one parameter for each column of the model matrix, but some of these are usually set to zero due to overparameterisations (see the description of FITLINEARNORMAL). For this reason, these tests, given by the two last columns produced by LISTPARAMETERS, should be interpreted with some care. EXAMPLE. In a simple one-way situation, after FITLIN Y=F LISTP the T-tests reported are for hypotheses of the form "mean in group f equals zero", which is rarely relevant. After fit of the same model (over-) parameterised by FITLIN Y=1+F LISTP the T-tests reported are correspond to pairwise comparisons of each F-level with the last F-level (which is sometimes relevant) - except for the first line labelled "CONSTANT", which returns the test for "mean in last group equals zero" (!). Notice that the last form "Y=1+F" is the relevant one, if the test for "no effect of F" is to be derivable from the analysis of variance table (for FITLINEARNORMAL and FITNONLINEAR only). -------------------------------------------------------------------------------- TESTMODELCHANGE Likelihood ratio test for reduction/extension given by the two last models fitted. ................................................................................ Syntax: TESTMODELCHANGE [ test-statistic [p-value]] For log-linear, logit-linear models and many other models, for which an approximate chi-square test on the likelihood-ratio test statistic is appropriate, TESTMODELCHANGE performs a simple subtraction of the log-likelihoods from the two latest model fits and a similar computation of the change in number of parameters, and computes the likelihood-ratio statistic and the relevant tail probability in the approximating Chi-square distribution. For linear normal models fitted by FITLINEARNORMAL, the relevant F-test is performed. For nonlinear models fitted by FITNONLINEAR, the approximate F-test, based on the weighted residual sums of squares from the last two models, is performed. Notice that this is not always reliable, since the weights may have changed if there is a non-constant variance function involved. The tests given in the approximate ANOVA table are probably more reliable. If the first parameter "test-statistic" is specified it must be the name of a vector of length 1 or a valid vector name which is not in use. In the latter case it becomes a vector of length 1, and the value of the (chi-square or F) test statistic is stored in it. Similarly, if a second parameter is specified, the P-value (tail probability) of the test is stored in this. To keep only the P-value but not the test statistic, write the first parameter as an asterix. The command does not work after FITANOVA. WARNING. It is a requirement - and mainly your own responsibility - that the test makes sense. In particular, the two last models fitted must be of the same type (log-linear, logistic or whatever), the number of observations must be the same (no restrictions changed in between, weights and offsets unchanged). For non-linear models, the three functions specifying such a model must be the same for the last two models, for Cox models the stratifying factor must not have changed, etc. etc. In brief, one of the two last models fitted must be a submodel of the other. -------------------------------------------------------------------------------- SAVEFITTED Computes fitted values, residuals and normed residuals after a model fit command. ................................................................................ Syntax: SAVEFITTED fitted [residuals [normedresiduals]] After FITLINEARNORMAL, FITLOGLINEAR, FITLOGITLINEAR and FITNEGBIN, this command saves - Fitted values ( = estimated means of observations) - Residuals ( = differences between observations and fitted values) - Normed residuals ( = residuals divided by estimated standard deviations) The three parameters must be unused vector names or names of variates of length equal to the number of units in the last model fit directive. An asterix * or a pseudoblank | (or simply omittance, for the parameters coming last) means that the variate is not to be computed. EXAMPLE. SAVEF * RES saves residuals in RES (and creates RES, if required), but fitted values and normed residuals are not computed. Normed residuals are computed without correction for the fact that the corresponding observation contributes to the estimation of parameters. Thus, normed residuals are typically less dispersed than i.i.d. observations from a normalised normal. For normal linear models, this is emphasised by the fact that the square sum of the normed residuals will always equal the number of observations minus the number of parameters estimated. However, this means that an extremely large normed residual can be taken as a (conservative but) safe indication of an outlying observation. For more exact outlier detection in the normal linear case, use SAVENORMEDRESIDUALS. Restrictions are not obeyed by SAVEFITTED, except in the obvious sense that if the last model fit was made under restrictions, the parameter estimates used by SAVEFITTED will be influenced by this. But the fitted values (and also residuals and normed residuals) are computed for non-present observations in exactly the same way as they are for the present obvservations. This actually means that the fitted values corresponding to observations that were not present when the model was fitted can be regarded as predictions of these "new" observations. This makes the command SAVEFITTED useful for many different purposes, such as cross-validation, replacement of missing observations and prediction in time series. SAVEFITTED can also be used after FITCOXMODEL, FITMC... , FITCLOGIT and FITCRASCH, but only for storage of fitted values (residuals and normed residuals are not defined). Here, "fitted value" means "estimated linear expression", not "estimated mean of observation". After FITNONLINEAR the command will work, but the resulting variates are not fitted values and residuals in the usual sense. See the description of FITNONLINEAR. -------------------------------------------------------------------------------- SAVEPARAMETERS Save estimated parameters from last model fit command. ................................................................................ Syntax: SAVEPARAMETERS estimates [estsd [variance] ] After any model fit command but FITANOVA, this command saves the estimated parameters in a variate. There is one parameter for each column of the model matrix, as generated by the model formula. The order and interpretation of the parameters (which can be displayed by a LISTPARAMETERS command) follows from the rules described under FITLINEARNORMAL. Notice that parameters which are set to zero due to overparameterisation are also saved. If the second command parameter is specified, a variate holding the estimated standard deviations of estimates is also created. The parameters must be valid names of non-existing variates, or names of variates of the correct length (which is the number of columns of the model matrix, or the number of lines in the table produced by LISTPARAMETERS). The last parameter "variance" makes sense only after FITLINEARNORMAL and FITNONLINEAR. It must be the name of a variate of length 1, or a valid vector name which is not in use. The effect of this is that the estimate of the error variance (or the corresponding squared scale parameter, in case of a weighted model or a nonlinear model with variance function different from 1) is stored in this variate. To skip a parameter, write it as an asterix. For example, to save only linear estimates and the estimated model variance, write e.g. SAVEPAR pars * var EXAMPLE. Suppose a log-linear model has been fitted by a command like FITLOGLIN COUNT-LOGSIZE=1+TREAT+SEX+... where TREAT is a factor on four levels. To compute and list standard approximate 95% confidence limits for the relative multiplicative effects of TREAT with level 4 as baseline, do something like this: SAVEP PARS1 SD1 VAR PARS SD 4 TRANSFER PARS1 2 5 PARS 1 4 TRANSFER SD1 2 5 SD 1 4 DEL PARS1 SD1 ESTIMATE=EXP(PARS) LOWER=EXP(PARS-1.96*SD) UPPER=EXP(PARS+1.96*SD) DEL PARS SD LIST ESTIMATE LOWER UPPER Notice: This relies on the convention that the last parameter of TREAT is set to zero, because this is where the linear dependence of columns in the model matrix is met for the first time. The last line of the listing will have ESTIMATE=1 (=EXP(0)) and LOWER=UPPER=ESTIMATE (since SD=0). To select level 1 as the baseline level, begin (before the model fit command) by a "baselining" of that level, e.g. by TREAT=TREAT-1 (setting the desired baseline level to 0, which implies that no "dummy" column is generated for that level), or make a permutation of the levels such that the desired baseline level becomes the last. See also the command ESTIMATE below. -------------------------------------------------------------------------------- ESTIMATE Outputs specified contrasts and their standard deviation. The command can be used after any model fit command. ................................................................................ Syntax: ESTIMATE term1 [term2 [...]] or ESTIMATE variate1 [variate2 [...]] In the first case, the parameters must be terms of the model formula, separated by blanks or plusses. For terms involving factors, all possible differences between parameter estimates are listed with their estimated standard deviations. For terms with variates only, the estimate of the regression coefficient and its standard deviation is given. If the term involves only a single factor, a simple plot showing how the estimates are positioned on the line is added. If more than 20 parameters are involved this is the only output you will get, since the list of pairwise comparisons is too long to be of any use. The second form can be used for estimation of quite general linear combinations of the parameters. variate1 , variate2 etc. must be of length equal to the number of linear parameters in the model, including those that are set to zero due to linear dependence (i.e. the number of lines written by LISTPARAMETERS), and the corresponding linear combinations (with the variate's values as coefficients) are estimated. EXAMPLE. If TREAT is a factor on 3 levels, the two ESTIMATE commands in the following program will give roughly the same output (except that the last one will not produce a plot) FITLIN Y=1+TREAT INCLUDEALL { required if restrictions on units 1..4 } COEFF1=[0 -1 1 0] COEFF2=[0 -1 0 1] COEFF3=[0 0 -1 1] ESTIMATE TREAT ESTIMATE COEFF1 COEFF2 COEFF3 The two kinds of parameters can be mixed as desired. For example, ESTIMATE F COEFF1 COEFF2 COEFF3 would be OK in the above example. WARNING. The quantities estimated are defined in a simple way in relation to the model matrix, namely as either coefficients to the corresponding covariates (in case of a variate argument), differences between such coefficients (in case of an argument involving at least one factor), or linear combinations of such coefficients (in case of a variate argument with length equal to the number of parameters in the model). However, in relation to the model they are not always meaningful. For example, in a R(ow) x C(olumn) two-way setup, ESTIMATE R*C makes perfectly sense after FITLINEAR Y=R*C or FITLINEAR Y=1+R*C (where it performs all the pairwise comparisons of (r,c)-means), but not after FITLINEAR Y=1+R+C+R*C because the presence of main effect terms will destroy the simple interpretation of the differences between interaction parameters. Similarly, ESTIMATE R makes no sense at all after FITLINEAR Y=1+R+C+R*C - or perhaps we should say that the sense it makes is somewhat complicated. The quantities estimated would be differences between cell means in the last column of the two-way table. After FITANOVA, the command ESTIMATE can also be used, but here it activates a quite different procedure adapted to the case of a mixed model, where contrast variances can be sums of contributions from different error strata. The only arguments allowed are fixed terms from the model formula of the last FITANOVA command, and the estimates coming out of this are always the means of observations in the groups defined by this factor or product factor. The command outputs a table of means, together with information that enables you to compute standard deviations of simple contrasts (differences between means). In the simplest case (effects of equally replicated factors with no partially confounded random factors) the standard deviation is given explicitely. In more complicated situations you must compute it from the contributions to the variance from the different strata. Since the treatment structure is always specified by the maximal model formula, estimability of linear parameters is not taken into account by this form of the ESTIMATE command. -------------------------------------------------------------------------------- SAVENORMEDRESIDUALS Computes "studentized" (T-distributed) normed residuals and the correponding tail probabilities after FITLINEARNORMAL and FITNONLINEAR. ................................................................................ Syntax: SAVENORMEDRESIDUALS nres [pvalues] When SAVEFITTED is used after FITLINEARNORMAL, it computes normed residuals simply as residuals divided by the estimated standard deviation. When hunting outliers, a more relevant definition of normed residuals is the one that makes them T-distributed with ResDF-1 degrees of freedom. A "studentized" residual can be computed by removal of the observation from the data set and fit of the model to the remaining observations. Or by extension of the model with a dummy that allows the observation to have its own, freely varying mean, and performing the T-test for the hypothesis that this term can be removed from the model. But a faster way of computing all these quantities goes as follows. Let ModelVariance*h(i) be the estimated variance of the i'th fitted value, computable as the double sum over (j1,j2) of the quantities Xmatrix(i,j1)*Xmatrix(i,j2)*ParameterCov(j1,j2). h(i) can also be interpreted as the i'th diagonal element of the orthogonal projection matrix for the linear subspace of means associated with the model. The (estimated) variance on the i'th residual r(i) is then V(i) = ModelVariance*(1-h(i)). The i'th studentized residual can now be computed as r(i)/sqrt(1-h(i)) NRES(i) = -------------------------------------------------- . sqrt( ( SS(res) - sqr(r(i))/(1-h(i)) )/(ResDF-1) ) A similar formula exists for the weighted case. The command SAVENORMEDRESIDUALS performs this computation and saves the result in the variate "nres" given as the first parameter. If this is an existing vector, it must be a variate of the correct length, otherwise it is declared as such. The second variate "pvalues", if specified, is also declared automatically, and the command will store in this the two-sided tail probabilites in the relevant T-distribution. A reasonable (conservative) criterion for an observation being "outlying" is that this tail probability is less than, say, 0.05 divided by the number of observations (because this ensures that an outlier is found in a correct model with probability at most 0.05). The command can also be used after FITNONLINEAR, but here the T-distribution must obviously be taken as an approximation. The "studentized residuals" produced in this case are those associated with the weighted regression performed in the last iteration. Restrictions are obeyed in the following sense: For non-present units normed residuals are set to zero and tail probabilities to one. In weighted models, weight=0 produces missing values for the corresponding studentized residuals (and P-values). -------------------------------------------------------------------------------- FITANOVA Analysis of variance in orthogonal designs. ................................................................................ Syntax: FITANOVA response=fixedterms+[randomterms] where the brackets here are not "syntax brackets", they should actually be there if the model contains random effects. This command performs analysis of variance for models in orthogonal designs, including certain variance component models, closely following the exposition Tjur, T. (1984): Analysis of Variance Models in Orthogonal Designs International Statistical Review 52, pp. 33-81 The theory given in that paper will not be repeated in full detail here. The syntax for FITANOVA is similar to that of FITLINEARNORMAL, with the following modifications: (1) A weight can not be specified. (2) Only factors, not variates, are allowed in the model specification on the right hand side of the equality sign. (3) Random effects in addition to the "unit-to-unit variation" (which is always assumed to be present) can be included in a final bracket. For example FITANOVA Y = 1 + ROW + COL + [ROW*COL] will estimate a model with fixed effects of ROW and COL (and a constant term), random ROW*COL effect (interaction) and a (mandatory) random UNIT effect (e.g. a measurement error, i.e. independent i.i.d. error terms as in a linear normal model). (4) The model specified must satisfy the following conditions, which - apart from non-essential modifications - are those given in Tjur (1984): A) The entire set of factors, occurring in the model specification, must be closed under the formation of minima. For example, for a balanced two-way table, FITANOVA Y = ROW + COL + [ROW*COL] will not work, because the minimum of ROW and COL (the trivial factor, represented by the constant term 1) is not present. B) The set of random factors (those occurring in the bracket) must be closed under the formation of minima, and they must all be balanced (i.e. they must group data into equally sized classes). C) Any two factors or product factors in the model must be orthogonal. It follows from these assumptions that the set of factors occuring in the model specification constitutes an orthogonal design (Tjur 1984, p. 41-42), and the set of fixed (non-bracketed) factors constitutes the maximal model formula specifying the treatment structure in this design. The analysis of variance table, output by FITANOVA, contains a line for each factor or product factor occurring in the model specification, including the random UNIT effect, with omission of square sums that are "structurally zero" (i.e. with zero degrees of freedom). The lines of the table are ordered by strata, each stratum containing the sums of squares for fixed effects in that stratum, with the sum of squares for the associated random factor as the last line, labeled "Residual". The lines for fixed effects give the F-tests for removal of the corresponding factors from the model. The "Residual" lines (for random effects) give the F-tests for removal of the corresponding random terms from the model, whenever this is a legal hypothesis (see Tjur 1984, p. 58). The estimated eigenvalues of the covariance matrix are found in the analysis of variance table as the "Residual" MS's. In the list of estimated variance components, the column SD gives the estimated standard deviations of these estimates, based on their interpretation as linear combinations of the Chi square distributed, independent eigenvalue estimates. Normal confidence limits, based on these standard deviations, are only reliable for large residual degrees of freedom in the corresponding stratum and all higher strata. The estimate "Total variance" (the sum of the variance components, i.e. the variance on a single observation) is similarly equipped with a standard deviation, computed in the same way. Missing minima of (product) factors, which can not be generated in a simple way, can be constructed by the command CONSTRUCTMINIMUM, see below. Estimation of fixed effects can be performed by the command ESTIMATE, see approximately 8 screenfuls above. LISTPARAMETERS, SAVEFITTED, SAVEPARAMETERS and SAVENORMEDRESIDUALS can not be used. PSEUDO STRATA. The formal requirement that the set of random factors should be closed under the formation of minima implies, e.g., that a model for a two-way table with random row and column effects can not be estimated by FITANOVA Y=1+[ROW+COL+ROW*COL] since the minimum 1 of ROW and COL is not among the random effects. You will have to use FITANOVA Y=1+[1+ROW+COL+ROW*COL] instead. However, this implies that the eigenvalue parameter for the "pseudo" stratum CONSTANT STRATUM is set to zero, because it can not be estimated. Accordingly, the variance on the grand mean is formally estimated as zero. However, the procedure ESTIMATE assumes that variance components for such strata should be set to zero, so the variance of the grand mean in this case (in output from an ESTIMATE 1 command) will be computed from the contributions to the variance from the three "non-pseudo" strata. This may result in a negative value for the estimated variance on the grand mean. More generally, variance components corresponding to strata with zero degrees of freedom for the residual will be skipped when contrast variance estimates are computed, and this may result in negative values of these estimates. The response variate must not contain missing values, and the factors involved must not have units on level zero. Restrictions are taken into account in the obvious way. However, the exclusion of units with missing or outlying responses will usually destroy orthogonality and balancedness. To obtain an approximate solution, replace (a few) missing values or outliers with suitable "typical values", e.g. group averages or predicted values from a linear model. CONSTRAINTS. The maximal number of levels for a product factor is 2049. The maximal number of fixed terms is 40, the maximal number of random terms is 10, and no term must be a product of more than 8 factors. -------------------------------------------------------------------------------- CONSTRUCTMINIMUM Construction of the minimum of two (products of) factors. ................................................................................ Syntax: CONSTRUCTMINIMUM prodfac1 prodfac2 mininum CONSTRUCTION OF PSEUDO FACTORS. Sometimes the closedness-under-minima constraint in FITANOVA enforces the inclusion of fixed-effect factors which are not statistically meaningful. Such factors are called pseudo-factors. Very often such factors will not be given in the data set, so you will have to construct them. The minimum of two given (products of) factors can be constructed by the command CONSTRUCTMINIMUM. The three parameters are strings. The first two must be names of existing factors or products of such, the last must be a valid identifier of a non-existing vector. The minimum of the factors given as parameter 1 and 2 is stored in a factor of name parameter 3. CONSTRAINTS. Since the result is stored as a single factor, the number of levels for the minimum must not exceed 255. Restrictions are NOT taken into account. -------------------------------------------------------------------------------- BARTLETT Bartlett's test for variance homogeneity. ................................................................................ Syntax: BARTLETT variate factor The variate and the factor must be of the same length. Bartlett's test for equality of the variances in the "variate"-samples in the one-way setup defined by "factor" is performed. Restrictions are taken into account, and empty groups or groups with a single observation are ignored. The variances in groups are listed, and Bartlett's test with bias correction is performed. If only two samples are present, also the relevant F-test on the proportion between the two variances is performed. Here, the right tail probability in the F-distribution is given for the ratio between largest and smallest variance. To check for constant variance across the groups determined by a factor in a linear model which is not just the one-way model determined by the factor, use BARTLETT with the variate of residuals as the first argument. This is a reasonable approximation when the number of observations is large compared to the number of parameters in the model. -------------------------------------------------------------------------------- CORMAT Writes a table of correlation coefficients for a set of variates. ................................................................................ Syntax: CORMAT variate1 variate2 [variate3 [...]] [stars] The variates must be of the same length. The procedure writes a table of correlations between the variates. The last (optional) parameter "stars" controls the printing of significance indicating stars. Without this parameter, no such printing takes place. If stars = * , a single star indicates significance on (two-sided) level 5%. If stars = ** , a double star in addition indicates significance on (two-sided) level 1%. For stars = *** , a triple star means significance on (two-sided) level 0.1%. To avoid line overflow, the number of decimals for the correlation coefficient becomes smaller when the number of stars is increased. The number of decimals after the point is 5 minus the number of stars. If you specify four or more stars, the number of decimals becomes 5 and no stars are printed, but a table of P-values is displayed instead. In this context "significance", of course, refers to the distribution of the empirical correlation coefficient in the normal case when the theoretical correlation is zero. Restrictions are obeyed and missing values are treated as non-present. But for each pair of variates, only the missing values of the two variates involved are excluded, not the units corresponding to missing values for other variates in the list. Thus, if you want a proper (positively definite) empirical correlation matrix, EXCLUDEMISSING for all variates involved must be used before CORMAT. -------------------------------------------------------------------------------- WILCOXON Wilcoxon (or Mann-Whitney) two sample test ................................................................................ Syntax: WILCOXON variate factor level1 level2 Performs Wilcoxon's nonparametric rank sum test for comparison of two empirical distributions, with the normal approximation to the distribution of the test statistic. The two samples of observations are defined as the values of the variate corresponding to units given by the two factor levels of the factor. Only present values are taken into account, and missing values are treated as non-present. Ties are corrected for by simple averaging, and the two extreme values of the test statistic, consistent with the ties, are also computed. -------------------------------------------------------------------------------- SPEARMAN Spearman's rank correlation test for independence ................................................................................ Syntax: SPEARMAN variate1 variate2 The two parameters must be names of variates of the same length, with at least two values present and no missing values present. Spearmans rank correlation (for the values present) is computed, and the approximate test for no dependence (assuming sqrt(n-1)*RankCor normalised normal) is performed. If ties are present, the result comes out as the average of the two extreme values obtainable by arbitrary ranking within tie groups (and these "extreme rank correlations" are reported also). As for WILCOXON (see above), this averaged test is conservative. Rejection of the hypothesis "no dependence" is reliable, but acceptance is, in principle, only possible when the two "extreme" tests agree on this. -------------------------------------------------------------------------------- SHOW Shows the contents of a file. ................................................................................ Syntax: SHOW [filename, directory or path] When the command is used without parameters, the sessions output file is shown. Here, some color effects are used to make the file more readable. For this reason, the output file should in general be kept short, otherwise it will take a long time to form the image. Do not save LISTings of very long data sets, they are useless anyway. If you really want to save such listings (for example as input to other programs), redirect the output to another file by an OUTFILE command, see 4 screenfuls below. Also, if you write programs with loops, use ECHO 0 and OUTFILE 0 before the loops, otherwise the output file will be filled up with command echoes and output from each loop. If, by mistake, you have created a very long output file, you can reset it by an OUTFILE * command (unless the file contains things that are important, in this case you will have to QUIT the session and start a new; or copy and paste to an "emergency backup file" which you create by an EDIT command). If the parameter is a valid directory name or a path, a file selection menu appears. For example, SHOW *.* { or just SHOW . } allows you to search among all files in the present directory (i.e. the working directory), whereas SHOW ..\proj2\*.JPG can be used if you want have a look at the JPG pictures on the sibling directory proj2. If the parameter is the name of an existing file, this file is chosen. The action of the SHOW command depends on the file extension: .OUT : The file is shown as an ISUW output file (to view output files from earlier ISUW sessions). .JPG, .BMP : The picture is shown in ISUW's own picture viewer. Use a SHELL command if you want Windows' default device for these extensions. .SUD : Writes the list of vectors stored in the StatUnit data set, their types, lengths, number of levels for factors and the space they occupy in bytes. Vector names that coincide with names of existing vectors are marked with an asterix. .ISU, .TXT, .DAT, .BAT, .CMD, .PS, .EPS, no extension : Shows the (plain text) file in ISUW's "read only editor". When a text file is viewed, you can use standard Windows features to select a portion of the text and copy (Ctrl-C) it to the clipboard, from which you can paste it into e.g. the editor by Ctrl-V. Use Ctrl-F to search for a given text string. The search is forwards from the cursors position, ignores case and multiple blanks and interpretes newline symbols as blanks. Repeat the search by Ctrl-L. Use Ctrl-P to print the whole file or the selected portion. SHOW commands are ignored in programs, except that the last SHOW command in a program is executed after the program has ended. More about this in the description of RUN. To open a file by its default Windows application determined by the file extension, use the SHELL command. -------------------------------------------------------------------------------- REMARK Writes a message to the output file. ................................................................................ Syntax: REMARK text Writes "text" to the output file. If ECHO is off (see below) or an alternative output file has been selected by OUTFILE (see 2 screenfuls below), the text is simply written (leading blanks are ignored, multiple blanks are replaced with single blanks). Avoid text exceeding 80 characters, the lines will not be broken. -------------------------------------------------------------------------------- QUIT Terminates the ISUW session. ................................................................................ No parameters. In interactive mode, the same happens if you press Escape from an empty command line, and then press Return. Before the session is terminated you are given the option to save the output file (temporarily created under the name ISUWOUT.TMP on the ISUW root directory) as a file with extension .OUT. These files are plain text files and can as such be imported to text handling programs, or viewed later from other ISUW sessions by the SHOW command. -------------------------------------------------------------------------------- ECHO Turns ISUW command and error message echo on/off ................................................................................ Syntax: ECHO [realexpression] The default action of ISUW is to echo all commands and error messages to the output file, which means that the output file will contain a complete log of the session. If you want to create a condensed output file, consisting only of selected output, use ECHO as the first command. When ECHO is used without parameters, it simply toggles ECHO on or off. With a real number or expression as the parameter, a positive value means that ECHO is set on, whereas zero or a negative value sets it off. Typically, you will use the parameters 1 for on, 0 for off. Another way of creating a condensed output file without command echoes, error messages etc., is to select an alternative output file by OUTFILE, see below. ECHO 0 should be used in most cases before loops in a program. Without this, the sessions output file will very quickly be filled up with echoes of the commands in the loop. -------------------------------------------------------------------------------- OUTFILE For redirection of StatUnit output to another file than the sessions output file. ................................................................................ Syntax: OUTFILE [filename] EXAMPLE. To write the output from a LIST command on a file LIST.TXT (which is the simplest way of exporting data to another program or statistics package), write OUTFILE LIST.TXT LIST ... OUTFILE % and just to check that everything is OK SHOW LIST.TXT OUTFILE without parameters (the third command in the program) redirects output to the primary output file. If this is already done, nothing happens. Do not write the name of the sessions temporary output file explicitely, this will overwrite the file instead of appending to it. Quite generally, if the parameter is the name of an existing file, that file will be overwritten without warning. When output is redirected in interactive mode, the result of an output generating command is previewed as usual, and the way you leave the read-only-editor (by Return or Escape) determines whether output is saved or suppressed. The only difference is that it is written to another file. Command echoes, error messages etc. are still written to the sessions original output file. You can use the file name 0 or NUL to suppress output entirely. In interactive mode, output will still be shown in the preview window, but regardless of whether you leave this window by Return or Escape the output will be lost. The command OUTFILE 0 should be used before loops in a program, if the loop contains output generating commands - unless you really want to see the output from all the loops. A special emergency version of the OUTFILE command has the form OUTFILE * The effect of this is that the sessions output file is "reset". All lines (except the first four) are deleted. You can use this to get rid of all output produced until now, for example if you have created a very long and unhandy output file by LISTing a long data set or forgetting to turn command echo off before the execution of a program with loops. But notice that all output produced earlier in the session is lost. If there is something you want to keep, you will have to save it on another file by copy and paste before you use the OUTFILE * command. Notice also that this command acts on the sessions "official" output file, not on a file to which you have redirected output by an earlier OUTFILE command. Such files can be reset simply by reopening them. -------------------------------------------------------------------------------- EDIT Edits ISUW programs and other plain text files. ................................................................................ Syntax: EDIT [filename] If the command is given without parameters or with a path or file mask as its parameter, a file selection menu appears. For ISUW programs (see the RUN command below), the file extension .ISU can be omitted. To edit a file without extension, add a period as the last character of the file name. When you edit a file you can - press Ctrl-Return to truncate the line from the cursor's position. - press F1 to get help. - press F2 to save changes. - press F10 to display the sessions output file. - search for a word or phrase by Ctrl-F. - repeat last search by Ctrl-L. - press Escape to leave the editor. When you edit an ISUW program you can, in addition, - press Ctrl-F1 to get help on the command in the present line. - press F9 to execute the program without leaving the editor. If an error occurs, the corresponding line of the program will be marked. Use a SHOW command as the last command of the program if you want to have a look at the result right after execution. If a portion of the text is marked as a block, only the block is executed. - use shortkeys defined by the KEYS command (with the obvious exceptions Ctrl-F, Ctrl-L, F2, F9 and F10). They will act roughly as they do from the command line, except that a final exclamation sign is replaced with a line shift, and the text of the present line will only be substituted for question marks in the KEYS string if it is marked (highlighted). However, the marked portion of the text must be contained in a single line, otherwise nothing happens. In addition to this, the editor has the following standard features of Windows editors. Mark (highlight) a block by the cursor arrows with the Shift key down, or use the mouse. To mark the entire text as a block, press Ctrl-A. When a block has been marked you can do the following: - Delete it by the Delete key (or overwrite it by any character key). - Delete it and copy its contents to a hidden "clipboard" by Ctrl-X. (these two operations can be regretted by Ctrl-Z) - Copy its contents (without deleting it) to the clipboard by Ctrl-C. Later, you can (optionally while editing another file, perhaps even in another editor) - Paste the contents of the clipboard into the text at the cursors position (or to replace marked text) by Ctrl-V. EDIT commands in programs are ignored. -------------------------------------------------------------------------------- RUN Executes a program. ................................................................................ Syntax: RUN [filename [par1 [par2 [...]]]] In many situations it is convenient to write a sequence of commands line by line on a text (ASCII) file, for later error correction, modification and reuse. Such a program file - call it PROGRAM.ISU - can be created and executed by EDIT PROGRAM RUN PROGRAM It is also possible to execute a program directly from the editor, see the description of EDIT right above. Notice that the extension .ISU - which is required - is not written as part of the file name in the EDIT and RUN commands. If the file name is omitted or replaced with an asterix, a file selection menu (path *.ISU) appears. The optional additional parameters par1, par2, ... are for parameter substitution in the program, see the description of SUBSTITUTE approximately 4 screenfuls below. The file must contain one command per line, except that a backslash \ at the end of a line means "append next line". The backslash can break the command at any point, also in the middle of a connected word. This means that a blank is not inserted automatically - don't forget to insert a blank before the backslash or in the beginning of next line, if there should be one. The program executes roughly as if you were typing the file line by line from the keyboard, unless GOTO commands are used to break the order. However, the commands are not stored in the upper "reuse" window of ISUW's front window, and all output is written directly to the sessions output file without any previewing. Thus, if you want output suppressed for certain commands, you must surround these commands by OUTFILE 0 ... OUTFILE commands. Commands can be truncated as far as they are unique. However, command field shortkeys (like A for INCLUDEALL) can not be used. COMPUTE commands can be written directly without the command name and without a leading blank, they are recognized by the appearence of an equality sign right after the first connected word. However, COMPUTE commands without a right hand side must start with = or + (or COMPUTE). Graphics output produced by PLOT and HISTOGRAM commands is previewed as in interactive mode. However, an exception from this occurs when a PostScipt file is open. In this case, the images are removed from the screen immediately, implying that you do not have to press Escape each time a picture is formed and sent to the PostScript file. The 3-d versions of PLOT and HISTOGRAM can not be used in programs. If an error occurs during the execution of a program, it will abort with an error message. The message on the screen contains information about which line the error occurred in and which loop (if not the first). The number of loops, in this context, is the number of times that the program file was opened and read from the beginning, see the GOTO command 7 screenfuls below. Edit the program to correct the error, remove vectors that were imported to or created by the program before the error interrupt, and RUN it again. The Escape key can be used to interrupt a program if it takes more time than expected or is stuck in an infinite loop. Escape will force the program to exit after execution of the command presently handled, just as if an error had occurred in that command. Notice that vectors that are (explicitely or implicitely) declared in the program or imported to the program by GETDATA will be present when the program terminates (except for "$-vectors", see below). This means that you will typically run into name coincidence conflicts if you forget to delete these vectors before you run it again. For larger programs, it is usually a good idea to write the program such that it imports and creates its own data, and let the first line of the program be a DELETE command without parameters. A useful convention for "local variables" is this: If the name of a variate or factor begins with a dollar sign it is deleted at exit from a program, also if the exit is due to an error or a keyboard interrupt. Notice that ALL vectors beginning with a $ are deleted, also those that were present when the program started. Thus, a good advice is not to use $-names for anything else. Lines can be indented as desired, and empty lines can be inserted, to make the program more readable. Comments can be inserted in two ways: (1) Lines starting with a percentage sign '%' (optionally after some blanks) are ignored. (2) Text in curled parentheses {} within a line is ignored. Such comments are not echoed to the output file (use REMARK for this). Nested programs are not allowed, i.e. RUN commands must not occur in programs. Also, OPENCOMMANDFILE and the 3-D versions of PLOT and HISTOGRAM are forbidden. EDIT and SHELL commands in programs are ignored. SHOW commands have a special role in programs. Or rather, the last SHOW command in a program has a special role, since this is the only one that is executed, and this is done AFTER the program has executed. For this reason, it is usually most convenient to place the SHOW command as the last command of the program. A construction which is particularly useful when a large program is tested from the editor is the following. Let the program start with a command like OUTFILE tmp and end with SHOW tmp This implies that all output is written to a temporary file (here called tmp), which is overwritten each time the program runs. As long as there are errors in the program you will stay in the editor, but as soon as the program executes without errors, its output will be shown immediately. -------------------------------------------------------------------------------- SUBSTITUTE Parameter substitution in programs. ................................................................................ Syntax: SUBSTITUTE string1 [string2 [...]] where string1 , ... are connected strings (i.e. without blanks). This command, which makes sense only in programs, represents the closest ISUW comes to a macro or procedure facility. The action of a SUBSTITUTE command is best explained by a small example: Suppose that a program LOGIT1.ISU, to be executed by a RUN command, consists of the two lines SUBSTITUTE model FITNONLINEAR model exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.) Then the effect of writing e.g. RUN LOGIT1 freq=1+group+sex+age/count is that the (long and tedious) command FITNONLINEAR freq=1+group+sex+age/count \ exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.) is executed - which actually means that a logistic regression model with overdispersion is analysed. The pseudo-parameter "model" is simply replaced with the first connected string after the file name in the RUN command. The general idea is that the SUBSTITUTE command specifies a list of strings separated by blanks. Whenever one of these strings occurs later in the same program, it is replaced with the corresponding element of the list of parameters in the RUN command that called the program. For this reason, the one and only SUBSTITUE command in a program should quite generally be placed in the beginning, and certainly not in a loop. It is also possible to give the replacement strings directly in the SUBSTITUTE command. This is particularly useful when you are writing and testing a program in the editor. The syntax for this is, for the program LOGIT1.ISU above, to write SUBSTITUTE model=freq=1+group+sex+age/count FITNONLINEAR model exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.) This program will RUN without additional parameters. The word "model" will be replaced with "freq=1+group+sex+age/count" whenever it occurs in the lines following the SUBSTITUTE command. Even if the RUN command specifies an additional parameter, this will be overwritten by the specification following the first equality sign in the SUBSTITUTE command. The general rule is that if a parameter in the SUBSTITUTE command contains at least one equality sign, everything before the first equality sign becomes the string to be substituted, everything after becomes the string that will replace it. An obvious consequence of this is that the words to be substituted in a program must not contain equality signs. The substitution works by simple case-sensitive comparison of substrings. There are some obvious problems with this. Things can break down if one string is a substring of another, or a substring of a string somewhere in the program, or of a string which is already substituted for another string. An easy way of avoiding such problems is to let all strings in the SUBSTITUTE command begin and end with special characters. A simple and useful convention is to let substitute strings be names in brackets. Since brackets are not used for much else in ISUW, this is rather safe. EXAMPLES. The following program BINSIM.ISU can be used for simulation of binomal observations. When called on the form, say, RUN BINSIM Y 0.4 30 it will fill the existing variate Y with simulated observations from a binomial distribution with probability parameter 0.4 and binomial total (index) 30. BINSIM.ISU: ECHO 0 { to avoid echoes of all the loops } SUBST [VAR] [P] [N] VAR $BIN [N] $I=0 {since $I is undefined it becomes a variate of length 1} %LABEL $I=$I+1 $BIN=(RANDOM<[P]) { $BIN is filled with Bernoulli variables } [VAR]($I)=SUM($BIN) { ...and the $I'th entry of [VAR] becomes } { the sum of these } GOTO %LABEL $I<##([VAR]) { GOTO is described below } ECHO 1 Notice that the two auxillary variates $BIN and $I are deleted automatically at exit, because they have a dollar sign as the first character of their names. The following program OUTLIERS.ISU, when called on the form, say, RUN OUTLIERS Y 1+SEX+SEX*AGE 0.05 performs some outlier detection in connection with the linear regression model specified by the first two parameters (here 'Y=1+SEX+SEX*AGE'). A plot of fitted values against studentized residuals (see the description of SAVENORMEDRESIDUALS), with a color marking of outliers, is produced, and the outliers, if any, are listed. Here, an outlier is defined conservatively in such a way that the probability of finding a positive number of outliers in a correct model will not exceed the number specified as the third parameter, here 0.05. The program is complicated because the listing of outliers is performed under restrictions, and the original restrictions, if any, must be reestablished. And also because we want a special action to be taken in the case where no outliers are detected. The contents of OUTLIERS.ISU: SUBST [y] [model] [alpha] FITLIN [y]=[model] SAVEFIT $fitted SAVENORMED $nres $p FAC $sign $pres ##([y]) 1 VAR $unit ##([y]) $pres=1 { $pres becomes 1 for units present, 0 otherwise } $unit=# $sign=($p<[alpha]/#([y])) { $sign becomes 'outlier-indicator' } XTEXT Fitted values YTEXT Studentized residuals PLOT $fitted $nres $sign=7,12 =* % Reestablishing default labels for plots: XTEXT YTEXT % Restrict such that only the outliers are present: FOCUS $sign 1 GOTO %no outliers #($sign)=0 REMARK Strictly significant outliers (alpha=[alpha]). LIST $unit:5:0 [y] $fitted $nres $p GOTO %continue 1 %no outliers REMARK No outliers detected (alpha=[alpha]). %continue % Reestablishing initial restrictions: INCLUDEALL FOCUS $pres 1 SHOW When a program with a SUBSTITUTE command is executed, the echo of command lines after the SUBSTITUTE command will appear as they are after the substitution. -------------------------------------------------------------------------------- GOTO Controls conditional jumps to labels in a program. ................................................................................ Syntax: GOTO label realexpression The first parameter "label" must be a text string holding the full contents of a line somewhere else in the program. It is preferable to use comment lines as labels (either "echoed" comments REMARK ... or "non-echoed" comments % ...). Comments in curled parenthesis can not be used because they would be removed from the GOTO command before its execution. EXAMPLE. A program of the form ... % Loop starts here ... GOTO % Loop starts here 1 ... will create an infinite loop which - unless the program creates some overflow error - can only be broken by the Escape key. Notice the last parameter 1, which (as would any other positive constant) implies that the GOTO statement is actually executed. In more relevant constructions, the last parameter is a real expression, and the GOTO statement is only executed if this expression returns a positive value. With the usual translation of reals to booleans, you may think of the statement as having an invisible IF before the last connected string. When a GOTO command with a positive value of its last parameter is executed, the following lines of the program are read and skipped until a line consisting of the text "label" is found. If the program file is read through, it is reopened and read once more from the beginning. If a line with the correct text is found, execution of the program is taken up again right after (notice, AFTER) that line, otherwise the program is left regularly (i.e. without any warning or error message). This implies that a construction like ... GOTO %EndOfProgram n=100 ... %EndOfProgram will work as intended, even if the last line is forgotten or misspelled (provided that no other line with exactly this content is found). The line to be searched for must match "label" litterally, also as regards upper/lower case of letters. But leading, trailing and multiple blanks are ignored in the comparison. The second parameter "realexpression" - in this case defined as the last connected word of the command - must be a valid right hand side of a COMPUTE statement with a variate of length 1 on its left hand side. WARNING. Programs with loops that are executed many times should in general have the command ECHO 0 somewhere before (or in) the loop. If the loop contains output producing commands, also the command OUTFILE 0 should be found somewhere before (or in) the loop. Otherwise, all commands (and their output) will be appended to the output file each time, and this takes unnecessarily long time and fills up the output file with useless garbage. EXAMPLES. The following program gives a variate I of length 1 the values 1, 2, ..., 10. I=0 ECHO 0 { commands in the loop not echoed } %LOOPSTART I=I+1 COMPUTE I { write I to the sessions output file } GOTO %LOOPSTART (I<10) ECHO 1 { reestablish command echo } SHOW { to see what came out of this interesting program } A more interesting example is this. Suppose that X and Y are variates of length 365. Think of them as time series sampled over a period of 365 days. The following program fits the linear regression model "Y=1+X" 345 times on data from the 20-day period up to and including day I, where I takes all the possible values 20, 21, ..., 365. The slope estimates are stored in the variate BETA, and the final plot shows the development of this locally estimated regression coefficient over time. Similar programs could be used for kernel smoothing by local polynomial regression, optionally with other kernels than the rectangular (using weights rather than restrictions). VAR DAY BETA 365 DAY=# I=19 OUTFILE 0 { To avoid 345 ANOVA tables } %loopstart I=I+1 EXCLUDE DAY>I DAY<=I-20 FITLIN Y=1+X INCLUDEALL SAVEPAR PARS BETA(I)=PARS(2) ECHO 0 { To avoid 344 additional echoes of the loop } GOTO %loopstart I<365 OUTFILE ECHO 1 PLOT DAY BETA =7 =L Another useful example follows here. It is wellknown that even experienced statisticians tend to be overcritical when normality assumptions are checked graphically. The only really valid way of doing it is by comparison of the histogram (or the probit diagram) with a set of similar plots for data sets of the same length, where the normality assumption holds. The following program NORMHIST, when called on the form, say, RUN NORMHIST 20 100 -4.0,8,4.0 will display 20 histograms of 100 pseudo-normal variables with the vertical axis from -4.0 to 4.0 divided into 8 intervals. NORMHIST.ISU: ECHO 0 SUBST [Hist] [Obs] [Int] VAR $U [Obs] VAR $I 1 $I=0 FRAMETEXT Histogram for [Obs] normal observations XTEXT | %Label $I=$I+1 $U=normal $U=$U-MEAN($U) $U=$U/SQRT(VARIANCE($U)) HIST $U=[Int] GOTO %Label $I<[Hist] FRAMETEXT XTEXT ECHO 1 A final example. In the description of SAVEDATA it was explained how this command can be used to create sub-datasets where excluded units are "physically deleted". This is the easiest way of doing it, but just for illustration, we show here how it can be done in a more direct way. The following program EXTRACT.ISU, when called on the form RUN EXTRACT COUNTER X constructs a variate COUNTER, which contains the indices for the units present in vectors of the same length as X. Thus, the length of COUNTER becomes the number of units present in X. After this, you can construct a "short" version of X (or any vector of the same length) by INCLUDEALL COMPUTE SHORT_X=X(COUNTER) EXTRACT.ISU: subst [c] [x] var [c] #([x]) var $present $unit ##(x) $present=1 { $present becomes "restrict indicator" } includeall $unit=$present exclude 1 $unit=$unit+$unit(#-1){ $unit becomes cumulated restrict indicator } includeall $i=0 { running unit number } %loopstart $i=$i+1 goto %skip $present($i)=0 [c]($unit($i))=$i %skip echo 0 { loop echoed only first time } goto %loopstart $i<##(x) echo 1 exclude $present=0 { reestablish restrictions } -------------------------------------------------------------------------------- OPENCOMMANDFILE Enables you to import commands from a text file. ................................................................................ Syntax: OPENCOMMANDFILE [filename] An ISUW command file is roughly the same as an ISUW program. The file must have extension .ISU, and in the command syntax it must be written without extension. If the command is used when a command file is already open, the present command file is closed and the new is opened. In particular, if the command is used without parameters, the present command file - if any - is closed. A blue command field indicates that a command file is open. Escape from a blue command field generates an OPENCOMMANDFILE command. The effect of this command is that the lines of the file specified can be imported one by one to the command field. The next line of the file is imported whenever the CursorDown arrow is pressed from an empty command field. Thus, it is essentially nothing more than an editor device associated with the command field. As long as you don't press the CursorDown arrow, everything works as usual in interactive mode. Conversely, if you import commands one by one and press Return as soon as a command has been imported, this is just a slow way of executing an ISUW program. You could obtain almost the same by a RUN command. However, there are some important differences. Output is previewed as usual, and you can append it to the output file by Return or suppress it by Escape as usual in interactive mode. The commands that are forbidden in programs executed by RUN (RUN, EDIT, SHELL and the 3d versions of PLOT and HISTOGRAM) can be used when a program is executed in this way. Conversely, GOTO and SUBSTITUTE can not be used. The advantage of executing a sequence of commands in this way is that you can skip commands as desired, insert other commands and make changes to the commands imported from the file before you execute them. If an error occurs you can usually correct it and continue - the command file is still open for import to the command field. The rules for breaking of long lines and truncation of commands are as described for RUN. Comments in the form of lines starting with a percentage sign are skipped, but comments in curled parentheses within a line can not be used (unless they are removed manually before Return is pressed). The demonstration programs on the directory DEMOS under the ISUW root directory are executed in this way, with the additional feature that the %-comments before each command are displayed in a separate window. You can actually write your own demonstration programs for special purposes and place them on the directory DEMOS. -------------------------------------------------------------------------------- SHELL Starts another Windows program, or opens a file by its default application. ................................................................................ Syntax: SHELL [command and/or filename] EXAMPLES. To edit a program PRG.ISU with NotePad, if you prefer this for ISUW's own EDIT command, write SHELL NOTEPAD PRG.ISU To view a PostScript file page1.ps (created e.g. by OPENPS etc.) and optionally print it, simply use SHELL page1.ps provided that GhostScript or a similar device is set up as the default handler of .PS files on your computer. To invoke Windows Explore, use the SHELL command without parameters or with the name of a directory as the parameter. Be careful, there are many things you can do to generate an immediate uncontrolled crash, like deleting the file ISUWOUT.TMP on the ISUW root directory, or starting a second ISUW session under the same ISUW root. If this happens, it may be necessary to perform an emergency closedown (press Ctrl-Alt-Del and use Window's task manager). SHELL commands in programs are ignored. -------------------------------------------------------------------------------- KEYS Programming the keyboard. ................................................................................ Syntax: KEYS [keyname [string]] EXAMPLE. After the command KEY AR RAMSTATUS! the key combination Alt-R gives the same result as you would obtain by writing RAMSTATUS in the command field and pressing Return. Programmable keys are F2..F12 (F2..F12) Alt-F2..F12 (AF2..AF12) Ctrl-F2..F12 (CF2..CF12) Shift-F2..F12 (SF2..SF12) Alt-A..Z (AA..AZ) Alt-0..9 (A0..A9) Ctrl-A..Z (CA..CZ) except... (see below) Ctrl-0..9 (C0..C9) where the exceptions of the type Ctrl-letter correspond to the letters C, X, V, A and Z, which are reserved for standard Windows editing purposes (copy, cut, paste, select all, regret). The lists in parentheses show the way keys are referred to in the first parameter keyname. For example, KEY AF4 DELETE! will imply that Alt-F4 performs a DELETE command without parameters (thus overwriting the default Windows action of Alt-F4, which is to terminate the ISUW session). Without the final '!', a programmed key simply inserts "string" in the command line at the cursors position, and no execution takes place before you press Return. If a portion of the text in the command window is selected (marked), this text is replaced with "string". EXAMPLE. Suppose you have just executed a READ command like READ UNIT SEX=Unknown,Male,Female HEIGHT GROUP Since ISUW has no other facility for saving names of factor levels, you might want to save the string 'SEX=Unknown,Male,Female' for later use. To get this string written whenever Alt-6 is pressed, write (reusing some of the previous command line) KEY a6 SEX=Unknown,Male,Female The effect of an exclamation sign - which must be the last character of the string parameter - is that the command is executed immediately. In this case, the string must be a command, possibly with some parameter substituted with a question mark, see below. Think of the '!' as a code for the Return key. If the string is a COMPUTE command, the command name can not be omitted. In this case the text in the command window is overwritten, unless the string defined by KEYS has a question mark somewhere. If the KEYS string contains a question mark "?" somewhere, this is substituted by the presently selected text in the command window. However, if the string is a command (that is ends with an exclamation sign) the rule is different. In this case the entire contents of the command window, with the exception of the leading blank (which has to be there) will be inserted at the question mark's place). EXAMPLES. After KEY AP (?) you can put a pair of parentheses around the cursor or the selected text by Alt-P. After KEY AE SHELL NOTEPAD ?.ISU! you will be able to edit a program with NotePad simply by writing its name without extension (after a blank) in the command line and then press Alt-E. After KEY AV COMPUTE Variance(?)! you will be able to display the variance of a variate by writing its name (after a blank) and press Alt-V. Whereas the command KEYS AL FITNONLINEAR ? exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.)! will enable you to fit a logistic regression with overdispersion just by writing the model formula (after a blank) in the command field and then press Alt-L. To get a list of presently active shortkeys write KEYS (without parameters). Your present shortkey definitions are automatically saved on exit and recovered next time you start an ISUW session under the same root directory. You can, however, - add the startup shortkey configuration (if you have destroyed it) by KEY + - delete a single shortkey function by programming it as an empty string, KEY keyname - delete all shortkey functions by KEY - Notice that if you use the last command and then terminate the session, the startup configuration is lost. If you have some shortkey definitions that you want to keep safely, write a program KEYS.ISU like the following. KEY - KEY F10 SHOW! { the output file is displayed by F10. A } { natural choice, since this is what happens } { when you press F10 from the editor. } KEY AA EDIT AUTOEXEC.ISU! { edit autoexec program by Alt-A } KEY AR RAMSTATUS! { a RAMSTATUS is shown when Alt-R is pressed } KEY AD COMP sqrt(Variance(?))! { display s.d. of a variate by } { writing its name and pressing Alt-D } KEY AX QUIT! { exit with Alt-X } KEY AP (?) { surround selected text with parentheses by } { Alt-P } KEY F2 SHELL ?.PS! { open postscript file by Windows default } { device by writing its name without } { extension and pressing F2 } KEY AN SHELL ezlearn.cbs.dk/stat/hamat-2/tt/! { open ISUW site } { by Alt-N. If you download ISUWINST.EXE, } { don't execute it while the session is open.} KEY AT SHELL C:\EXE\WINT.EXE! { open WinT - Desk Calculator } { with Statistical Tables - by Alt-T } KEY AK KEYS! { display present shortkeys by Alt-K } KEY CK EDIT C:\ISUW\KEYS! { edit (and execute by F9, if desired)} { this file by Ctrl-K } ... Save this program on a suitable working directory or (as an exclusive exception) on the ISUW root directory. Local shortkeys - i.e. shortkeys you want to be active only on a particular working directory - can conveniently be placed as KEY commands in an AUTOEXEC.ISU program on that directory. ******************************************************************************** ISUW - the Windows version of Interactive StatUnit ISU - is a Delphi 5 application, based on a collection of Turbo Pascal (later Borland Pascal) units for statistical analysis, developed from around 1990. ISUW is public domain software. Accordingly, I take no responsibility for errors in the program; but I would certainly like to hear about them, and correct them if I can. The latest version of ISUW can be downloaded from my download page ezlearn.cbs.dk/stat/hamat-2/tt/ where other useful stuff can also be found. Tue Tjur e-mail tuetjur@cbs.dk Copenhagen Business School The Statistics Group Solbjerg Plads 3 DK-2000 Frederiksberg Denmark @@@ Copyright Tue Tjur 2006 @@@ ======================================================== End of ISUWHELP.TXT ===