ISUW, version JUN2009

    ========================================================================
           This is ISUWHELP.TXT, the complete ISUW "on-line" manual.
    ========================================================================

--------------------------------------------------------------------------------

    HELP ON HELP

    F1 (from the command field or the program editor) displays the help
    pages.

    F1 once more (from the help pages) displays a list of ISU commands.
    Select help on a specific command by the cursor up/down arrows and
    Return.

    Ctrl-F1 from the command field displays help on the present command.

    Shift-F1 disables/enables "hint mode" where hints on active keys are
    displayed in a small pop-up window at the mouse cursor.

    In the help pages, search for a word or phrase by Ctrl-F. Repeat the
    search by Return or Ctrl-L. The search is forwards from the cursors
    position, ignores case, interpretes new line as blank and ignores
    multiple blanks.

    If you want to have a hardcopy of the "on-line-manual", print all
    pages by Ctrl-P (88 pages of 72 lines). But wait, it is much easier to
    search for commands and phrases "on-line"; and later, perhaps, print
    out selected sections (Select by Shift and the cursor arrows, then
    press Ctrl-P). We suggest that you start softly with a printout of the
    article "Introduction to ISUW".

--------------------------------------------------------------------------------

    HOW TO RUN ISUW INTERACTIVELY - a brief summary.

    Notes on installation and directory structure are given in a later
    section approximately 18 screenfuls below (search for "install").

    ISUW is essentially mouse free. You can use the mouse to resize or
    move the ISUW window, and it works as usual when you are editing or
    selecting from a menu. But a general principle behind the design of
    ISUW is that there should not be more on the screen than necessary at
    any time. The buttons that control ISUW are not placed on the screen,
    they can be found on the keyboard where they are a lot easier to hit.

    No rule without exceptions, and a useful exception is this. When you
    place the mouse cursor in a window on the screen, a small message with
    summary hints (in particular concerning the keys you can use) will pop
    up. At some point, this "hint mode" becomes more irritating than
    useful. Shift-F1 can be used to switch it off and on.

    In the entry dialog for selection of working directory, move around in
    the directory tree by the cursor arrows. The Up/Down arrows work in
    the obvious way, the Left/Right arrows shift between "show siblings"
    and "show children" (and the space bar can be used for both). Press
    Return when the desired working directory is highlighted. If you want
    to create a new working directory, press Escape instead of Return,
    then edit the selected directory name and press Return. Notice that
    the working directory must be a subdirectory of the ISUW root
    directory (probably C:\ISUW).

    An easy way of getting started is to select DEMOS as your working
    directory and run some of the demonstration programs placed there.

    To select the same working directory as last time - which is what you
    do most of the time - this dialog can be skipped by Return.

    In general, the Escape key is used when you leave a window or dialog
    box. Sometimes, in particular in dialogs, you can also leave by Return.
    In this case Return means "perform action", if there is anything to
    perform, Escape means "leave without performing".

    In interactive mode, commands are written line by line in the command
    field in the bottom of ISUW's front page and executed when Return is
    pressed. In addition to standard editing keys, the following keys are
    active.

        Escape clears the command line.

        Escape from an empty command line allows you to QUIT.

        Ctrl-Return truncates the command line from the cursors position.

        Cursor Up/Down allows you to reuse (and reedit) earlier commands.

        Cursor Right/Left with Ctrl completes/replaces command names
        lexographically, when the cursor is in the first connected word
        starting in position 1.

        Cursor Right/Left with Ctrl completes/replaces vector names
        lexographically, when the cursor is in a connected word starting
        after position 1.

    In addition you can define your own shortkeys, see the description of
    the KEYS command.

    When a command is written from position 1 of the line, the command
    name is completed automatically as soon as it is unique. Also, some
    standard beginnings of commands are completed (I -> INCLUDE, LI ->
    LIST, OP -> OPEN, SA -> SAVE, SK -> SKIP). This means that the keys
    you must use to write INCLUDEALL are actually IA (here you can even
    use A), whereas to write SKIPLINE you must use SKL. You will soon
    learn this, it is impossible to write anything else than a command
    from position 1.

    You can also import a command to the command window from the command
    list associated with the help pages. Press F1 F1, select command with
    Cursor Up/Down or the mouse, and press BackSpace or LeftArrow.

    With a blank in position 1, the command line is interpreted as a
    COMPUTE command.

    The general syntax is

        commandname [parameter1 [parameter2 [...]]]

    Thus, parameters are in general separated by blanks. In special cases
    (PLOT, TABULATE, ... ) a "pseudo blank" |, which can be written by the 
    key in the upper left corner of the keyboard, is used for subdivision of
    parameters.

    When a command produces output, this is shown in a (light green)
    preview window. Leave this window by Return if you want output
    appended to the sessions output file, Escape if you don't. Press
    Ctrl-P to print it out. Use the command SHOW (without parameters) to
    look at the sessions output file. From the corresponding window you
    can also use Ctrl-P to print out the whole output file or a selected
    (marked) portion of it.

    It is possible, and often more convenient, to write an ISUW program
    (i.e. a sequence of ISUW commands, each occupying a line) and execute
    it by a RUN command, or directly from the program editor by pressing
    F9. See the command descriptions for EDIT and RUN. In this case all
    output is written to the output file without any previewing.

********************************************************************************

    INTRODUCTION.

    The following approximately 17 screenfuls give an overview of ISUW. It
    is roughly identical to the material you can find in the article
    "Introduction to ISUW", see my homepage ezlearn.cbs.dk/stat/hamat-2/tt/.
    To learn about the possibilities, read this and try simultaneously to
    perform some simple operations. You can leave the help file by Escape
    and return to the same position by F1.

    After this comes the detailed descriptions of all commands. To find the
    description of a command, write it in the command line and press Ctrl
    F1. Or just press Ctrl-F1 from an empty command line or press F1 twice
    to get a menu for selection (by Return) of command.

--------------------------------------------------------------------------------
    A SIMPLE EXAMPLE.

    Suppose we have an ASCII (plain text) file EX1.TXT of the form

           Dose   Response
          0.968          0
          0.909          1
          ...
          1.689          1
          0.524          0

    consisting of a heading and 415 lines, each containing a value of a
    covariate x and a binary response y. This could be data from an
    experiment where 415 animals have been given a dose x of some drug, y
    being the binary response, e.g. 1 for reaction, 0 for no reaction. The
    following commands read this data set and fits a standard logit linear
    model with the base 10 logarithm of x as the independent variable,
    then fits the model with slope zero and performs the likelihood-ratio
    test for this hypothesis ("no drug effect").

        VAR X Y 415                  { declares the variates to hold data }
        OPEN EX1.TXT                 { opens the data file for input }
        SKIPLINE                     { skips the heading line }
        READ X Y                     { reads the two variates in parallel }
        COMPUTE LOG10X=LN(X)/LN(10)  { computes the log10 transform of X }
        FITLOGIT Y=1+LOG10X          { fits the logistic regression model }
        LISTPARAMETERS               { lists parameter estimates }
        FITLOGIT Y=1                 { fits the reduced model }
        TEST                         { tests last against previous model }

--------------------------------------------------------------------------------
    VARIATES AND FACTORS.

    The basic structures in StatUnit are called vectors. A vector can be a
    factor (an array of bytes, for storage of qualitative variables) or a
    variate (an array of single precision real numbers, 7-8 significant
    digits). Variates and factors are created by the commands VARIATE and
    FACTOR. In addition to this, variates are often declared implicitly,
    for example by COMPUTE commands, like LOG10X in the example above.

    EXAMPLE. To declare two variates X and Y of length 100, simply write

        VAR X Y 100

    Notice that we have written VAR, not VARIATE. Actually V would have
    been enough here. In ISUW, any command can be truncated as long as it
    is unambigous. When written from first position of the command window
    in interactive mode, the command is simply completed as soon as it is
    recognised.

    A value of a variate can be missing (which internally means that it has
    the value -1.0E-37). Missing values are recognised by most commands and
    treated appropriately as such.

    A factor has, apart from its length, a property called its number of
    levels, an integer from 1 to 255 which specifies the maximal level
    allowed. This is given as an additional parameter in the declaration.

    EXAMPLE. To declare a factor SEX of length 100 on 2 levels, write

        FACTOR SEX 100 2

    Names of vectors can be of length up to 8. The first character must be
    a letter A..Z or an underbar _ , the remaining characters can also be
    digits 0..9. Actually, the special characters #, &, %, $ and @ can
    also be used (even as first characters), but we recommend that you do
    not do so because these characters are sometimes used for special
    purposes by ISUW. For example, vectors starting with a dollar sign
    have the special property that they are always deleted at exit from a
    program (!)

    Vector names are case insensitive, in output from ISUW they are
    usually written in capitals.

    At most 255 vectors can be in use simultaneously.

    Vectors can be removed from memory (to save capacity or to release their
    names) by the command DELETE, and their names can be changed by the
    command RENAME. The command RAMSTATUS displays a list of vectors present
    and the space they occupy.

--------------------------------------------------------------------------------
    INPUT FROM TEXT FILES.

    The OPENINFILE, READ, SKIPITEM and SKIPLINE commands are designed for
    input from ASCII (plain text) files in free format.

    EXAMPLE. An ISUW program dealing with a data set of 178 units to be read
    from a file A:\HEIGHTS.DAT might begin something like this.

        FACTOR SEX 178 2
        VARIATE AGE HEIGHT 178
        FACTOR GROUP 178 6
        OPEN A:\HEIGHTS.DAT
        READ * SEX AGE HEIGHT GROUP

    This READ command assumes that the file has the data in standard format,
    like

        001  1  23.1  178.2  1
        002  2  43.6  173.1  4
         ...

    where the first unit (or here, person) is a male (SEX=1), of age 23.1,
    etc. etc. The only separators allowed (when nothing else is specified)
    are blanks, newline symbols (and, in fact, any control characters in
    the range 0-31), commas and semicolons. The asterix in the READ
    command implies that the first item for each unit (here the unit or
    line number) is skipped.

    The READ command above assumes that factor levels are represented by
    their numerical levels. If this is not the case, an equality sign
    followed by a comma separated list of level names can be appended to
    the factor name. For example,

        READ * SEX=*,Male,Female AGE HEIGHT GROUP

    would work if the file looked like this:

        001  Male    23.1  178.2  1
        002  Female  43.6  173.1  4
         ...
        114  *       32.9  167.0  2
         ...

    with level 1 of SEX coded as 'Male', level 2 as 'Female' and level 0
    ("the missing level") as *.

    Values of variates must be in standard format (like 1.2, -0.22, +2.0E7).
    The symbol * is recognised as a missing value (for variates only).

--------------------------------------------------------------------------------
    LISTING IN ASCII FORMAT.

    This is done by the commands LIST and LIST1. LIST is for parallel
    listing of vectors (usually of equal lengths). LIST1 is for listing of
    single vectors across the page. For both commands, formats can be used
    to determine width and number of digits after the decimal point. With
    LIST it is also possible to write factor levels as names.

--------------------------------------------------------------------------------
    COMMENTS ON THE OUTPUT FILE.

    To write a comment to the output file, use the command REMARK.

--------------------------------------------------------------------------------
    DATA STORAGE

    in an internal binary file format is handled by the commands SAVEDATA and
    GETDATA. In their simplest form, these commands are used to dump and
    restore all vectors present in an ISUW session. The command SHOW can
    display the contents of a data set without importing it, with indication
    of potential name conflicts. ISUW data sets are files with extension .SUD
    - do not try to edit them or handle them with other tools than ISUW (or
    the DOS version ISU).

--------------------------------------------------------------------------------
    GRAPHICS.

    The commands PLOT and HISTOGRAM are used for graphics. PLOT produces
    scatter plots (one variate against another). Colors and plot symbols can
    be chosen according to the levels of factors. Points can be connected by
    lines as desired, and overlayed plots can be produced. HISTOGRAM produces
    histograms for variates (or factors), optionally parallel histograms
    grouped by the levels of a factor. Headings and axis titles are controlled
    by the commands FRAMETEXT, XTEXT and YTEXT. Without these specifications
    reasonable default texts (variate names etc.) are used.

    Graphics can be saved in JPEG format (*.JPG) or as bitmap (*.BMP) files,
    and thereafter imported to text handling programs (like MicroSoft Word or
    Word Perfect) or image processing programs. Hardcopies can be printed as
    PostScript files (see the descriptions of commands OPENPSFILE, PSFRAME and
    CLOSEPSFILE). Interactively rotatable 3-d graphics can be produced by PLOT
    and HISTOGRAM, by specification of an extra variate or factor.

--------------------------------------------------------------------------------
    PARALLEL SORTING OF VECTORS

    is perfomed by the command SORT.

--------------------------------------------------------------------------------
    RESTRICTIONS.

    In most applications, data are given as a rectangular data set, i.e. a
    number of variates and factors of the same length, which is the number
    of "records" or "experimental units" or "patients" or "persons" or
    "runs" or "plots" or whatever, depending on the applied context. We
    shall use the word units. To restrict attention to a subset of the units
    set, use the commands EXCLUDE, INCLUDE, INCLUDEALL, FOCUSONLEVEL,
    EXCLUDELEVEL and EXCLUDEMISSING. These commands control a hidden array
    of booleans (all TRUE from the beginning), telling which units are
    "present". All ISUW commands for which this is relevant obey
    restrictions, in the sense that only units present are taken into
    account. For example, model fit commands and COMPUTE commands act only
    on the subset of data specified as "present".

    WARNING. Restrictions act in parallel on all vectors, independently of
    their lengths. Parallel restrictions on vectors of different lengths are
    usually meaningless. Be careful - use INCLUDEALL as soon as restrictions
    are no longer required. Special care should be taken in connection
    with SORT, SAVEDATA, TABULATE and the short form of TRANSFER - see the
    command descriptions.

    For convenience, the important command INCLUDEALL can be executed by
    pressing A from an empty command window.

--------------------------------------------------------------------------------
    COMPUTATIONS.

    Unit-by-unit computations are performed by COMPUTE. For example, if P is
    a variate of length 100 with values between 0 and 1, and you want to
    create another variate LOGIT_P of length 100 holding its logit
    transformed values, write

        COMPUTE LOGIT_P=LN(P)-LN(1-P)

    If LOGIT_P is not previously declared, it will be declared as a variate
    of length 100. If it is declared before, it must be a variate of length
    100. Values of P that are not in the range ]0,1[ will result in a
    missing value of LOGIT_P, and a warning about this is written to the
    output file. Factors can also be handled in this way, and many
    transformations that are not just "unit by unit" are also possible. See
    the description of the COMPUTE command.

    A COMPUTE command without a left hand side simply displays the result.
    For example, to display the mean and standard deviations of (the values
    in) a variate X, just write

        COMPUTE mean(X)
        COMPUTE sqrt(variance(X))

    For convenience, the COMPUTE command can be generated from an empty
    command window by pressing the key + (without an equality sign) or =
    (including the equality sign). Moreover, if the line starts with a blank
    it is interpreted as a COMPUTE command.

--------------------------------------------------------------------------------
    OTHER WAYS OF ASSIGNING VALUES/LEVELS TO VARIATES/FACTORS.

    GENERATE assigns systematically varying levels to a factor. For example,
    if G is a factor of length 20 on 4 levels, the command

        GENERATE G 3

    will assign levels

        1 1 1 2 2 2 3 3 3 4 4 4 1 1 1 2 2 2 3 3

    to the factor. The level changes cyclically, the last parameter (here 3)
    determining the lag between change points.

    GROUP is used for construction of a factor by interval grouping of a
    variate.

    TRANSFER can be used to copy subvector into subvector. For example, if
    X is a vector of length 100, to split it into two vectors of length 50,
    write

        VARIATE X1 X2 50
        TRANSFER X  1  50   X1 1 50
        TRANSFER X 51 100   X2 1 50

    TRANSFER can also be used to copy the values/levels present in a
    vector into a new vector of the appropriate length.

--------------------------------------------------------------------------------
    SUMMARIES, TABLES, TABULAR SUMMATION.

    SUMMARY displays summary descriptions of variates or factors.

    ONEWAYTABLE produces one-way tables of counts for factors (number of
    units for each level) or variates (counts of values in specified
    intervals). Two- or Threewaytables of counts of units or sums of a given
    variate over level combinations for two or three factors are produced by
    the commands TWOWAYTABLE and THREEWAYTABLE.

    The command TABULATE performs counting (of units) or summation (of
    variate values) over the cells of a cross classification determined by
    an arbitrary number of factors.

    For convenience, these commands can be generated by a single key from an
    empty command window as follows.

                    Press key     to write command
                            0              SUMMARY
                            1              ONEWAYTABLE
                            2              TWOWAYTABLE
                            3              THREEWAYTABLE
                            4              TABULATE

--------------------------------------------------------------------------------
    STATISTICAL MODELS.

    FITLINEARNORMAL is for analysis of variance and regression.

    FITLOGLINEAR is for multiplicative or log-linear models for Poisson or
    multinomial data.

    FITLOGITLINEAR is for logistic regression models for binary or binomial
    data.

    FITNONLINEAR is for a class of nonlinear regression models, including the
    generalised linear models with overdispersion, user-specified mean
    (=inverse link) and variance functions.

    FITCOXMODEL is for proportional hazards models by Cox's likelihood,
    optionally with right censoring, left truncation and stratification.

    FITMCLOGIT, FITMCPROBIT and FITMCCLOGLOG are for ordered categorical
    response models as described by P. McCullagh (JRSS B 42, 109-142), where
    the responses are assumed to be the result of a grouping with unknown
    cutpoints of continuous data from a linear position parameter model with
    error distribution logistic, normal or Compertz.

    FITCLOGIT is for conditional logistic regression (like in matched
    case-control studies). FITCRASCH is for a special case of this, the
    conditional Rasch model (two way logit-additive model for binary data by
    conditioning on the row sums, arbitrary linear structure for column
    parameters).

    FITNEGBIN is for log-linear models for negative binomial data, usually
    coming up as "Poisson data with over-dispersion".

    FITANOVA is for analysis of variance, including random effects models,
    in balanced orthogonal designs.

    After any model fit command except FITANOVA, the command LISTPARAMETERS
    produces a listing of the estimated parameters in the last model fitted,
    and the command SAVEFITTED can be used for extraction of fitted values,
    residuals and normed residuals from the last model fitted (whenever this
    makes sense, see the command descriptions). See also the command
    SAVENORMEDRESIDUALS, which can be used after FITLINEARNORMAL to compute
    exact T-distributed ("studentized") normed residuals. SAVEPARAMETERS saves
    parameter estimates and their estimated standard deviations as variates
    (except after FITANOVA). ESTIMATE outputs the estimates of specified
    linear combinations of the parameters and their estimated standard
    deviations. TESTMODELCHANGE can be used for computation of the likelihood
    ratio test (Chi-square or F) for model reduction after fit of two nested
    models by any FIT... command except FITANOVA.

    All model fit commands involve the concept of a model formula, i.e. a
    code for a linear expression involving linear effects of covariates,
    effects of factors, interactions between factors etc. This concept,
    which is more or less common to all statistics packages, is explained
    carefully in the description of FITLINEARNORMAL.

    WARNING. The design matrix determined by a model formula is not
    physically stored. What is kept is a code telling how to compute its
    elements from values or levels of existing variates and factors. Hence,
    commands referring to the last model fit use the actual values/levels of
    vectors occurring in the last model. If these have been changed,
    incorrect results will come out. If some of them have been deleted, the
    information that can be extracted is reduced accordingly. For example,
    if some of the independent variables (or an offset variable, if such was
    present) has been deleted, SAVEFITTED will not work. If the response
    variate has been deleted, SAVEFITTED will be able to produce fitted
    values, but not residuals and normed residuals. Similarly, if a weight
    variate has been deleted, only fitted values and residuals, but not
    normed residuals, can be produced.

--------------------------------------------------------------------------------
    TESTS, NON-PARAMETRICS, DESCRIPTIVE STATISTICS.

    The command WILCOXON performs a two-sample Wilcoxon or Mann-Whitney test.

    The command SPEARMAN computes Spearmans rank correlation and performs
    the test for "no ordinal correlation".

    The command BARTLETT performs Bartlett's test for variance homogeneity
    in a one-way setting.

    The command CORMAT writes the matrix of correlations for a set of
    variates, with optional indication of significances.

--------------------------------------------------------------------------------
    CALLING OTHER PROGRAMS.

    Other programs (or documents to be opened by applications determined by
    their file extensions) can be called directly from ISUW by a SHELL
    command. For example, to edit a file PRG.ISU with NotePad (if you prefer
    this to ISUW's own EDIT command), use the command

        SHELL NOTEPAD PRG.ISU

--------------------------------------------------------------------------------
    PROGRAMMING THE KEYBOARD.

    The function keys F2..F12, alone or in combination with Alt, Ctrl or
    Shift, and the keys A..Z and 0..9 in combination with Alt or Ctrl can
    (with certain exceptions that are used for other things) be
    programmed. For example, the command

        KEY ao show!

    will imply that the sessions output file is displayed whenever Alt-O
    is pressed (the exclamation sign means "Return").

    Your programmed keys are automatically saved on exit and recovered at
    next startup under the same ISUW root directory.

--------------------------------------------------------------------------------
    ISUW PROGRAMS.

    ISUW commands can be written line by line on text files by commands of
    the form

        EDIT programname

    and executed by

        RUN programname

    Programs are saved on files with extension .ISU . Make sure that you
    include this extension in the program name if you use another editor
    than ISUW's own EDIT command.

    To avoid long lines in programs, you can split them up in pieces by the
    "continue line symbol" \. For example, rather than

        FITLINEAR Y = 1 + ROWS + COLUMNS + LAYERS + ROWS*COLUMNS + COLUMNS*LAYERS + ROWS*LAYERS

    it is usually preferable to write

        FITLINEAR Y = 1 \
        + ROWS + COLUMNS + LAYERS \
        + ROWS*COLUMNS + COLUMNS*LAYERS + ROWS*LAYERS

    Empty lines can be inserted and lines can be indented as desired to make
    the program more readable. Comments can be included in two ways:

       (1) Lines starting with a percentage sign % are ignored.

       (2) Text in curled parentheses {} within a line is ignored.

    As opposed to REMARKSs, such comments are not echoed to the output file.

    A primitive device for parameter substitution is also available, see the
    description of the command SUBSTITUTE. The command GOTO can be used for
    conditional branching and loops.

    In programs, command names (but not vector names) can be truncated. See
    the list below of shortest unique truncations. The command name COMPUTE
    can be omitted for COMPUTE commands with a left hand side. COMPUTE
    commands without a left hand side can be written with an equality sign
    '=' or a plus '+' as the first character of the expression to display.

    A more primitive device for program execution is implemented in the
    command OPENCOMMANDFILE. The effect of this command is that the
    commands on a file can be imported one by one to the command field by
    the CursorDown arrow. This can be used if you want to execute a
    sequence of commands, with the optional possibility of modifying the
    commands and inserting other commands. The demonstration programs on
    the directory DEMOS under the ISUW root directory are executed in this
    way (but with the additional feature that explanatory text is shown in
    a separate "supervisor" window).

--------------------------------------------------------------------------------
    INSTALLATION, DIRECTORY STRUCTURE AND CALL OF ISUW.

    To install ISUW, download (as you have probably already done) the file
    ISUWINST.EXE to an empty directory on your harddisk. We recommend
    C:\ISUW to keep file names short, but in principle a directory like

      C:\Program Files\Danish Mouse Free Software\Interactive StatUnit

    could also be used. Unpack this "selfunpacking" file by executing it,
    for example by double clicking on it from Windows Explore, or calling
    it by its name from the Run window or a command (MS-DOS or equivalent)
    window from the same directory. This operation results in the creation
    of four files,

      ISUW.EXE      (the executable program)
      ISUWHELP.TXT  (the file you are reading here)
      BORLNDMM.DLL  (a Delphi system file required for memory management)
      DEMOS.EXE     (a self-unpacking file containing the DEMOS files)

    After this ISUWINST.EXE can be deleted. The entire ISUW package takes up
    less than 2 MB of space on the harddisk.

    To install a new version of ISUW, overwriting the old version, simply
    repeat this. Here you can avoid four confirm-overwrite-dialogs by use
    of the option -o, i.e. by writing

        ISUWINST -o

    rather than just ISUWINST.

    In the following we refer to two directories:

    1. The ISUW root directory. This is where the files SHORTKEY.TXT and
       SHORTKEY.BIN, holding your shortkey definitions, are kept, and the
       file LASTDIR holding the name of the last sessions working
       directory, your latest selection of size and position on the screen
       of the ISUW window and your choice of having "hint mode" on or
       off. You can have several ISUW root directories for different
       applications, if you wish. If you ever get the (probably rather
       useless) idea of having two or more ISUW sessions running at the
       same time, make sure you do it from different ISUW root
       directories, otherwise there will be file sharing conflicts.
       However, ISUW makes some initial file checks that will usually
       prevent this error.

    By default, the ISUW root directory becomes the directory where you
    unpacked the four system files. But the ISUW root directory does not
    have to be the directory where these files are located. You can (and
    should, if these files are placed e.g. on a read-only network drive)
    redefine this to any other directory by giving a valid directory name
    (including drive letter and colon) as the first parameter in the call
    of ISUW.EXE.

    2. The working directory. This directory, which (with an exception to be
       mentioned below) must be a subdirectory (or sub-sub etc.) of the
       ISUW root directory, is selected (and sometimes created) from a
       dialog box when the session begins. Typically, you will use the
       same directory again and again, in this case you can skip the
       dialog by Return. The working directory is the place where all
       sorts of output is placed by default, and also the place where you
       will typically place data files before or during the session. The
       working directory becomes the current Windows directory throughout
       the session, which means that file names without a path
       specification refer to files on that directory.

    The working directory can be selected directly in the call of ISUW.EXE
    by specification of the full name (including drive letter and colon)
    of an existing directory as the second parameter in the call of
    ISUW.EXE. In this case the working directory does not have to be a
    subdirectory of the ISUW root directory. The entry dialog is skipped,
    and the file LASTDIR is left unchanged.

    Here, you can also extend the name of the desired working directory to
    the full name of an existing file with extension either ISU or SUD on
    that directory. In the first case, the ISU program is executed, in the
    second case the ISU data set is imported. This is useful if you want
    to set up your Windows computer in such a way that double clicking
    from Windows Explore on an ISU program or an ISU data set results in a
    startup of ISUW with the appropriate action. Write a command file -
    say ISUWBAT.BAT - containing a single line of the form

        C:\isuw\ISUW.EXE c:\isuw %1

    and make this your Windows default application for opening .ISU and
    .SUD files (Click Tools -> Folder Options -> File Types from Windows
    Explore).

    If both the ISUW root and the working directory are specified as the
    first two parameters to ISUW.EXE, additional parameters can be added.
    These parameters should constitute a valid ISUW command, which will be
    copied to the command field. If, in addition, the last of these
    parameters ends with an exclamation sign, this command will be
    executed immediately after entry to the program. In this way you can
    build some standard initialization into the call of ISUW, like import
    of a data set, execution of an ISUW program defining some local
    shortkeys etc.

    EXAMPLE. On the computer I use at work, I have placed the ISUW system
    files on a directory named C:\DELPHI\ISUW1 because I am developing
    ISUW under Borland Delphi. However, my (only) ISUW root directory is
    C:\ISUW. Thus, the shortcut starting ISUW from my desktop has as its
    "target property" the command

        C:\DELPHI\ISUW1\ISUW.EXE C:\ISUW

    However, I have a main application of ISUW related to a course called
    MPAS. For that reason, I have another shortcut on my desktop with the
    command

        C:\DELPHI\ISUW1\ISUW.EXE C:\ISUW C:\ISUW\MPAS\06

    in the target field, which means that I can go directly to this
    application without any entry dialog. Next year I am probably going to
    change 06 to 07 (after having created C:\ISUW\MPAS\07). In between, I
    use the form

        C:\DELPHI\ISUW1\ISUW.EXE C:\ISUW C:\ISUW\MPAS\06\DATA.SUD

    to load a certain data set automatically at startup.

    AUTOEXEC.ISU. Another way of specifying automatic initialization goes
    as follows. If an ISUW program named AUTOEXEC.ISU exists on a working
    directory, this program will be executed automatically at startup from
    that working directory. This works quite generally - i.e. also when
    the working directory is selected from the entry dialog.

    Uninstallation. In the very unlikely case that you want to uninstall
    ISUW, simply remove the root directory (or directories, if you have
    more than one) and whatever you want to get rid of among files you
    have created elsewhere by ISUW on working directories that are not
    subdirectories of the root directory. This can be tedious, of course,
    but not more tedious than the occasional garbage collection which is
    required anyway. ISUW does not make any hidden changes to your
    computers setup. The shortcuts that you may have created are easy to
    delete.

--------------------------------------------------------------------------------
    OUTPUT.

    ISUW writes output to a temporary file on the ISUW root directory
    named ISUWOUT.TMP. This is the file you look at in a somewhat modified
    form by the command SHOW. When an ISUW session ends you will be given
    the option of saving this file on the working directory under another
    name. If you answer "No" here, the output file is lost. In spite of
    some special effects created by the SHOW command (ISUW prompts beeing
    replaced with special colors of command lines, error messages and
    notes beeing printed in special colors, REMARKS in italics etc.), ISUW
    output files are ordinary text files which can be edited and printed
    e.g. by NotePad or imported to standard text handling programs.

    Whenever a command sent from the command window produces written
    output, this is shown in a light green preview window, which you leave
    by Return if you want output appended to ISUWOUT.TMP, Escape if you do
    not. For commands in a program executed by a RUN command the rule is
    different. Here, all output is written directly to ISUWOUT.TMP, unless
    you redirect it explicitely to another file (or the "paper basket"
    NUL) by an OUTFILE command.

    Commands and error messages are echoed to the output file by default.
    This means that the ISUW output file will contain a complete log of
    what has happended during the session. In some cases (for example to
    avoid echoes of the same GOTO loop again and again) you may prefer to
    switch this default off by the ECHO command.

********************************************************************************

                       Alphabetic list of ISUW commands

    The last column indicates (when it makes sense) whether a command does
    (+) or does not (-) take restrictions into account. For details, see the
    command description.

       Command               Shortest      Equivalent           Restrictions
                             truncation    brief form
                                                       Shortkey
                                                       from empty
                                                       command window


       BARTLETT              B                                       +
       CLOSEPSFILE           CL
       COMPUTE               COM          + =    + =   +
       CONSTRUCTMINIMUM      CON                                     -
       CORMAT                COR                                     +
       DELETE                D
       ECHO                  EC
       EDIT                  ED
       ESTIMATE              ES
       EXCLUDE               EXCLUDE
       EXCLUDELEVEL          EXCLUDEL
       EXCLUDEMISSING        EXCLUDEM
       FACTOR                FA
       FITANOVA              FITA                                    +
       FITCLOGIT             FITCL                                   +
       FITCOXMODEL           FITCO                                   +
       FITCRASCH             FITCR                                   +
       FITLINEARNORMAL       FITLI                                   +
       FITLOGLINEAR          FITLOGL                                 +
       FITLOGITLINEAR        FITLOGI                                 +
       FITMCCLOGLOG          FITMCC                                  +
       FITMCLOGIT            FITMCL                                  +
       FITMCPROBIT           FITMCP                                  +
       FITNEGBIN             FITNE                                   +
       FITNONLINEAR          FITNO                                   +
       FOCUSONLEVEL          FO
       FRAMETEXT             FR
       GENERATELEVELS        GEN                                     -
       GETDATA               GET                                     -
       GOTO                  GO                                      -
       GROUP                 GR                                      +
       HISTOGRAM             H                                       +
       INCLUDE               INCLUDE                   I
       INCLUDEALL            INCLUDEA                  A
       KEYS                  K
       LIST                  LIST                 L                  +
       LIST1                 LIST1                L1                 +
       LISTPARAMETERS        LISTP
       ONEWAYTABLE           ON                        1             +
       OPENCOMMANDFILE       OPENC
       OPENINFILE            OPENI                OPEN
       OPENPSFILE            OPENP
       OUTFILE               OU
       PLOT                  PL                                      +
       PSFRAME               PS
       QUIT                  Q
       RAMSTATUS             RA                                      -
       READ                  REA                                     +
       REMARK                REM
       RENAME                REN
       RUN                   RU
       SAVEDATA              SAVED                SAVE               +
       SAVEFITTED            SAVEF
       SAVENORMEDRESIDUALS   SAVEN                                   +
       SAVEPARAMETERS        SAVEP
       SHELL                 SHE
       SHOW                  SHO
       SKIPITEM              SKIPI
       SKIPLINE              SKIPL
       SORT                  SO                                      -
       SPEARMAN              SP                                      +
       SUBSTITUTE            SUB
       SUMMARY               SUM                       0             +
       TABULATE              TA                        4             +
       TESTMODELCHANGE       TE
       THREEWAYTABLE         TH                        3             +
       TRANSFER              TR                                     -/+
       TWOWAYTABLE           TW                        2             +
       VARIATE               V
       WILCOXON              W                                       +
       XTEXT                 X
       YTEXT                 Y

********************************************************************************



    ========================  COMMAND DESCRIPTIONS  ========================

--------------------------------------------------------------------------------
    VARIATE

    Declaration of variates.
................................................................................

    Syntax: VARIATE name1 [name2 [...]] length

    Creates new variates named name1 ... of the same length. The length
    must be a positive integer or integer expression.

    EXAMPLE.

        VAR X Y 10*48

    creates two variates X and Y of length 480.

    Variates are arrays or vectors of single precision real numbers.

    Formally, there is no upper limit to the length of variates and
    factors. Or, rather, the realistic limit is set by the computers RAM.
    But things will work rather slowly if the RAM is filled up. Depending
    on your Windows version, the computer may start using disk cache (which
    will slow it down to a speed where it is almost useless), or the
    session will crash.

    Another problem is that for single precision numbers greater than
    appr. 15 millions, rounding to integer values will not be correct.
    This means that for a variate X of length greater than 15 millions you
    can not use expressions like X(UNIT) for all possible values of UNIT.
    For most data sets this is no problem at all - but now you are warned.

    There is, however, an upper limit of 255 to the number of variates and
    factors that can be present simultaneously in an ISUW session.

    At declaration, all values are set to zero.

    WARNING. If an error occurs during execution, like in the command

        VAR X Y 1A Z 100

    where 1A is an illegal variate name, the command is interrupted by an
    error message. However, the command is executed up to the place where
    the error is met. In the above example X and Y are declared, but not Z.

--------------------------------------------------------------------------------
    FACTOR

    Declaration of factors.
................................................................................

    Syntax: FACTOR name1 [name2 [...]] length levels

    Creates new factors named name1 ... of given length and number of
    levels. The length and the number of levels are integer expressions,
    both positive.

    EXAMPLE.

        FAC SEX TREAT 132 2
        FAC GROUP 132 6

    creates three factors of length 132, SEX and TREAT on 2 levels and GROUP
    on 6 levels.

    Factors are stored as arrays of bytes. For this reason, the maximal
    number of levels is 255.

    At declaration, all levels (present or not) are set to zero.

    WARNING. See the warning to VARIATE (just above).

--------------------------------------------------------------------------------
    RAMSTATUS

    Writes information about existing vectors in memory.
................................................................................

    No parameters.

    Writes information about the vectors present and the dynamically
    allocated memory they occupy. The information includes

        Names of variates and their lengths.
        Names of factors and their lengths and numbers of levels.
        The number of bytes occupied by each vector and totally.

    In addition, RAMSTATUS tells whether restrictions are present or not.

--------------------------------------------------------------------------------
    DELETE
    or
    DEL

    Deletes existing vectors.
................................................................................

    Syntax: DELETE [name1 [name2 [...]]]

    Deletes existing vectors, releasing the space they occupy and their
    names.

    WARNING. If an error occurs during execution, like in the command

        DEL X Y 1ST Z

    where 1ST is an illegal variate name, the command is interrupted by an
    error message. However, the command is executed up to the place where
    the error is met. In the above example X and Y are deleted (if they
    both exist), but not Z.

    If the command is used without parameters, all vectors present are
    deleted. In addition, all restrictions are removed. If an input file,
    an alternative output file or a PostScript output file is open it is
    closed, information from last model fit is lost, and the parameters
    determining text for PLOT and HISTOGRAM are set to their defaults. In
    addition, if an OUTFILE command is in force, output is redirected back
    to the sessions output file, and command echo is set "on". Thus, a
    DELETE command without parameters is a sort of "reset" command. The
    only difference from closing the session and starting a new is that
    the output file is still there, and if DELETE is used in this way in a
    program after a SUBSTITUTE command the effects of that command are
    still in force.

    DELETE without parameters is very often useful as the first command in
    a program. So are the commands ECHO 0 and OUTFILE 0, if you want to
    avoid commands echoes and output from loops, or OUTFILE  if
    you want to direct output to another file than the sessions output
    file (useful when a program is tested). It follows from what was
    said above, that the DELETE command must come before the two other
    commands (but a SUBSTITUTE command may be placed before it).

--------------------------------------------------------------------------------
    RENAME

    Gives an existing vector a new name.
................................................................................

    Syntax: RENAME oldname newname

    oldname should, of course, be the name of an existing vector, and
    newname must be a valid vector name which is not in use. The command
    can be used to solve name coincidence conflicts before import of a
    data set. Use a SHOW command to see if such conflicts are present.

--------------------------------------------------------------------------------
    EXCLUDE

    Excludes specified units, marking them as "non-present".
................................................................................

    Syntax: EXCLUDE range1 [range2 [...]]

    A range can be an integer expression or an expression of the form

        integer1:integer2

    where integer1 and integer2 are integer expressions.
    In the last case,

        0 < integer1 <= integer2 <= length of longest vector

    is required, and the units from integer1 to integer2 are excluded.

    EXAMPLE. To remove all units <= 10 and >= 91 write

        EXCLUDE 1:10 91:100

    provided that the relevant vector length is 100.

    Ranges are handled one by one, and units for each range in the natural
    order. Thus, if (in the above situation, with 100 as the length of all
    existing vectors) you write

        EXCLUDE 1:10 91:101

    an error would occur for unit 101. Nevertheless, the desired
    restrictions would actually be imposed. Whereas

        EXCLUDE 91:101 1:10

    would remove only unit 91 to 100, but not 1 to 10 since this comes after
    the error interrupt.

    A range can also be specified as the name of a variate, or an expression
    that would be valid as a right hand side of a COMPUTE command. In this
    case, the units excluded are those for which the variate value is
    defined, non-missing and positive. For example, to exclude all units for
    which the variate X takes a value which is not in the interval [0,100],
    write

        EXCLUDE (X<0)+(X>100)

    or just (specifying two ranges)

        EXCLUDE  X<0  X>100

    Notice that missing values of X in this command, or more generally units
    for which the variate or expression is missing or results in a missing
    value, are not excluded. For example,

        EXCLUDE X>ln(0)

    has no effect, and no warning is given.

    WARNING. The command

        EXCLUDE 100

    excludes unit 100, whereas

        EXCLUDE 100.2

    excludes unit 1 (!). Since 100.2 is not interpretable as a range, it is
    assumed to be the right hand side of a compute statement, resulting in a
    variate of length 1 with the value 100.2. Similarly,

        EXCLUDE -3

    results in an error message, whereas

        EXCLUDE -3.2

    has no effect.

    Units that are already excluded are not touched by EXCLUDE. Thus, the
    two commands

        EXCLUDE SEX=0
        EXCLUDE CODE=999

    will do exactly the same as the single statement

        EXCLUDE SEX=0 CODE=999

--------------------------------------------------------------------------------
    EXCLUDELEVEL

    Excludes all units on specified levels of a factor.
................................................................................

    Syntax: EXCLUDELEVEL factor level1 [level2 [...]]

    EXAMPLE. To remove all units on level 0 or 2 of the factor SEX, write

        EXCLUDELEVEL SEX 0 2

    As for EXCLUDE (see above), levels are handled one by one, and the
    command is interrupted with an error message if an error occurs. Thus,
    if SEX has 2 levels (which is the usual state of affairs), the command

        EXCLUDELEVEL SEX 0 5 2

    would exclude level 0, but not level 2.

--------------------------------------------------------------------------------
    FOCUSONLEVEL

    Excludes all units that are not on a specified level of a factor.
................................................................................

    Syntax: FOCUSONLEVEL factor level

    Equivalent to

        EXCLUDELEVEL ...

    (see above) where ... stands for the list of levels different from
    the level specified in the FOCUS command.

    EXAMPLE. To focus on the females in group 1, write something like

        FOCUSONLEVEL SEX 2
        FOCUSONLEVEL GROUP 1

    Equivalently, you could use

        EXCLUDE SEX<>2 GROUP<>1

--------------------------------------------------------------------------------
    EXCLUDEMISSING

    Excludes units for which values of given variates are missing.
................................................................................

    Syntax: EXCLUDEMISSING name1 [name2 [...]]

    This command is typically used before a model fit command.

    EXAMPLE.

        EXCLUDEMISSING HEIGHT WEIGHT
        EXCLUDELEVEL SEX 0
        FITLINEARNORMAL  WEIGHT=1+SEX+HEIGHT+SEX*HEIGHT

    The rules for error interrupts are similar to what has been said about
    EXCLUDE and EXCLUDELEVEL.

--------------------------------------------------------------------------------
    INCLUDE

    Includes specified units, marking them as "present".
................................................................................

    Syntax: INCLUDE range1 [range2 [...]]

    Opposite to EXCLUDE. Units specified are included, units not specified
    are untouched. Ranges (and the rules for error interrupts) are explained
    a few screenfuls above under EXCLUDE.

    When a range is expressed as a valid right hand side of a COMPUTE
    command, the computation does, of course, take place also for
    non-present units - otherwise nothing would happen.

--------------------------------------------------------------------------------
    INCLUDEALL

    Includes all units, i.e. removes all restrictions.
................................................................................

    No parameters.

    Reestablishes the initial state of affairs, where no restrictions are
    present. Use this command whenever restrictions are not required any
    more. From an empty command window, just press A.

--------------------------------------------------------------------------------
    OPENINFILE
    or just
    OPEN

    Opens a text file for input.
................................................................................

    Syntax: OPEN [filename]

    ASCII files for input are handled by the commands READ, SKIPITEM and
    SKIPLINE, see below. If another file is already open for input it is
    closed. To close an input file without opening a new, use the command
    without parameters. This is often necessary if you want to EDIT an
    input file to correct errors detected by a READ command, because you
    will not be allowed to make changes to an input file while it is open.

--------------------------------------------------------------------------------
    READ

    Input of data from a text file.
................................................................................

    Syntax: READ [[separators]] name1 [name2 [...]]

    Data are read in parallel from the file opened by OPEN (see above).
    Items on the file must be separated by blanks, newline symbols (or
    other characters in the range 0-32), commas or semicolons. Other
    separator characters can be specified, see approximately 4 screenfuls
    below. name1, name2 etc. must be names of existing vectors of equal
    lengths. The symbol '*' can be used for "skip next item", and '/'
    means "skip to start of next line".

    EXAMPLE.  Suppose that A:\PROJECT.DAT contains 100 lines, beginning with

        001  12.32   1.19  Male    009   1
        002  11.15   1.23  Female  009   1
        003  11.91   1.18  Female  004   1
        ...

    To read columns 2 and 3 as variates named AGE and INC and column 4 as a
    factor SEX on 2 levels with Male and Female represented by levels 1 and
    2, write

        OPEN A:\PROJECT.DAT
        VAR AGE INC 100
        FAC SEX 100 2
        READ * AGE INC SEX=,Male,Female /

    Notice the use of a list of level names. This is required when at least
    one factor level is coded as something else than its integer numerical
    level. Level names are case sensitive, i.e. 'male' instead of 'Male' would
    not work. In the example, the comma right after the equality sign means
    that level 0 is not given any name because it does not occur. If unknown
    sex occurred and was coded as *, we could write

        READ * AGE INC SEX=*,Male,Female /

    instead, to assign level 0 of SEX to the unknowns.

    The symbol '/' meaning "skip remainder of line", is only required if
    there is actually something to skip. A single slash '/' right at the
    point where a line is read has no effect; whereas two slashes with a
    blank between will imply that the every second line is skipped.

    For variates, standard format of numbers is assumed (exponential
    notation is allowed, like -1.2e3 instead of -1200). Decimal points
    must be periods, not commas. An asterix '*' or a period '.' will be
    interpreted as a missing value.

    Restrictions are obeyed by the READ command, in the sense that only the
    units present are read. This is useful if you have to piece together
    variates or factors of segments from different files. But in the
    standard situation, it means that you must remember to remove all
    restrictions by INCLUDEALL before you READ.

    An error (an invalid real number, a named level not in the list of level
    names or a numerical level out of range) results in an error message
    written to the output file, and the corresponding value/level is set to
    missing value/level zero. But the reading is not interrupted.

    If you want to correct errors that have to do with the data file, you
    must edit that file. Otherwise, if the error has to do with the READ
    command, you can correct it immediately and repeat it. But before this,
    the file must be OPENed again, otherwise reading continues from the
    position where it stopped. The same happens if reading finishes before
    the end of a file. In this case, a new READ command will continue from
    where the last ended.

    EXAMPLE. Suppose that a data file EXAMPLE.DAT contains the values of
    two variates of lengths 10 and 4 in the following obscure layout:

        The first six values of X are
        1.2, 1.3, 2.l, 4.3,

        5.0, 1.2;
        Here are the four values of Y:
        34.1  32.2  45.0  23.6;
        Finally, the last four values of X are
        5.4 1.7 1.9 3.3;
        And here comes some junk: 1234567890

    You could read X and Y as follows:

        VAR X 10
        VAR Y 4
        OPEN EXAMPLE.DAT
        EXCLUDE 7:10
        SKIPITEM 7                   { or SKIPLINE }
        READ X
        INCLUDEALL
        SKIPITEM 7                   { or SKIPLINE }
        READ Y
        EXCLUDE 1:6
        SKIPITEM 8                   { or SKIPLINE }
        READ X
        INCLUDEALL

    Here, you would receive a warning concerning the third value of X,

        READ ERROR: Invalid real number 2.l for variate X at unit 3

    where a lower case L has been typed instead of 1, and the
    corresponding entry of X will contain a missing value. However,

        SHOW EXAMPLE.DAT

    would tell you what went wrong, and you could then correct the error by

        COMPUTE X(3)=2.1

    A more permanent solution would be to edit EXAMPLE.DAT to correct the
    error, and then do the READing once more. However, since EXAMPLE.DAT
    is still open as an input file, you would not be allowed to save the
    changes. To do this you would have to close it first, which can be
    done by an OPEN command without parameters. For example,

        OPEN
        EDIT EXAMPLE.DAT         {make the correction and save}
        OPEN EXAMPLE.DAT
        ...

    would work.

    As this example also illustrates, a data file must not necessarily end
    exactly where the reading terminates. In the tail of the file you can
    keep e.g. a description of data.

    An attempt to read through the end of an input file is, of course, an
    error. The reading is interrupted, an error message is given, and all
    values/levels not read yet are left unchanged (except for the
    value/level that was read when the EOF mark was met; this may or may not
    be set to missing value/level zero, depending on some circumstances
    around the termination of the file).

    If a data file uses other separators than blank, newline, comma and
    semicolon, you can specify this by adding a bracket containing these
    additional separator characters as the first argument of the READ
    command. For example, to read from a file where the characters / , [
    and ] should be interpreted as blanks, use

        READ [/[]] ...

    Notice that such additional separating characters are only in force
    within the READ command where they are specified, e.g. not in a
    following (or preceding) SKIPITEM command.

    Notice also that separating characters must not occur in level names
    for factors.

    Fixed format files (with data in fixed positions, no delimiters) must
    be edited or handled by other tools.

--------------------------------------------------------------------------------
    SKIPITEM

    Skips next item on a data file.
................................................................................

    Syntax: SKIPITEM [integer]

    The next "integer" items on the file opened by OPEN (see above) are
    skipped. if the integer parameter is missing, 1 is assumed. This
    command is useful if a data file contains headings or other comments.

    EXAMPLE. Suppose that A:\PROJECT.DAT contains 100 lines plus a "header
    line", beginning with

        NO   AGE     INC   SEX
        001  12.32   1.19  Male
        002  11.15   1.23  Female
        003  11.91   1.18  Female
        ...

    To read column 2 and 3 as variates named AGE and INC, column 3 as a
    factor SEX on 2 levels with 1 and 2 coded as Male and Female, write

        VAR AGE INC 100
        FAC SEX 100 2
        OPEN A:\PROJECT.DAT
        SKIPITEM 4                       { or SKIPLINE }
        READ * AGE INC SEX=,Male,Female

--------------------------------------------------------------------------------
    SKIPLINE

    Skips to beginning of next line on a data file.
................................................................................

    Syntax: SKIPLINE [integer]

    Without the parameter, SKIPLINE reads through the present line, to the
    beginning of the next. If a line has just been read through the next
    will be skipped, otherwise the remainder of the present line is
    skipped. If an integer parameter is given, this operation is simply
    performed "integer" times, i.e. the remainder of the present line and
    the next "integer"-1 whole lines are skipped.

    Whereas SKIPITEM interpretes newline symbols as delimiters and thus
    skips as many empty lines as necessary to reach the items to be
    skipped, SKIPLINE counts also empty lines. For example, if a data file
    begins with 4 empty lines followed by a line consisting of two
    variable names, these 5 lines can be skipped either by

        SKIPITEM 2

    or

        SKIPLINE 5

--------------------------------------------------------------------------------
    LIST
    or
    L

    Parallel print of data.
................................................................................

    Syntax: LIST vector1 [vector2 [...]]

    EXAMPLE. If AGE is a variate and SEX a factor on 2 levels, the statement

        LIST AGE SEX

    will produce output like

                   AGE  SEX
               24.7100    2
               42.1200    1
               32.4300    1
               ...
               22.1300    2

    The default format for variates is :10:4, which means width 10 with 4
    decimals after the decimal point. For factors it is :4, i.e. width 4 or
    the length of the factors name if this is more than 4. You can change
    this by addition of a format to the vector name. For example,

        LIST AGE:4:1 SEX:3

    would result in something like

          AGE SEX
         24.7   2
         42.1   1
         32.4   1
         ...
         22.1   2

    In addition to this, you can add a list of level names to a factor, like

        LIST AGE:4:1 SEX=,M,F:3

    which would result in a listing like

          AGE SEX
         24.7   F
         42.1   M
         32.4   M
         ...
         22.1   F

    Notice that the list of levels comes before the format, if both are
    present. Notice also that the equality sign, which indicates that a list
    of level names will follow, is followed immediately by a comma. This is
    because the name for level 0 is here set to an empty string. If "missing
    sex", or rather "sex unknown", does actually occur, one would perhaps
    prefer something like

        LIST AGE:4:1 SEX=Unknown,Male,Female:7

    which might produce a listing like

          AGE     SEX
         24.7  Female
         42.1    Male
         32.4    Male
         ...
         20.2 Unknown
         ...
         22.1  Female

    Notice the format :7, which is necessary here because the longest level
    name is of length 7 > 4. Otherwise, level names would be truncated.

    If LIST is used without parameters, all vectors present are listed with
    default formats.

    Restrictions are obeyed, in the sense that the lines corresponding to
    hidden units are not printed.

--------------------------------------------------------------------------------
    LIST1
    or
    L1

    Condensed print of data (across the page)
................................................................................

    Syntax: LIST1 vector1 [vector2 [...]]

    EXAMPLE. If AGE is a variate of length 4, SEX a factor on 2 levels also
    of length 4, the statement

        LIST1 AGE SEX

    will produce output like

        AGE
            24.7100    42.1200    32.4300    22.1300

        SEX
            2    1    1    2

    Formats can be used, just as for LIST (see above), and the standard
    formats are the same. Level names for factors can not be used.

    Restrictions are obeyed, in the sense that the values/levels
    corresponding to hidden units are not printed.

--------------------------------------------------------------------------------
    SAVEDATA
    or
    SAVE

    Creates a StatUnit data set.
................................................................................

    Syntax: SAVEDATA dataset [vector1 [vector2 [...]]]

    StatUnit data sets are files written in an internal binary format for
    fast storage and recovery of data.

    If only the data set name is specified, all vectors present are stored
    in the data set. Physically, the data set becomes a file with the name
    specified followed by the extension .SUD (for "StatUnit Data"). For
    example,

        SAVE C:\PROJECTS\A_SCHEME\DATA1

    will create a file DATA1.SUD on the directory C:\PROJECTS\A_SCHEME. It
    is an error if this directory does not exist. If such a file exists
    already, you will be asked to confirm that you want to overwrite it.
    However, this is only in interactive mode; if the SAVEDATA command
    occurs in a program, the file is overwritten without warning.

    If a list of vector names is added only these vectors are stored. The
    names must be names of existing vectors in the present session,
    otherwise the file is not created.

    Restrictions are taken into account in the sense that only units present
    are stored. This means that you can use SAVEDATA to create "physically
    restricted" sub data sets, i.e. data sets where the excluded units are
    not only marked as non-present, but are actually not there at all.

    EXAMPLE. Suppose we have variates AGE and HEIGHT and a factor SEX on
    two levels, all of the same length. To create a data set MALES that
    contains only the part of data with SEX=1, and import this to our
    session, we could do something like the following (assuming no
    restrictions present from the beginning).

        FOCUSONLEVEL SEX 1
        SAVE MALES AGE HEIGHT
        DELETE
        GET MALES

    Notice that the DELETE command is without parameters here. This form
    of the DELETE command removes all restrictions also, and this is
    important because the restrictions imposed on the "long data set" will
    almost certainly be meaningless for the "short data set". If DELETE
    can not be used in this way because other vectors are to be kept,
    INCLUDEALL must be used.

    The DOS versions' GETDATA command can import data sets created by ISUW,
    provided that ISU's length constraint (MaxLength = 16379) is satisfied.

--------------------------------------------------------------------------------
    GETDATA

    Imports data from StatUnit data set.
................................................................................

    Syntax: GETDATA [dataset [vector1 [vector2 [...]]]

    If only the data set name is specified, all vectors in the data set are
    read. The name, say A:\PROJECT\DATA1, is the name of the corresponding
    file A:\PROJECT\DATA1.SUD, created by SAVEDATA (written without the
    extension .SUD).

    If the file name "dataset" is omitted or replaced with an asterix or a
    directory name, a file selection menu appears.

    If a list of vector names is added, only these vectors are read. These
    names should, of course, be names of vectors in the data set. However,
    if other names of non-existing vectors are included by mistake, the
    remaining vectors will still be imported.

    Names of vectors imported must not coincide with names of existing
    vectors. SHOW dataset.SUD will tell you if this is the case. Use
    RENAME as necessary. The reading is stopped if an error of this type
    occurs. This means that part of the command may be executed. However,
    the interrupt point depends not on the order of vectors in the list,
    if present, but rather on the order in which vectors were stored
    originally by SAVEDATA. Try a RAMSTATUS if something goes wrong.

    Restrictions are not taken into account and not changed by this command.

    Data sets created by the DOS version ISU can be imported by GETDATA,
    provided that the special characters of the Danish-Norwegian alphabet do
    not occur in vector names.

--------------------------------------------------------------------------------
    SUMMARY

    Summary statistics for variates and factors.
................................................................................

    Syntax: SUMMARY [vector1 [vector2 [...]]]

    The parameters must be names of factors or variates. Information about
    the vectors is written. For a factor, the information includes length,
    number of levels, number of units present and the number of levels=0.
    For a variate, the information includes length, units present, the
    number of missing values, MAX, MIN, MEAN and standard deviation (for
    present and non-missing values).

    If the command is used without parameters, summaries of all vectors
    present are given.

--------------------------------------------------------------------------------
    ONEWAYTABLE

    One-way tables of counts.
................................................................................

    Syntax: ONEWAYTABLE vector1 [vector2 [...]]

    The parameters may be names of variates or factors. If a parameter is
    the name of a factor, a one-way table of counts is produced, which for
    each level gives the number of units present. You can extend the name of
    the factor by a list of names separated by commas, as for the READ
    command. For example,

        ONEWAYTABLE SEX=Unknown,Male,Female

    could produce something like

        Factor SEX, 100 units present.
                       Unknown       1
                          Male      43
                        Female      56

    If the parameter is the name of a variate, a table of counts is produced
    with cutpoints chosen by ISUW. But you can extend the name to specify
    lower limit, number of intervals and upper limit. For example

    ONEWAYTABLE X=2,10,7

    would produce a table of counts of X-values in the ten intervals between
    cutpoints 2.0, 2.5, 3.0, ... , 6.5, 7.0.

    ONEWAYTABLE obeys restrictions in the sense that non-present units are
    ignored. If the parameter is a variate, missing values are treated as
    non-present.

    The commands TWOWAYTABLE and THREEWAYTABLE (see below) have additional
    options which enables the formation of tables of variate sums for a
    given variate instead of tables of counts. This is not implemented for
    ONEWAYTABLE, because it is usually just as easy to use TABUALATE and
    LIST (see 3 screenfuls below). For example, to produce a one-way table
    of sums of a given variate Y over the groups determined by a factor F
    of the same length, use

       TABULATE Y1=Y F F1
       LIST F1 Y1

--------------------------------------------------------------------------------
    TWOWAYTABLE

    Two-way tables of counts or variate sums.
................................................................................

    Syntax: TWOWAYTABLE [variate] factor1 factor2

    The two last parameters must be names of factors of the same length,
    optionally extended by lists of level names (as for ONEWAYTABLE above).

    If the first parameter "variate" is not given (or written as the pseudo
    variate name 1) the command writes a table of counts of units present in
    the two-way classification (factor 1 as rows, factor 2 as columns).

    If the parameter "variate" is present, it must be the name of a
    variate of the same length, and a table of sums of its values over
    level combinations of the two factors is produced. The variate name
    can be extended by a format, like in a LIST command. However, this is
    mainly to let you decide the accuracy displayed when sums of variate
    values are tabulated. For tables of counts, the figures are displayed
    as integers, and if the width given by the format is too small, it
    will be increased as necessary.

    The default action of TWOWAYTABLE is to produce tables with row sums
    and column sums. To avoid one of these (or both), add a minus sign as
    the first character to the name of the factor(s) for which the
    additional "total" level should not be displayed.

    EXAMPLE. Suppose we have factors AGEGR and ATTITUDE, classifying some
    survey sample data according to age and answer to an attitude related
    question. The table of counts in this cross classification is produced
    by

        TWOWAYTABLE agegr attitude

    A table showing the distribution of ATTITUDEs within AGEGRoups in
    percentages with one digit after the decimal point can be produced by

        TABULATE rowsum agegr    {see two screenfuls below}
        COMPUTE pct=100/rowsum(agegr)
        TWOWAYTABLE pct::1 -agegr attitude

    Notice the minus sign before AGEGR in the last command. The row sums
    in this table are 100, and they are displayed to make the
    interpretation of the percentages clear. But the column sums (and the
    total sum) are irrelevant and therefore suppressed.

    Restrictions are taken into account. Units with one or both factor
    levels equal to zero, or with a missing value of the variate (if
    specified), are handled as non-present. Thus, if you want tables where
    factor level zero is taken into account, you must recode the factors
    first.

    WARNING. Most output producing commands in ISUW break up lines in such
    a way that the width of the output file does not exceed 80 characters.
    TWOWAYTABLE (and THREEWAYTABLE below) is an exception from this. If
    the second factor has many levels, a very wide table is produced, and
    this may result in lines of length > 80 (up to 1024, in fact). This
    gives you the option of producing such a table and later print it out
    in a readable format after some editing, like change to a smaller font.

--------------------------------------------------------------------------------
    THREEWAYTABLE

    Three-way tables of counts or variate sums.
................................................................................

    Syntax: THREEWAYTABLE [variate] factor1 factor2 factor3

    Exactly as TWOWAYTABLE, except that three factors are specified and a
    three way table is written. For each level of factor1, a factor2 by
    factor3 table is produced, and unless factor1 is preceeded by a minus
    sign, an additional factor2 by factor3 table of totals (summed over
    factor1) is written.

    Notice that level 0 of factor1 is excluded also in the "total" two-way
    table by factor2 and factor3, which comes last in the listing. Hence,
    this final table will not always coincide with the table one would get
    by

        TWOWAYTABLE [variate] factor2 factor3

--------------------------------------------------------------------------------
    TABULATE

    Computation and storage of counts or variate sums in a k-way table.
................................................................................

    Syntax:  TABULATE newvar[=oldvar] oldfactors [newfactors]

    EXAMPLE. If ROW and COL are existing factors of the same (arbitrary)
    length on 3 and 4 levels respectively, then

        TABULATE COUNT ROW*COL ROW1*COL1

    will do as follows. A variate COUNT and two factors ROW1 and COL1 of
    length 12 (=3*4) will be created, ROW1 on 3 and COL1 on 4 levels. If
    some of these exist already, they must be of correct types and
    dimensions. Factor levels will be generated such that each level
    combination occurs exactly once, COL1 varying fastest, and the
    corresponding counts of units in the original setting, corrected for
    restrictions (units with a level 0 counting as excluded) will be stored
    in COUNT.

    Generally, the two string parameters oldfactors and newfactors must
    contain the same number (not necessarily two) of factor names,
    separated by asterixes * or "pseudoblanks" | . The factors in
    oldfactors must be previously declared and of equal lengths, the names
    in newfactors and the new variate name(s) occurring in the first
    parameter must not be names of existing vectors, unless they just
    happen to be of correct types, lengths and numbers of levels (e.g.
    created by a similar TABULATE command). The common length of the
    vectors created becomes the product of the numbers of levels for the
    "old factors".

    TABULATE can also be used to form sums of values of a given variate.
    If COUNT above is replaced with Y_SUM=Y, for Y a variate of length
    equal to the lengths of the factors ROW and COL, then the values of
    Y_SUM will become the sums over the corresponding (product) factor
    levels of the values of Y. If Y was a variate filled with 1s, the
    result would be the same as above.

    The first parameter may contain several specifications separated by
    pseudoblanks. For example,

        TABULATE COUNT|Y_SUM=Y ROW*COL ROW1*COL1

    would perform both tasks mentioned above, and is thus equivalent to the
    two commands

        TABULATE COUNT    ROW*COL ROW1*COL1
        TABULATE Y_SUM=Y  ROW*COL ROW1*COL1

    In the last command here, we could actually have written

        TABULATE Y_SUM=Y  ROW*COL

    omitting the last argument ROW1*COL1. This is legal, and in general it
    implies that only the variate is formed. In the present case it makes
    no difference since the factors ROW1 and COL1 are generated by the
    first command.

    Restrictions are taken into account in the sense that non-present
    units are not counted, or the corresponding variate values are treated
    as zeroes in the summation. Missing values of a summand are treated as
    zeroes.

    WARNING. Notice that if restrictions are present, it will usually be
    unavoidable to continue with INCLUDEALL. The restrictions on the
    "long" vectors are not likely to be relevant for the resulting "short"
    vectors.

    EXAMPLE. To produce a list of average INCOMEs of persons in the 30
    groups of a 2 x 5 x 3 classification by factors SEX (2 levels), SITE (5
    levels) and SOC (3 levels), do something like this:

        EXCLUDEMISSING INCOME
        TABULATE COUNT|INCOME0=INCOME SEX*SITE*SOC SEX0*SITE0*SOC0
        INCLUDEALL
        INCOME0=INCOME0/COUNT
        LIST SEX0 SITE0 SOC0 INCOME0::0

    Notice the EXCLUDEMISSING and INCLUDEALL commands, which are required if
    INCOME has missing values. Without this, the COUNTs would include units
    for which INCOME were missing, and this would result in incorrect
    averages (since missing INCOMEs are treated as zeroes).

    EXAMPLE. If Y is a variate, R and C two (row and column) factors that
    arrange the values of Y in a balanced two-way table, fitted values in
    the additive two-way model (which would usually be computed by
    SAVEFITTED FIT after FITLIN Y=1+R+C) can be computed by

        TABULATE rowsums=y|rowcoun R
        TABULATE colsums=y|colcoun C
        rowmeans=rowsums/rowcoun
        colmeans=colsums/colcoun
        fit=rowmeans(r)+colmeans(c)-mean(y)

    (to understand the last line, see the section VECTORS AS FUNCTIONS OF
    UNIT INDEX in the description of the COMPUTE command).

--------------------------------------------------------------------------------
    PLOT

    Scatter plots on screen and paper.
................................................................................

    Syntax for 2-dimensional version:

        PLOT xvariates yvariates [colors [symbols]]

    and for the 3-d version:

        PLOT xvariates yvariates zvariates [colors]


    We begin with a description of the 2-dimensional version. A description
    of the modifications required for the 3-d version follows approximately
    11 screenfuls below.

    In the simplest case, xvariates and yvariates are just names of single
    variates and the two other parameters are not written, like

        PLOT X Y

    which will produce a scatter plot of the points ( X(i) , Y(i) ).
    Restrictions are obeyed in the sense that non-present points are not
    plotted, and also points with one or both coordinates missing are
    skipped. Endpoints of axis intervals are chosen in such a way that the
    coordinate frame becomes the smallest rectangle containing all the
    points to be plotted.

    The variate names can be extended by axis specifications of the form
    =LowerLimit,NumberOfLabels,UpperLimit. For example,

        PLOT  X=-1.0,10,9  Y

    implies that the horizontal axis will go from -1 to 9, labels will be
    displayed at integer multiples of 1=(9-(-1))/10, and one decimal after
    the comma will be written (since the lower bound for X is given with one
    decimal). A minus sign before the number of intervals (e.g. -10 instead
    of 10 in the example above) will produce lattice lines at the label
    points. Any field can be left empty, and the first field may contain a
    "pseudo number" determining only the number of digits. For example

        PLOT  X=-1.0,10,9.0  Y=d.dd,10,

    will imply that the vertical axis is eqipped with 10(+1) labels with 2
    digits after the decimal point, but the defaults MIN(Y) and MAX(Y) will
    be used as the limits, since these are not given.

    Points that are not within the specified limits (in one or both
    directions) are not plotted.

    A heading can be given in a separate FRAMETEXT command before the plot
    command. Similarly, XTEXT and YTEXT can be used to specify texts to be
    written at the two axis, otherwise the variate names are used.

    EXAMPLE.

        VAR X Y 100
        X=RANDOM
        Y=RANDOM
        FRAMETEXT 100 random points
        PLOT  X=0.00,10,1.00  Y=0.00,10,1.00 =red =*

    These five commands would produce something like this (except that the
    points will be red):

                                  100 random points
   1.00 |-----------*-------------------------*-------------------------------|
 Y      |                            *     *  *                               |
   0.90 | *           *          **                *       *                  |
        *                            *                                 *      |
   0.80 |             *       **      *                     *              *  |
        |               *              *    *       *   *               *     |
   0.70 |*                   *                   *              *           * |
        |  *                       *     *                                    |
   0.60 |        *     *     *                  *      **             *    *  |
        |                               *        *      *                     |
   0.50 | *      *      *                         *              *   *        |
        |*                        **   *       *       *                      |
   0.40 |                  *           *  *   *                               |
        |       *      *                             *        *  * *      *   |
   0.30 |     **          *        *          *  *                            |
        |     *                 *       *                                *    |
   0.20 |                  *                   *        *      *              |
        |              *          *   *        *        *                     |
   0.10 |     *    *      *                                   *            *  |
        |             *    *        *             *       *                   |
   0.00 |*-------*------------------------------------------------------------|
       0.00   0.10   0.20   0.30   0.40   0.50   0.60   0.70   0.80   0.90   1.0
                                           X

    In the above example, the last two parameters imply that the points will
    be plotted as red stars. More generally, the third and fourth parameters
    specify colors and symbols according to the following rules.

    The third parameter colors, if non-empty, can be specified as an
    equality sign followed by a color number or a color name. Color
    numbers and their names can be found in the table below.

    This is merely if you want to plot all points in a color different
    from the default 0 (=black). A more relevant application of the color
    specification is to choose color according to the level of a factor.
    The color parameter can be specified as the name of a factor of length
    equal to the common length of the two variates, followed by a comma
    separated list of color codes. For example, if WEIGHT and HEIGHT are
    variates, SEX a factor on two levels, all of the same length, then

        PLOT HEIGHT WEIGHT SEX=,9,12

    will produce a scatter plot with light blue points for SEX=1 and light
    red points for SEX=2. You can also let ISUW assign the colors in
    standard order, by

        PLOT HEIGHT WEIGHT SEX

    which is equivalent to

        PLOT HEIGHT WEIGHT SEX=15,1,2

    In general, if only the factor is specified, but not the colors, the list
    '15,1,2,3,4,5,6,7,8,9,10,11,12,13,14,0,0,0,0...' is implyed.

    By changing 15 to something else you could select a color different from
    15 (white, no symbol plotted) for the factor level 0.

    TABLE OF COLORS.

                       No.   Name

                        0    BLACK
                        1    BLUE
                        2    GREEN
                        3    CYAN
                        4    RED
                        5    MAGENTA
                        6    BROWN
                        7    GRAY or LIGHTGRAY
                        8    DARKGRAY
                        9    LIGHTBLUE
                       10    LIGHTGREEN
                       11    LIGHTCYAN
                       12    LIGHTRED
                       13    LIGHTMAGENTA
                       14    YELLOW
                       15    NONE or WHITE

    Names of colors can be used instead of the numbers. Only the first eight
    characters of a color name need to be specified.

    To see the 16 colors of ISUW, copy and paste the following program into
    the editor and RUN it:

        fac $colfac 16 15
        $colfac=#-1
        var $x $y 16
        $x=$colfac
        $y=1
        xtext Colors
        ytext |
        frametext Colors in ISUW
        plot $x $y=0 $colfac=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 #

    The fourth and last parameter symbols has a similar role as the color
    parameter, but it determines the plot symbol instead of the color. For
    example (referring to the WEIGHT HEIGHT SEX example above),

        PLOT HEIGHT WEIGHT | SEX=,+,o

    would plot points corresponding to SEX=1 as plusses, and points with
    SEX=2 as small circles. If the list of symbols is omitted,
    '=0,1,2,3,4,5,6,7,8,2,2,2...' is implied.

    Notice the "pseudo blank" | occuring here as parameter 3. It simply
    means that we want the default color (black).

    Symbols can be identified by numbers or names, according to the
    following table:

    Table of plot symbols:

                       No.      Name

                        0       NONE
                        1       X or CROSS
                        2       + or PLUS
                        3       O or CIRCLE
                        4       * or STAR
                        5       DELTA or TRIANGLE
                        6       NAPLA            {upside down triangle}
                        7       SQUARE
                        8       DIAMOND          {45 degrees rotated square}

    To see the 8 symbols and 16 colors, copy and paste the following program
    into the editor and RUN it:

        fac $colfac 16*9 16
        gen $colfac 1
        fac $symfac 16*9 9
        gen $symfac 16
        var $col $sym 16*9
        $col=$colfac-1
        $sym=$symfac-1
        xtext Colors
        ytext Symbols
        frametext Colors and plot symbols in ISUW
        plot $col=-1,17,16 \
        $sym=-1,10,9 \
        $colfac=,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 \
        $symfac=,1,2,3,4,5,6,7,8

    A special use of the symbol parameter takes the form

       =L

    which has the effect that points on the same level of the factor are
    connected by lines, provided that they come right after each other in
    the ordering by unit number. Thus, the effect of this depends strongly
    on the order of units. A sorting by the factor with the X-variate as the
    secondary criterion is the most common application, drawing factor
    groups as broken lines (Y-variate as a function of X-variate).

    EXAMPLE. Suppose that TIME and TEMP are variates, LOCALITY a factor on 4
    levels, all of the same length. The commands

        SORT LOCALITY TIME TEMP
        PLOT TIME TEMP LOCALITY=12,13,14,0 LOCALITY=L

    will produce a plot where, for each LOCALITY, TEMP is drawn as a
    (linearly interpolated) function of TIME, and the color (light red,
    light magenta, yellow, black) follows the LOCALITY.

    As for the color parameter, the factor identifier in "symbols" can be
    omitted, meaning that a factor with constant level zero is assumed. For
    example, to connect all points by a broken line use

        PLOT X Y | =L

    Another special use of the symbol parameter, which involves no factor
    name and no equality sign, takes the form

        +v1-v2  (or, equivalently,  -v2+v1 )

    where v1 and v2 are names of variates of the same length as xvariates
    and yvariates. This is used when vertical lines through the points
    should be drawn to indicate e.g. confidence bounds. The typical
    application (for symmetric confidence intervals) is

        PLOT  X  Y  ...  +SD-SD

    where SD is a variate holding standard deviations or double standard
    deviations.

    In the typical application, both variates will have non-negative values.
    Hence, it is a natural requirement that the two signs must be different
    when two variates are specified. If only one is specified, a half line
    from each point is drawn, the sign determining which way the line goes.
    Notice that

        PLOT X Y=0 =14 -Y

    will  produce  a  plot  with  (yellow) points sitting on top of "sticks"
    (only relevant if Y is nonnegative).

    A third option for the last parameter "symbols" is to let it consist
    of the single character #. In this case each point is represented by a
    box standing on the x-axis, of suitable width with the invisible point
    right in the middle of its top. The width of these boxes becomes 0.8
    times the range for the x-variate, divided by the number of units
    present. This is usually only relevant if the values of the x-variate
    are equidistant. Notice that the y-axis bounds are determined as usual
    if nothing else is specified. Usually, a specification of the form
    y=0,... is required if the lower endpoint of the y-axis should be 0.
    This can be used to draw histograms when counts (or percentages) are
    given as the values of a variate (e.g. produced by a TABULATE command).

    Notice that automatic definition of x-axis bounds as Max and Min in this
    case would imply that the first and last box hang halfway outside the
    frame. For this reason, a small correction is made in this case. But for
    overlayed plots, this will only work if the histogram is the first plot
    in the parallel list (see below).

    If colors are specified, they will be used as fill colors of the boxes. If
    you produce 'stacked' histograms by overlaying such plots (see below) take
    care that the histograms with the lower boxes come after those with the
    higher boxes. Similarly, if you plot a histogram together with a curve -
    e.g. to show the fit of a distribution - plot the curve after the
    histogram, unless you want to hide it partially behind the boxes.


    OVERLAYED PLOTS.

    Overlayed plots are produced by 'merging' of PLOT commands as follows.
    Let the four parameters xvariates, yvariates, colors and symbols each
    contain two, three or more parallel specifications, separated by
    pseudo blanks. For example,

        PLOT  X=0.0,5,10|X  Y|FITTED  GROUP=,14,13,12|=WHITE  =|GROUP=L

    will (roughly) overlay the results of the two commands

        PLOT  X=0.0,5,10    Y         GROUP=,14,13,12         =
        PLOT             X    FITTED                  =WHITE    GROUP=L

    Notice the use of an equality sign alone as indicating an empty
    element of a parallel list. The limits and labelling of axes are taken
    from the specifications in xvariates and yvariates for the first
    element of the parallel lists, later specifications are ignored.
    Notice that such specifications are usually necessary, unless the
    ranges of variation for the first elements happens to cover the ranges
    for later elements. This is why Y comes before FITTED in the above
    example (but things may go wrong, also in this case).


    HARDCOPIES.

    The immedeate effect of a PLOT or (see below) HISTOGRAM command is
    that a picture is displayed on the screen. The picture is removed by
    Escape. There are two ways of getting the plots out on paper. The
    simplest is to press Return instead of Escape to remove the picture
    from the screen. This brings up a "Save picture as ..." dialog box, in
    which you can select a file name and save the picture as a *.JPG file.
    A *.BMP (bitmap) file can also be selected, but these files are much
    bigger, and since most image processing programs can import both
    types, there is generally no reason to do so. If, for some reason, you
    need to produce a *.BMP file, you should be aware that this will only
    work if you write the file name without the extension or explicitely
    with extension .BMP. Both file types can be imported to image
    processing programs and most text handling programs running under
    Windows, where they can be modified, merged with text and other
    pictures and printed out.

    To produce graphics of a somewhat higher resolution (but without
    colors), optionally with several plots per page, you can use the
    commands OPENPS, PSFRAME and CLOSEPS for handling of PostScript files.
    For example, to produce a single plot in landscape format on a page,
    use

    OPENPS filename                { opens a file for PostScript output }
    PSFRAME 1 1                    { selects "first picture out of one" }
    PLOT ...
    CLOSEPS                        { closes the file }

    After this, the file filename.PS will contain PostScript code that can
    be sent directly to a PostScript printer. OPENPS can also produce
    encapsulated PostScript files, which can be imported e.g. by Microsoft
    Word. See the command descriptions.

    When PostScript code is generated, color codes are handled as follows.
    When points are plotted, all colors are translated to black, except no.
    15 which is translated to white (no point plotted). For boxes (symbol
    code #), where the color usually controls the fill color, color codes
    are translated to "whiteness" proportionally to their numerical values,
    with 0 meaning black and 15 meaning white.


    THE 3-DIMENSIONAL VERSION OF THE PLOT COMMAND.

    This is activated when the third parameter is (or begins with) a variate
    name. In the simplest case

        PLOT X Y Z

    a 3-dimensional scatter plot is produced. As for the 2-d version, the
    variates can be extended by information about their domain and the
    desired numbers of cutpoints, and the four parameters may be parallel
    lists of any (common) length, to produce overlayed 3-d-plots. The fourth
    parameter "colors" determines the colors of points. The syntax here is
    exactly the same as for the correponding parameter in the 2-d version.

    The differences from the 2-d version are explained here:

    The plot symbol is fixed (a three-dimensional cross), and points can not
    be connected by lines.

    Negative numbers of cutpoints are interpreted as 0, except in the third
    parameter zvariates, where the sign has an effect quite different
    from the one it has for the 2-d version, see below. The effect of
    specifying numbers of cutpoints is that whenever two of the three
    variates have a positive number of cutpoints, the corresponding lattice
    will be drawn on the corresponding (back/bottom/left) side of the frame
    box. For example,

        PLOT  X=0,10,1   Y=0,10,1   Z

    or equivalently

        PLOT  X=0,10,1   Y=0,10,1   Z=,0

    will produce a plot with a 10 by 10 lattice drawn at the bottom of the
    frame box. If instead

        PLOT  X=0,10,1   Y=0,10,1   Z=,1

    is used, vertical lines will also be drawn on the back and left sides of
    the frame box.

    A minus sign before the number of cutpoints for the Z-variate has the
    effect that vertical "sticks" are drawn from points to the bottom of the
    frame box. Similarly, a plus sign here has the effect that the points
    will "hang in strings from the roof". This option may be specified
    without an actual number of cutpoints, like

        PLOT  X   Y   Z=,+

    For hardcopies, this option is recommended, because it is the only way
    to give the copies a taste of the 3-d perspective (since it does not
    help much to rotate the paper).

    Once the picture is on the screen, the following keys are active (this
    applies also to the 3-d version of HISTOGRAM):

      - The four cursor movement keys rotate the frame box (or rather:
        moves the flying observer around the coordinate box).

      - The keys + and - on the numerical keyboard zoom and unzoom.

      - Escape or Return terminates. Use Return if you want to save
        the final picture as a .JPG file.

    Labels and axis titles can not be created by the 3-d version of the PLOT
    command. The initial picture on the screen tells which axis is which,
    but after this you are on your own. Here some lattice lines may help
    you to avoid losing orientation.

    Hardcopies of the final picture via .JPG files or .BMP files can be
    produced exactly as described above, but PostScript code can not. A
    PostScript output file may be open, but 3-d plots will not write any
    code to it.

    3-d can only be produced in interactive mode, not in programs.

--------------------------------------------------------------------------------
    HISTOGRAM

    Histograms on screen and paper.
................................................................................

    Syntax for 2-dimensional version:

            HISTOGRAM [weight*]variate [factor]

    and  for  the  3-dimensional  version (3-d histogram for a 2-dimensional
    empirical distribution)

            HISTOGRAM variate1 variate2

    Without the second parameter "factor" and the specification of
    "weight", the command produces an ordinary histogram, showing the
    empirical distribution of the (present and non-missing) values of the
    variate.

    Notice that HISTOGRAM computes the counts. If the counts are known in
    advance and stored in a variate (for example by a TABULATE command), use
    PLOT instead (with fourth parameter '#', see 6 screenfuls above). Or
    use the "weighted" version of HISTOGRAM explained 1-2 screenfuls below.

    If the second parameter "factor" is included and is the name of a
    factor of the same length, one histogram for each factor level except
    level 0 is produced, for comparison of the distributions of variate
    values on the different factor levels. The factor name can be extended
    by a list of level names as usual. However, level 0 is always ignored.
    In some cases (long level names, many intervals) the level names are
    written in such a way that the first box is more or less destroyed by
    the text string.

    The variate name can be extended by a specification of the form

        =lower,intervals,upper

    to determine the number of intervals and its endpoints. For example,

        HIST AGE=20,16,100 SEX

    will produce a histogram for each SEX, with the AGE interval [20,100]
    divided into 5-year groups. AGEs outside the interval [20,100] are
    ignored. If no extension of this form is given, min(AGE) and max(AGE)
    are taken as the endpoints, and the number of intervals is suitably
    chosen. This specification may be more or less incomplete. For example,

        HIST AGE=d.dd,16 SEX

    would use the default endpoints max(AGE) and min(AGE), but the number
    of intervals would become 16. The layout of the "pseudo left endpoint"
    d.dd implies that x-axes labels are written with two significant
    digits after the decimal point.

    The first parameter may also be the name of a factor. In this case,
    the factor is interpreted as a variate with integer values
    (0,)1,2,..., and the histogram is drawn in the obvious way, with one
    box for each level of the factor. However, if the factor name is
    extended by a list of level names, these names will be written under
    the frame instead of the integer levels. In this case, a non-empty
    level name for level 0 will imply that a box for level 0 is also
    drawn. For a factor with many levels, the level names must be short.
    If the number of levels exceeds, say, 15, it is usually preferable to
    let the procedure write the integer levels.

    Weights. In the versions of the HISTOGRAM command explained above, the
    heights of the boxes drawn are counts of units, summing up to the
    number of units present and non-missing in the variate or factor given
    as the first argument. Sometimes, in particular when dealing with
    aggregated data obtained by grouping of a variate or factor, it is
    desirable to make a histogram where each unit i counts some value w[i]
    instead of 1. The syntax for this is to "multiply" the first argument
    by this "weight" variate from the left.

    EXAMPLE. To display a distribution of individuals according to the
    levels of a factor AGEGR for each SEX level we can use

        HIST AGEGR SEX

    Suppose, however, that we have at our disposal only the counts in
    the AGEGR*SEX groups - for example as they would be after

        TABULATE COUNT AGEGR*SEX AGEGR0*SEX0

    To produce the same two parallel histograms as before, we could then
    use

        HIST COUNT*AGEGR0 SEX0

    To produce a single histogram, showing the distribution of individuals
    in age groups with the vertical axis scaled such that the box heights
    are percentages (summing up to 100), we could use

        PERCENT=100*COUNT/sum(COUNT)
        HIST PERCENT*AGEGR0



    THE 3-DIMENSIONAL VERSION OF THE HISTOGRAM COMMAND.

    If the second parameter is included and is the name of a variate, a
    different StatUnit procedure, making a 3-dimensional picture of the
    2-dimensional histogram, is activated. The two variate names can be
    extended by limits etc. as described above. The picture can be rotated,
    zoomed/unzoomed etc. as described for the 3-d version of the PLOT
    command, see approximately 4 screenfuls above. Weights can not be
    used, and parallel histograms can not be produced.

    HARDCOPIES are produced exactly as for PLOTs. But the 3-d version of
    the HISTOGRAM command can not be used in programs, and cannot produce
    PostScript code

--------------------------------------------------------------------------------
    FRAMETEXT

    Text above frame for 2-d versions of PLOT and HISTOGRAM.
................................................................................

    Syntax: FRAMETEXT [text]

    The default action of PLOT is to draw scatter plots without any heading.
    For HISTOGRAM, the default is "Histogram for ", or (in
    case of two parameters "Histogram for  by ".
    These defaults are suppressed as long as a FRAMETEXT command is in
    force, which remains until the command

        FRAMETEXT

    (without parameters) or a DELETE command without parameters is given.

    Multiple blanks in a string are usually removed, but you can use "pseudo
    blanks" to enforce multiple blanks. In particular,

        FRAMETEXT |

    will suppress headings entirely in the following PLOT and HISTOGRAM
    commands.

--------------------------------------------------------------------------------
    XTEXT

    Text below frame for PLOT and HISTOGRAM.
................................................................................

    Syntax: XTEXT [text]

    The default action of PLOT is to write the variate name(s) at the
    x-axis. For HISTOGRAM, the default is to write nothing there in case of
    a single parameter, and the name of the variate if two parameters are
    specified. These defaults are overwritten by an XTEXT command. Other
    rules (for pseudo blanks, cancellation of action etc.) are exactly as
    for FRAMETEXT above.

--------------------------------------------------------------------------------
    YTEXT

    Text at vertical axis for PLOT and HISTOGRAM.
................................................................................

    Syntax: YTEXT [text]

    The default action of PLOT is to write the variate name(s) at the
    y-axis. For HISTOGRAM, the default is to write "Count" there in case of
    a single parameter, the factor name if two parameters are given. These
    defaults are overwritten by a YTEXT command. Other rules (for pseudo
    blanks, cancellation of action etc.) are exactly as for FRAMETEXT above.

    The command

        YTEXT |

    is often useful for enlargement of the labels on the y-axis.

--------------------------------------------------------------------------------
    OPENPSFILE

    Open file for output of PostScript code
................................................................................

    Syntax: OPENPSFILE [filename]

    If the file name is empty the default name ISUW.PS is used.

    Use this together with PSFRAME and CLOSEPSFILE to send graphics to a
    printer in PostScript format.

    EXAMPLE (two plots on a page).

        OPENPS
        PSFRAME 1 2           {can be omitted, since this is the default}
        PLOT ...  or HIST ...
        PSFRAME 2 2           { "second of two", i.e. lower half of the paper }
        PLOT ...  or HIST ...
        CLOSEPS

    ... to be followed e.g. by COPY ISUW.PS LPT2 from a command window, or
    a similar operation via Windows explore, GhostScript or whatever.

    If "filename" is specified explicitely with extension .EPS,
    encapsulated PostScript code is produced. This is only for a single
    plot per page (the PSFRAME command can not be used), but these files
    can be imported to some text handling programs.

    When a PostScript file is open in interactive mode, the images
    produced are shown on the screen as usual, and you still have the
    option of saving them (in addition) as .JPG or .BMP files. In programs
    this option is not available. The images are formed on the screen as
    usual, but they are removed immediately. This enables you to write
    programs that can run unattended while they produce one or more
    PostScipt pages.

--------------------------------------------------------------------------------
    PSFRAME

    Selecting the position on the page for PostScript graphichs output.
................................................................................

    Syntax: PSFRAME select total

    The two parameters must be integer or integer expressions, satisfying

        select = 1,2,...,total,
        total  = 1, 2, 4, 8, 9, 16, 18, 25, 32, 36, 49, 50, 64 or 72.

    The idea is that the command selects frame no. "select" out of "total"
    equally sized frames on the paper. The values total = 2, 8, 18, 32,
    50, 72 (double squares) produce pages in "portrait" format. The
    remaining values 1, 4, 9, 16, 25, 36, 49 and 64 (squares) produce
    pages in "landscape" format.

                                                 -------------
    EXAMPLE.    To produce  3 plots in a         |  1  |  2  |
    4 x 2 arrangement, leaving the lower         -------------
    half   of  the  paper  and the lower         |  3  |     |
    right corner of the upper half blank,        -------------
    do something like this:                      |     |     |
                                                 -------------
                                                 |     |     |
        OPENPS                                   -------------
        PSFRAME 1 8
        PLOT ... { 1 }
        PSFRAME 2 8
        PLOT ... { 2 }
        PSFRAME 3 8
        PLOT ... { 3 }
        CLOSEPS

    A special version of the command takes the form

        PSFRAME # total

    Here, the number of frames on the page is given by the last parameter,
    but the specification of the frame by the symbol # implies that the
    first parameter "select" will take the values 1, 2, ..., shifting by 1
    each time a PLOT or HISTOGRAM command is met.

    EXAMPLES. In the example above, we could obtain exactly the same by

        OPENPS
        PSFRAME # 8
        PLOT ... { 1 }
        PLOT ... { 2 }
        PLOT ... { 3 }
        CLOSEPS

    The following program generates a PostScript page with 32 probit
    diagrams for 100 simulated normal observations.

        DEL
        VAR x probit 100
        probit=phiinv(#/101)
        OPENPS
        $i=0
          %%%
          $i=$i+1
          PSFRAME $i 32
          x=normal
          SORT x
          PLOT x probit =15 =L
          GOTO %%% $i<32
        CLOSEPS

    Here, we could also have put the PSFRAME command before the loop if we
    had given it the form PSFRAME # 32.

    If no PSFRAME command is used, ISUW uses the default which corresponds
    to PSFRAME 1 2. The command can not be used when OPENPS has specified
    an encapsulated postscript file (extension .eps).

--------------------------------------------------------------------------------
    CLOSEPSFILE

    Close file for PostScript graphics output.
................................................................................

    No parameters.

    Closes the file for output of PostScript code. Use with OPENPSFILE,
    PSFRAME, PLOT and HISTOGRAM.

--------------------------------------------------------------------------------
    SORT

    Parallel sorting of vectors.
................................................................................

    Syntax: SORT vector1 [vector2 [...]] [+]

    The vectors occurring as parameters must be of the same length. The
    effect of this command is that the vectors are sorted in parallel,
    such that the resulting vectors are ordered increasingly, primarily by
    values/levels of vector1, then (within constant value/level of
    vector1) by vector2, etc. etc. A final plus sign has the effect that
    all other vectors of the same length are sorted in parallel with those
    in the list; but the ordering within tie groups (if any) determined by
    the vectors in the list, becomes arbitrary.

    EXAMPLE. Let X be a variate and SEX a factor on two levels, both of
    length 10, with values/levels as LISTed here:

        SEX          X
          1     1.5933
          1     1.8114
          1     1.5953
          2     2.0208
          2     1.6493
          1     1.9338
          1     1.9628
          2     1.2509
          1     1.2309
          2     1.5081

    Then, after the command

        SORT SEX X

    a similar LISTing would look like this:

        SEX          X
          1     1.2309
          1     1.5933
          1     1.5953
          1     1.8114
          1     1.9338
          1     1.9628
          2     1.2509
          2     1.5081
          2     1.6493
          2     2.0208

    In this case, you might want to perform the sorting by SEX only,
    preserving the order of X-values unchanged within SEX-groups. However,
    you can not do this by just writing SORT SEX, because only the vectors
    in the list are sorted; nor can you do it by SORT SEX +, because the
    QuickSort algorithm will create arbitrary permutations of units within
    SEX groups. A solution to this problem goes as follows: Create (if it
    doesn't exist already) a vector which is ordered by unit number, for
    example by

        VAR UNIT 10
        COMPUTE UNIT=#

    Include this as the second vector in the list,

        SORT SEX UNIT X

    or just (if other vectors of length 10 are present and should be sorted
    in parallel)

        SORT SEX UNIT +

    With the original unit number as the secondary criterion, the original
    order within SEX-groups is preserved.

    Notice that a SORT command without a final plus sign should usually have
    all vectors of the relevant length in the list of parameters. Otherwise,
    the unit-to-unit correspondance between vectors is lost, and this is
    rarely useful. If you make a sorting without some vectors of the
    relevant length, a warning is given.

    SORT ignores restrictions, and if restrictions are present the
    restriction indicator is NOT sorted in parallel. Thus, the excluded
    units, if any, are not the same as before. For this reason, a warning
    is given if restrictions are present under a SORT command. If you
    forgot to do it before, you may as well right after a SORT command
    perform an INCLUDEALL command.

    If you want to preserve the restrictions, you will have to construct your
    own "restriction indicator". For example by

        FAC PRESENT N 1     { where N stands for the common length of     }
                            { the vectors to be sorted.                   }

        COMPUTE PRESENT=1   { PRESENT becomes the "presence indicator",   }
                            { since the default value 0 remains unchanged }
                            { for the non-present units.                  }

        INCLUDEALL          { this command can also be placed after the   }
                            { SORT command, it doesn't matter.            }

        SORT ...            { where the argument list should include      }
                            { PRESENT or end with a +                     }

        EXCLUDE PRESENT=0

        DELETE PRESENT

    Missing values of variates are treated according to their physical
    representation, which is the numerical value -1E-37. Thus, after a
    SORTing by values of a variate, the missing values will occur after the
    negative values and before the zeroes.

--------------------------------------------------------------------------------
    COMPUTE

    Computation of variates and factors from other variates and factors.
................................................................................

    Syntax:   COMPUTE vectorname=expression

    or        COMPUTE vectorname(index) = expression

    or        COMPUTE expression[:[width]:[decimals]]

    In interactive mode, the command name COMPUTE can be replaced with a
    blank in position 1 of the command window. In programs you may even omit
    the blank provided that the COMPUTE command has a left hand side. In
    this case the command is identified by the equality sign after the first
    connected string.

    In interactive mode you can generate a COMPUTE command from position 1
    by = (with equality sign) or + (without equality sign). In programs,
    COMPUTE commands without a left hand side can be preceeded by = or +
    instead of the command name.

    In the simplest case, the action taken by COMPUTE is a unit-by-unit
    computation of values of the vector on the left hand side. For example,
    if Y is a variate of length 100, the statement

         LOGY=ln(Y)/ln(10)

    will give LOGY values ln(Y(1))/ln(10), ... ,ln(Y(100))/ln(10). If LOGY
    is declared in advance it must be a variate of length 100, otherwise it
    will be declared as such. The expression on the right hand side may
    involve the algebraic operators +, -, *, / and ^ (meaning "raised to the
    power", e.g.(-2)^3=-8), six relational operators (see 2 screenfuls below),
    explicit constants, standard functions (see 5 screenfuls below) and
    parentheses as necessary. All the vectors occurring on the right hand
    side must be existing vectors of the same length, with some exceptions
    to be explained later. We call them "parallel" vectors, to emphasise the
    one-to-one correspondence between their entries and the entries of the
    resulting vector on the left hand side.

    The result of a COMPUTE command without a left hand side is that the
    result is computed and displayed, rather than stored in a vector. For
    example, the result of the command

        COMPUTE X/Y:6:2

    for existing variates X and Y of the same length is roughly the same as
    you would obtain by

        COMPUTE RATIO=X/Y
        LIST1 RATIO:6:2
        DELETE RATIO

    For this reason, most of what follows is only explained for the case
    where a left hand side is present.


    MISSING VALUES.

    Missing values are taken into account by COMPUTE in the sense that the
    result will always become missing for entries where one of the parallel
    vectors on the right hand side has a missing value. Even if you write

        COMPUTE Y=X+0*Z

    a missing value of Z will result in a missing value of Y.

    Algebraically or numerically undefined quantities, such as ln(0) or
    ln(-4), sqrt(12-20), (-7)^1.3, exp(100) etc. etc., are set to the
    missing value, and a warning about this is given.


    RESTRICTIONS.

    Restrictions are obeyed in the sense that for non-present units the
    vector on the left hand side is left unchanged. An exception from this
    occurs when the result is a variate of length 1, this will always be
    computed.

    EXAMPLE. The statements

        FOCUSONLEVEL F 3
        COMPUTE Y=1/0
        INCLUDEALL

    can  be  used  to  give Y missing values for all units on level 3 of the
    factor F. However, a shorter way of doing this is by the single command

        COMPUTE Y=Y/(F<>3)


    RELATIONAL OPERATORS.

    In the command line above, the denominator (F<>3) becomes 1 for units
    on an F-level different from 3, 0 for units on level 3. The following
    six relational operators

        =     equal to
        <>    different from
        <     less than
        <=    less than or equal to
        >     greater than
        >=    greater than or equal to

    can be used on the right hand side of a COMPUTE command, and the
    resulting boolean expressions are given values 1 (for TRUE) or 0 (for
    FALSE). For example,

        COMPUTE INDIC=exp(X)>Y+3

    will return a variate INDIC of zeroes and ones, 1 when exp(X)>Y+3. As
    a more complicated example,

        COMPUTE MINXY=(X<=Y)*X+(Y1 on the right hand
    side. If the result is to become a factor, it must be declared in
    advance, see 6-7 screenfuls below.


    FACTORS ON THE RIGHT HAND SIDE.

    Factor dummies can be written like (F=2), meaning the variate which is 1
    when the factor F takes the level 2, 0 otherwise. More generally,
    factors of the correct length may occur on the right hand side, where
    they are interpreted as parallel variates with their numerical levels as
    values.

    EXAMPLE. If F is a factor on three levels,

        COMPUTE X= 1.1*(F=1) + 2.3*(F=2) + 4.8*(F=3)

    will result in a variate X with values 1.1, 2.3 and 4.8, determined by
    the levels of F.


    UNIT-BY-UNIT FUNCTIONS.

    Vector valued functions, operating unit by unit, are

        EXP()         the exponential function

        LN()          log (base e, use LN()/LN(10) if you want base 10)

        SQR()         x -> x*x

        SQRT()        x -> square root of x

        ABS()         x -> |x|

        SIN()
        COS()         wellknown trigonometric functions
        ARCTAN()

        INT()         x -> [x]
                      (integer part, upwards rounding for negative argument)

        PHI()         The c.d.f. of the normalised normal distribution.

        PHIINV()      The inverse of PHI().

        NORMAL        simulated standardised normal values.

        POISSON()     random Poisson distributed values with the argument as
                      parameter. If the parameter is 0 or negative, 0 is
                      returned.

        RANDOM        random uniform on [0,1]

    EXAMPLES.

    To generate discretely uniform random values in the range 1, 2, ... , 6,
    say, use construcions like

        FAC DICE 10000 6
        COMPUTE DICE=1+INT(6*RANDOM)

    To fill an existing variate Y with random zeroes and ones, 1 occurring
    with probability 1/3, write

        COMPUTE Y=RANDOM<1/3

    If FITTED is a variate holding supposedly correct means in a
    multiplicative Poisson model (produced e.g. by FITLOGLINEAR ... and
    SAVEFITTED FITTED), then

        COMPUTE SIMDATA=POISSON(FITTED)

    will produce a simulated response variate under the estimated model.

    To make a probit diagram for the observations in a variate X write
    something like

        SORT X +
        COMPUTE N=##(X)                     { ##() is explained below }
        VAR PROBIT N
        COMPUTE PROBIT=PHIINV(#/(N+1))      { # is explained below }
        PLOT X PROBIT


    SCALARS ON THE RIGHT HAND SIDE.

    Until now, all vectors occurring on the right hand side have been
    assumed to be parallel vectors, i.e. vectors whose lengths must coincide
    with (and sometimes determine) the length of the resulting vector on the
    left hand side. Here comes the first exception: Variates of length 1 may
    occur on the right hand side, where they are treated exactly as
    explicitely written constants. The length of the resulting vector will
    then be determined by other parallel vectors, or by its own length if it
    exists. For example,

        VAR X 100
        COMPUTE A=2
        COMPUTE X=#^A     { # is explained below, but have a guess ... }

    will create a vector of length 100 with values 1, 4, 9, ... , 10000,
    provided that A has not been created earlier as something else than a
    variate of length 1.


    SCALAR VALUED VECTOR FUNCTIONS.

    The following scalar valued functions are available:

    SUM() returns the sum of the (present and non-missing) values of a
    variate.

    MEAN() returns the average of the (present and non-missing) values of a
    variate. If no values are present and non-missing, 0 is returned.

    MIN() and MAX() return minimum and maximum of the (present and
    non-missing) values of a variate. If no values are present and
    non-missing, 0 is returned.

    VARIANCE() returns the (denominator n-1) sample variance of the
    (present and non-missing) values of a variate. If no values or only a
    single value are present and non-missing, 0 is returned.

    ##() returns the full length of a vector, without correction for
    restrictions or missing values. If the argument is the name of a vector
    that does not exist, 0 is returned. This convention is useful if you
    want to check for existence of a vector in a program.

    #() returns the number of units present of a variate or factor. For
    variates, missing values are counted as non-present. But units with
    level 0 for factors are regarded as present. Thus, if F is a factor
    and no restrictions are present, we have ##(F) = #(F).

    The argument of a scalar-valued function must be the name of a vector,
    not an expression, and this vector is NOT a parallel vector, i.e. it
    may be of any length, and this length has no influence on automatic
    declaration of the vector on the left hand side.

    EXAMPLE. if X is a variate of length 100, and SD is not declared (or of
    length 1), you can write

        COMPUTE SD=sqrt(variance(X))

    to obtain the standard deviation (as a variate of length 1), and then

        COMPUTE X0=(X-mean(X))/SD

    to compute the vector X0 of standardised values. This can also be done
    in a single step by

        COMPUTE X0=(X-mean(X))/sqrt(variance(X))


    THE UNIT INDEX # AND THE NUMBER OF UNITS ##.

    The identifier # (not to be confused with the scalar valued vector
    function #() ) has a special meaning as a (non-existing) variate with
    the values 1, 2, 3,... . Writing e.g., for an existing variate UNIT of
    length 100,

        COMPUTE UNIT=#

    the variate UNIT will get values 1, 2, ... , 100.

    The identifier ## (not to be confused with the scalar valued vector
    function ##() ) has another special meaning, as the length of the
    resulting left hand side. Thus, if you write, for an existing vector X
    of length N,

        COMPUTE X=#/##

    you will give X the values 1/N, 2/N, ... , 1.

    EXAMPLE. Suppose we have a vector X of length 100, and want to split it
    up in two vectors, one containing the odd-numbered entries and the other
    containing the even. This can be done by

        VARIATE X1 X2 50
        COMPUTE X1=X(2*#-1)
        COMPUTE X2=X(2*#)

    (cfr. VECTORS AS FUNCTIONS OF UNIT INDEX a few screenfuls below)


    VECTOR CONSTANTS.

    A list of real numbers, seperated by blanks and embraced by brackets
    [], can be used in COMPUTE commands to represent an unnamed vector
    with given values. For example, to declare and simultaneously give
    values to (short) variates, simply use commands like

        COMPUTE X=[1.1 -1.2 0.2 3.4]

    EXAMPLE. In an earlier example, the statement

        COMPUTE X= 1.1*(F=1) + 2.3*(F=2) + 4.8*(F=3)

    was suggested as a way of giving values 1.1, 2.3 and 4.8 to a variate,
    depending on the level of a factor. An easier solution is

        COMPUTE X=[1.1 2.3 4.8](F)

    (cfr. VECTORS AS FUNCTIONS OF UNIT INDEX a few lines below)

    It is also possible to include vector names in the list, representing
    the list of the vectors values (or levels, in case of a factor). For
    example

        COMPUTE A=[1 2 3]
        COMPUTE B=[a 0 a]

    will result in a vector B of length 7 with values 1 2 3 0 1 2 3.
    However, these "bracketed lists" must not (and need not) be nested.
    For example,

        COMPUTE B=[[1 2 3] 0 [1 2 3]]

    would NOT work.


    FACTORS ON THE LEFT HAND SIDE.

    The vector on the left hand side may be a factor. If the result is
    non-integer or out of range, the level 0 will be assigned, and a warning
    will be given.

    EXAMPLE. If F is a factor of length 100 on 5 levels, and you want to
    collapse it to a factor G on three levels, representing the groups {1},
    {2,3} and {4,5} of F-levels, you can write

        FACTOR G 100 3
        COMPUTE G=[1 2 2 3 3](F)

    - where the explanation of the last line follows now.


    VECTORS AS FUNCTIONS OF UNIT INDEX.

    Variates and factors may occur as "functions" on the right hand side. In
    this case the argument must be integer and is interpreted as a unit
    index. The vector is NOT a parallel vector in this case (but its
    argument may very well be so). The following example illustrates this
    point.

    EXAMPLE. The "display" command (i.e. a COMPUTE command without a left
    hand side, here extended by a format)

        COMPUTE [10 20 30 40 50]([1 5 2]):3:0

    will produce the output

          10  50  20

    A more useful example follows here.

    EXAMPLE. If X and X1 are variates of the same length,

        COMPUTE X1=X(#-1)

    will give X1 the "lagged" values of X. The first value X1(1) will become
    missing, because X(0) is undefined (and a warning about this will be
    given). Notice that X1 must be declared first, since X on the rigth hand
    side is not a parallel vector. If X1 was undeclared, the statement would
    result in a variate of length one with a single missing value.

    WARNING. The statement

        COMPUTE X=X(#-1)

    will not work as - perhaps - expected, since the computations are
    performed unit by unit in the natural order. This statement would
    actually result in a vector of missing values, since we would get

        first for unit 1         X(1) = X(0) = *
        then  for unit 2         X(2) = X(1) = *
        then  for unit 3         X(3) = X(2) = *
        etc. etc.

    A similar warning comes here: Suppose you want to transform a variate X
    by subtraction of its first value from all entries. Then

        COMPUTE X=X-X(1)

    will not work, because X(1) is set to zero before the later entries are
    computed. Instead, you would have to do something like

        COMPUTE X1=X(1)        { X1 undeclared, thus becoming of length 1 }
        COMPUTE X=X-X1
        DEL X1

    In general one has to be very careful when the variate on the left hand
    side occurs as a function on the right hand side. The computations are
    performed unit by unit, and if entries of the variate have been changed
    by earlier steps, this may give unexpected results.

    However, as long as you know the rules, the dynamic execution can be
    useful. For example, to produce a vector S holding the cumulated values
    X(1), X(1)+X(2), X(1)+X(2)+X(3), ... of an existing variate X, write
    (provided that no restriction are present)

        COMPUTE S=X
        EXCLUDE 1
        COMPUTE S=S+S(#-1)
        INCLUDE 1

    The exclusion of unit 1 is necessary here, because otherwise the
    reference to unit 0 would produce a missing value, and this would
    persist all the way through, producing a vector of missing values.
    Notice that it is OK to refer to S(#-1) also for #=2, the exclusion of
    unit 1 does not prevent this. The restrictions refer to the unit index
    for the vector on the left hand side.

    Another example, clearly demonstrating how and why vectors are not
    parallel when they occur as functions, follows here.

    EXAMPLE. Suppose we have some monthly data over 20 years. Let MONTH be a
    factor of length 240 on 12 levels, holding ... guess what. To create a
    variate DAYS of length 240, holding the number of days in each month,
    simply write

        DAYS=[31 28 31 30 31 30 31 31 30 31 30 31](MONTH)

    Here, MONTH is a parallel vector, the vector [31 28 ... 31] is not.


    SINGLE VECTOR ENTRIES ON THE LEFT HAND SIDE.

    To get the leap years correct in the example above, you could add a few
    statements of the form

        DAYS((1984-1981)*12+2)=29
        DAYS((1988-1981)*12+2)=29
        etc.

    (pretending here that the first month is January 1981).

    Quite generally, the left hand side in a COMPUTE command may be of the
    form

        vectorname(integer expression).

    In this case the right hand side must interpretable as a vector of
    length 1, and the result is stored in the corresponding entry of the
    vector on the left hand side. For example (provided that A and B are
    undeclared, or variates of length 1)

        VAR X 10
        COMPUTE A=3
        COMPUTE B=23.5
        COMPUTE X(Sqr(A))=B
        DEL A B

    is an extremely complicated way of setting the 9'th value of X to 23.5;
    which could also be done by the single command

        COMPUTE X(9)=23.5

    Also in this case can the vector on the left be a factor, if the
    expression on the right hand side is integer and in the range of valid
    levels.


    RULES FOR NAME CONFLICTS.

    Names of vectors may be ISUW function names, but in this case the
    corresponding functions are no longer available. For example, If you
    declare a variate named EXP you can no longer use the exponential
    function, because e.g. exp(1.3) will be interpreted as the (missing)
    1.3'rd value of the variate EXP. Similarly, if you declare vectors named
    RANDOM, MEAN, SUM, ... these functions can no longer be used.


    THE CONSTANT PI.

    The constant PI=3.14159.. can be constructed (if 3.14159 is not good
    enough) by

        COMPUTE PI=4*ARCTAN(1)


    REGISTER OVERFLOW AND STACK OVERFLOW.

    Large formulas may result in an error message reporting "register
    overflow" or "stack overflow". The reason for this is that COMPUTE is
    based on a parser procedure (formula interpreter) that calls itself, and
    also that intermediate results are stored in a limited number of
    registers. If this appears to be a problem, you will have to perform the
    computations in two or more steps. For example, computing a sum of more
    than (approximately, depending on other circumstances) 100 terms may
    result in such an error. Splitting it up as a sum of two, like

        COMPUTE X=A1+...+A50
        COMPUTE X=X+A51+...+A100

    will solve this problem (which you will hardly ever meet).

--------------------------------------------------------------------------------
    GENERATELEVELS

    Assigning levels to a factor in a systematic (cyclic) way.
................................................................................

    Syntax: GENERATELEVELS factorname lag

    Assigns cyclically varying levels to a factor. For example, if F is a
    factor on 3 levels,

        GEN F 2

    will assign levels 1 1 2 2 3 3 1 1 2 2 ... to the factor. Hence, the
    second parameter (2, in this case) determines the lag between change
    points.

    Restrictions are NOT taken into account.

    EXAMPLE. A file contains the 3 by 4 table

        1.2  1.4  1.1  1.9
        1.2  1.1  1.3  1.6
        1.1  1.4  1.2  1.3

    We can read these values into a variate of length 12 by

        VAR Y 12
        OPEN filename
        READ Y

    The two factors reflecting the two-way structure can be constructed by

        FAC ROW 12 3
        GEN ROW 4
        FAC COL 12 4
        GEN COL 1

--------------------------------------------------------------------------------
    GROUP

    Construction of a factor by interval grouping of a variate.
................................................................................

    Syntax: GROUP variatename factorname [levels [cutpoints]]

    Constructs a factor by interval grouping of an existing variate. The
    number of levels is set by the integer parameter "levels". If the
    factor is existing in advance, it must be of the same length as the
    variate, and "levels" must be the number of levels for that factor, if
    specified (or it can be specified as zero, or marked with an asterix).
    If the factor does not exist, it is automatically declared. If
    "levels" is a positive integer, it is taken as the number of levels
    for the new factor. If "levels" is specified as 0 or not specified at
    all, the new factor is automatically declared with its number of
    levels set to the rounded value of the square root of the length of
    the variate, but at most 255.

    The final parameter(s) cutpoints, if specified, must contain levels-1
    real numbers separated by blanks. These must be in increasing order,
    and they determine the cutpoints between intervals. This parameter can
    also be the name of a variate, holding the desired cutpoints. The
    length of this variate must be levels-1, and its values must be
    nonmissing and increasing. If this parameter is not specified, the
    cutpoints are placed equidistantly between MIN and MAX of the variate.

    EXAMPLE.

        GROUP AGE AGEGRP 4 20 40 60

    is equivalent to

        FACTOR AGEGRP ##(AGE) 4
        COMPUTE AGEGRP=1+(AGE>20)+(AGE>40)+(AGE>60)

    Ties are handled according to the convention "] , ]", meaning that a
    value falling exactly at a cutpoint is put in the lower of the two
    possible categories.

    Restrictions are obeyed in the following sense. If the factor exists in
    advance, it is unchanged for the hidden units. If the factor is
    automatically declared, it becomes 0 for those units. Notice that if
    cutpoints are selected automatically, they will be based on MIN and MAX
    of the present observations only.

--------------------------------------------------------------------------------
    TRANSFER

    Transfers subvector to subvector, or compresses vector by removal of
    hidden entries.
................................................................................

    Syntax: TRANSFER  name1 s1 e1  name2 s2 e2
    or      TRANSFER  name1 name2

    For the first form (with six parameters), the two names must be names
    of either factors or variates, and the four integer parameters s1, e1,
    s2 and e2, specifying the start and end of the vector segments, must
    satisfy obvious consistency requirements involving the lengths of the
    two vectors and the two subvectors.

    EXAMPLE. If X and Y are vectors of lengths (at least) 10 and 100, the
    command

        TRANSFER  X 1 10  Y 91 100

    will set Y-values equal to X-values according to the scheme

        Y(91)  = X(1)
        Y(92)  = X(2)
         ...
        Y(100) = X(10)

    Transfer of a subvector with values/levels in reversed order is
    performed when either s1 is greater than e1 or s2 is greater than e2.
    Special care should be taken in the case where name1 = name2. This is
    allowed, but the transfer takes place in the order s1 first ... e1
    last, and this may give unexpected results.

    EXAMPLE. If X is a vector of length 100, the command

        TRANSFER  X 1 99  X 2 100

    will produce a vector with the same value for all units, because the
    same value (originally X(1)) is transferred again and again. Whereas

        TRANSFER  X 99 1  X 100 2
        COMPUTE X(1)=1/0

    will produce a proper "lagged" vector (with the first value set to a
    missing value, by the last statement).

    TRANSFER with six parameters does NOT take restrictions into account.


    For the short form (two parameters), name1 must be the name of an
    existing vector. name2 should usually be specified as a valid vector
    name which is not in use, but if earlier defined it must be of length
    equal to the number of entries present in name1, and its type
    (including number of levels, if it is a factor) must coincide with the
    type of name1. The action taken is to create name2 if necessary and
    transfer the values/levels present in name1 to name2 in the obvious
    order.

    EXAMPLE. Suppose we have variates AGE and HEIGHT and a factor SEX on
    two levels, all of the same length. To create a data set MALES that
    contains only the part of data with SEX=1, and import this to our
    session, we can (as explained earlier in the description of SAVEDATA)
    do as follows (assuming no restrictions present from the beginning).

        FOCUSONLEVEL SEX 1
        SAVE MALES AGE HEIGHT
        DELETE
        GET MALES

    However, "short" vectors AGE_MEN and HEIG_MEN can also be created
    directly by

        FOCUSONLEVEL SEX 1
        TRANSFER AGE AGE_MEN
        TRANSFER HEIGHT HEIG_MEN

    If this is to be followed by some relevant operations on the new
    vectors, we should obviously proceed with

        INCLUDEALL

    since the restrictions imposed on the long vectors are not likely to
    be relevant for the short vectors.

--------------------------------------------------------------------------------
    FITLINEARNORMAL

    Regression and analysis of variance
................................................................................

    Syntax: FITLINEARNORMAL variate=modelformula[/weight]

    Fits a standard linear model for normal observations. "variate" is the
    dependent variable, and "modelformula" is a code for the linear
    expression for the mean in the model. The model formula consists of
    terms separated by plusses or blanks. Each term may be a factor, a
    variate, or a formal product of factors and variates. The special term
    '1' represents a constant term in the model (like a factor on one
    level or a variate filled with 1's). As opposed to most other
    statistics packages, ISUW requires explicit specification of the
    constant term when it should be included (and it does not have to be
    the first term).

    All vectors occuring on the right hand side must be of the same length
    as the reponse variate on the left.

    If "weight" is specified, it must be a variate of the same length. Its
    values must be positive, and they are interpreted as weights of
    observations - implying that the assumption of constant variance is
    replaced with the assumption that the variances are proportional to
    the inverse weights. In this case, the unknown proportionality factor
    takes over the role of the variance in the homoscedastic case.

    Restrictions are obeyed in the sense that the analysis is performed
    only for the data present. Missing values of variates must not occur.
    Factors in the model formula may take the level 0, but this complicates
    the interpretation and is not recommended in general. See, however,
    2-3 screenfuls below.

    EXAMPLES.

    X, Y and COUNT are variates, F and G are factors. When more than one
    model formula is given, they correspond to different parameterisations
    of the same model, provided that none of the factors take the level
    zero.

        FITLIN Y=F
        FITLIN Y=1+F            one-way analysis of variance

        FITLIN Y=1+X            ordinary linear regression

        FITLIN Y=F+G            two-way analysis of variance,
        FITLIN Y=1+F+G          additive model

        FITLIN Y=F*G            two-way analysis of variance,
        FITLIN Y=1+F+F*G        with interaction
        FITLIN Y=1+F+G+F*G

        FITLIN Y=F+F*X          a regression line for each F-level
        FITLIN Y=1+F+F*X
        or 1+F*(1+X)            (this is legal - the "distributive
                                 law" is build into the syntax)

        FITLIN Y=F+X
        FITLIN Y=1+F+X          parallel regression lines

        FITLIN Y=1+F*X          regression lines with common intercept

        FITLIN Y=F*X            regression lines through (0,0)

        FITLIN Y=1+X+X*X        second degree polynomial regression

        FITLIN Y=1+X/COUNT      linear regression with weights COUNT
                                (referring to a standard context where
                                Y holds the averages in groups of
                                observations with common X-value, COUNT
                                holding the group sizes).

    Quite generally, the rules for translation of the model formula to a
    model can be described as follows.

    The model states that the mean vector for the response variate is a
    linear combination of columns of a MODEL MATRIX (or DESIGN MATRIX), the

          (number of observations) times (number of parameters)

    matrix which is ususally called X when the model is treated
    mathematically. The model matrix can be thought of as generated from
    the model formula by the following simple algorithm. Each term of the
    model formula generates a number of columns of the matrix. A term
    which is a product of factors generates a number of columns which is
    the product of the numbers of levels for the factors. These columns
    become "dummies", i.e. 0/1-indicators for the levels of the
    corresponding product factor or cross classification. Multiplication
    of a term by a variate does not change the number of columns
    generated, but each column is multiplied (entry by entry) with the
    values of the variate. In particular, a term consisting of a single
    variate simply creates a column holding the values of the variate. The
    term 1 generates a single column filled with 1's, and is thus
    equivalent to a variate filled with 1's.

    Hence, to count the number of columns (which is also the number of
    linear parameters, including those that are set to zero due to
    overparameterisation), simply add the orders of terms in the model
    formula, where the order of a term is computed as the product of the
    numbers of levels for factors in the term (1 if no factors are involved).

    If the model matrix has linearly dependent columns (which is almost
    always the case), parameters are set to zero according to the
    following rule: Whenever a column is a linear combination of the
    preceding columns, the corresponding parameter (i.e. the coefficient
    to that column in the expression for the mean vector) is set to zero.
    In this way, a unique parameterisation is obtained. For models
    involving factors, specified with a constant term first, then main
    effects, then first order interactions etc., this results in the usual
    "corner point parameterisation" where the parameters corresponding to
    last levels or (for interactions) level combinations involving a last
    level are set to zero.

    As an implicit consequence of these rules, the level 0 for a factor
    (or a level combination for a cross classifications involving a level
    0) does not create a dummy column of the model matrix. Our general
    advice is not to use factors taking the level 0. If the level 0 is
    used at all it should be as a "missing level", and accordingly units
    with a level 0 should be excluded before FITLINEARNORMAL.

    However, level 0 as a "relevant level" can be used to avoid
    overparametrisations, if desired. For example, in a design with many
    factors on two levels, it is sometimes preferable to declare these
    factors with one level and use levels 0 and 1 as the two levels. This
    actually means that you take over the complete control of the dummies
    that form the model matrix. Multiplication of two factors on one level
    in a model formula litteraly means multiplication of the two dummies,
    etc. In this case you will have to be completely aware of what you are
    doing. In particular, you should be aware that main effects without a
    constant term, interactions without main effects etc., which can
    usually be specified without much danger of confusion if you avoid to
    use level 0, may result in meaningless models when the level 0 occurs.

    EXAMPLE. To fit a one-way ANOVA model in such a way that the parameter
    estimates are directly interpretable as expectations in the groups, a
    command of the form

        FITLIN Y=F

    can be used. However, this will only work if the factor F does not
    take the level 0, because the specification assumes E(Y)=0 for units
    with F=0. If the level 0 actually occurs, you can use

        FITLIN Y=1+F

    to avoid this. But notice that the interpretation of the estimated
    parameters will be very different from what it is when the level 0
    does not occur. Level 0 takes over the role as "baseline level", which
    is usually taken by the last level.

    Output from FITLINEARNORMAL consists of an analysis of variance table,
    giving the degrees of freedom, square sums, mean square sums,
    F-statistics and P-values for successive reduction of the model by
    removal of the terms from the model formula, beginning with the last.
    This is what SAS users call the "type I" ANOVA table. However, there
    are some differences. Firstly, the constant term will (if it is
    specified) occupy a line, just as any other model term. Accordingly,
    the last "Total" line will have the total number n of observations
    (rather than n-1) as "degrees of freedom" and the total square sum of
    the observations (rather than the square sum of deviations from the
    mean) as its "sum of squares". Secondly, the tests performed are in
    accordance with the usual rules for successive model reduction, in the
    sense that the denominators of the F-statistics are "pooled variance
    estimates", not just the variance estimate from the original model.

    As a consequence of this, the order of terms determines the model
    reductions to be tested, and also which parameters to be set to zero
    to avoid overparameterisation. For this reason, the supposedly least
    important terms (e.g. higher order interactions) should be put last in
    the model formula. For example, if in a simple regression analysis a
    line through the origin can be expected, it is more natural to write
    "Y=X+1" than "Y=1+X". If the hypothesis "intercept=0" is accepted, the
    proportionality model is fitted by "Y=X".

    The estimated variance and standard deviation of the model is also
    printed out. In addition, if the model formula has a constant term as
    its first term, the usual R-square statistic (the proportion of the
    total square sum of deviations explained by the model) is displayed.

    To list the parameter estimates and their standard deviations for the
    last model fitted by FITLINEARNORMAL, use the command LISTPARAMETERS.
    To save these estimates (and optionally their estimated standard
    deviations) in a variate use SAVEPARAMETERS. To extract fitted values,
    residuals and normed residuals, use SAVEFITTED. See also the
    descriptions of ESTIMATE (for estimation of specific contrasts) and
    SAVENORMEDRESIDUALS (for extraction of studentized residuals).
    TESTMODELCHANGE, which refers to the last two models fitted, can be
    used for test of model reductions if more than one term is removed in
    a single step.

--------------------------------------------------------------------------------
    FITLOGLINEAR

    Multiplicative Poisson models
................................................................................

    Syntax: FITLOGLINEAR variate[-offset]=modelformula

    Fits a log-linear or multiplicative Poisson model. It is wellknown how
    conditioning on the total sum or a set of marginal sums in such models
    results in models for multinomial data, which can be handled by the same
    computational methods. But in the following, we refer to the
    "independent Poisson" interpretation.

    "variate" is the dependent (non-negative integer) variable, and the
    model formula is a code for the linear expression for the logarithmised
    mean in the model, cfr. the description of FITLINEARNORMAL above.

    EXAMPLE. It should be emphasised that the following example is not
    relevant at all, it is just a fast and easy way of explaining the
    relation between model formulas and some wellknown concepts related to
    contingency tables.

    Let BIRTHS be a variate of length 24 holding the monthly counts of
    births of boys and girls in a certain area during a year. Let MONTH be
    the factor of length 24 on 12 levels holding the month number 1..12, and
    let SEX be the factor of length 24 on 2 levels holding the sex
    (boys/girls). The model fitted by

        FITLOGLIN BIRTHS=MONTH*SEX

    is the full or saturated model, where each observation has its own
    freely varying expectation. The model fitted by

        FITLOGLIN BIRTHS=MONTH+SEX

    corresponds to independence in the 12*2 table, i.e. the sex proportion
    is constant from month to month. The further reduction to

        FITLOGLIN BIRTHS=MONTH

    correponds to the assumption that the sex proportion is exactly 1:1,
    whereas

        FITLOGLIN BIRTHS=SEX

    assumes (more or less, see below) constant birth intensity over the year.
    Finally,

        FITLOGLIN BIRTHS=1

    assumes both constant sex proportion and equal distribution over months.

    An OFFSET is a given variate of values that should be added to the
    linear expression determining the mean. In the context of multiplicative
    models, the usual expectation

        E(Y) = exp(linear expression)

    becomes, with an offset variate OFFSET,

        E(Y) = exp(OFFSET)*exp(linear expression)

    EXAMPLE. The above mentioned model "BIRTHS=SEX" does not take into
    account the fact that months are unequally long. A more interesting
    hypotesis would be that the numbers of births per day are i.i.d., or
    that the expected number per month is proportional to the number of days
    in the month. This model could be fitted (assuming that the year is not
    a leap year) by

        OFFS=LN([31 28 31 30 31 30 31 31 30 31 30 31](MONTH))
        FITLOGLIN  BIRTHS-OFFS=SEX

    The output from FITLOGLINEAR includes the likelihood ratio test and
    Pearson's approximation to it for the model against the full model
    (i.e. the model where each observation has its own free parameter).
    Be careful with these tests, they are not reliable for small
    (expected) counts.

--------------------------------------------------------------------------------
    FITLOGITLINEAR

    Logit linear models for binary or binomial data.
................................................................................

    Syntax:

    FITLOGITLINEAR variate[-offset]=modelformula[/total]

    Fits a logit-linear or logistic regression model. "variate" is the
    dependent variable. The command has two different forms, depending on
    whether "total" is specified or not:

    1. BINARY RESPONSE. Here, the response variate is assumed to be binary,
       i.e. only the values 0 and 1 are allowed. The model states that these
       0-1 variables are independent, with (let us call them Y(i))

                                      exp(linear expression)
           P(Y(i)=1) =   p(i)  =  -----------------------------
                                   1 +  exp(linear expression)

       where the linear expression is determined by the model formula,
       exactly as for FITLINEARNORMAL. "total" should not be specified.

    2. BINOMIAL (FREQUENCY) RESPONSE. Here, the responses are assumed to be
       relative frequencies (notice: FREQUENCIES, not COUNTS) of the form
       Y(i)/M(i), where the Y(i) are independent, binomially distributed
       with binomial totals or indices M(i) and probability parameters
       p(i) parameterised as above. The variate "total" after the slash
       must contain the binomial totals M(i).

    Notice that model 1 is a special case of model 2, namely if "total" is a
    variate filled with 1's. Conversely, model 2 can always be regarded as a
    model derived from model 1 by sufficient reduction, namely summation of
    binary responses over covariate classes (i.e. groups in which all
    explanatory variables and factors are constant).

    The rules for translation of the model formula to a linear expression,
    the handling of overparameterisations etc. are exactly as for
    FITLINEARNORMAL, see 6-8 screenfuls above.

    EXAMPLE. See the "SIMPLE EXAMPLE" starting approximately 4 screenfuls
    below top of this file.

    The interpretation of the optional "offset variate" "offset" is exactly
    as for FITLOGLINEAR. The offset is a vector of values that are added in
    advance to the linear expression. You may think of it as a covariate in
    the model with its coefficient "frozen" at the value 1.

    In the binomial case, the output from FITLOGITLINEAR includes the
    likelihood ratio test for the model against the full model, which
    merely assumes that the responses are binomial frequencies with freely
    varying probability parameters. Pearson's approximation is not given,
    but it can be computed as the square sum of the normed residuals, and
    it also comes out if you fit the corresponding overdispersion model by
    FITNONLINEAR, see below. Be careful with these tests, they are not
    reliable when the binomial totals are small or when relative
    frequencies are close to 0 or 1.

--------------------------------------------------------------------------------
    FITNONLINEAR

    Nonlinear regression and generalised linear models with overdispersion.
................................................................................

    Syntax: FITNONLINEAR modelspec m Dm v [init [worky [workw]]]

    where "modelspec" has the form

        variate[-offset]=modelformula[/weight]

    The syntax for the model specification is exactly as for
    FITLINEARNORMAL, except that an offset variate is allowed (cfr.
    FITLOGLINEAR above). However, for obvious reasons there must not be
    any blanks in the model specification (use +, not blank, to separate
    terms), nor in the following function specifications.

    The parameters m, Dm and v are expressions for the three functions
    that define the model (closely following the notation in Tjur (1998):
    Nonlinear regression, quasi likelihood, and overdispersion in
    generalized linear models, The American Statistician 52 pp. 222-227).
    m is the mean (or inverse link) function, Dm is the first derivative
    of m (required for the numerical procedure) and v is the variance (up
    to a common scale factor and the weights) as a function of the mean.
    In these expressions, the argument is written as a period '.'. Apart
    from this, the syntax is exactly as for the right hand side of a
    COMPUTE statement, with '.' entering as a (parallel) variate of the
    relevant length.

    EXAMPLES. To fit a log-linear model with variance proportional to the
    mean (multiplicative Poisson type structure) write

    FITNONLINEAR ...  exp(.)  exp(.)  .

    To fit a linear model with a variance proportional to the squared mean
    (constant coefficient of variation) write

    FITNONLINEAR ...  .  1  sqr(.)

    To fit a logit-linear model for binomial frequencies FREQ=COUNT/M
    (including an overdispersion parameter) write

    FITNONL FREQ=.../M exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.)

    To fit a probit-linear model instead, write

    FITNONLINEAR FREQ=.../M phi(.) exp(-sqr(.)/2)/sqrt(2*3.14159) .*(1-.)

    The nonlinear regression models that can be handled by FITNONLINEAR are
    characterised as follows:

    The observations are (in principle) independent, normally distributed.
    The expectation for each observation is given as a known function (m) of
    the "linear parameter" associated with each unit. These linear
    parameters are, in turn, linear combinations of covariates specified by
    a model formula, just like the mean in a linear model. The variance is
    specified as a known function (v) of the mean, multiplied by an unknown
    "overdispersion" or squared scale parameter, common to all observations,
    optionally divided by known weights.

    The procedure estimates a nonlinear regression model of this kind by the
    method known as Iteratively Reweighted Least Squares (IRLS) or Quasi
    Likelihood. In case of a constant variance function, this reduces to
    ordinary (optionally weighted) least squares or maximum likelihood.

    In full detail, the function parameters m, Dm and v and the model
    formula determine the model as follows. The i'th observation Y(i) is
    normally distributed with mean

       E[Y(i)] = mu(i) = m( [ offset(i)+ ] eta(i) )
             = m( [ offset(i)+ ] beta(1)*x(i,1) + ... + beta(p)*x(i,p) )

    where x(i,j) are the elements of the model matrix determined by the
    model formula, and variance

        var[Y(i)] = lambda*v(mu(i))/w(i)

    where w(i) is the i'th value of the weight variate (or 1, if a weight is
    not specified) and lambda is the overdispersion (or squared scale)
    parameter. The parameters to be estimated are beta(1), ... , beta(p) and
    lambda.

    The last three command parameters INIT, WORKY and WORKW are optional.
    They can be omitted, or replaced with an asterix to be skipped.

    INIT, if specified, should be a variate of the same length as all other
    vectors involved. The values of INIT are taken as the initial linear
    parameters. This may speed things up, and in some cases (in particular
    when the mean function has a singularity at zero, like m='1/.') such
    initial values are necessary for the iterative method to get started.
    Indeed, if INIT is not specified a variate of zeroes is used.

    EXAMPLE. For a model with m(eta)=exp(eta) and variance proportional to
    the mean you could do something like this (if not for any other reason,
    then to save computing time):

    LNY=LN(Y+0.5)                 {'+0.5' can be omitted if Y has no zeroes}
    FITLINEARNORMAL LNY=1+SEX+AGE
    SAVEFITTED ETA0
    FITNONLINEAR Y=1+AGE+SEX  exp(.)  exp(.)  .  ETA0
    SAVEFITTED ETA0
    FITNONLINEAR Y=1+AGE  exp(.)  exp(.)  .  ETA0
    etc.

    In general, suitable initial values of the linear parameters can be
    produced as the fitted values in a (possibly weighted) linear regression
    where the response variate has values computed as the observations
    transformed by the inverse mean function (also called the link
    function). In later FITNONLINEAR commands with a slightly modified model
    formula (same y-variate, same offset, same weights) the estimated linear
    parameters from a fit can usually be taken as the initial values for the
    next.


    GENERALISED LINEAR MODELS WITH OVERDISPERSION.

    It is a property of the IRLS method that if Y has nonnegative integer
    values, the estimates in the above example will actually coincide with
    the maximum likelihood estimates in a multiplicative Poisson model given
    by the same model formula. Similarly, with observations FREQ=Y/M defined
    as relative frequencies, after a command of the form

      FITNONL FREQ=.../M  exp(.)/(1+exp(.))  exp(.)/sqr(1+exp(.))  .*(1-.)

    the IRLS estimates will coincide with the maximum likelihood estimates
    in the corresponding logistic regression model. Quite generally, this
    equivalence holds for any generalised linear model in the sense of
    Nelder and Wedderburn (1972, JRSS A p. 370-84). The results produced by
    FITNONLINEAR are thus, to a large extend, valid for generalised linear
    models with overdispersion as described e.g. in the book on Generalised
    Linear Models by McCullagh and Nelder (Chapman and Hall 1989). The
    standard deviations produced by LISTPARAMETERS are corrected for
    overdispersion, and the tests for beta(j)=0 are based on the relevant
    T-distribution. The F-tests in the analysis of variance table can be
    regarded as second order approximations to the usual likelihood ratio
    tests based on the Chi-square approximation, with correction for
    overdispersion and (using F rather than Chi square) for random variation
    of the estimate of the overdispersion parameter.

    According to the quasi likelihood interpretation (Wedderburn, Biometrika
    1974, p. 439-447), the IRLS method is in fact valid quite generally for
    "distribution free" models specified by their first and second moments
    only. However, standard central-limit-theorem-type assumptions are, of
    course, required for F-tests, Chi-square tests etc. to be asymptotically
    valid.

    The computations are performed iteratively. Each iteration calls (with
    output suppressed) the StatUnit procedure FitLinearNormal with a
    dependent variable WORKY and a weight variate WORKW computed from the
    results of the previous iteration as

        WORKY = (Y-m(FITTED))/Dm(FITTED) + FITTED  [-OFFSET]
        WORKW = W*sqr(Dm(FITTED))/v(m(FITTED))

    (Y is the original dependent variate, W the original weight vector,
    FITTED the variate of fitted values from the previous fit, including the
    offset, if specified).

    The iterations are stopped when the weighted square sum of changes,
    defined as the sum of the quantities

        W*sqr(m(NEWFITTED)-m(LASTFITTED))/v(m(LASTFITTED))

    is less than (1.0E-12)*ModelVariance*ResDF

    where ModelVariance is the present estimate of the overdispersion
    parameter, ResDf its degrees of freedom.

    A final iteration is performed resulting in an analysis of variance
    table with approximate F-tests for removal of terms (beginning with
    the last, as usual). Notice that a new fit after removal of
    insignificant terms from the bottom of the table will result in square
    sums for the remaining terms that are slightly changed, as opposed to
    what happens in the case of a linear model with constant variance
    function.

    After the ANOVA table FITLINEARNORMAL prints the estimate of the
    overdispersion parameter and its square root, and finally FITNONLINEAR
    adds the Chi-square test for "no overdispersion" (i.e. overdispersion
    parameter = 1), which coincides with Pearsons goodness-of-fit test in
    the multiplicative Poisson and logistic regression type situations
    mentioned above. Notice that the test for "no overdispersion" is
    irrelevant in proper nonlinear regression situtations, where the value
    1 of the scale parameter plays no particular role. And also in the
    case of a logistic regression model for binary responses.

    If nothing else is specified, the variates referred to as WORKY and
    WORKW above are saved under the names #NL_WY and #NL_WW. Existing
    variates of these names will be deleted. These variates are saved
    because they are used by SAVEFITTED. However, you can specify other
    names for them as the last two parameters to FITNONLINEAR.

    After a call to FITNONLINEAR, ISUW commands referring to the last
    model fit refer to the call of FITLINEARNORMAL in the final iteration.
    This implies that

        LISTPARAMETERS

    will return the IRLS estimates, with standard deviations and T-tests
    based on the approximating linear model.

        SAVEFITTED FIT0 RES0 NRES0

    will result in the following:

    FIT0 will contain the estimated linear parameters. To produce estimated
    means, transform to FIT = m( [ OFFSET + ] FIT0 ).

    RES0 will contain the quantities (Y-FIT)/Dm( [ OFFSET + ] FIT0 ). To
    obtain residuals in the usual sense (observations minus estimated means)
    multiply by Dm( [ OFFSET + ] FIT0 ), or compute more directly (with FIT
    computed as above) RES=Y-FIT.

    NRES0 will actually contain the correct normed residuals (residuals
    divided by their estimated standard deviations).

    Notice that you have also, at your disposal after a model fit, the
    variate of "linearised observations" (default name #NL_WY) and the
    workweights for the last fit (default name #NL_WW).

    The class of models that can be handled by FITNONLINEAR is acutally
    broader than indicated above, because variates may occur in the
    formulas for m, Dm and v. This means that some kinds of unit-specific
    mean and variance functions are allowed. As a trivial example (which
    does obviously not extend the class), notice that the command

        FITNONLINEAR  Y=1+AGE+SEX  OFFS+.  1  1/W

    will perform exactly the same as

        FITNONLINEAR  Y-OFFS=1+AGE+SEX/W  .  1  1

    which, in turn, could be obtained simply by

        Y0=Y-OFFS
        FITLINEARNORMAL  Y0=1+AGE+SEX/W

    In this sense, FITNONLINEAR's conventions for offsets and weights are
    unnecessary, because they can be built into the functions.

    Restrictions are obeyed in the obvious way.

--------------------------------------------------------------------------------
    FITCOXMODEL

    Proportional hazards models for survival data by Cox's partial
    likelihood.
................................................................................

    Syntax:

    FITCOXMODEL exittime[*deathind][-entrytime][/stratum]=modelformula

    exittime must be a variate holding the times of death/censoring, and
    this must come first on the left hand side. The order of the three
    optional specifications deathind (indicator of the event death, as
    opposed to censoring), entrytime (times of left truncation) and
    stratum (a factor dividing individuals into groups with common
    underlying intensity) is irrelevant, they are identified by the
    preceeding characters ( * , - or / ).

    "deathind" must be specified in case of right censoring. It must be a
    factor on a single level and of length equal to the length of exittime.
    It is interpreted as an indicator of the event death. Thus, censored
    individuals should have the level of this factor equal to zero.

    entrytime should hold the times of entrance when survival times are left
    truncated. For each individual, the time under observation must be
    positive, i.e. entrytime < exittime (sharply).

    "stratum", if specified, must be a factor of the same length as exittime.
    This specification means that each stratum has its own underlying
    (unknown) intensity. Only the parameters of interest (given by the right
    hand side of the model specification) are common to the strata. A
    typical example is stratification by SEX, which is often required. If
    the same factor occurs also on the right hand side in interaction with
    everything else, the model is equivalent to a separate Cox model for
    each stratum.

    "modelformula" must be a model formula, involving variates and factors of
    the same length as exittime. Notice that a constant term 1 should not be
    specified, because a constant factor on the intensity is absorbed by the
    unknown underlying intensity. In the expression for Cox's likelihood, a
    constant factor simply cancels out.


    The model considered is given by the following expression for the death
    intensity of an individual:

    DeathIntensity(time)=lambda0(time)*exp(linear expression in covariates)

    Here, lambda0 is the "underlying intensity", or the intensity for an
    individual with all covariates = 0. The linear expression involves
    individual specific information like (in a medical context) sex, age,
    weight, smoking habits, treatment or whatever, with unknown parameters
    as coefficients, just as in a multiple regression model or any
    other generalised linear model.

    Cox's partial likelihood for the parameters of interest (based on the
    order in which events took place, disregarding the actual times where
    they took place) can be written as the product over all dead individuals
    of the fractions

                    exp(linear expression for dead individual)
             -------------------------------------------------------
             Sum over all indivials at risk at that time of exp(...)

    where ... stands for the linear expression for the single indivial in
    the set of individuals under risk at the time of the death. "Under risk"
    means present at that time (entrytime < t <= exittime) and in the same
    stratum as the one who died.

    Ties (coincident times of death/censoring) are handled according to
    Breslow's method, which is to include all individuals under risk just
    before a death in the risk set for that event. For coincident deaths
    this means that the individuals occur in each other's risk sets. This
    may seem artificial, but at least it gives a non-arbitrary correction
    for ties, which is acceptable in case of only few ties.

    Time dependent covariates can not be handled, with the following
    exception: If you have one or a few time dependent but PIECEWISE
    CONSTANT covariates, these can be handled as follows. Whenever a
    covariate shifts its value, let the corresponding individual "change its
    identity", i.e. remove it by rigth censoring and introduce a new (with
    the new values of the covariate) by entrance (left truncation) at the
    same time point.

    This may seem restrictive, but in principle any time dependent covariate
    can be handled in this way, because what matters is only the values of
    covariates at the finitely many timepoints where a death takes place.
    The problem is the construction of the data set, and the fact that this
    data set may become very long.

    Restrictions are taken into account in the obvious way. Missing values
    of variates for units present must not occur. Levels 0 for factors are
    treated as usual (no dummy generated, take care that main effects are
    present when interactions are specified etc.). The level 0 of the factor
    stratum is treated as any other level, if it occurs.

    In rarely occuring situations, involving few individuals and/or many
    covariates, the ouput will contain the following warning:

        Iterations stopped, fitted values are out
        of range - probably because deterministic
        model fits.

    This happens when a covariate or a linear combination of such is a
    monotone function of the time of death (or in situations where this is
    close to the truth). In this case, the maximum-likelihood estimate of
    the parameters does not exist, unless the values plus/minus infinity are
    allowed. Iterations are stopped due to numerical problems. The results
    are not reliable (and hardly of any interest either) in this case.

    After FITCOXMODEL, the command SAVEFITTED can be used, but only for
    storage of fitted values, i.e. the quantities referred to as "linear
    expressions" above. Residuals and normed residuals are not defined. The
    commands LISTPARAMETERS, ESTIMATE, SAVEPARAMETERS etc. will work in the
    obvious way. Similarly, TESTMODELCHANGE can be used after fit of a model
    and a reduced model. Notice that a model reduction here means removal of
    one or more terms from the model formula. Removal of a stratifying
    factor, for example, can not be tested in this way.


    ESTIMATION OF THE INTEGRATED UNDERLYING INTENSITY.

    This can be done simultaneously with the fit. The syntax for this is to
    add a final variate name to the model formula, preceeded by a slash / ,
    like

        FITCOXMODEL EXITTIME*DEAD-ENTRYTIM/SEX=AGE+TREATMT/INTINT

    which will imply that the usual estimate of the integrated underlying
    intensity is saved in a variate named INTINT. The name specified after
    the slash must be a valid variate name, and if this variate is already
    declared it must be of the same length as all other vectors in play.

    The resulting variate will have missing values in all entries, except
    those correponding to individuals that are "present" and dead. In a plot
    of INTINT against EXITTIME, these are the upper left breakpoints of the
    broken line, when the function is drawn as a step function in the usual
    way. To produce the usual plot, connect consecutive points by a
    horizontal line (from left to right) and a vertical line (upwards).

    In case of a stratified model, the integrated intensities are estimated
    correctly, but notice that a point plot of this variate against exittime
    will be rather confusing, unless different colors or symbols are used
    for the strata, or all but one stratum are excluded.

    Notice that a fit of a Cox model without covariates, like

        FITCOXMODEL EXITTIME*DEAD-ENTRYTIM=/INTINT

    (empty model formula) will result in the usual nonparametric estimate of
    the integrated intensity for a homogenouos sample of individuals. A good
    approximation to the non-parametric Kaplan-Meier estimate of the
    survival function (i.e. one minus the c.d.f. of the survival time
    distribution) can then be computed as

        KM=exp(-INTINT)

    From a plot of KM against EXITTIME, the usual plot of the Kaplan-Meier
    estimate is obtained when consecutive points (beginning with (0,1)) are
    connected by a horizontal line (from left to right) and a vertical line
    (downwards).

    The estimated standard deviation of the estimated baseline intensity can
    also be computed. The syntax for this is to add an extra variate name
    after a plus sign, like

         FITCOXMODEL EXITTIME*DEAD-ENTRYTIM/SEX=AGE+TREATMT/INTINT+IISD

    After this, pointwise confidence limits for the estimated baseline
    intensity can be computed by

        UPPER=INTINT+1.96*IISD
        LOWER=INTINT-1.96*IISD

    or better (by log-transformation)

        UPPER=EXP( LN(INTINT)+1.96*IISD/INTINT )
        LOWER=EXP( LN(INTINT)-1.96*IISD/INTINT )

    CONSTRAINT. Since the Pascal code for this command is taken over
    directly from the DOS version ISU, the length of the vectors entering in
    the model specification must not exceed 16379. Hopefully to be changed
    in the future, but not a matter of highest priority.

--------------------------------------------------------------------------------
    FITMCLOGIT
    FITMCPROBIT
    FITMCCLOGLOG

    P. McCullagh's model for ordered qualitative responses.
................................................................................

    Syntax:  FITMC... response=modelformula

    where the left hand side "response" is either

          (1)  A factor on k levels, holding the responses,
    or
          (2)  A list  C1 C2 ... Ck  of k variates, holding the
               multinomial counts of individuals in the k response groups.

    In the first case, the "unit by unit" (no aggregation) case, "response"
    must be the name of a factor on k levels, where k is the number of
    possible (ordered) responses. The levels of this factor must be in the
    range 1..k (no zeroes). The length n of this factor (the number of
    units) must coincide with the lengths of all vectors occurring in the
    model formula.

    In the second case, C1, C2, ... Ck must be names of variates, containing
    the multinomial counts of the k responses in n "covariate groups". Only
    in this case will the command compute goodness of fit statistics and
    fitted values. This representation is only relevant when the model
    considered is specified by factors or covariates with several units for
    each combination of factor levels / variate values. The common length of
    C1..Ck must coincide with the lengths of all vectors occurring in the
    model formula.

    The model formula after the equality sign MUST contain a constant term 1
    as its first term, since only k-2 cutpoint parameters are used. Thus,
    for k=2, no cutpoint parameters are defined, and the model is equivalent
    to a logit-linear, probit-linear or cloglog-linear model for binary
    data. For k>2, the first parameter CONSTANT represents the cutpoint
    between level k-1 and k, and the cutpoints THR[1]..THR[k-2] are
    negative, representing the differences between cutpoints 1, 2 ... k-2
    and cutpoint k-1.

    Mathematically, the model can be stated as follows. The probability that
    an individual responds r (=1..k) is

      F(CUTPT[r] + linear expression ) - F(CUTPT[r-1] + linear expression )

    (subsuming CUTPT[0]=minus infinity, CUTPT[k]=plus infinity). Hence, the
    response can be regarded as a discretised version of a non-observable
    continuous variable X with c.d.f. F(linear expression + x). In other
    words, the model is really a linear position parameter model with error
    distribution F, but the observations are only available in "rounded"
    form, where the "rounding" is a grouping in k intervals with unknown
    cutpoints. In the parameterisation chosen here, these cutpoints are

    CONSTANT+THR[1] < CONSTANT+THR[2] < ... < CONSTANT+THR[k-2] < CONSTANT

    The "linear expression" is determined by the model formula as usual.

    The three models available in ISUW are defined by their c.d.f.'s as
    follows:

        COMMAND        F(x)                       distribution  inverse F

        FITMCLOGIT     F(x) = exp(x)/(1+exp(x))   Logistic      logit
        FITMCPROBIT    F(x) = PHI(x)              Normal        probit
        FITMCCLOGLOG   F(x) = 1-exp(-exp(x))      Gompertz      cloglog

    (cloglog is short for "complementary log log").

    For F(x) = exp(x)/(1+exp(x)) (the logistic c.d.f.) and k=2, the usual
    logit-linear model for binary or binomial data comes out of it. For
    k>2, this model becomes a "union" of several such logistic models, in
    the sense that the marginal models obtained by dichotomisations of the
    ordered scale are ordinary logistic regression models for binary data.
    But the point is that the interesting parameters (the coefficients to
    covariates in the linear expression) are common to these marginal
    models; only the constant term (the cutpoint) does, of course, depend
    on the choice of dichotomisation.

    EXAMPLE. Consider a classical dose-response situation, where doses of
    some drug are given to animals. A standard model for analysis of this
    situation states that the probability of death for an animal depends
    logit-linearly on log(dose). A similar model could be used to describe
    dose-dependence of an event like "death or serious damage". This
    corresponds to the two possible dichotomisations of the ordered
    three-point scale

        1: No effect
        2: Seriously damaged, but not dead
        3: Dead

    The Logistic McCullagh model for the full three-level response, which
    can be fitted by a command of the form

        FITMCLOGIT RESP=1+LOGDOSE

    (RESP a factor on three levels) can be regarded as a way of
    incorporating both binary models in one, subsuming that the
    interesting parameters (here the slope, i.e. the coefficient to
    log(dose)) is the same in the two models.

    Notice that FITMCPROBIT can be used to fit classical probit linear
    models for binary data (k=2). Similarly FITMCLOGIT can be used to fit
    logistic regression models for binary data - but FITLOGITLINEAR is
    much faster. FITMCCLOGLOG is useful in survival analysis, where this
    model comes out when survival times in a proportional hazards model
    are grouped.

    After any of the FITMC... commands, the command SAVEFITTED can be
    used, but only for storage of fitted values, i.e. the quantities
    referred to as "linear expressions" above. Residuals and normed
    residuals are not defined (or, rather, if they were they should be
    variates of length n*k, not n). The commands LISTPARAMETERS, ESTIMATE,
    SAVEPARAMETERS etc. will work in the obvious way. Similarly,
    TESTMODELCHANGE can be used after fit of a model and a reduced model.

    CONSTRAINT. Each response must occur at least once. If this is not the
    case, the response scale must be reduced by collapse of neighbour
    levels.

--------------------------------------------------------------------------------
    FITCLOGIT

    Conditional logistic regression.
................................................................................

    Syntax:

    For binary data:

    FITCLOGIT response/groups=modelformula

    For binomial data:

    FITCLOGIT response/groups=modelformula/totals

    Consider a logit linear model of the following form. A binary response y
    (values 0 and 1) is assumed to have independent elements with

             logit( P(y=1) ) = a(g) + general linear expression.

    In principle, this is an ordinary logistic regression model, but the
    factor level g (for group) has a particular role in the following. It is
    assumed to represent a classification of units in groups, for which the
    parameters a(g) are regarded as nuissance parameters. In analogy with
    wellknown variance analysis concepts, we may think of the groups as
    blocks, and the conditional analysis performed is similar to the
    intra-block analysis.

    Let S(g) denote the sum of the responses y(i) over all units in group g.
    It is easy to prove that the model obtained by conditioning on these
    group sums has a likelihood which does not depend on the nuissance
    parameters a(g). This is an exclusive property of the logit-linear
    model, which is not shared e.g. by probit-linear models or other
    generalised linear models for binary data.

    An important special case occurs in connection with case-control
    studies, where each case (response y=1) is matched with a given number
    of controls (y=0). The controls are selected at random from a large
    population in such a way that they match the case with respect to
    characteristics that are not to be analyzed in the context (age, sex,
    ...). Considering the grouping into case-control groups, and
    conditioning on the corresponding sums (which are all 1), we obtain a
    model where the group parameters (and, in turn, all parameters
    representing effects of matched factors) disappear.

    "response" must be the name of a variate, containing the binary
    responses (0/1). In case of binomial data (i.e. when the binary
    responses are aggregated in covariate groups within the "conditioning
    groups"), response must contain the relative frequencies, and the name
    of the variate holding the binomial totals must be given after a slash
    after the model formula.

    "groups" must be the name of a factor or variate of the same length as
    response. The levels/values of this must be increasing, and they
    determine the groups. Typically, this vector could hold the group
    numbers. In a case-control data set, where the case occurs first in
    each group, these values can be obtained by cumulation of the
    reponses. But the values are actually irrelevant, only their order
    matters. A positive increase means that a new group begins, a proper
    decrease must never occur.

    Variates and factors in the model formula must be of the same length
    as response and groups. The string "modelformula" has exactly the same
    form and interpretation as in a call to FITLOGITLINEAR, except that
    the conditioning factor must be given after a slash before the
    equality sign, and an offset is not allowed. A typical call in
    connection with case-control studies might look like

        FITCLOGIT CASE_IND/CCGROUP=SOCGR+EXPOSURE

    Here, CASE_IND is a variate with the value 1 for cases, 0 for controls.
    CCGROUP is a variate or factor with increasing values/levels, holding
    the number of the case control group. SOCGRP could be a factor,
    containing some information about social covariates, and EXPOSURE could
    be a covariate holding the exposures for some suspected toxic matter or
    whatever. Usually relevant effects like AGE, SEX, COHORT etc. can be
    left out, if taken into account by the matching. If they are included in
    the model formula, the corresponding parameters will be set to zero, and
    so will the intercept parameter corresponding to a constant term 1, if
    included.

    Groups, in which all units respond 1, or all units respond 0, will
    obviously not contribute to the likelihood.

    Restrictions are taken into account in the obvious way. Units excluded
    are simply excluded from their group, and if this results in a "trivial
    group" in the above sense, the entire group is ignored. However, the
    grouping factor (or variate) GROUP must still be sorted, also as regards
    the missing values.

    After a call to FITCLOGIT, SAVEFITTED can be used for storage of fitted
    values (but not residuals and normed residuals). However, these fitted
    values are not very useful in themselves, because they do not contain
    the contributions of the conditioning factor or effects confounded with
    it. Exact fitted values can not be easily obtained (cfr. the problems
    with computation of a mean in a non-central hypergeometric
    distribution), but a good approximation can be obtained as follows. Use

        SAVEFITTED FIT

    and after this, fit a logit linear model (unconditional) with the
    conditioning group factor as the only explanatory variable and with the
    variate FIT as offset, like

        FITLOGIT Y-FIT=GROUP/M
        SAVEFITTED NEWFIT
        NEWFIT=M*NEWFIT

    After this, NEWFIT will contain a good approximation to the expectations
    of the binomial observations under the estimated model.

    CONSTRAINT. The number of positive responses in a group must not exceed
    255.

--------------------------------------------------------------------------------
    FITCRASCH

    Conditional estimation in the Rasch model.
................................................................................

    Syntax:  FITCRASCH response/sgsizes=modelformula


    In its simplest form, the model considered here is the logit additive
    model for a two-way table of binary responses (referred to as the full
    Rasch model below), stating that the probability of a positive response
    is additive on the logit scale. That is, if y(row,col) denotes the
    binary response in the (row,col)'th cell,

        logit( P( y(row,col) = 1 ) )  =  alpha(row) + beta(col).

    The usual maximum likelihood estimates for column parameters are known
    to have bad asymptotic properties when the number of rows (and thereby
    the total number of parameters) tends to infinity. This problem
    disappears when the conditional estimates, given the row sums, are used
    instead.

    More generally, we are considering a special case of the conditional
    logistic regression model (see above) defined by the following two
    properties concerning the groups defined by the conditioning factor
    (here the rows of our table):

        (1) The groups are equally sized

        (2) All covariates are functions of the internal unit number in
            the group.

    Or, in other words: Our vector of binary responses can be set up in a
    two-way table such that the sums conditioned on are the row sums, and
    such that the covariates in the logistic model (disregarding those that
    are "conditioned away") are columnwise constant.

    In particular, the largest model satisfying this is the "full Rasch
    model", the model with a free parameter for each column.

    In this context, rows are often regarded as subjects and columns as
    items. The idea is that each subject responds 0 or 1 to each item, and
    the conditioning taking place here is on subject sums or subject
    "scores".

    The command FITCLOGIT is unnecessarily slow for estimation of these
    models, because many quantities that need only be computed once in each
    iteration will be computed once for each subject in each iteration.

    Under the above assumptions, the conditional likelihood turns out to
    depend only on the item totals (i.e. sums of responses for each item)
    and the sizes of the score groups, i.e. the number of subjects with
    1,2,...,k "correct" answers. In the command (see syntax description
    above)

    "response" is a variate of length k, holding the item totals.

    "sgsizes" is a variate of the same length k, holding the sizes of
    score groups. Its first value must be the number of subjects with
    score 1, its second value the number of subjects with score 2, etc.
    The number of individuals with score 0, i.e. no correct answers, is
    left out because it is irrelevant. In fact, the last value (the number
    of subjects with all answers correct) is also irrelevant, but it is
    included for cosmetic reasons (to keep all variates of the same
    length) and to allow for the check of consistency mentioned below.

    The model formula is, in the simplest case (the full Rasch model) a
    factor of the same length k with distinct levels 1,...,k. But it may
    also take the more general form of a model formula in variates/factors
    of length k.

    The command takes restrictions into account in the following sense: If a
    unit is missing, it means that the corresponding item is left out. Think
    of a table where a column is deleted. However, this will change the row
    sums, and accordingly one will have to change the score group sizes.
    Entries in the vector of score group sizes are never, in any sense,
    regarded as missing. This means that you can not remove an item from the
    analysis merely by excluding the unit number, you must modify the
    variate of score group sizes accordingly.

    In practice, a more relevant kind of restrictions has to do with fit
    of the Rasch model to a subset of the set of subjects. Notice that
    this can be done here, but it has nothing to do with restrictions on
    the formal set of units, which is the set of items. The values of
    "response" and "sgsizes" are sums over the set of subjects, and any
    change of the set of subjects can be performed by accordingly changing
    the values of these variates.

    EXAMPLE. Consider the table

                  item    1  2  3   sum

           subject   1    1  1  0     2
                     2    0  1  0     1
                     3    0  1  0     1
                     4    1  1  0     2
                     5    1  1  0     2
                     6    0  0  1     1
                     7    0  0  1     1
                     8    1  1  1     3
                     9    0  0  0     0
                    10    1  1  0     2

                 sum      5  7  3


    Let ITEM be factor of length 3 with three levels, SUM and SG_SIZE
    variates of length 3 with values given by

        ITEM     SUM   SG_SIZE
           1       5         4
           2       7         4
           3       3         1

    Then

        FITCRASCH SUM / SG_SIZE = ITEM

    will fit a full Rasch model.

    There is an obvious check of consistency, which in the above example
    takes the form

      Total sum of responses =  5 + 7 + 3  =  1*0 + 4*1 + 4*2 + 1*3

    This check is performed by FITCRASCH, and the command is interrupted
    with an error message if the check fails.

    CONSTRAINT. The number of items must not exceed 255.

--------------------------------------------------------------------------------
    FITNEGBIN

    Estimation in log linear models for negatively binomially
    distributed counts.
................................................................................

    Syntax: FITNEGBIN response=modelformula[/weight] [initalpha]

    The negative binomial distribution is the distribution on {0,1,2,...}
    with point probabilites
                                           a   y
          P(Y=y)  =  (a+y-1 over y)   (1-p)   p         ( a > 0 ).

    (with an obvious notation '( ... over ... )' for binomial coefficients).
    For integer values of the parameter a (called ALPHA in procedure output)
    this is the distribution of the waiting time to (or rather, the number
    of non-succesful outcomes before) the a'th success in a sequence of
    independent identical binary experiments with probability 1-p of
    "success". For arbitrary a>0, the negative binomial distribution can be
    characterised as a mixture of Poisson distributions with respect to a
    Gamma distribution in the following sense. If Y is Poisson distributed
    with a random paramater lambda, which is drawn from a Gamma distribution
    with form parameter a and scale parameter b, then the resulting
    distribution of Y is a negative binomial with "a=a" (the form parameter
    of the Gamma distribution takes the role of the a in the negative
    binomial) and p=b/(1+b).

    The last interpretation justifies the use of negative binomial models in
    situations where the usual log linear Poisson models fail due to
    overdispersion. The mean in the negative binomial distribution is
    m=a*p/(1-p), the variance m*(1+m/a) > m.

    The simplest kind of models (without a "weight") that can be estimated
    by the command FITNEGBIN can be characterised as follows. Each
    (nonnegative integer) observation y[i] has a negative binomial
    distribution. The parameter a is common to all observations, the
    parameter p = p[i] depends logit-linearly on background
    variates/factors, specified as usual by a model formula. Thus, the mean
    of the i'th observation becomes

             E(Y[i]) = a * exp( par1*X(i,1) + par2*X(i,2) + ... ) ,

    which is of the usual log-linear form. This kind of models are fitted by
    commands like (in case of a model with a constant term and a single
    covariate)

        FITNEGBIN Y=1+X


    "WEIGHTED" MODELS.

    A useful generalisation is the following. Rather than assuming the
    parameter a to be the same for all units, we assume that there are
    parameters a[i] = a*n[i], proportional to a given variate n. This is
    typically the case if each y[i] has come out by summation of i.i.d.
    counts for n[i] individuals. If these counts (on a nonobservable
    "micro-level") follow a model of the simpler kind described above, a
    model with such proportional parameters a[i]=a*n[i] comes out of it. The
    interpretation of the unknown parameter ALPHA (=a) is exactly as before
    (but on the micro-level). Since the variate n here has a role very
    similar to a weight variate in a generalised linear model, the syntax
    for this has been chosen such that the "weight" variate should follow
    the model formula, separated by a slash, like

        FITNEGBIN Y=1+X/N

    Notice, however, that the variate N here is not merely a weight occuring
    directly in the summation of the log likelihood function. Moreover, this
    kind of "aggregation" over "micro-units" is not quite as simple as the
    corresponding concept in the log-linear Poisson models. In the Poisson
    case, this is simply a sufficient reduction. For the present kind of
    models, the aggregation is not a sufficient reduction, but the marginal
    model for the aggregated data set is a similar model, due to the
    convolution property of the negative binomial distribution.

    Sometimes the iterative estimation procedure will be interrupted by an
    error message stating that the information matrix is not positively
    semi-definite. Typically, this happens if ALPHA has become too large.
    The default action of FITNEGBIN is to take ALPHA=1 as the starting
    value. If this problem appears give an initial value of ALPHA as the
    last parameter, like

        FITNEGBIN Y=1+X/N 0.01

    An initial value of ALPHA can also help if ALPHA becomes negative
    during the iterative maximisation (which results in an error message).
    If there is no overdispersion at all, convergence may fail. The result
    should not be trusted in this case, use a Poisson model (FITLOGLINEAR)
    instead. Or, if there is some overdispersion, use FITNONLINEAR (see
    one screenful below).

    Restrictions are taken into account in the obvious way. Missing values
    of variates among the units present must not occur.


    WARNINGS, LIMITATIONS.

    Do not rely too much on the standard deviation estimated for the
    parameter ALPHA (called a above). The log likelihood is not well
    approximated by a quadratic function in this parameter. Moreover, the
    test for ALPHA=0 is irrelevant. The log linear Poisson model corresponds
    to the value plus infinity of ALPHA. To test for ALPHA = plus infinity
    (no overdispersion), fit the corresponding log linear Poisson model
    (with log of the "weights", if present, as an offset) and look at the
    test against the full model. Do not use the negative binomial model when
    overdispersion is not present - what happens is simply that the estimate
    of ALPHA becomes very large, so that the model is almost a Poisson model.

    The computation of the log likelihood and its derivatives involves a
    summation from 1 to y for each unit. Thus, for large observations (e.g.
    y > 1000) the procedure is slow. An alternative (if you insist on a
    variance function of the same shape as for the negative binomial) is to
    use

        FITNONLINEAR modelformula exp(.) exp(.) .*(1+./a)

    trying with different values of a until the residual plot is OK. You can
    also handle the weigted case by FITNONLINEAR (use variance function
    .*(1+./(a*n)), where n is the variate of "weights").

--------------------------------------------------------------------------------
    LISTPARAMETERS

    Lists parameter estimates and their estimated standard deviations.
................................................................................

    No parameters.

    After any model fit command except FITANOVA, this command lists the
    parameter estimates, their estimated standard deviations and the T- or
    approximate U-tests for hypotheses of the form "parameter=0".

    There is one parameter for each column of the model matrix, but some
    of these are usually set to zero due to overparameterisations (see the
    description of FITLINEARNORMAL). For this reason, these tests, given
    by the two last columns produced by LISTPARAMETERS, should be
    interpreted with some care.

    EXAMPLE. In a simple one-way situation, after

        FITLIN Y=F
        LISTP

    the T-tests reported are for hypotheses of the form "mean in group f
    equals zero", which is rarely relevant. After fit of the same model
    (over-) parameterised by

        FITLIN Y=1+F
        LISTP

    the T-tests reported are correspond to pairwise comparisons of each
    F-level with the last F-level (which is sometimes relevant) - except
    for the first line labelled "CONSTANT", which returns the test for
    "mean in last group equals zero" (!).

    Notice that the last form "Y=1+F" is the relevant one, if the test for
    "no effect of F" is to be derivable from the analysis of variance table
    (for FITLINEARNORMAL and FITNONLINEAR only).

--------------------------------------------------------------------------------
    TESTMODELCHANGE

    Likelihood ratio test for reduction/extension given by the two last
    models fitted.
................................................................................

    Syntax: TESTMODELCHANGE [ test-statistic [p-value]]

    For log-linear, logit-linear models and many other models, for which an
    approximate chi-square test on the likelihood-ratio test statistic is
    appropriate, TESTMODELCHANGE performs a simple subtraction of the
    log-likelihoods from the two latest model fits and a similar computation
    of the change in number of parameters, and computes the likelihood-ratio
    statistic and the relevant tail probability in the approximating
    Chi-square distribution.

    For linear normal models fitted by FITLINEARNORMAL, the relevant F-test
    is performed.

    For nonlinear models fitted by FITNONLINEAR, the approximate F-test,
    based on the weighted residual sums of squares from the last two
    models, is performed. Notice that this is not always reliable, since
    the weights may have changed if there is a non-constant variance
    function involved. The tests given in the approximate ANOVA table are
    probably more reliable.

    If the first parameter "test-statistic" is specified it must be the
    name of a vector of length 1 or a valid vector name which is not in
    use. In the latter case it becomes a vector of length 1, and the value
    of the (chi-square or F) test statistic is stored in it. Similarly, if
    a second parameter is specified, the P-value (tail probability) of the
    test is stored in this. To keep only the P-value but not the test
    statistic, write the first parameter as an asterix.

    The command does not work after FITANOVA.

    WARNING. It is a requirement - and mainly your own responsibility - that
    the test makes sense. In particular, the two last models fitted must be
    of the same type (log-linear, logistic or whatever), the number of
    observations must be the same (no restrictions changed in between,
    weights and offsets unchanged). For non-linear models, the three
    functions specifying such a model must be the same for the last two
    models, for Cox models the stratifying factor must not have changed,
    etc. etc. In brief, one of the two last models fitted must be a submodel
    of the other.

--------------------------------------------------------------------------------
    SAVEFITTED

    Computes fitted values, residuals and normed residuals after a model fit
    command.
................................................................................

    Syntax: SAVEFITTED fitted [residuals [normedresiduals]]

    After FITLINEARNORMAL, FITLOGLINEAR, FITLOGITLINEAR and FITNEGBIN, this
    command saves

       -  Fitted values ( = estimated means of observations)

       -  Residuals ( = differences between observations and fitted values)

       -  Normed residuals ( = residuals divided by estimated standard
                               deviations)

    The three parameters must be unused vector names or names of variates of
    length equal to the number of units in the last model fit directive. An
    asterix * or a pseudoblank | (or simply omittance, for the parameters
    coming last) means that the variate is not to be computed.

    EXAMPLE.

        SAVEF * RES

    saves residuals in RES (and creates RES, if required), but fitted values
    and normed residuals are not computed.

    Normed residuals are computed without correction for the fact that the
    corresponding observation contributes to the estimation of parameters.
    Thus, normed residuals are typically less dispersed than i.i.d.
    observations from a normalised normal. For normal linear models, this is
    emphasised by the fact that the square sum of the normed residuals will
    always equal the number of observations minus the number of parameters
    estimated. However, this means that an extremely large normed residual
    can be taken as a (conservative but) safe indication of an outlying
    observation. For more exact outlier detection in the normal linear
    case, use SAVENORMEDRESIDUALS.

    Restrictions are not obeyed by SAVEFITTED, except in the obvious sense
    that if the last model fit was made under restrictions, the parameter
    estimates used by SAVEFITTED will be influenced by this. But the
    fitted values (and also residuals and normed residuals) are computed
    for non-present observations in exactly the same way as they are for
    the present obvservations. This actually means that the fitted values
    corresponding to observations that were not present when the model was
    fitted can be regarded as predictions of these "new" observations.
    This makes the command SAVEFITTED useful for many different purposes,
    such as cross-validation, replacement of missing observations and
    prediction in time series.

    SAVEFITTED can also be used after FITCOXMODEL, FITMC... , FITCLOGIT and
    FITCRASCH, but only for storage of fitted values (residuals and normed
    residuals are not defined). Here, "fitted value" means "estimated linear
    expression", not "estimated mean of observation". After FITNONLINEAR the
    command will work, but the resulting variates are not fitted values and
    residuals in the usual sense. See the description of FITNONLINEAR.

--------------------------------------------------------------------------------
    SAVEPARAMETERS

    Save estimated parameters from last model fit command.
................................................................................

    Syntax: SAVEPARAMETERS estimates [estsd [variance] ]

    After any model fit command but FITANOVA, this command saves the
    estimated parameters in a variate. There is one parameter for each
    column of the model matrix, as generated by the model formula. The
    order and interpretation of the parameters (which can be displayed by a
    LISTPARAMETERS command) follows from the rules described under
    FITLINEARNORMAL. Notice that parameters which are set to zero due to
    overparameterisation are also saved.

    If the second command parameter is specified, a variate holding the
    estimated standard deviations of estimates is also created.

    The parameters must be valid names of non-existing variates, or names
    of variates of the correct length (which is the number of columns of
    the model matrix, or the number of lines in the table produced by
    LISTPARAMETERS).

    The last parameter "variance" makes sense only after FITLINEARNORMAL
    and FITNONLINEAR. It must be the name of a variate of length 1, or a
    valid vector name which is not in use. The effect of this is that the
    estimate of the error variance (or the corresponding squared scale
    parameter, in case of a weighted model or a nonlinear model with
    variance function different from 1) is stored in this variate.

    To skip a parameter, write it as an asterix. For example, to save only
    linear estimates and the estimated model variance, write e.g.

        SAVEPAR pars * var

    EXAMPLE. Suppose a log-linear model has been fitted by a command like

        FITLOGLIN COUNT-LOGSIZE=1+TREAT+SEX+...

    where TREAT is a factor on four levels. To compute and list standard
    approximate 95% confidence limits for the relative multiplicative
    effects of TREAT with level 4 as baseline, do something like this:

        SAVEP PARS1 SD1
        VAR PARS SD 4
        TRANSFER PARS1 2 5 PARS 1 4
        TRANSFER SD1   2 5 SD   1 4
        DEL PARS1 SD1
        ESTIMATE=EXP(PARS)
        LOWER=EXP(PARS-1.96*SD)
        UPPER=EXP(PARS+1.96*SD)
        DEL PARS SD
        LIST ESTIMATE LOWER UPPER

    Notice: This relies on the convention that the last parameter of TREAT
    is set to zero, because this is where the linear dependence of columns
    in the model matrix is met for the first time. The last line of the
    listing will have ESTIMATE=1 (=EXP(0)) and LOWER=UPPER=ESTIMATE (since
    SD=0). To select level 1 as the baseline level, begin (before the model
    fit command) by a "baselining" of that level, e.g. by

        TREAT=TREAT-1

    (setting the desired baseline level to 0, which implies that no "dummy"
    column is generated for that level), or make a permutation of the levels
    such that the desired baseline level becomes the last. See also the
    command ESTIMATE below.

--------------------------------------------------------------------------------
    ESTIMATE

    Outputs specified contrasts and their standard deviation. The command
    can be used after any model fit command.

................................................................................

    Syntax: ESTIMATE term1 [term2 [...]]

    or      ESTIMATE variate1 [variate2 [...]]

    In the first case, the parameters must be terms of the model formula,
    separated by blanks or plusses. For terms involving factors, all
    possible differences between parameter estimates are listed with their
    estimated standard deviations. For terms with variates only, the
    estimate of the regression coefficient and its standard deviation is
    given. If the term involves only a single factor, a simple plot
    showing how the estimates are positioned on the line is added. If more
    than 20 parameters are involved this is the only output you will get,
    since the list of pairwise comparisons is too long to be of any use.

    The second form can be used for estimation of quite general linear
    combinations of the parameters. variate1 , variate2 etc. must be of
    length equal to the number of linear parameters in the model,
    including those that are set to zero due to linear dependence (i.e.
    the number of lines written by LISTPARAMETERS), and the corresponding
    linear combinations (with the variate's values as coefficients) are
    estimated.

    EXAMPLE. If TREAT is a factor on 3 levels, the two ESTIMATE commands in
    the following program will give roughly the same output (except that
    the last one will not produce a plot)

        FITLIN Y=1+TREAT
        INCLUDEALL { required if restrictions on units 1..4 }
        COEFF1=[0 -1  1  0]
        COEFF2=[0 -1  0  1]
        COEFF3=[0  0 -1  1]
        ESTIMATE TREAT
        ESTIMATE COEFF1 COEFF2 COEFF3

    The two kinds of parameters can be mixed as desired. For example,

        ESTIMATE F COEFF1 COEFF2 COEFF3

    would be OK in the above example.

    WARNING. The quantities estimated are defined in a simple way in
    relation to the model matrix, namely as either

        coefficients to the corresponding covariates (in case of a variate
        argument),

        differences between such coefficients (in case of an argument
        involving at least one factor), or

        linear combinations of such coefficients (in case of a variate
        argument with length equal to the number of parameters in the
        model).

    However, in relation to the model they are not always meaningful. For
    example, in a R(ow) x C(olumn) two-way setup,

        ESTIMATE R*C

    makes perfectly sense after

        FITLINEAR Y=R*C   or
        FITLINEAR Y=1+R*C

    (where it performs all the pairwise comparisons of (r,c)-means), but
    not after

        FITLINEAR Y=1+R+C+R*C

    because the presence of main effect terms will destroy the simple
    interpretation of the differences between interaction parameters.
    Similarly,

        ESTIMATE R

    makes no sense at all after

        FITLINEAR Y=1+R+C+R*C

    - or perhaps we should say that the sense it makes is somewhat
    complicated. The quantities estimated would be differences between
    cell means in the last column of the two-way table.

    After FITANOVA, the command ESTIMATE can also be used, but here it
    activates a quite different procedure adapted to the case of a mixed
    model, where contrast variances can be sums of contributions from
    different error strata. The only arguments allowed are fixed terms
    from the model formula of the last FITANOVA command, and the estimates
    coming out of this are always the means of observations in the groups
    defined by this factor or product factor. The command outputs a table
    of means, together with information that enables you to compute
    standard deviations of simple contrasts (differences between means).
    In the simplest case (effects of equally replicated factors with no
    partially confounded random factors) the standard deviation is given
    explicitely. In more complicated situations you must compute it from
    the contributions to the variance from the different strata. Since the
    treatment structure is always specified by the maximal model formula,
    estimability of linear parameters is not taken into account by this
    form of the ESTIMATE command.

--------------------------------------------------------------------------------
    SAVENORMEDRESIDUALS

    Computes "studentized" (T-distributed) normed residuals and the
    correponding tail probabilities after FITLINEARNORMAL and FITNONLINEAR.

................................................................................

    Syntax: SAVENORMEDRESIDUALS nres [pvalues]

    When SAVEFITTED is used after FITLINEARNORMAL, it computes normed
    residuals simply as residuals divided by the estimated standard
    deviation. When hunting outliers, a more relevant definition of normed
    residuals is the one that makes them T-distributed with ResDF-1
    degrees of freedom. A "studentized" residual can be computed by
    removal of the observation from the data set and fit of the model to
    the remaining observations. Or by extension of the model with a dummy
    that allows the observation to have its own, freely varying mean, and
    performing the T-test for the hypothesis that this term can be removed
    from the model. But a faster way of computing all these quantities
    goes as follows. Let ModelVariance*h(i) be the estimated variance of
    the i'th fitted value, computable as the double sum over (j1,j2) of
    the quantities

             Xmatrix(i,j1)*Xmatrix(i,j2)*ParameterCov(j1,j2).

    h(i) can also be interpreted as the i'th diagonal element of the
    orthogonal projection matrix for the linear subspace of means associated
    with the model. The (estimated) variance on the i'th residual r(i) is
    then

                      V(i) = ModelVariance*(1-h(i)).

    The i'th studentized residual can now be computed as

                                   r(i)/sqrt(1-h(i))
     NRES(i)  =    -------------------------------------------------- .
                   sqrt( ( SS(res) - sqr(r(i))/(1-h(i)) )/(ResDF-1) )

    A similar formula exists for the weighted case. The command
    SAVENORMEDRESIDUALS performs this computation and saves the result in
    the variate "nres" given as the first parameter. If this is an
    existing vector, it must be a variate of the correct length, otherwise
    it is declared as such.

    The second variate "pvalues", if specified, is also declared
    automatically, and the command will store in this the two-sided tail
    probabilites in the relevant T-distribution. A reasonable
    (conservative) criterion for an observation being "outlying" is that
    this tail probability is less than, say, 0.05 divided by the number of
    observations (because this ensures that an outlier is found in a
    correct model with probability at most 0.05).

    The command can also be used after FITNONLINEAR, but here the
    T-distribution must obviously be taken as an approximation. The
    "studentized residuals" produced in this case are those associated
    with the weighted regression performed in the last iteration.

    Restrictions are obeyed in the following sense: For non-present units
    normed residuals are set to zero and tail probabilities to one. In
    weighted models, weight=0 produces missing values for the
    corresponding studentized residuals (and P-values).

--------------------------------------------------------------------------------
    FITANOVA

    Analysis of variance in orthogonal designs.
................................................................................

    Syntax: FITANOVA response=fixedterms+[randomterms]

    where the brackets here are not "syntax brackets", they should
    actually be there if the model contains random effects.

    This command performs analysis of variance for models in orthogonal
    designs, including certain variance component models, closely following
    the exposition

          Tjur, T. (1984):
          Analysis of Variance Models in Orthogonal Designs
          International Statistical Review 52, pp. 33-81

    The theory given in that paper will not be repeated in full detail here.

    The syntax for FITANOVA is similar to that of FITLINEARNORMAL, with the
    following modifications:

    (1) A weight can not be specified.

    (2) Only factors, not variates, are allowed in the model specification on
        the right hand side of the equality sign.

    (3) Random effects in addition to the "unit-to-unit variation" (which is
        always assumed to be present) can be included in a final bracket.
        For example

          FITANOVA Y = 1 + ROW + COL + [ROW*COL]

        will estimate a model with fixed effects of ROW and COL (and a
        constant term), random ROW*COL effect (interaction) and a
        (mandatory) random UNIT effect (e.g. a measurement error, i.e.
        independent i.i.d. error terms as in a linear normal model).

    (4) The  model  specified  must  satisfy the following conditions, which
        -  apart from non-essential modifications - are those given in Tjur
        (1984):

        A) The  entire set of factors, occurring in the model specification,
           must be closed under the formation of minima. For example, for a
           balanced two-way table,

             FITANOVA Y = ROW + COL + [ROW*COL]

           will not work, because the minimum of ROW and COL (the trivial
           factor, represented by the constant term 1) is not present.

        B) The set of random factors (those occurring in the bracket) must
           be closed under the formation of minima, and they must all be
           balanced (i.e. they must group data into equally sized classes).

        C) Any two factors or product factors in the model must be
           orthogonal.

    It follows from these assumptions that the set of factors occuring in
    the model specification constitutes an orthogonal design (Tjur 1984, p.
    41-42), and the set of fixed (non-bracketed) factors constitutes the
    maximal model formula specifying the treatment structure in this design.
    The analysis of variance table, output by FITANOVA, contains a line for
    each factor or product factor occurring in the model specification,
    including the random UNIT effect, with omission of square sums that
    are "structurally zero" (i.e. with zero degrees of freedom). The lines
    of the table are ordered by strata, each stratum containing the sums
    of squares for fixed effects in that stratum, with the sum of squares
    for the associated random factor as the last line, labeled "Residual".
    The lines for fixed effects give the F-tests for removal of the
    corresponding factors from the model. The "Residual" lines (for random
    effects) give the F-tests for removal of the corresponding random
    terms from the model, whenever this is a legal hypothesis (see Tjur
    1984, p. 58).

    The estimated eigenvalues of the covariance matrix are found in the
    analysis of variance table as the "Residual" MS's. In the list of
    estimated variance components, the column SD gives the estimated
    standard deviations of these estimates, based on their interpretation as
    linear combinations of the Chi square distributed, independent
    eigenvalue estimates. Normal confidence limits, based on these standard
    deviations, are only reliable for large residual degrees of freedom in
    the corresponding stratum and all higher strata.

    The estimate "Total variance" (the sum of the variance components, i.e.
    the variance on a single observation) is similarly equipped with a
    standard deviation, computed in the same way.

    Missing minima of (product) factors, which can not be generated in a
    simple way, can be constructed by the command CONSTRUCTMINIMUM, see
    below.

    Estimation of fixed effects can be performed by the command ESTIMATE,
    see approximately 8 screenfuls above. LISTPARAMETERS, SAVEFITTED,
    SAVEPARAMETERS and SAVENORMEDRESIDUALS can not be used.


    PSEUDO STRATA.

    The formal requirement that the set of random factors should be closed
    under the formation of minima implies, e.g., that a model for a two-way
    table with random row and column effects can not be estimated by

        FITANOVA Y=1+[ROW+COL+ROW*COL]

    since the minimum 1 of ROW and COL is not among the random effects. You
    will have to use

        FITANOVA Y=1+[1+ROW+COL+ROW*COL]

    instead. However, this implies that the eigenvalue parameter for the
    "pseudo" stratum CONSTANT STRATUM is set to zero, because it can not be
    estimated. Accordingly, the variance on the grand mean is formally
    estimated as zero. However, the procedure ESTIMATE assumes that
    variance components for such strata should be set to zero, so the
    variance of the grand mean in this case (in output from an ESTIMATE 1
    command) will be computed from the contributions to the variance from
    the three "non-pseudo" strata. This may result in a negative value for
    the estimated variance on the grand mean. More generally, variance
    components corresponding to strata with zero degrees of freedom for the
    residual will be skipped when contrast variance estimates are computed,
    and this may result in negative values of these estimates.

    The response variate must not contain missing values, and the factors
    involved must not have units on level zero. Restrictions are taken into
    account in the obvious way. However, the exclusion of units with missing
    or outlying responses will usually destroy orthogonality and
    balancedness. To obtain an approximate solution, replace (a few) missing
    values or outliers with suitable "typical values", e.g. group averages
    or predicted values from a linear model.

    CONSTRAINTS. The maximal number of levels for a product factor is 2049.
    The maximal number of fixed terms is 40, the maximal number of random
    terms is 10, and no term must be a product of more than 8 factors.

--------------------------------------------------------------------------------
    CONSTRUCTMINIMUM

    Construction of the minimum of two (products of) factors.
................................................................................

    Syntax: CONSTRUCTMINIMUM prodfac1 prodfac2 mininum

    CONSTRUCTION OF PSEUDO FACTORS.

    Sometimes the closedness-under-minima constraint in FITANOVA enforces
    the inclusion of fixed-effect factors which are not statistically
    meaningful. Such factors are called pseudo-factors. Very often such
    factors will not be given in the data set, so you will have to construct
    them. The minimum of two given (products of) factors can be constructed
    by the command CONSTRUCTMINIMUM. The three parameters are strings. The
    first two must be names of existing factors or products of such, the
    last must be a valid identifier of a non-existing vector. The minimum of
    the factors given as parameter 1 and 2 is stored in a factor of name
    parameter 3.

    CONSTRAINTS. Since the result is stored as a single factor, the number
    of levels for the minimum must not exceed 255. Restrictions are NOT
    taken into account.

--------------------------------------------------------------------------------
    BARTLETT

    Bartlett's test for variance homogeneity.
................................................................................

    Syntax: BARTLETT variate factor

    The variate and the factor must be of the same length. Bartlett's test
    for equality of the variances in the "variate"-samples in the one-way
    setup defined by "factor" is performed. Restrictions are taken into
    account, and empty groups or groups with a single observation are
    ignored. The variances in groups are listed, and Bartlett's test with
    bias correction is performed. If only two samples are present, also
    the relevant F-test on the proportion between the two variances is
    performed. Here, the right tail probability in the F-distribution is
    given for the ratio between largest and smallest variance.

    To check for constant variance across the groups determined by a
    factor in a linear model which is not just the one-way model
    determined by the factor, use BARTLETT with the variate of residuals
    as the first argument. This is a reasonable approximation when the
    number of observations is large compared to the number of parameters
    in the model.

--------------------------------------------------------------------------------
    CORMAT

    Writes a table of correlation coefficients for a set of variates.
................................................................................

    Syntax: CORMAT variate1 variate2 [variate3 [...]] [stars]

    The variates must be of the same length. The procedure writes a table of
    correlations between the variates.

    The last (optional) parameter "stars" controls the printing of
    significance indicating stars. Without this parameter, no such
    printing takes place. If stars = * , a single star indicates
    significance on (two-sided) level 5%. If stars = ** , a double star in
    addition indicates significance on (two-sided) level 1%. For stars =
    *** , a triple star means significance on (two-sided) level 0.1%. To
    avoid line overflow, the number of decimals for the correlation
    coefficient becomes smaller when the number of stars is increased. The
    number of decimals after the point is 5 minus the number of stars. If
    you specify four or more stars, the number of decimals becomes 5 and
    no stars are printed, but a table of P-values is displayed instead.

    In this context "significance", of course, refers to the distribution
    of the empirical correlation coefficient in the normal case when the
    theoretical correlation is zero.

    Restrictions are obeyed and missing values are treated as non-present.
    But for each pair of variates, only the missing values of the two
    variates involved are excluded, not the units corresponding to missing
    values for other variates in the list. Thus, if you want a proper
    (positively definite) empirical correlation matrix, EXCLUDEMISSING for
    all variates involved must be used before CORMAT.

--------------------------------------------------------------------------------
    WILCOXON

    Wilcoxon (or Mann-Whitney) two sample test
................................................................................

    Syntax: WILCOXON variate factor level1 level2

    Performs Wilcoxon's nonparametric rank sum test for comparison of two
    empirical distributions, with the normal approximation to the
    distribution of the test statistic. The two samples of observations
    are defined as the values of the variate corresponding to units given
    by the two factor levels of the factor. Only present values are taken
    into account, and missing values are treated as non-present. Ties are
    corrected for by simple averaging, and the two extreme values of the
    test statistic, consistent with the ties, are also computed.

--------------------------------------------------------------------------------
    SPEARMAN

    Spearman's rank correlation test for independence
................................................................................

    Syntax: SPEARMAN variate1 variate2

    The two parameters must be names of variates of the same length, with
    at least two values present and no missing values present. Spearmans
    rank correlation (for the values present) is computed, and the
    approximate test for no dependence (assuming sqrt(n-1)*RankCor
    normalised normal) is performed. If ties are present, the result comes
    out as the average of the two extreme values obtainable by arbitrary
    ranking within tie groups (and these "extreme rank correlations" are
    reported also). As for WILCOXON (see above), this averaged test is
    conservative. Rejection of the hypothesis "no dependence" is reliable,
    but acceptance is, in principle, only possible when the two "extreme"
    tests agree on this.

--------------------------------------------------------------------------------
    SHOW

    Shows the contents of a file.
................................................................................

    Syntax: SHOW [filename, directory or path]

    When the command is used without parameters, the sessions output file
    is shown. Here, some color effects are used to make the file more
    readable. For this reason, the output file should in general be kept
    short, otherwise it will take a long time to form the image. Do not
    save LISTings of very long data sets, they are useless anyway. If you
    really want to save such listings (for example as input to other
    programs), redirect the output to another file by an OUTFILE command,
    see 4 screenfuls below. Also, if you write programs with loops, use
    ECHO 0 and OUTFILE 0 before the loops, otherwise the output file will
    be filled up with command echoes and output from each loop. If, by
    mistake, you have created a very long output file, you can reset it by
    an OUTFILE * command (unless the file contains things that are
    important, in this case you will have to QUIT the session and start a
    new; or copy and paste to an "emergency backup file" which you create
    by an EDIT command).

    If the parameter is a valid directory name or a path, a file selection
    menu appears. For example,

        SHOW *.*     { or just SHOW . }

    allows you to search among all files in the present directory (i.e. the
    working directory), whereas

        SHOW ..\proj2\*.JPG

    can be used if you want have a look at the JPG pictures on the sibling
    directory proj2.

    If the parameter is the name of an existing file, this file is chosen.

    The action of the SHOW command depends on the file extension:

        .OUT : The file is shown as an ISUW output file (to view output
        files from earlier ISUW sessions).

        .JPG, .BMP : The picture is shown in ISUW's own picture viewer. Use
        a SHELL command if you want Windows' default device for these
        extensions.

        .SUD : Writes the list of vectors stored in the StatUnit data set,
        their types, lengths, number of levels for factors and the space
        they occupy in bytes. Vector names that coincide with names of
        existing vectors are marked with an asterix.

        .ISU, .TXT, .DAT, .BAT, .CMD, .PS, .EPS, no extension : Shows the
        (plain text) file in ISUW's "read only editor".

    When a text file is viewed, you can use standard Windows features to
    select a portion of the text and copy (Ctrl-C) it to the clipboard,
    from which you can paste it into e.g. the editor by Ctrl-V. Use Ctrl-F
    to search for a given text string. The search is forwards from the
    cursors position, ignores case and multiple blanks and interpretes
    newline symbols as blanks. Repeat the search by Ctrl-L. Use Ctrl-P to
    print the whole file or the selected portion.

    SHOW commands are ignored in programs, except that the last SHOW
    command in a program is executed after the program has ended. More
    about this in the description of RUN.

    To open a file by its default Windows application determined by the
    file extension, use the SHELL command.

--------------------------------------------------------------------------------
    REMARK

    Writes a message to the output file.
................................................................................

    Syntax: REMARK text

    Writes "text" to the output file.

    If ECHO is off (see below) or an alternative output file has been
    selected by OUTFILE (see 2 screenfuls below), the text is simply written
    (leading blanks are ignored, multiple blanks are replaced with single
    blanks). Avoid text exceeding 80 characters, the lines will not be
    broken.

--------------------------------------------------------------------------------
    QUIT

    Terminates the ISUW session.
................................................................................

    No parameters.

    In  interactive mode, the same happens if you press Escape from an empty
    command line, and then press Return.

    Before the session is terminated you are given the option to save the
    output file (temporarily created under the name ISUWOUT.TMP on the ISUW
    root directory) as a file with extension .OUT. These files are plain
    text files and can as such be imported to text handling programs, or
    viewed later from other ISUW sessions by the SHOW command.

--------------------------------------------------------------------------------
    ECHO

    Turns ISUW command and error message echo on/off
................................................................................

    Syntax: ECHO [realexpression]

    The default action of ISUW is to echo all commands and error messages to
    the output file, which means that the output file will contain a
    complete log of the session. If you want to create a condensed output
    file, consisting only of selected output, use ECHO as the first command.

    When ECHO is used without parameters, it simply toggles ECHO on or
    off. With a real number or expression as the parameter, a positive
    value means that ECHO is set on, whereas zero or a negative value sets
    it off. Typically, you will use the parameters 1 for on, 0 for off.

    Another way of creating a condensed output file without command echoes,
    error messages etc., is to select an alternative output file by OUTFILE,
    see below.

    ECHO 0 should be used in most cases before loops in a program. Without
    this, the sessions output file will very quickly be filled up with
    echoes of the commands in the loop.

--------------------------------------------------------------------------------
    OUTFILE

    For redirection of StatUnit output to another file than the sessions
    output file.
................................................................................

    Syntax: OUTFILE [filename]

    EXAMPLE. To write the output from a LIST command on a file LIST.TXT
    (which is the simplest way of exporting data to another program or
    statistics package), write

        OUTFILE LIST.TXT
        LIST ...
        OUTFILE
        % and just to check that everything is OK
        SHOW LIST.TXT

    OUTFILE without parameters (the third command in the program)
    redirects output to the primary output file. If this is already done,
    nothing happens. Do not write the name of the sessions temporary
    output file explicitely, this will overwrite the file instead of
    appending to it.

    Quite generally, if the parameter is the name of an existing file,
    that file will be overwritten without warning.

    When output is redirected in interactive mode, the result of an output
    generating command is previewed as usual, and the way you leave the
    read-only-editor (by Return or Escape) determines whether output is
    saved or suppressed. The only difference is that it is written to
    another file. Command echoes, error messages etc. are still written to
    the sessions original output file.

    You can use the file name 0 or NUL to suppress output entirely. In
    interactive mode, output will still be shown in the preview window,
    but regardless of whether you leave this window by Return or Escape
    the output will be lost. The command OUTFILE 0 should be used before
    loops in a program, if the loop contains output generating commands
    - unless you really want to see the output from all the loops.

    A special emergency version of the OUTFILE command has the form

        OUTFILE *

    The effect of this is that the sessions output file is "reset". All
    lines (except the first four) are deleted. You can use this to get rid
    of all output produced until now, for example if you have created a
    very long and unhandy output file by LISTing a long data set or
    forgetting to turn command echo off before the execution of a program
    with loops. But notice that all output produced earlier in the session
    is lost. If there is something you want to keep, you will have to save
    it on another file by copy and paste before you use the OUTFILE *
    command. Notice also that this command acts on the sessions "official"
    output file, not on a file to which you have redirected output by an
    earlier OUTFILE command. Such files can be reset simply by reopening
    them.

--------------------------------------------------------------------------------
    EDIT

    Edits ISUW programs and other plain text files.
................................................................................

    Syntax: EDIT [filename]

    If the command is given without parameters or with a path or file mask
    as its parameter, a file selection menu appears.

    For ISUW programs (see the RUN command below), the file extension .ISU
    can be omitted. To edit a file without extension, add a period as the
    last character of the file name.

    When you edit a file you can

      - press Ctrl-Return to truncate the line from the cursor's position.

      - press F1 to get help.

      - press F2 to save changes.

      - press F10 to display the sessions output file.

      - search for a word or phrase by Ctrl-F.

      - repeat last search by Ctrl-L.

      - press Escape to leave the editor.

    When you edit an ISUW program you can, in addition,

      - press Ctrl-F1 to get help on the command in the present line.

      - press F9 to execute the program without leaving the editor. If an
        error occurs, the corresponding line of the program will be
        marked. Use a SHOW command as the last command of the program if
        you want to have a look at the result right after execution.
        If a portion of the text is marked as a block, only the block is
        executed.

      - use shortkeys defined by the KEYS command (with the obvious
        exceptions Ctrl-F, Ctrl-L, F2, F9 and F10). They will act roughly
        as they do from the command line, except that a final exclamation
        sign is replaced with a line shift, and the text of the present
        line will only be substituted for question marks in the KEYS
        string if it is marked (highlighted). However, the marked portion
        of the text must be contained in a single line, otherwise nothing
        happens.

    In addition to this, the editor has the following standard features of
    Windows editors. Mark (highlight) a block by the cursor arrows with
    the Shift key down, or use the mouse. To mark the entire text as a
    block, press Ctrl-A. When a block has been marked you can do the
    following:

      - Delete it by the Delete key (or overwrite it by any character key).

      - Delete it and copy its contents to a hidden "clipboard" by Ctrl-X.

    (these two operations can be regretted by Ctrl-Z)

      - Copy its contents (without deleting it) to the clipboard by Ctrl-C.

    Later, you can (optionally while editing another file, perhaps even in
    another editor)

      - Paste the contents of the clipboard into the text at the cursors
        position (or to replace marked text) by Ctrl-V.

    EDIT commands in programs are ignored.

--------------------------------------------------------------------------------
    RUN

    Executes a program.
................................................................................

    Syntax: RUN [filename [par1 [par2 [...]]]]

    In many situations it is convenient to write a sequence of commands line
    by line on a text (ASCII) file, for later error correction, modification
    and reuse. Such a program file - call it PROGRAM.ISU - can be created
    and executed by

       EDIT PROGRAM
       RUN PROGRAM

    It is also possible to execute a program directly from the editor, see
    the description of EDIT right above.

    Notice that the extension .ISU - which is required - is not written as
    part of the file name in the EDIT and RUN commands.

    If the file name is omitted or replaced with an asterix, a file
    selection menu (path *.ISU) appears.

    The optional additional parameters par1, par2, ... are for parameter
    substitution in the program, see the description of SUBSTITUTE
    approximately 4 screenfuls below.

    The file must contain one command per line, except that a backslash \ at
    the end of a line means "append next line". The backslash can break the
    command at any point, also in the middle of a connected word. This means
    that a blank is not inserted automatically - don't forget to insert a
    blank before the backslash or in the beginning of next line, if there
    should be one.

    The program executes roughly as if you were typing the file line by
    line from the keyboard, unless GOTO commands are used to break the
    order. However, the commands are not stored in the upper "reuse"
    window of ISUW's front window, and all output is written directly to
    the sessions output file without any previewing. Thus, if you want
    output suppressed for certain commands, you must surround these
    commands by OUTFILE 0 ... OUTFILE commands.

    Commands can be truncated as far as they are unique. However, command
    field shortkeys (like A for INCLUDEALL) can not be used. COMPUTE
    commands can be written directly without the command name and without
    a leading blank, they are recognized by the appearence of an equality
    sign right after the first connected word. However, COMPUTE commands
    without a right hand side must start with = or + (or COMPUTE).

    Graphics output produced by PLOT and HISTOGRAM commands is previewed as
    in interactive mode. However, an exception from this occurs when a
    PostScipt file is open. In this case, the images are removed from the
    screen immediately, implying that you do not have to press Escape each
    time a picture is formed and sent to the PostScript file.

    The 3-d versions of PLOT and HISTOGRAM can not be used in programs.

    If an error occurs during the execution of a program, it will abort
    with an error message. The message on the screen contains information
    about which line the error occurred in and which loop (if not the
    first). The number of loops, in this context, is the number of times
    that the program file was opened and read from the beginning, see the
    GOTO command 7 screenfuls below. Edit the program to correct the
    error, remove vectors that were imported to or created by the program
    before the error interrupt, and RUN it again.

    The Escape key can be used to interrupt a program if it takes more
    time than expected or is stuck in an infinite loop. Escape will force
    the program to exit after execution of the command presently handled,
    just as if an error had occurred in that command.

    Notice that vectors that are (explicitely or implicitely) declared in
    the program or imported to the program by GETDATA will be present when
    the program terminates (except for "$-vectors", see below). This means
    that you will typically run into name coincidence conflicts if you
    forget to delete these vectors before you run it again. For larger
    programs, it is usually a good idea to write the program such that it
    imports and creates its own data, and let the first line of the
    program be a DELETE command without parameters.

    A useful convention for "local variables" is this: If the name of a
    variate or factor begins with a dollar sign it is deleted at exit from
    a program, also if the exit is due to an error or a keyboard
    interrupt. Notice that ALL vectors beginning with a $ are deleted,
    also those that were present when the program started. Thus, a good
    advice is not to use $-names for anything else.

    Lines can be indented as desired, and empty lines can be inserted, to
    make the program more readable.

    Comments can be inserted in two ways:

        (1) Lines starting with a percentage sign '%' (optionally after
            some blanks) are ignored.

        (2) Text in curled parentheses {} within a line is ignored.

    Such comments are not echoed to the output file (use REMARK for this).

    Nested programs are not allowed, i.e. RUN commands must not occur in
    programs. Also, OPENCOMMANDFILE and the 3-D versions of PLOT and
    HISTOGRAM are forbidden. EDIT and SHELL commands in programs are
    ignored.

    SHOW commands have a special role in programs. Or rather, the last
    SHOW command in a program has a special role, since this is the only
    one that is executed, and this is done AFTER the program has executed.
    For this reason, it is usually most convenient to place the SHOW
    command as the last command of the program. A construction which is
    particularly useful when a large program is tested from the editor is
    the following. Let the program start with a command like

        OUTFILE tmp

    and end with

        SHOW tmp

    This implies that all output is written to a temporary file (here
    called tmp), which is overwritten each time the program runs. As long
    as there are errors in the program you will stay in the editor, but as
    soon as the program executes without errors, its output will be shown
    immediately.

--------------------------------------------------------------------------------
    SUBSTITUTE

    Parameter substitution in programs.
................................................................................

    Syntax: SUBSTITUTE string1 [string2 [...]]

    where string1 , ... are connected strings (i.e. without blanks).

    This command, which makes sense only in programs, represents the
    closest ISUW comes to a macro or procedure facility. The action of a
    SUBSTITUTE command is best explained by a small example:

    Suppose that a program LOGIT1.ISU, to be executed by a RUN command,
    consists of the two lines

        SUBSTITUTE model
        FITNONLINEAR model exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.)

    Then the effect of writing e.g.

        RUN LOGIT1 freq=1+group+sex+age/count

    is that the (long and tedious) command

        FITNONLINEAR freq=1+group+sex+age/count \
        exp(.)/(1+exp(.))  exp(.)/sqr(1+exp(.))  .*(1-.)

    is executed - which actually means that a logistic regression model
    with overdispersion is analysed. The pseudo-parameter "model" is
    simply replaced with the first connected string after the file name in
    the RUN command.

    The general idea is that the SUBSTITUTE command specifies a list of
    strings separated by blanks. Whenever one of these strings occurs later
    in the same program, it is replaced with the corresponding element of
    the list of parameters in the RUN command that called the program. For
    this reason, the one and only SUBSTITUE command in a program should
    quite generally be placed in the beginning, and certainly not in a
    loop.

    It is also possible to give the replacement strings directly in
    the SUBSTITUTE command. This is particularly useful when you are
    writing and testing a program in the editor. The syntax for this is,
    for the program LOGIT1.ISU above, to write

        SUBSTITUTE model=freq=1+group+sex+age/count
        FITNONLINEAR model exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.)

    This program will RUN without additional parameters. The word "model"
    will be replaced with "freq=1+group+sex+age/count" whenever it occurs
    in the lines following the SUBSTITUTE command. Even if the RUN command
    specifies an additional parameter, this will be overwritten by the
    specification following the first equality sign in the SUBSTITUTE
    command. The general rule is that if a parameter in the SUBSTITUTE
    command contains at least one equality sign, everything before the first
    equality sign becomes the string to be substituted, everything after
    becomes the string that will replace it.

    An obvious consequence of this is that the words to be substituted in
    a program must not contain equality signs.

    The substitution works by simple case-sensitive comparison of
    substrings. There are some obvious problems with this. Things can
    break down if one string is a substring of another, or a substring of
    a string somewhere in the program, or of a string which is already
    substituted for another string. An easy way of avoiding such problems
    is to let all strings in the SUBSTITUTE command begin and end with
    special characters. A simple and useful convention is to let
    substitute strings be names in brackets. Since brackets are not used
    for much else in ISUW, this is rather safe.

    EXAMPLES. The following program BINSIM.ISU can be used for simulation of
    binomal observations. When called on the form, say,

        RUN BINSIM Y 0.4 30

    it will fill the existing variate Y with simulated observations from a
    binomial distribution with probability parameter 0.4 and binomial total
    (index) 30.

    BINSIM.ISU:

        ECHO 0                          { to avoid echoes of all the loops }
        SUBST [VAR] [P] [N]
        VAR $BIN [N]
        $I=0        {since $I is undefined it becomes a variate of length 1}
        %LABEL
        $I=$I+1
        $BIN=(RANDOM<[P])        { $BIN is filled with Bernoulli variables }
        [VAR]($I)=SUM($BIN)      { ...and the $I'th entry of [VAR] becomes }
                                 { the sum of these                        }
        GOTO %LABEL $I<##([VAR])                 { GOTO is described below }
        ECHO 1

    Notice that the two auxillary variates $BIN and $I are deleted
    automatically at exit, because they have a dollar sign as the first
    character of their names.

    The following program OUTLIERS.ISU, when called on the form, say,

        RUN OUTLIERS Y 1+SEX+SEX*AGE 0.05

    performs some outlier detection in connection with the linear
    regression model specified by the first two parameters (here
    'Y=1+SEX+SEX*AGE'). A plot of fitted values against studentized
    residuals (see the description of SAVENORMEDRESIDUALS), with a color
    marking of outliers, is produced, and the outliers, if any, are
    listed. Here, an outlier is defined conservatively in such a way that
    the probability of finding a positive number of outliers in a correct
    model will not exceed the number specified as the third parameter,
    here 0.05. The program is complicated because the listing of outliers
    is performed under restrictions, and the original restrictions, if
    any, must be reestablished. And also because we want a special action
    to be taken in the case where no outliers are detected.

    The contents of OUTLIERS.ISU:

        SUBST [y] [model] [alpha]
        FITLIN [y]=[model]
        SAVEFIT $fitted
        SAVENORMED $nres $p
        FAC $sign $pres ##([y]) 1
        VAR $unit ##([y])
        $pres=1          { $pres becomes 1 for units present, 0 otherwise }
        $unit=#
        $sign=($p<[alpha]/#([y]))     { $sign becomes 'outlier-indicator' }
        XTEXT Fitted values
        YTEXT Studentized residuals
        PLOT $fitted $nres $sign=7,12 =*
        % Reestablishing default labels for plots:
        XTEXT
        YTEXT
        % Restrict such that only the outliers are present:
        FOCUS $sign 1
        GOTO %no outliers #($sign)=0
        REMARK Strictly significant outliers (alpha=[alpha]).
        LIST $unit:5:0 [y] $fitted $nres $p
        GOTO %continue 1
        %no outliers
        REMARK No outliers detected (alpha=[alpha]).
        %continue
        % Reestablishing initial restrictions:
        INCLUDEALL
        FOCUS $pres 1
        SHOW

    When a program with a SUBSTITUTE command is executed, the echo of
    command lines after the SUBSTITUTE command will appear as they are
    after the substitution.

--------------------------------------------------------------------------------
    GOTO

    Controls conditional jumps to labels in a program.
................................................................................

    Syntax: GOTO label realexpression

    The first parameter "label" must be a text string holding the full
    contents of a line somewhere else in the program. It is preferable to
    use comment lines as labels (either "echoed" comments REMARK ... or
    "non-echoed" comments % ...). Comments in curled parenthesis can not
    be used because they would be removed from the GOTO command before
    its execution.

    EXAMPLE. A program of the form

        ...
        % Loop starts here
        ...
        GOTO % Loop starts here 1
        ...

    will create an infinite loop which - unless the program creates some
    overflow error - can only be broken by the Escape key. Notice the last
    parameter 1, which (as would any other positive constant) implies that
    the GOTO statement is actually executed. In more relevant constructions,
    the last parameter is a real expression, and the GOTO statement is only
    executed if this expression returns a positive value. With the usual
    translation of reals to booleans, you may think of the statement as
    having an invisible IF before the last connected string.

    When a GOTO command with a positive value of its last parameter is
    executed, the following lines of the program are read and skipped
    until a line consisting of the text "label" is found. If the program
    file is read through, it is reopened and read once more from the
    beginning. If a line with the correct text is found, execution of the
    program is taken up again right after (notice, AFTER) that line,
    otherwise the program is left regularly (i.e. without any warning or
    error message). This implies that a construction like

        ...
        GOTO %EndOfProgram n=100
        ...
        %EndOfProgram

    will work as intended, even if the last line is forgotten or misspelled
    (provided that no other line with exactly this content is found).

    The line to be searched for must match "label" litterally, also as
    regards upper/lower case of letters. But leading, trailing and multiple
    blanks are ignored in the comparison.

    The second parameter "realexpression" - in this case defined as the
    last connected word of the command - must be a valid right hand side of
    a COMPUTE statement with a variate of length 1 on its left hand side.

    WARNING. Programs with loops that are executed many times should
    in general have the command

        ECHO 0

    somewhere before (or in) the loop. If the loop contains output
    producing commands, also the command

        OUTFILE 0

    should be found somewhere before (or in) the loop. Otherwise, all
    commands (and their output) will be appended to the output file each
    time, and this takes unnecessarily long time and fills up the output
    file with useless garbage.

    EXAMPLES.

    The following program gives a variate I of length 1 the values 1, 2,
    ..., 10.

        I=0
        ECHO 0                           { commands in the loop not echoed }
        %LOOPSTART
        I=I+1
        COMPUTE I                    { write I to the sessions output file }
        GOTO %LOOPSTART (I<10)
        ECHO 1                                  { reestablish command echo }
        SHOW            { to see what came out of this interesting program }

    A more interesting example is this. Suppose that X and Y are variates
    of length 365. Think of them as time series sampled over a period of
    365 days. The following program fits the linear regression model
    "Y=1+X" 345 times on data from the 20-day period up to and including
    day I, where I takes all the possible values 20, 21, ..., 365. The
    slope estimates are stored in the variate BETA, and the final plot
    shows the development of this locally estimated regression coefficient
    over time. Similar programs could be used for kernel smoothing by
    local polynomial regression, optionally with other kernels than the
    rectangular (using weights rather than restrictions).

        VAR DAY BETA 365
        DAY=#
        I=19
        OUTFILE 0                           { To avoid 345 ANOVA tables }

          %loopstart
          I=I+1
            EXCLUDE DAY>I DAY<=I-20
            FITLIN Y=1+X
            INCLUDEALL
          SAVEPAR PARS
          BETA(I)=PARS(2)
          ECHO 0           { To avoid 344 additional echoes of the loop }
          GOTO %loopstart I<365

        OUTFILE
        ECHO 1
        PLOT DAY BETA =7 =L

    Another useful example follows here. It is wellknown that even
    experienced statisticians tend to be overcritical when normality
    assumptions are checked graphically. The only really valid way of
    doing it is by comparison of the histogram (or the probit diagram)
    with a set of similar plots for data sets of the same length, where
    the normality assumption holds. The following program NORMHIST, when
    called on the form, say,

        RUN NORMHIST 20 100 -4.0,8,4.0

    will display 20 histograms of 100 pseudo-normal variables with the
    vertical axis from -4.0 to 4.0 divided into 8 intervals.

    NORMHIST.ISU:

        ECHO 0
        SUBST [Hist] [Obs] [Int]
        VAR $U [Obs]
        VAR $I 1
        $I=0
        FRAMETEXT Histogram for [Obs] normal observations
        XTEXT |

        %Label
        $I=$I+1
        $U=normal
        $U=$U-MEAN($U)
        $U=$U/SQRT(VARIANCE($U))
        HIST $U=[Int]
        GOTO %Label $I<[Hist]

        FRAMETEXT
        XTEXT
        ECHO 1

    A final example. In the description of SAVEDATA it was explained how
    this command can be used to create sub-datasets where excluded units
    are "physically deleted". This is the easiest way of doing it, but just
    for illustration, we show here how it can be done in a more direct way.
    The following program EXTRACT.ISU, when called on the form

         RUN EXTRACT COUNTER X

    constructs a variate COUNTER, which contains the indices for the units
    present in vectors of the same length as X. Thus, the length of
    COUNTER becomes the number of units present in X. After this, you can
    construct a "short" version of X (or any vector of the same length)
    by

        INCLUDEALL
        COMPUTE SHORT_X=X(COUNTER)

    EXTRACT.ISU:

        subst [c] [x]
        var [c] #([x])
        var $present $unit ##(x)
        $present=1                 { $present becomes "restrict indicator" }
        includeall
        $unit=$present
        exclude 1
        $unit=$unit+$unit(#-1){ $unit becomes cumulated restrict indicator }
        includeall

        $i=0   { running unit number }
          %loopstart
          $i=$i+1
          goto %skip $present($i)=0
          [c]($unit($i))=$i
          %skip
          echo 0                           { loop echoed only first time }
          goto %loopstart $i<##(x)

        echo 1
        exclude $present=0                    { reestablish restrictions }

--------------------------------------------------------------------------------
    OPENCOMMANDFILE

    Enables you to import commands from a text file.
................................................................................

    Syntax: OPENCOMMANDFILE [filename]

    An ISUW command file is roughly the same as an ISUW program. The file
    must have extension .ISU, and in the command syntax it must be written
    without extension.

    If the command is used when a command file is already open, the
    present command file is closed and the new is opened. In particular,
    if the command is used without parameters, the present command file
    - if any - is closed. A blue command field indicates that a command
    file is open. Escape from a blue command field generates an
    OPENCOMMANDFILE command.

    The effect of this command is that the lines of the file specified can
    be imported one by one to the command field. The next line of the file
    is imported whenever the CursorDown arrow is pressed from an empty
    command field. Thus, it is essentially nothing more than an editor
    device associated with the command field. As long as you don't press
    the CursorDown arrow, everything works as usual in interactive mode.

    Conversely, if you import commands one by one and press Return as soon
    as a command has been imported, this is just a slow way of executing
    an ISUW program. You could obtain almost the same by a RUN command.
    However, there are some important differences. Output is previewed as
    usual, and you can append it to the output file by Return or suppress
    it by Escape as usual in interactive mode. The commands that are
    forbidden in programs executed by RUN (RUN, EDIT, SHELL and the 3d
    versions of PLOT and HISTOGRAM) can be used when a program is executed
    in this way. Conversely, GOTO and SUBSTITUTE can not be used.

    The advantage of executing a sequence of commands in this way is that
    you can skip commands as desired, insert other commands and make
    changes to the commands imported from the file before you execute
    them. If an error occurs you can usually correct it and continue - the
    command file is still open for import to the command field.

    The rules for breaking of long lines and truncation of commands are as
    described for RUN. Comments in the form of lines starting with a
    percentage sign are skipped, but comments in curled parentheses within
    a line can not be used (unless they are removed manually before Return
    is pressed).

    The demonstration programs on the directory DEMOS under the ISUW root
    directory are executed in this way, with the additional feature that
    the %-comments before each command are displayed in a separate window.
    You can actually write your own demonstration programs for special
    purposes and place them on the directory DEMOS.

--------------------------------------------------------------------------------
    SHELL

    Starts another Windows program, or opens a file by its default application.
................................................................................

    Syntax: SHELL [command and/or filename]

    EXAMPLES.

    To edit a program PRG.ISU with NotePad, if you prefer this for ISUW's
    own EDIT command, write

        SHELL NOTEPAD PRG.ISU

    To view a PostScript file page1.ps (created e.g. by OPENPS etc.) and
    optionally print it, simply use

        SHELL page1.ps

    provided that GhostScript or a similar device is set up as the default
    handler of .PS files on your computer.

    To invoke Windows Explore, use the SHELL command without parameters or
    with the name of a directory as the parameter. Be careful, there are
    many things you can do to generate an immediate uncontrolled crash,
    like deleting the file ISUWOUT.TMP on the ISUW root directory, or
    starting a second ISUW session under the same ISUW root. If this
    happens, it may be necessary to perform an emergency closedown (press
    Ctrl-Alt-Del and use Window's task manager).

    SHELL commands in programs are ignored.

--------------------------------------------------------------------------------
    KEYS

    Programming the keyboard.
................................................................................

    Syntax: KEYS [keyname [string]]

    EXAMPLE. After the command

        KEY AR RAMSTATUS!

    the key combination Alt-R gives the same result as you would obtain by
    writing RAMSTATUS in the command field and pressing Return.

    Programmable keys are

                       F2..F12             (F2..F12)
                   Alt-F2..F12             (AF2..AF12)
                  Ctrl-F2..F12             (CF2..CF12)
                 Shift-F2..F12             (SF2..SF12)
                   Alt-A..Z                (AA..AZ)
                   Alt-0..9                (A0..A9)
                  Ctrl-A..Z                (CA..CZ)    except... (see below)
                  Ctrl-0..9                (C0..C9)

    where the exceptions of the type Ctrl-letter correspond to the letters
    C, X, V, A and Z, which are reserved for standard Windows editing
    purposes (copy, cut, paste, select all, regret).

    The lists in parentheses show the way keys are referred to in the first
    parameter keyname. For example,

        KEY AF4 DELETE!

    will imply that Alt-F4 performs a DELETE command without parameters
    (thus overwriting the default Windows action of Alt-F4, which is to
    terminate the ISUW session).

    Without the final '!', a programmed key simply inserts "string" in the
    command line at the cursors position, and no execution takes place
    before you press Return. If a portion of the text in the command
    window is selected (marked), this text is replaced with "string".

    EXAMPLE. Suppose you have just executed a READ command like

        READ UNIT SEX=Unknown,Male,Female HEIGHT GROUP

    Since ISUW has no other facility for saving names of factor levels, you
    might want to save the string 'SEX=Unknown,Male,Female' for later use.
    To get this string written whenever Alt-6 is pressed, write (reusing
    some of the previous command line)

        KEY a6 SEX=Unknown,Male,Female

    The effect of an exclamation sign - which must be the last character
    of the string parameter - is that the command is executed immediately.
    In this case, the string must be a command, possibly with some
    parameter substituted with a question mark, see below. Think of the
    '!' as a code for the Return key. If the string is a COMPUTE command,
    the command name can not be omitted. In this case the text in the
    command window is overwritten, unless the string defined by KEYS has a
    question mark somewhere.

    If the KEYS string contains a question mark "?" somewhere, this is
    substituted by the presently selected text in the command window.
    However, if the string is a command (that is ends with an exclamation
    sign) the rule is different. In this case the entire contents of the
    command window, with the exception of the leading blank (which has to
    be there) will be inserted at the question mark's place).

    EXAMPLES. After

    KEY AP (?)

    you can put a pair of parentheses around the cursor or the selected
    text by Alt-P.

    After

        KEY AE SHELL NOTEPAD ?.ISU!

    you will be able to edit a program with NotePad simply by writing its
    name without extension (after a blank) in the command line and then
    press Alt-E. After

        KEY AV COMPUTE Variance(?)!

    you will be able to display the variance of a variate by writing its
    name (after a blank) and press Alt-V. Whereas the command

      KEYS AL FITNONLINEAR ? exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.)!

    will enable you to fit a logistic regression with overdispersion just
    by writing the model formula (after a blank) in the command field and
    then press Alt-L.

    To get a list of presently active shortkeys write

        KEYS

    (without parameters).

    Your present shortkey definitions are automatically saved on exit and
    recovered next time you start an ISUW session under the same root
    directory. You can, however,

    - add the startup shortkey configuration (if you have destroyed it) by

        KEY +

    - delete a single shortkey function by programming it as an empty string,

        KEY keyname

    - delete all shortkey functions by

        KEY -

    Notice that if you use the last command and then terminate the session,
    the startup configuration is lost. If you have some shortkey definitions
    that you want to keep safely, write a program KEYS.ISU like the
    following.

        KEY -

        KEY F10 SHOW!      { the output file is displayed by F10. A     }
                           { natural choice, since this is what happens }
                           { when you press F10 from the editor.        }

        KEY AA EDIT AUTOEXEC.ISU!  { edit autoexec program by Alt-A     }

        KEY AR RAMSTATUS!  { a RAMSTATUS is shown when Alt-R is pressed }

        KEY AD COMP sqrt(Variance(?))!   { display s.d. of a variate by }
                           { writing its name and pressing Alt-D        }

        KEY AX QUIT!       { exit with Alt-X                            }

        KEY AP (?)         { surround selected text with parentheses by }
                           { Alt-P                                      }

        KEY F2 SHELL ?.PS! { open postscript file by Windows default    }
                           { device by writing its name without         }
                           { extension and pressing F2                  }

        KEY AN SHELL ezlearn.cbs.dk/stat/hamat-2/tt/!  { open ISUW site }
                           { by Alt-N. If you download ISUWINST.EXE,    }
                           { don't execute it while the session is open.}

        KEY AT SHELL C:\EXE\WINT.EXE!  { open WinT - Desk Calculator    }
                           { with Statistical Tables - by Alt-T         }

        KEY AK KEYS!       { display present shortkeys by Alt-K         }

        KEY CK EDIT C:\ISUW\KEYS! { edit (and execute by F9, if desired)}
                           { this file by Ctrl-K                        }

        ...

    Save this program on a suitable working directory or (as an exclusive
    exception) on the ISUW root directory.

    Local shortkeys - i.e. shortkeys you want to be active only on a
    particular working directory - can conveniently be placed as KEY
    commands in an AUTOEXEC.ISU program on that directory.

********************************************************************************

    ISUW - the Windows version of Interactive StatUnit ISU - is a Delphi 5
    application, based on a collection of Turbo Pascal (later Borland
    Pascal) units for statistical analysis, developed from around 1990.

    ISUW is public domain software. Accordingly, I take no responsibility
    for errors in the program; but I would certainly like to hear about
    them, and correct them if I can.

    The latest version of ISUW can be downloaded from my download page

        ezlearn.cbs.dk/stat/hamat-2/tt/

    where other useful stuff can also be found.

    Tue Tjur                                        e-mail tuetjur@cbs.dk
    Copenhagen Business School
    The Statistics Group
    Solbjerg Plads 3
    DK-2000 Frederiksberg
    Denmark

                       @@@  Copyright Tue Tjur 2006  @@@

======================================================== End of ISUWHELP.TXT ===