ISUW, version JUN2009
========================================================================
This is ISUWHELP.TXT, the complete ISUW "on-line" manual.
========================================================================
--------------------------------------------------------------------------------
HELP ON HELP
F1 (from the command field or the program editor) displays the help
pages.
F1 once more (from the help pages) displays a list of ISU commands.
Select help on a specific command by the cursor up/down arrows and
Return.
Ctrl-F1 from the command field displays help on the present command.
Shift-F1 disables/enables "hint mode" where hints on active keys are
displayed in a small pop-up window at the mouse cursor.
In the help pages, search for a word or phrase by Ctrl-F. Repeat the
search by Return or Ctrl-L. The search is forwards from the cursors
position, ignores case, interpretes new line as blank and ignores
multiple blanks.
If you want to have a hardcopy of the "on-line-manual", print all
pages by Ctrl-P (88 pages of 72 lines). But wait, it is much easier to
search for commands and phrases "on-line"; and later, perhaps, print
out selected sections (Select by Shift and the cursor arrows, then
press Ctrl-P). We suggest that you start softly with a printout of the
article "Introduction to ISUW".
--------------------------------------------------------------------------------
HOW TO RUN ISUW INTERACTIVELY - a brief summary.
Notes on installation and directory structure are given in a later
section approximately 18 screenfuls below (search for "install").
ISUW is essentially mouse free. You can use the mouse to resize or
move the ISUW window, and it works as usual when you are editing or
selecting from a menu. But a general principle behind the design of
ISUW is that there should not be more on the screen than necessary at
any time. The buttons that control ISUW are not placed on the screen,
they can be found on the keyboard where they are a lot easier to hit.
No rule without exceptions, and a useful exception is this. When you
place the mouse cursor in a window on the screen, a small message with
summary hints (in particular concerning the keys you can use) will pop
up. At some point, this "hint mode" becomes more irritating than
useful. Shift-F1 can be used to switch it off and on.
In the entry dialog for selection of working directory, move around in
the directory tree by the cursor arrows. The Up/Down arrows work in
the obvious way, the Left/Right arrows shift between "show siblings"
and "show children" (and the space bar can be used for both). Press
Return when the desired working directory is highlighted. If you want
to create a new working directory, press Escape instead of Return,
then edit the selected directory name and press Return. Notice that
the working directory must be a subdirectory of the ISUW root
directory (probably C:\ISUW).
An easy way of getting started is to select DEMOS as your working
directory and run some of the demonstration programs placed there.
To select the same working directory as last time - which is what you
do most of the time - this dialog can be skipped by Return.
In general, the Escape key is used when you leave a window or dialog
box. Sometimes, in particular in dialogs, you can also leave by Return.
In this case Return means "perform action", if there is anything to
perform, Escape means "leave without performing".
In interactive mode, commands are written line by line in the command
field in the bottom of ISUW's front page and executed when Return is
pressed. In addition to standard editing keys, the following keys are
active.
Escape clears the command line.
Escape from an empty command line allows you to QUIT.
Ctrl-Return truncates the command line from the cursors position.
Cursor Up/Down allows you to reuse (and reedit) earlier commands.
Cursor Right/Left with Ctrl completes/replaces command names
lexographically, when the cursor is in the first connected word
starting in position 1.
Cursor Right/Left with Ctrl completes/replaces vector names
lexographically, when the cursor is in a connected word starting
after position 1.
In addition you can define your own shortkeys, see the description of
the KEYS command.
When a command is written from position 1 of the line, the command
name is completed automatically as soon as it is unique. Also, some
standard beginnings of commands are completed (I -> INCLUDE, LI ->
LIST, OP -> OPEN, SA -> SAVE, SK -> SKIP). This means that the keys
you must use to write INCLUDEALL are actually IA (here you can even
use A), whereas to write SKIPLINE you must use SKL. You will soon
learn this, it is impossible to write anything else than a command
from position 1.
You can also import a command to the command window from the command
list associated with the help pages. Press F1 F1, select command with
Cursor Up/Down or the mouse, and press BackSpace or LeftArrow.
With a blank in position 1, the command line is interpreted as a
COMPUTE command.
The general syntax is
commandname [parameter1 [parameter2 [...]]]
Thus, parameters are in general separated by blanks. In special cases
(PLOT, TABULATE, ... ) a "pseudo blank" |, which can be written by the ½
key in the upper left corner of the keyboard, is used for subdivision of
parameters.
When a command produces output, this is shown in a (light green)
preview window. Leave this window by Return if you want output
appended to the sessions output file, Escape if you don't. Press
Ctrl-P to print it out. Use the command SHOW (without parameters) to
look at the sessions output file. From the corresponding window you
can also use Ctrl-P to print out the whole output file or a selected
(marked) portion of it.
It is possible, and often more convenient, to write an ISUW program
(i.e. a sequence of ISUW commands, each occupying a line) and execute
it by a RUN command, or directly from the program editor by pressing
F9. See the command descriptions for EDIT and RUN. In this case all
output is written to the output file without any previewing.
********************************************************************************
INTRODUCTION.
The following approximately 17 screenfuls give an overview of ISUW. It
is roughly identical to the material you can find in the article
"Introduction to ISUW", see my homepage ezlearn.cbs.dk/stat/hamat-2/tt/.
To learn about the possibilities, read this and try simultaneously to
perform some simple operations. You can leave the help file by Escape
and return to the same position by F1.
After this comes the detailed descriptions of all commands. To find the
description of a command, write it in the command line and press Ctrl
F1. Or just press Ctrl-F1 from an empty command line or press F1 twice
to get a menu for selection (by Return) of command.
--------------------------------------------------------------------------------
A SIMPLE EXAMPLE.
Suppose we have an ASCII (plain text) file EX1.TXT of the form
Dose Response
0.968 0
0.909 1
...
1.689 1
0.524 0
consisting of a heading and 415 lines, each containing a value of a
covariate x and a binary response y. This could be data from an
experiment where 415 animals have been given a dose x of some drug, y
being the binary response, e.g. 1 for reaction, 0 for no reaction. The
following commands read this data set and fits a standard logit linear
model with the base 10 logarithm of x as the independent variable,
then fits the model with slope zero and performs the likelihood-ratio
test for this hypothesis ("no drug effect").
VAR X Y 415 { declares the variates to hold data }
OPEN EX1.TXT { opens the data file for input }
SKIPLINE { skips the heading line }
READ X Y { reads the two variates in parallel }
COMPUTE LOG10X=LN(X)/LN(10) { computes the log10 transform of X }
FITLOGIT Y=1+LOG10X { fits the logistic regression model }
LISTPARAMETERS { lists parameter estimates }
FITLOGIT Y=1 { fits the reduced model }
TEST { tests last against previous model }
--------------------------------------------------------------------------------
VARIATES AND FACTORS.
The basic structures in StatUnit are called vectors. A vector can be a
factor (an array of bytes, for storage of qualitative variables) or a
variate (an array of single precision real numbers, 7-8 significant
digits). Variates and factors are created by the commands VARIATE and
FACTOR. In addition to this, variates are often declared implicitly,
for example by COMPUTE commands, like LOG10X in the example above.
EXAMPLE. To declare two variates X and Y of length 100, simply write
VAR X Y 100
Notice that we have written VAR, not VARIATE. Actually V would have
been enough here. In ISUW, any command can be truncated as long as it
is unambigous. When written from first position of the command window
in interactive mode, the command is simply completed as soon as it is
recognised.
A value of a variate can be missing (which internally means that it has
the value -1.0E-37). Missing values are recognised by most commands and
treated appropriately as such.
A factor has, apart from its length, a property called its number of
levels, an integer from 1 to 255 which specifies the maximal level
allowed. This is given as an additional parameter in the declaration.
EXAMPLE. To declare a factor SEX of length 100 on 2 levels, write
FACTOR SEX 100 2
Names of vectors can be of length up to 8. The first character must be
a letter A..Z or an underbar _ , the remaining characters can also be
digits 0..9. Actually, the special characters #, &, %, $ and @ can
also be used (even as first characters), but we recommend that you do
not do so because these characters are sometimes used for special
purposes by ISUW. For example, vectors starting with a dollar sign
have the special property that they are always deleted at exit from a
program (!)
Vector names are case insensitive, in output from ISUW they are
usually written in capitals.
At most 255 vectors can be in use simultaneously.
Vectors can be removed from memory (to save capacity or to release their
names) by the command DELETE, and their names can be changed by the
command RENAME. The command RAMSTATUS displays a list of vectors present
and the space they occupy.
--------------------------------------------------------------------------------
INPUT FROM TEXT FILES.
The OPENINFILE, READ, SKIPITEM and SKIPLINE commands are designed for
input from ASCII (plain text) files in free format.
EXAMPLE. An ISUW program dealing with a data set of 178 units to be read
from a file A:\HEIGHTS.DAT might begin something like this.
FACTOR SEX 178 2
VARIATE AGE HEIGHT 178
FACTOR GROUP 178 6
OPEN A:\HEIGHTS.DAT
READ * SEX AGE HEIGHT GROUP
This READ command assumes that the file has the data in standard format,
like
001 1 23.1 178.2 1
002 2 43.6 173.1 4
...
where the first unit (or here, person) is a male (SEX=1), of age 23.1,
etc. etc. The only separators allowed (when nothing else is specified)
are blanks, newline symbols (and, in fact, any control characters in
the range 0-31), commas and semicolons. The asterix in the READ
command implies that the first item for each unit (here the unit or
line number) is skipped.
The READ command above assumes that factor levels are represented by
their numerical levels. If this is not the case, an equality sign
followed by a comma separated list of level names can be appended to
the factor name. For example,
READ * SEX=*,Male,Female AGE HEIGHT GROUP
would work if the file looked like this:
001 Male 23.1 178.2 1
002 Female 43.6 173.1 4
...
114 * 32.9 167.0 2
...
with level 1 of SEX coded as 'Male', level 2 as 'Female' and level 0
("the missing level") as *.
Values of variates must be in standard format (like 1.2, -0.22, +2.0E7).
The symbol * is recognised as a missing value (for variates only).
--------------------------------------------------------------------------------
LISTING IN ASCII FORMAT.
This is done by the commands LIST and LIST1. LIST is for parallel
listing of vectors (usually of equal lengths). LIST1 is for listing of
single vectors across the page. For both commands, formats can be used
to determine width and number of digits after the decimal point. With
LIST it is also possible to write factor levels as names.
--------------------------------------------------------------------------------
COMMENTS ON THE OUTPUT FILE.
To write a comment to the output file, use the command REMARK.
--------------------------------------------------------------------------------
DATA STORAGE
in an internal binary file format is handled by the commands SAVEDATA and
GETDATA. In their simplest form, these commands are used to dump and
restore all vectors present in an ISUW session. The command SHOW can
display the contents of a data set without importing it, with indication
of potential name conflicts. ISUW data sets are files with extension .SUD
- do not try to edit them or handle them with other tools than ISUW (or
the DOS version ISU).
--------------------------------------------------------------------------------
GRAPHICS.
The commands PLOT and HISTOGRAM are used for graphics. PLOT produces
scatter plots (one variate against another). Colors and plot symbols can
be chosen according to the levels of factors. Points can be connected by
lines as desired, and overlayed plots can be produced. HISTOGRAM produces
histograms for variates (or factors), optionally parallel histograms
grouped by the levels of a factor. Headings and axis titles are controlled
by the commands FRAMETEXT, XTEXT and YTEXT. Without these specifications
reasonable default texts (variate names etc.) are used.
Graphics can be saved in JPEG format (*.JPG) or as bitmap (*.BMP) files,
and thereafter imported to text handling programs (like MicroSoft Word or
Word Perfect) or image processing programs. Hardcopies can be printed as
PostScript files (see the descriptions of commands OPENPSFILE, PSFRAME and
CLOSEPSFILE). Interactively rotatable 3-d graphics can be produced by PLOT
and HISTOGRAM, by specification of an extra variate or factor.
--------------------------------------------------------------------------------
PARALLEL SORTING OF VECTORS
is perfomed by the command SORT.
--------------------------------------------------------------------------------
RESTRICTIONS.
In most applications, data are given as a rectangular data set, i.e. a
number of variates and factors of the same length, which is the number
of "records" or "experimental units" or "patients" or "persons" or
"runs" or "plots" or whatever, depending on the applied context. We
shall use the word units. To restrict attention to a subset of the units
set, use the commands EXCLUDE, INCLUDE, INCLUDEALL, FOCUSONLEVEL,
EXCLUDELEVEL and EXCLUDEMISSING. These commands control a hidden array
of booleans (all TRUE from the beginning), telling which units are
"present". All ISUW commands for which this is relevant obey
restrictions, in the sense that only units present are taken into
account. For example, model fit commands and COMPUTE commands act only
on the subset of data specified as "present".
WARNING. Restrictions act in parallel on all vectors, independently of
their lengths. Parallel restrictions on vectors of different lengths are
usually meaningless. Be careful - use INCLUDEALL as soon as restrictions
are no longer required. Special care should be taken in connection
with SORT, SAVEDATA, TABULATE and the short form of TRANSFER - see the
command descriptions.
For convenience, the important command INCLUDEALL can be executed by
pressing A from an empty command window.
--------------------------------------------------------------------------------
COMPUTATIONS.
Unit-by-unit computations are performed by COMPUTE. For example, if P is
a variate of length 100 with values between 0 and 1, and you want to
create another variate LOGIT_P of length 100 holding its logit
transformed values, write
COMPUTE LOGIT_P=LN(P)-LN(1-P)
If LOGIT_P is not previously declared, it will be declared as a variate
of length 100. If it is declared before, it must be a variate of length
100. Values of P that are not in the range ]0,1[ will result in a
missing value of LOGIT_P, and a warning about this is written to the
output file. Factors can also be handled in this way, and many
transformations that are not just "unit by unit" are also possible. See
the description of the COMPUTE command.
A COMPUTE command without a left hand side simply displays the result.
For example, to display the mean and standard deviations of (the values
in) a variate X, just write
COMPUTE mean(X)
COMPUTE sqrt(variance(X))
For convenience, the COMPUTE command can be generated from an empty
command window by pressing the key + (without an equality sign) or =
(including the equality sign). Moreover, if the line starts with a blank
it is interpreted as a COMPUTE command.
--------------------------------------------------------------------------------
OTHER WAYS OF ASSIGNING VALUES/LEVELS TO VARIATES/FACTORS.
GENERATE assigns systematically varying levels to a factor. For example,
if G is a factor of length 20 on 4 levels, the command
GENERATE G 3
will assign levels
1 1 1 2 2 2 3 3 3 4 4 4 1 1 1 2 2 2 3 3
to the factor. The level changes cyclically, the last parameter (here 3)
determining the lag between change points.
GROUP is used for construction of a factor by interval grouping of a
variate.
TRANSFER can be used to copy subvector into subvector. For example, if
X is a vector of length 100, to split it into two vectors of length 50,
write
VARIATE X1 X2 50
TRANSFER X 1 50 X1 1 50
TRANSFER X 51 100 X2 1 50
TRANSFER can also be used to copy the values/levels present in a
vector into a new vector of the appropriate length.
--------------------------------------------------------------------------------
SUMMARIES, TABLES, TABULAR SUMMATION.
SUMMARY displays summary descriptions of variates or factors.
ONEWAYTABLE produces one-way tables of counts for factors (number of
units for each level) or variates (counts of values in specified
intervals). Two- or Threewaytables of counts of units or sums of a given
variate over level combinations for two or three factors are produced by
the commands TWOWAYTABLE and THREEWAYTABLE.
The command TABULATE performs counting (of units) or summation (of
variate values) over the cells of a cross classification determined by
an arbitrary number of factors.
For convenience, these commands can be generated by a single key from an
empty command window as follows.
Press key to write command
0 SUMMARY
1 ONEWAYTABLE
2 TWOWAYTABLE
3 THREEWAYTABLE
4 TABULATE
--------------------------------------------------------------------------------
STATISTICAL MODELS.
FITLINEARNORMAL is for analysis of variance and regression.
FITLOGLINEAR is for multiplicative or log-linear models for Poisson or
multinomial data.
FITLOGITLINEAR is for logistic regression models for binary or binomial
data.
FITNONLINEAR is for a class of nonlinear regression models, including the
generalised linear models with overdispersion, user-specified mean
(=inverse link) and variance functions.
FITCOXMODEL is for proportional hazards models by Cox's likelihood,
optionally with right censoring, left truncation and stratification.
FITMCLOGIT, FITMCPROBIT and FITMCCLOGLOG are for ordered categorical
response models as described by P. McCullagh (JRSS B 42, 109-142), where
the responses are assumed to be the result of a grouping with unknown
cutpoints of continuous data from a linear position parameter model with
error distribution logistic, normal or Compertz.
FITCLOGIT is for conditional logistic regression (like in matched
case-control studies). FITCRASCH is for a special case of this, the
conditional Rasch model (two way logit-additive model for binary data by
conditioning on the row sums, arbitrary linear structure for column
parameters).
FITNEGBIN is for log-linear models for negative binomial data, usually
coming up as "Poisson data with over-dispersion".
FITANOVA is for analysis of variance, including random effects models,
in balanced orthogonal designs.
After any model fit command except FITANOVA, the command LISTPARAMETERS
produces a listing of the estimated parameters in the last model fitted,
and the command SAVEFITTED can be used for extraction of fitted values,
residuals and normed residuals from the last model fitted (whenever this
makes sense, see the command descriptions). See also the command
SAVENORMEDRESIDUALS, which can be used after FITLINEARNORMAL to compute
exact T-distributed ("studentized") normed residuals. SAVEPARAMETERS saves
parameter estimates and their estimated standard deviations as variates
(except after FITANOVA). ESTIMATE outputs the estimates of specified
linear combinations of the parameters and their estimated standard
deviations. TESTMODELCHANGE can be used for computation of the likelihood
ratio test (Chi-square or F) for model reduction after fit of two nested
models by any FIT... command except FITANOVA.
All model fit commands involve the concept of a model formula, i.e. a
code for a linear expression involving linear effects of covariates,
effects of factors, interactions between factors etc. This concept,
which is more or less common to all statistics packages, is explained
carefully in the description of FITLINEARNORMAL.
WARNING. The design matrix determined by a model formula is not
physically stored. What is kept is a code telling how to compute its
elements from values or levels of existing variates and factors. Hence,
commands referring to the last model fit use the actual values/levels of
vectors occurring in the last model. If these have been changed,
incorrect results will come out. If some of them have been deleted, the
information that can be extracted is reduced accordingly. For example,
if some of the independent variables (or an offset variable, if such was
present) has been deleted, SAVEFITTED will not work. If the response
variate has been deleted, SAVEFITTED will be able to produce fitted
values, but not residuals and normed residuals. Similarly, if a weight
variate has been deleted, only fitted values and residuals, but not
normed residuals, can be produced.
--------------------------------------------------------------------------------
TESTS, NON-PARAMETRICS, DESCRIPTIVE STATISTICS.
The command WILCOXON performs a two-sample Wilcoxon or Mann-Whitney test.
The command SPEARMAN computes Spearmans rank correlation and performs
the test for "no ordinal correlation".
The command BARTLETT performs Bartlett's test for variance homogeneity
in a one-way setting.
The command CORMAT writes the matrix of correlations for a set of
variates, with optional indication of significances.
--------------------------------------------------------------------------------
CALLING OTHER PROGRAMS.
Other programs (or documents to be opened by applications determined by
their file extensions) can be called directly from ISUW by a SHELL
command. For example, to edit a file PRG.ISU with NotePad (if you prefer
this to ISUW's own EDIT command), use the command
SHELL NOTEPAD PRG.ISU
--------------------------------------------------------------------------------
PROGRAMMING THE KEYBOARD.
The function keys F2..F12, alone or in combination with Alt, Ctrl or
Shift, and the keys A..Z and 0..9 in combination with Alt or Ctrl can
(with certain exceptions that are used for other things) be
programmed. For example, the command
KEY ao show!
will imply that the sessions output file is displayed whenever Alt-O
is pressed (the exclamation sign means "Return").
Your programmed keys are automatically saved on exit and recovered at
next startup under the same ISUW root directory.
--------------------------------------------------------------------------------
ISUW PROGRAMS.
ISUW commands can be written line by line on text files by commands of
the form
EDIT programname
and executed by
RUN programname
Programs are saved on files with extension .ISU . Make sure that you
include this extension in the program name if you use another editor
than ISUW's own EDIT command.
To avoid long lines in programs, you can split them up in pieces by the
"continue line symbol" \. For example, rather than
FITLINEAR Y = 1 + ROWS + COLUMNS + LAYERS + ROWS*COLUMNS + COLUMNS*LAYERS + ROWS*LAYERS
it is usually preferable to write
FITLINEAR Y = 1 \
+ ROWS + COLUMNS + LAYERS \
+ ROWS*COLUMNS + COLUMNS*LAYERS + ROWS*LAYERS
Empty lines can be inserted and lines can be indented as desired to make
the program more readable. Comments can be included in two ways:
(1) Lines starting with a percentage sign % are ignored.
(2) Text in curled parentheses {} within a line is ignored.
As opposed to REMARKSs, such comments are not echoed to the output file.
A primitive device for parameter substitution is also available, see the
description of the command SUBSTITUTE. The command GOTO can be used for
conditional branching and loops.
In programs, command names (but not vector names) can be truncated. See
the list below of shortest unique truncations. The command name COMPUTE
can be omitted for COMPUTE commands with a left hand side. COMPUTE
commands without a left hand side can be written with an equality sign
'=' or a plus '+' as the first character of the expression to display.
A more primitive device for program execution is implemented in the
command OPENCOMMANDFILE. The effect of this command is that the
commands on a file can be imported one by one to the command field by
the CursorDown arrow. This can be used if you want to execute a
sequence of commands, with the optional possibility of modifying the
commands and inserting other commands. The demonstration programs on
the directory DEMOS under the ISUW root directory are executed in this
way (but with the additional feature that explanatory text is shown in
a separate "supervisor" window).
--------------------------------------------------------------------------------
INSTALLATION, DIRECTORY STRUCTURE AND CALL OF ISUW.
To install ISUW, download (as you have probably already done) the file
ISUWINST.EXE to an empty directory on your harddisk. We recommend
C:\ISUW to keep file names short, but in principle a directory like
C:\Program Files\Danish Mouse Free Software\Interactive StatUnit
could also be used. Unpack this "selfunpacking" file by executing it,
for example by double clicking on it from Windows Explore, or calling
it by its name from the Run window or a command (MS-DOS or equivalent)
window from the same directory. This operation results in the creation
of four files,
ISUW.EXE (the executable program)
ISUWHELP.TXT (the file you are reading here)
BORLNDMM.DLL (a Delphi system file required for memory management)
DEMOS.EXE (a self-unpacking file containing the DEMOS files)
After this ISUWINST.EXE can be deleted. The entire ISUW package takes up
less than 2 MB of space on the harddisk.
To install a new version of ISUW, overwriting the old version, simply
repeat this. Here you can avoid four confirm-overwrite-dialogs by use
of the option -o, i.e. by writing
ISUWINST -o
rather than just ISUWINST.
In the following we refer to two directories:
1. The ISUW root directory. This is where the files SHORTKEY.TXT and
SHORTKEY.BIN, holding your shortkey definitions, are kept, and the
file LASTDIR holding the name of the last sessions working
directory, your latest selection of size and position on the screen
of the ISUW window and your choice of having "hint mode" on or
off. You can have several ISUW root directories for different
applications, if you wish. If you ever get the (probably rather
useless) idea of having two or more ISUW sessions running at the
same time, make sure you do it from different ISUW root
directories, otherwise there will be file sharing conflicts.
However, ISUW makes some initial file checks that will usually
prevent this error.
By default, the ISUW root directory becomes the directory where you
unpacked the four system files. But the ISUW root directory does not
have to be the directory where these files are located. You can (and
should, if these files are placed e.g. on a read-only network drive)
redefine this to any other directory by giving a valid directory name
(including drive letter and colon) as the first parameter in the call
of ISUW.EXE.
2. The working directory. This directory, which (with an exception to be
mentioned below) must be a subdirectory (or sub-sub etc.) of the
ISUW root directory, is selected (and sometimes created) from a
dialog box when the session begins. Typically, you will use the
same directory again and again, in this case you can skip the
dialog by Return. The working directory is the place where all
sorts of output is placed by default, and also the place where you
will typically place data files before or during the session. The
working directory becomes the current Windows directory throughout
the session, which means that file names without a path
specification refer to files on that directory.
The working directory can be selected directly in the call of ISUW.EXE
by specification of the full name (including drive letter and colon)
of an existing directory as the second parameter in the call of
ISUW.EXE. In this case the working directory does not have to be a
subdirectory of the ISUW root directory. The entry dialog is skipped,
and the file LASTDIR is left unchanged.
Here, you can also extend the name of the desired working directory to
the full name of an existing file with extension either ISU or SUD on
that directory. In the first case, the ISU program is executed, in the
second case the ISU data set is imported. This is useful if you want
to set up your Windows computer in such a way that double clicking
from Windows Explore on an ISU program or an ISU data set results in a
startup of ISUW with the appropriate action. Write a command file -
say ISUWBAT.BAT - containing a single line of the form
C:\isuw\ISUW.EXE c:\isuw %1
and make this your Windows default application for opening .ISU and
.SUD files (Click Tools -> Folder Options -> File Types from Windows
Explore).
If both the ISUW root and the working directory are specified as the
first two parameters to ISUW.EXE, additional parameters can be added.
These parameters should constitute a valid ISUW command, which will be
copied to the command field. If, in addition, the last of these
parameters ends with an exclamation sign, this command will be
executed immediately after entry to the program. In this way you can
build some standard initialization into the call of ISUW, like import
of a data set, execution of an ISUW program defining some local
shortkeys etc.
EXAMPLE. On the computer I use at work, I have placed the ISUW system
files on a directory named C:\DELPHI\ISUW1 because I am developing
ISUW under Borland Delphi. However, my (only) ISUW root directory is
C:\ISUW. Thus, the shortcut starting ISUW from my desktop has as its
"target property" the command
C:\DELPHI\ISUW1\ISUW.EXE C:\ISUW
However, I have a main application of ISUW related to a course called
MPAS. For that reason, I have another shortcut on my desktop with the
command
C:\DELPHI\ISUW1\ISUW.EXE C:\ISUW C:\ISUW\MPAS\06
in the target field, which means that I can go directly to this
application without any entry dialog. Next year I am probably going to
change 06 to 07 (after having created C:\ISUW\MPAS\07). In between, I
use the form
C:\DELPHI\ISUW1\ISUW.EXE C:\ISUW C:\ISUW\MPAS\06\DATA.SUD
to load a certain data set automatically at startup.
AUTOEXEC.ISU. Another way of specifying automatic initialization goes
as follows. If an ISUW program named AUTOEXEC.ISU exists on a working
directory, this program will be executed automatically at startup from
that working directory. This works quite generally - i.e. also when
the working directory is selected from the entry dialog.
Uninstallation. In the very unlikely case that you want to uninstall
ISUW, simply remove the root directory (or directories, if you have
more than one) and whatever you want to get rid of among files you
have created elsewhere by ISUW on working directories that are not
subdirectories of the root directory. This can be tedious, of course,
but not more tedious than the occasional garbage collection which is
required anyway. ISUW does not make any hidden changes to your
computers setup. The shortcuts that you may have created are easy to
delete.
--------------------------------------------------------------------------------
OUTPUT.
ISUW writes output to a temporary file on the ISUW root directory
named ISUWOUT.TMP. This is the file you look at in a somewhat modified
form by the command SHOW. When an ISUW session ends you will be given
the option of saving this file on the working directory under another
name. If you answer "No" here, the output file is lost. In spite of
some special effects created by the SHOW command (ISUW prompts beeing
replaced with special colors of command lines, error messages and
notes beeing printed in special colors, REMARKS in italics etc.), ISUW
output files are ordinary text files which can be edited and printed
e.g. by NotePad or imported to standard text handling programs.
Whenever a command sent from the command window produces written
output, this is shown in a light green preview window, which you leave
by Return if you want output appended to ISUWOUT.TMP, Escape if you do
not. For commands in a program executed by a RUN command the rule is
different. Here, all output is written directly to ISUWOUT.TMP, unless
you redirect it explicitely to another file (or the "paper basket"
NUL) by an OUTFILE command.
Commands and error messages are echoed to the output file by default.
This means that the ISUW output file will contain a complete log of
what has happended during the session. In some cases (for example to
avoid echoes of the same GOTO loop again and again) you may prefer to
switch this default off by the ECHO command.
********************************************************************************
Alphabetic list of ISUW commands
The last column indicates (when it makes sense) whether a command does
(+) or does not (-) take restrictions into account. For details, see the
command description.
Command Shortest Equivalent Restrictions
truncation brief form
Shortkey
from empty
command window
BARTLETT B +
CLOSEPSFILE CL
COMPUTE COM + = + = +
CONSTRUCTMINIMUM CON -
CORMAT COR +
DELETE D
ECHO EC
EDIT ED
ESTIMATE ES
EXCLUDE EXCLUDE
EXCLUDELEVEL EXCLUDEL
EXCLUDEMISSING EXCLUDEM
FACTOR FA
FITANOVA FITA +
FITCLOGIT FITCL +
FITCOXMODEL FITCO +
FITCRASCH FITCR +
FITLINEARNORMAL FITLI +
FITLOGLINEAR FITLOGL +
FITLOGITLINEAR FITLOGI +
FITMCCLOGLOG FITMCC +
FITMCLOGIT FITMCL +
FITMCPROBIT FITMCP +
FITNEGBIN FITNE +
FITNONLINEAR FITNO +
FOCUSONLEVEL FO
FRAMETEXT FR
GENERATELEVELS GEN -
GETDATA GET -
GOTO GO -
GROUP GR +
HISTOGRAM H +
INCLUDE INCLUDE I
INCLUDEALL INCLUDEA A
KEYS K
LIST LIST L +
LIST1 LIST1 L1 +
LISTPARAMETERS LISTP
ONEWAYTABLE ON 1 +
OPENCOMMANDFILE OPENC
OPENINFILE OPENI OPEN
OPENPSFILE OPENP
OUTFILE OU
PLOT PL +
PSFRAME PS
QUIT Q
RAMSTATUS RA -
READ REA +
REMARK REM
RENAME REN
RUN RU
SAVEDATA SAVED SAVE +
SAVEFITTED SAVEF
SAVENORMEDRESIDUALS SAVEN +
SAVEPARAMETERS SAVEP
SHELL SHE
SHOW SHO
SKIPITEM SKIPI
SKIPLINE SKIPL
SORT SO -
SPEARMAN SP +
SUBSTITUTE SUB
SUMMARY SUM 0 +
TABULATE TA 4 +
TESTMODELCHANGE TE
THREEWAYTABLE TH 3 +
TRANSFER TR -/+
TWOWAYTABLE TW 2 +
VARIATE V
WILCOXON W +
XTEXT X
YTEXT Y
********************************************************************************
======================== COMMAND DESCRIPTIONS ========================
--------------------------------------------------------------------------------
VARIATE
Declaration of variates.
................................................................................
Syntax: VARIATE name1 [name2 [...]] length
Creates new variates named name1 ... of the same length. The length
must be a positive integer or integer expression.
EXAMPLE.
VAR X Y 10*48
creates two variates X and Y of length 480.
Variates are arrays or vectors of single precision real numbers.
Formally, there is no upper limit to the length of variates and
factors. Or, rather, the realistic limit is set by the computers RAM.
But things will work rather slowly if the RAM is filled up. Depending
on your Windows version, the computer may start using disk cache (which
will slow it down to a speed where it is almost useless), or the
session will crash.
Another problem is that for single precision numbers greater than
appr. 15 millions, rounding to integer values will not be correct.
This means that for a variate X of length greater than 15 millions you
can not use expressions like X(UNIT) for all possible values of UNIT.
For most data sets this is no problem at all - but now you are warned.
There is, however, an upper limit of 255 to the number of variates and
factors that can be present simultaneously in an ISUW session.
At declaration, all values are set to zero.
WARNING. If an error occurs during execution, like in the command
VAR X Y 1A Z 100
where 1A is an illegal variate name, the command is interrupted by an
error message. However, the command is executed up to the place where
the error is met. In the above example X and Y are declared, but not Z.
--------------------------------------------------------------------------------
FACTOR
Declaration of factors.
................................................................................
Syntax: FACTOR name1 [name2 [...]] length levels
Creates new factors named name1 ... of given length and number of
levels. The length and the number of levels are integer expressions,
both positive.
EXAMPLE.
FAC SEX TREAT 132 2
FAC GROUP 132 6
creates three factors of length 132, SEX and TREAT on 2 levels and GROUP
on 6 levels.
Factors are stored as arrays of bytes. For this reason, the maximal
number of levels is 255.
At declaration, all levels (present or not) are set to zero.
WARNING. See the warning to VARIATE (just above).
--------------------------------------------------------------------------------
RAMSTATUS
Writes information about existing vectors in memory.
................................................................................
No parameters.
Writes information about the vectors present and the dynamically
allocated memory they occupy. The information includes
Names of variates and their lengths.
Names of factors and their lengths and numbers of levels.
The number of bytes occupied by each vector and totally.
In addition, RAMSTATUS tells whether restrictions are present or not.
--------------------------------------------------------------------------------
DELETE
or
DEL
Deletes existing vectors.
................................................................................
Syntax: DELETE [name1 [name2 [...]]]
Deletes existing vectors, releasing the space they occupy and their
names.
WARNING. If an error occurs during execution, like in the command
DEL X Y 1ST Z
where 1ST is an illegal variate name, the command is interrupted by an
error message. However, the command is executed up to the place where
the error is met. In the above example X and Y are deleted (if they
both exist), but not Z.
If the command is used without parameters, all vectors present are
deleted. In addition, all restrictions are removed. If an input file,
an alternative output file or a PostScript output file is open it is
closed, information from last model fit is lost, and the parameters
determining text for PLOT and HISTOGRAM are set to their defaults. In
addition, if an OUTFILE command is in force, output is redirected back
to the sessions output file, and command echo is set "on". Thus, a
DELETE command without parameters is a sort of "reset" command. The
only difference from closing the session and starting a new is that
the output file is still there, and if DELETE is used in this way in a
program after a SUBSTITUTE command the effects of that command are
still in force.
DELETE without parameters is very often useful as the first command in
a program. So are the commands ECHO 0 and OUTFILE 0, if you want to
avoid commands echoes and output from loops, or OUTFILE if
you want to direct output to another file than the sessions output
file (useful when a program is tested). It follows from what was
said above, that the DELETE command must come before the two other
commands (but a SUBSTITUTE command may be placed before it).
--------------------------------------------------------------------------------
RENAME
Gives an existing vector a new name.
................................................................................
Syntax: RENAME oldname newname
oldname should, of course, be the name of an existing vector, and
newname must be a valid vector name which is not in use. The command
can be used to solve name coincidence conflicts before import of a
data set. Use a SHOW command to see if such conflicts are present.
--------------------------------------------------------------------------------
EXCLUDE
Excludes specified units, marking them as "non-present".
................................................................................
Syntax: EXCLUDE range1 [range2 [...]]
A range can be an integer expression or an expression of the form
integer1:integer2
where integer1 and integer2 are integer expressions.
In the last case,
0 < integer1 <= integer2 <= length of longest vector
is required, and the units from integer1 to integer2 are excluded.
EXAMPLE. To remove all units <= 10 and >= 91 write
EXCLUDE 1:10 91:100
provided that the relevant vector length is 100.
Ranges are handled one by one, and units for each range in the natural
order. Thus, if (in the above situation, with 100 as the length of all
existing vectors) you write
EXCLUDE 1:10 91:101
an error would occur for unit 101. Nevertheless, the desired
restrictions would actually be imposed. Whereas
EXCLUDE 91:101 1:10
would remove only unit 91 to 100, but not 1 to 10 since this comes after
the error interrupt.
A range can also be specified as the name of a variate, or an expression
that would be valid as a right hand side of a COMPUTE command. In this
case, the units excluded are those for which the variate value is
defined, non-missing and positive. For example, to exclude all units for
which the variate X takes a value which is not in the interval [0,100],
write
EXCLUDE (X<0)+(X>100)
or just (specifying two ranges)
EXCLUDE X<0 X>100
Notice that missing values of X in this command, or more generally units
for which the variate or expression is missing or results in a missing
value, are not excluded. For example,
EXCLUDE X>ln(0)
has no effect, and no warning is given.
WARNING. The command
EXCLUDE 100
excludes unit 100, whereas
EXCLUDE 100.2
excludes unit 1 (!). Since 100.2 is not interpretable as a range, it is
assumed to be the right hand side of a compute statement, resulting in a
variate of length 1 with the value 100.2. Similarly,
EXCLUDE -3
results in an error message, whereas
EXCLUDE -3.2
has no effect.
Units that are already excluded are not touched by EXCLUDE. Thus, the
two commands
EXCLUDE SEX=0
EXCLUDE CODE=999
will do exactly the same as the single statement
EXCLUDE SEX=0 CODE=999
--------------------------------------------------------------------------------
EXCLUDELEVEL
Excludes all units on specified levels of a factor.
................................................................................
Syntax: EXCLUDELEVEL factor level1 [level2 [...]]
EXAMPLE. To remove all units on level 0 or 2 of the factor SEX, write
EXCLUDELEVEL SEX 0 2
As for EXCLUDE (see above), levels are handled one by one, and the
command is interrupted with an error message if an error occurs. Thus,
if SEX has 2 levels (which is the usual state of affairs), the command
EXCLUDELEVEL SEX 0 5 2
would exclude level 0, but not level 2.
--------------------------------------------------------------------------------
FOCUSONLEVEL
Excludes all units that are not on a specified level of a factor.
................................................................................
Syntax: FOCUSONLEVEL factor level
Equivalent to
EXCLUDELEVEL ...
(see above) where ... stands for the list of levels different from
the level specified in the FOCUS command.
EXAMPLE. To focus on the females in group 1, write something like
FOCUSONLEVEL SEX 2
FOCUSONLEVEL GROUP 1
Equivalently, you could use
EXCLUDE SEX<>2 GROUP<>1
--------------------------------------------------------------------------------
EXCLUDEMISSING
Excludes units for which values of given variates are missing.
................................................................................
Syntax: EXCLUDEMISSING name1 [name2 [...]]
This command is typically used before a model fit command.
EXAMPLE.
EXCLUDEMISSING HEIGHT WEIGHT
EXCLUDELEVEL SEX 0
FITLINEARNORMAL WEIGHT=1+SEX+HEIGHT+SEX*HEIGHT
The rules for error interrupts are similar to what has been said about
EXCLUDE and EXCLUDELEVEL.
--------------------------------------------------------------------------------
INCLUDE
Includes specified units, marking them as "present".
................................................................................
Syntax: INCLUDE range1 [range2 [...]]
Opposite to EXCLUDE. Units specified are included, units not specified
are untouched. Ranges (and the rules for error interrupts) are explained
a few screenfuls above under EXCLUDE.
When a range is expressed as a valid right hand side of a COMPUTE
command, the computation does, of course, take place also for
non-present units - otherwise nothing would happen.
--------------------------------------------------------------------------------
INCLUDEALL
Includes all units, i.e. removes all restrictions.
................................................................................
No parameters.
Reestablishes the initial state of affairs, where no restrictions are
present. Use this command whenever restrictions are not required any
more. From an empty command window, just press A.
--------------------------------------------------------------------------------
OPENINFILE
or just
OPEN
Opens a text file for input.
................................................................................
Syntax: OPEN [filename]
ASCII files for input are handled by the commands READ, SKIPITEM and
SKIPLINE, see below. If another file is already open for input it is
closed. To close an input file without opening a new, use the command
without parameters. This is often necessary if you want to EDIT an
input file to correct errors detected by a READ command, because you
will not be allowed to make changes to an input file while it is open.
--------------------------------------------------------------------------------
READ
Input of data from a text file.
................................................................................
Syntax: READ [[separators]] name1 [name2 [...]]
Data are read in parallel from the file opened by OPEN (see above).
Items on the file must be separated by blanks, newline symbols (or
other characters in the range 0-32), commas or semicolons. Other
separator characters can be specified, see approximately 4 screenfuls
below. name1, name2 etc. must be names of existing vectors of equal
lengths. The symbol '*' can be used for "skip next item", and '/'
means "skip to start of next line".
EXAMPLE. Suppose that A:\PROJECT.DAT contains 100 lines, beginning with
001 12.32 1.19 Male 009 1
002 11.15 1.23 Female 009 1
003 11.91 1.18 Female 004 1
...
To read columns 2 and 3 as variates named AGE and INC and column 4 as a
factor SEX on 2 levels with Male and Female represented by levels 1 and
2, write
OPEN A:\PROJECT.DAT
VAR AGE INC 100
FAC SEX 100 2
READ * AGE INC SEX=,Male,Female /
Notice the use of a list of level names. This is required when at least
one factor level is coded as something else than its integer numerical
level. Level names are case sensitive, i.e. 'male' instead of 'Male' would
not work. In the example, the comma right after the equality sign means
that level 0 is not given any name because it does not occur. If unknown
sex occurred and was coded as *, we could write
READ * AGE INC SEX=*,Male,Female /
instead, to assign level 0 of SEX to the unknowns.
The symbol '/' meaning "skip remainder of line", is only required if
there is actually something to skip. A single slash '/' right at the
point where a line is read has no effect; whereas two slashes with a
blank between will imply that the every second line is skipped.
For variates, standard format of numbers is assumed (exponential
notation is allowed, like -1.2e3 instead of -1200). Decimal points
must be periods, not commas. An asterix '*' or a period '.' will be
interpreted as a missing value.
Restrictions are obeyed by the READ command, in the sense that only the
units present are read. This is useful if you have to piece together
variates or factors of segments from different files. But in the
standard situation, it means that you must remember to remove all
restrictions by INCLUDEALL before you READ.
An error (an invalid real number, a named level not in the list of level
names or a numerical level out of range) results in an error message
written to the output file, and the corresponding value/level is set to
missing value/level zero. But the reading is not interrupted.
If you want to correct errors that have to do with the data file, you
must edit that file. Otherwise, if the error has to do with the READ
command, you can correct it immediately and repeat it. But before this,
the file must be OPENed again, otherwise reading continues from the
position where it stopped. The same happens if reading finishes before
the end of a file. In this case, a new READ command will continue from
where the last ended.
EXAMPLE. Suppose that a data file EXAMPLE.DAT contains the values of
two variates of lengths 10 and 4 in the following obscure layout:
The first six values of X are
1.2, 1.3, 2.l, 4.3,
5.0, 1.2;
Here are the four values of Y:
34.1 32.2 45.0 23.6;
Finally, the last four values of X are
5.4 1.7 1.9 3.3;
And here comes some junk: 1234567890
You could read X and Y as follows:
VAR X 10
VAR Y 4
OPEN EXAMPLE.DAT
EXCLUDE 7:10
SKIPITEM 7 { or SKIPLINE }
READ X
INCLUDEALL
SKIPITEM 7 { or SKIPLINE }
READ Y
EXCLUDE 1:6
SKIPITEM 8 { or SKIPLINE }
READ X
INCLUDEALL
Here, you would receive a warning concerning the third value of X,
READ ERROR: Invalid real number 2.l for variate X at unit 3
where a lower case L has been typed instead of 1, and the
corresponding entry of X will contain a missing value. However,
SHOW EXAMPLE.DAT
would tell you what went wrong, and you could then correct the error by
COMPUTE X(3)=2.1
A more permanent solution would be to edit EXAMPLE.DAT to correct the
error, and then do the READing once more. However, since EXAMPLE.DAT
is still open as an input file, you would not be allowed to save the
changes. To do this you would have to close it first, which can be
done by an OPEN command without parameters. For example,
OPEN
EDIT EXAMPLE.DAT {make the correction and save}
OPEN EXAMPLE.DAT
...
would work.
As this example also illustrates, a data file must not necessarily end
exactly where the reading terminates. In the tail of the file you can
keep e.g. a description of data.
An attempt to read through the end of an input file is, of course, an
error. The reading is interrupted, an error message is given, and all
values/levels not read yet are left unchanged (except for the
value/level that was read when the EOF mark was met; this may or may not
be set to missing value/level zero, depending on some circumstances
around the termination of the file).
If a data file uses other separators than blank, newline, comma and
semicolon, you can specify this by adding a bracket containing these
additional separator characters as the first argument of the READ
command. For example, to read from a file where the characters / , [
and ] should be interpreted as blanks, use
READ [/[]] ...
Notice that such additional separating characters are only in force
within the READ command where they are specified, e.g. not in a
following (or preceding) SKIPITEM command.
Notice also that separating characters must not occur in level names
for factors.
Fixed format files (with data in fixed positions, no delimiters) must
be edited or handled by other tools.
--------------------------------------------------------------------------------
SKIPITEM
Skips next item on a data file.
................................................................................
Syntax: SKIPITEM [integer]
The next "integer" items on the file opened by OPEN (see above) are
skipped. if the integer parameter is missing, 1 is assumed. This
command is useful if a data file contains headings or other comments.
EXAMPLE. Suppose that A:\PROJECT.DAT contains 100 lines plus a "header
line", beginning with
NO AGE INC SEX
001 12.32 1.19 Male
002 11.15 1.23 Female
003 11.91 1.18 Female
...
To read column 2 and 3 as variates named AGE and INC, column 3 as a
factor SEX on 2 levels with 1 and 2 coded as Male and Female, write
VAR AGE INC 100
FAC SEX 100 2
OPEN A:\PROJECT.DAT
SKIPITEM 4 { or SKIPLINE }
READ * AGE INC SEX=,Male,Female
--------------------------------------------------------------------------------
SKIPLINE
Skips to beginning of next line on a data file.
................................................................................
Syntax: SKIPLINE [integer]
Without the parameter, SKIPLINE reads through the present line, to the
beginning of the next. If a line has just been read through the next
will be skipped, otherwise the remainder of the present line is
skipped. If an integer parameter is given, this operation is simply
performed "integer" times, i.e. the remainder of the present line and
the next "integer"-1 whole lines are skipped.
Whereas SKIPITEM interpretes newline symbols as delimiters and thus
skips as many empty lines as necessary to reach the items to be
skipped, SKIPLINE counts also empty lines. For example, if a data file
begins with 4 empty lines followed by a line consisting of two
variable names, these 5 lines can be skipped either by
SKIPITEM 2
or
SKIPLINE 5
--------------------------------------------------------------------------------
LIST
or
L
Parallel print of data.
................................................................................
Syntax: LIST vector1 [vector2 [...]]
EXAMPLE. If AGE is a variate and SEX a factor on 2 levels, the statement
LIST AGE SEX
will produce output like
AGE SEX
24.7100 2
42.1200 1
32.4300 1
...
22.1300 2
The default format for variates is :10:4, which means width 10 with 4
decimals after the decimal point. For factors it is :4, i.e. width 4 or
the length of the factors name if this is more than 4. You can change
this by addition of a format to the vector name. For example,
LIST AGE:4:1 SEX:3
would result in something like
AGE SEX
24.7 2
42.1 1
32.4 1
...
22.1 2
In addition to this, you can add a list of level names to a factor, like
LIST AGE:4:1 SEX=,M,F:3
which would result in a listing like
AGE SEX
24.7 F
42.1 M
32.4 M
...
22.1 F
Notice that the list of levels comes before the format, if both are
present. Notice also that the equality sign, which indicates that a list
of level names will follow, is followed immediately by a comma. This is
because the name for level 0 is here set to an empty string. If "missing
sex", or rather "sex unknown", does actually occur, one would perhaps
prefer something like
LIST AGE:4:1 SEX=Unknown,Male,Female:7
which might produce a listing like
AGE SEX
24.7 Female
42.1 Male
32.4 Male
...
20.2 Unknown
...
22.1 Female
Notice the format :7, which is necessary here because the longest level
name is of length 7 > 4. Otherwise, level names would be truncated.
If LIST is used without parameters, all vectors present are listed with
default formats.
Restrictions are obeyed, in the sense that the lines corresponding to
hidden units are not printed.
--------------------------------------------------------------------------------
LIST1
or
L1
Condensed print of data (across the page)
................................................................................
Syntax: LIST1 vector1 [vector2 [...]]
EXAMPLE. If AGE is a variate of length 4, SEX a factor on 2 levels also
of length 4, the statement
LIST1 AGE SEX
will produce output like
AGE
24.7100 42.1200 32.4300 22.1300
SEX
2 1 1 2
Formats can be used, just as for LIST (see above), and the standard
formats are the same. Level names for factors can not be used.
Restrictions are obeyed, in the sense that the values/levels
corresponding to hidden units are not printed.
--------------------------------------------------------------------------------
SAVEDATA
or
SAVE
Creates a StatUnit data set.
................................................................................
Syntax: SAVEDATA dataset [vector1 [vector2 [...]]]
StatUnit data sets are files written in an internal binary format for
fast storage and recovery of data.
If only the data set name is specified, all vectors present are stored
in the data set. Physically, the data set becomes a file with the name
specified followed by the extension .SUD (for "StatUnit Data"). For
example,
SAVE C:\PROJECTS\A_SCHEME\DATA1
will create a file DATA1.SUD on the directory C:\PROJECTS\A_SCHEME. It
is an error if this directory does not exist. If such a file exists
already, you will be asked to confirm that you want to overwrite it.
However, this is only in interactive mode; if the SAVEDATA command
occurs in a program, the file is overwritten without warning.
If a list of vector names is added only these vectors are stored. The
names must be names of existing vectors in the present session,
otherwise the file is not created.
Restrictions are taken into account in the sense that only units present
are stored. This means that you can use SAVEDATA to create "physically
restricted" sub data sets, i.e. data sets where the excluded units are
not only marked as non-present, but are actually not there at all.
EXAMPLE. Suppose we have variates AGE and HEIGHT and a factor SEX on
two levels, all of the same length. To create a data set MALES that
contains only the part of data with SEX=1, and import this to our
session, we could do something like the following (assuming no
restrictions present from the beginning).
FOCUSONLEVEL SEX 1
SAVE MALES AGE HEIGHT
DELETE
GET MALES
Notice that the DELETE command is without parameters here. This form
of the DELETE command removes all restrictions also, and this is
important because the restrictions imposed on the "long data set" will
almost certainly be meaningless for the "short data set". If DELETE
can not be used in this way because other vectors are to be kept,
INCLUDEALL must be used.
The DOS versions' GETDATA command can import data sets created by ISUW,
provided that ISU's length constraint (MaxLength = 16379) is satisfied.
--------------------------------------------------------------------------------
GETDATA
Imports data from StatUnit data set.
................................................................................
Syntax: GETDATA [dataset [vector1 [vector2 [...]]]
If only the data set name is specified, all vectors in the data set are
read. The name, say A:\PROJECT\DATA1, is the name of the corresponding
file A:\PROJECT\DATA1.SUD, created by SAVEDATA (written without the
extension .SUD).
If the file name "dataset" is omitted or replaced with an asterix or a
directory name, a file selection menu appears.
If a list of vector names is added, only these vectors are read. These
names should, of course, be names of vectors in the data set. However,
if other names of non-existing vectors are included by mistake, the
remaining vectors will still be imported.
Names of vectors imported must not coincide with names of existing
vectors. SHOW dataset.SUD will tell you if this is the case. Use
RENAME as necessary. The reading is stopped if an error of this type
occurs. This means that part of the command may be executed. However,
the interrupt point depends not on the order of vectors in the list,
if present, but rather on the order in which vectors were stored
originally by SAVEDATA. Try a RAMSTATUS if something goes wrong.
Restrictions are not taken into account and not changed by this command.
Data sets created by the DOS version ISU can be imported by GETDATA,
provided that the special characters of the Danish-Norwegian alphabet do
not occur in vector names.
--------------------------------------------------------------------------------
SUMMARY
Summary statistics for variates and factors.
................................................................................
Syntax: SUMMARY [vector1 [vector2 [...]]]
The parameters must be names of factors or variates. Information about
the vectors is written. For a factor, the information includes length,
number of levels, number of units present and the number of levels=0.
For a variate, the information includes length, units present, the
number of missing values, MAX, MIN, MEAN and standard deviation (for
present and non-missing values).
If the command is used without parameters, summaries of all vectors
present are given.
--------------------------------------------------------------------------------
ONEWAYTABLE
One-way tables of counts.
................................................................................
Syntax: ONEWAYTABLE vector1 [vector2 [...]]
The parameters may be names of variates or factors. If a parameter is
the name of a factor, a one-way table of counts is produced, which for
each level gives the number of units present. You can extend the name of
the factor by a list of names separated by commas, as for the READ
command. For example,
ONEWAYTABLE SEX=Unknown,Male,Female
could produce something like
Factor SEX, 100 units present.
Unknown 1
Male 43
Female 56
If the parameter is the name of a variate, a table of counts is produced
with cutpoints chosen by ISUW. But you can extend the name to specify
lower limit, number of intervals and upper limit. For example
ONEWAYTABLE X=2,10,7
would produce a table of counts of X-values in the ten intervals between
cutpoints 2.0, 2.5, 3.0, ... , 6.5, 7.0.
ONEWAYTABLE obeys restrictions in the sense that non-present units are
ignored. If the parameter is a variate, missing values are treated as
non-present.
The commands TWOWAYTABLE and THREEWAYTABLE (see below) have additional
options which enables the formation of tables of variate sums for a
given variate instead of tables of counts. This is not implemented for
ONEWAYTABLE, because it is usually just as easy to use TABUALATE and
LIST (see 3 screenfuls below). For example, to produce a one-way table
of sums of a given variate Y over the groups determined by a factor F
of the same length, use
TABULATE Y1=Y F F1
LIST F1 Y1
--------------------------------------------------------------------------------
TWOWAYTABLE
Two-way tables of counts or variate sums.
................................................................................
Syntax: TWOWAYTABLE [variate] factor1 factor2
The two last parameters must be names of factors of the same length,
optionally extended by lists of level names (as for ONEWAYTABLE above).
If the first parameter "variate" is not given (or written as the pseudo
variate name 1) the command writes a table of counts of units present in
the two-way classification (factor 1 as rows, factor 2 as columns).
If the parameter "variate" is present, it must be the name of a
variate of the same length, and a table of sums of its values over
level combinations of the two factors is produced. The variate name
can be extended by a format, like in a LIST command. However, this is
mainly to let you decide the accuracy displayed when sums of variate
values are tabulated. For tables of counts, the figures are displayed
as integers, and if the width given by the format is too small, it
will be increased as necessary.
The default action of TWOWAYTABLE is to produce tables with row sums
and column sums. To avoid one of these (or both), add a minus sign as
the first character to the name of the factor(s) for which the
additional "total" level should not be displayed.
EXAMPLE. Suppose we have factors AGEGR and ATTITUDE, classifying some
survey sample data according to age and answer to an attitude related
question. The table of counts in this cross classification is produced
by
TWOWAYTABLE agegr attitude
A table showing the distribution of ATTITUDEs within AGEGRoups in
percentages with one digit after the decimal point can be produced by
TABULATE rowsum agegr {see two screenfuls below}
COMPUTE pct=100/rowsum(agegr)
TWOWAYTABLE pct::1 -agegr attitude
Notice the minus sign before AGEGR in the last command. The row sums
in this table are 100, and they are displayed to make the
interpretation of the percentages clear. But the column sums (and the
total sum) are irrelevant and therefore suppressed.
Restrictions are taken into account. Units with one or both factor
levels equal to zero, or with a missing value of the variate (if
specified), are handled as non-present. Thus, if you want tables where
factor level zero is taken into account, you must recode the factors
first.
WARNING. Most output producing commands in ISUW break up lines in such
a way that the width of the output file does not exceed 80 characters.
TWOWAYTABLE (and THREEWAYTABLE below) is an exception from this. If
the second factor has many levels, a very wide table is produced, and
this may result in lines of length > 80 (up to 1024, in fact). This
gives you the option of producing such a table and later print it out
in a readable format after some editing, like change to a smaller font.
--------------------------------------------------------------------------------
THREEWAYTABLE
Three-way tables of counts or variate sums.
................................................................................
Syntax: THREEWAYTABLE [variate] factor1 factor2 factor3
Exactly as TWOWAYTABLE, except that three factors are specified and a
three way table is written. For each level of factor1, a factor2 by
factor3 table is produced, and unless factor1 is preceeded by a minus
sign, an additional factor2 by factor3 table of totals (summed over
factor1) is written.
Notice that level 0 of factor1 is excluded also in the "total" two-way
table by factor2 and factor3, which comes last in the listing. Hence,
this final table will not always coincide with the table one would get
by
TWOWAYTABLE [variate] factor2 factor3
--------------------------------------------------------------------------------
TABULATE
Computation and storage of counts or variate sums in a k-way table.
................................................................................
Syntax: TABULATE newvar[=oldvar] oldfactors [newfactors]
EXAMPLE. If ROW and COL are existing factors of the same (arbitrary)
length on 3 and 4 levels respectively, then
TABULATE COUNT ROW*COL ROW1*COL1
will do as follows. A variate COUNT and two factors ROW1 and COL1 of
length 12 (=3*4) will be created, ROW1 on 3 and COL1 on 4 levels. If
some of these exist already, they must be of correct types and
dimensions. Factor levels will be generated such that each level
combination occurs exactly once, COL1 varying fastest, and the
corresponding counts of units in the original setting, corrected for
restrictions (units with a level 0 counting as excluded) will be stored
in COUNT.
Generally, the two string parameters oldfactors and newfactors must
contain the same number (not necessarily two) of factor names,
separated by asterixes * or "pseudoblanks" | . The factors in
oldfactors must be previously declared and of equal lengths, the names
in newfactors and the new variate name(s) occurring in the first
parameter must not be names of existing vectors, unless they just
happen to be of correct types, lengths and numbers of levels (e.g.
created by a similar TABULATE command). The common length of the
vectors created becomes the product of the numbers of levels for the
"old factors".
TABULATE can also be used to form sums of values of a given variate.
If COUNT above is replaced with Y_SUM=Y, for Y a variate of length
equal to the lengths of the factors ROW and COL, then the values of
Y_SUM will become the sums over the corresponding (product) factor
levels of the values of Y. If Y was a variate filled with 1s, the
result would be the same as above.
The first parameter may contain several specifications separated by
pseudoblanks. For example,
TABULATE COUNT|Y_SUM=Y ROW*COL ROW1*COL1
would perform both tasks mentioned above, and is thus equivalent to the
two commands
TABULATE COUNT ROW*COL ROW1*COL1
TABULATE Y_SUM=Y ROW*COL ROW1*COL1
In the last command here, we could actually have written
TABULATE Y_SUM=Y ROW*COL
omitting the last argument ROW1*COL1. This is legal, and in general it
implies that only the variate is formed. In the present case it makes
no difference since the factors ROW1 and COL1 are generated by the
first command.
Restrictions are taken into account in the sense that non-present
units are not counted, or the corresponding variate values are treated
as zeroes in the summation. Missing values of a summand are treated as
zeroes.
WARNING. Notice that if restrictions are present, it will usually be
unavoidable to continue with INCLUDEALL. The restrictions on the
"long" vectors are not likely to be relevant for the resulting "short"
vectors.
EXAMPLE. To produce a list of average INCOMEs of persons in the 30
groups of a 2 x 5 x 3 classification by factors SEX (2 levels), SITE (5
levels) and SOC (3 levels), do something like this:
EXCLUDEMISSING INCOME
TABULATE COUNT|INCOME0=INCOME SEX*SITE*SOC SEX0*SITE0*SOC0
INCLUDEALL
INCOME0=INCOME0/COUNT
LIST SEX0 SITE0 SOC0 INCOME0::0
Notice the EXCLUDEMISSING and INCLUDEALL commands, which are required if
INCOME has missing values. Without this, the COUNTs would include units
for which INCOME were missing, and this would result in incorrect
averages (since missing INCOMEs are treated as zeroes).
EXAMPLE. If Y is a variate, R and C two (row and column) factors that
arrange the values of Y in a balanced two-way table, fitted values in
the additive two-way model (which would usually be computed by
SAVEFITTED FIT after FITLIN Y=1+R+C) can be computed by
TABULATE rowsums=y|rowcoun R
TABULATE colsums=y|colcoun C
rowmeans=rowsums/rowcoun
colmeans=colsums/colcoun
fit=rowmeans(r)+colmeans(c)-mean(y)
(to understand the last line, see the section VECTORS AS FUNCTIONS OF
UNIT INDEX in the description of the COMPUTE command).
--------------------------------------------------------------------------------
PLOT
Scatter plots on screen and paper.
................................................................................
Syntax for 2-dimensional version:
PLOT xvariates yvariates [colors [symbols]]
and for the 3-d version:
PLOT xvariates yvariates zvariates [colors]
We begin with a description of the 2-dimensional version. A description
of the modifications required for the 3-d version follows approximately
11 screenfuls below.
In the simplest case, xvariates and yvariates are just names of single
variates and the two other parameters are not written, like
PLOT X Y
which will produce a scatter plot of the points ( X(i) , Y(i) ).
Restrictions are obeyed in the sense that non-present points are not
plotted, and also points with one or both coordinates missing are
skipped. Endpoints of axis intervals are chosen in such a way that the
coordinate frame becomes the smallest rectangle containing all the
points to be plotted.
The variate names can be extended by axis specifications of the form
=LowerLimit,NumberOfLabels,UpperLimit. For example,
PLOT X=-1.0,10,9 Y
implies that the horizontal axis will go from -1 to 9, labels will be
displayed at integer multiples of 1=(9-(-1))/10, and one decimal after
the comma will be written (since the lower bound for X is given with one
decimal). A minus sign before the number of intervals (e.g. -10 instead
of 10 in the example above) will produce lattice lines at the label
points. Any field can be left empty, and the first field may contain a
"pseudo number" determining only the number of digits. For example
PLOT X=-1.0,10,9.0 Y=d.dd,10,
will imply that the vertical axis is eqipped with 10(+1) labels with 2
digits after the decimal point, but the defaults MIN(Y) and MAX(Y) will
be used as the limits, since these are not given.
Points that are not within the specified limits (in one or both
directions) are not plotted.
A heading can be given in a separate FRAMETEXT command before the plot
command. Similarly, XTEXT and YTEXT can be used to specify texts to be
written at the two axis, otherwise the variate names are used.
EXAMPLE.
VAR X Y 100
X=RANDOM
Y=RANDOM
FRAMETEXT 100 random points
PLOT X=0.00,10,1.00 Y=0.00,10,1.00 =red =*
These five commands would produce something like this (except that the
points will be red):
100 random points
1.00 |-----------*-------------------------*-------------------------------|
Y | * * * |
0.90 | * * ** * * |
* * * |
0.80 | * ** * * * |
| * * * * * * |
0.70 |* * * * * |
| * * * |
0.60 | * * * * ** * * |
| * * * |
0.50 | * * * * * * |
|* ** * * * |
0.40 | * * * * |
| * * * * * * * |
0.30 | ** * * * * |
| * * * * |
0.20 | * * * * |
| * * * * * |
0.10 | * * * * * |
| * * * * * |
0.00 |*-------*------------------------------------------------------------|
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.0
X
In the above example, the last two parameters imply that the points will
be plotted as red stars. More generally, the third and fourth parameters
specify colors and symbols according to the following rules.
The third parameter colors, if non-empty, can be specified as an
equality sign followed by a color number or a color name. Color
numbers and their names can be found in the table below.
This is merely if you want to plot all points in a color different
from the default 0 (=black). A more relevant application of the color
specification is to choose color according to the level of a factor.
The color parameter can be specified as the name of a factor of length
equal to the common length of the two variates, followed by a comma
separated list of color codes. For example, if WEIGHT and HEIGHT are
variates, SEX a factor on two levels, all of the same length, then
PLOT HEIGHT WEIGHT SEX=,9,12
will produce a scatter plot with light blue points for SEX=1 and light
red points for SEX=2. You can also let ISUW assign the colors in
standard order, by
PLOT HEIGHT WEIGHT SEX
which is equivalent to
PLOT HEIGHT WEIGHT SEX=15,1,2
In general, if only the factor is specified, but not the colors, the list
'15,1,2,3,4,5,6,7,8,9,10,11,12,13,14,0,0,0,0...' is implyed.
By changing 15 to something else you could select a color different from
15 (white, no symbol plotted) for the factor level 0.
TABLE OF COLORS.
No. Name
0 BLACK
1 BLUE
2 GREEN
3 CYAN
4 RED
5 MAGENTA
6 BROWN
7 GRAY or LIGHTGRAY
8 DARKGRAY
9 LIGHTBLUE
10 LIGHTGREEN
11 LIGHTCYAN
12 LIGHTRED
13 LIGHTMAGENTA
14 YELLOW
15 NONE or WHITE
Names of colors can be used instead of the numbers. Only the first eight
characters of a color name need to be specified.
To see the 16 colors of ISUW, copy and paste the following program into
the editor and RUN it:
fac $colfac 16 15
$colfac=#-1
var $x $y 16
$x=$colfac
$y=1
xtext Colors
ytext |
frametext Colors in ISUW
plot $x $y=0 $colfac=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 #
The fourth and last parameter symbols has a similar role as the color
parameter, but it determines the plot symbol instead of the color. For
example (referring to the WEIGHT HEIGHT SEX example above),
PLOT HEIGHT WEIGHT | SEX=,+,o
would plot points corresponding to SEX=1 as plusses, and points with
SEX=2 as small circles. If the list of symbols is omitted,
'=0,1,2,3,4,5,6,7,8,2,2,2...' is implied.
Notice the "pseudo blank" | occuring here as parameter 3. It simply
means that we want the default color (black).
Symbols can be identified by numbers or names, according to the
following table:
Table of plot symbols:
No. Name
0 NONE
1 X or CROSS
2 + or PLUS
3 O or CIRCLE
4 * or STAR
5 DELTA or TRIANGLE
6 NAPLA {upside down triangle}
7 SQUARE
8 DIAMOND {45 degrees rotated square}
To see the 8 symbols and 16 colors, copy and paste the following program
into the editor and RUN it:
fac $colfac 16*9 16
gen $colfac 1
fac $symfac 16*9 9
gen $symfac 16
var $col $sym 16*9
$col=$colfac-1
$sym=$symfac-1
xtext Colors
ytext Symbols
frametext Colors and plot symbols in ISUW
plot $col=-1,17,16 \
$sym=-1,10,9 \
$colfac=,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 \
$symfac=,1,2,3,4,5,6,7,8
A special use of the symbol parameter takes the form
=L
which has the effect that points on the same level of the factor are
connected by lines, provided that they come right after each other in
the ordering by unit number. Thus, the effect of this depends strongly
on the order of units. A sorting by the factor with the X-variate as the
secondary criterion is the most common application, drawing factor
groups as broken lines (Y-variate as a function of X-variate).
EXAMPLE. Suppose that TIME and TEMP are variates, LOCALITY a factor on 4
levels, all of the same length. The commands
SORT LOCALITY TIME TEMP
PLOT TIME TEMP LOCALITY=12,13,14,0 LOCALITY=L
will produce a plot where, for each LOCALITY, TEMP is drawn as a
(linearly interpolated) function of TIME, and the color (light red,
light magenta, yellow, black) follows the LOCALITY.
As for the color parameter, the factor identifier in "symbols" can be
omitted, meaning that a factor with constant level zero is assumed. For
example, to connect all points by a broken line use
PLOT X Y | =L
Another special use of the symbol parameter, which involves no factor
name and no equality sign, takes the form
+v1-v2 (or, equivalently, -v2+v1 )
where v1 and v2 are names of variates of the same length as xvariates
and yvariates. This is used when vertical lines through the points
should be drawn to indicate e.g. confidence bounds. The typical
application (for symmetric confidence intervals) is
PLOT X Y ... +SD-SD
where SD is a variate holding standard deviations or double standard
deviations.
In the typical application, both variates will have non-negative values.
Hence, it is a natural requirement that the two signs must be different
when two variates are specified. If only one is specified, a half line
from each point is drawn, the sign determining which way the line goes.
Notice that
PLOT X Y=0 =14 -Y
will produce a plot with (yellow) points sitting on top of "sticks"
(only relevant if Y is nonnegative).
A third option for the last parameter "symbols" is to let it consist
of the single character #. In this case each point is represented by a
box standing on the x-axis, of suitable width with the invisible point
right in the middle of its top. The width of these boxes becomes 0.8
times the range for the x-variate, divided by the number of units
present. This is usually only relevant if the values of the x-variate
are equidistant. Notice that the y-axis bounds are determined as usual
if nothing else is specified. Usually, a specification of the form
y=0,... is required if the lower endpoint of the y-axis should be 0.
This can be used to draw histograms when counts (or percentages) are
given as the values of a variate (e.g. produced by a TABULATE command).
Notice that automatic definition of x-axis bounds as Max and Min in this
case would imply that the first and last box hang halfway outside the
frame. For this reason, a small correction is made in this case. But for
overlayed plots, this will only work if the histogram is the first plot
in the parallel list (see below).
If colors are specified, they will be used as fill colors of the boxes. If
you produce 'stacked' histograms by overlaying such plots (see below) take
care that the histograms with the lower boxes come after those with the
higher boxes. Similarly, if you plot a histogram together with a curve -
e.g. to show the fit of a distribution - plot the curve after the
histogram, unless you want to hide it partially behind the boxes.
OVERLAYED PLOTS.
Overlayed plots are produced by 'merging' of PLOT commands as follows.
Let the four parameters xvariates, yvariates, colors and symbols each
contain two, three or more parallel specifications, separated by
pseudo blanks. For example,
PLOT X=0.0,5,10|X Y|FITTED GROUP=,14,13,12|=WHITE =|GROUP=L
will (roughly) overlay the results of the two commands
PLOT X=0.0,5,10 Y GROUP=,14,13,12 =
PLOT X FITTED =WHITE GROUP=L
Notice the use of an equality sign alone as indicating an empty
element of a parallel list. The limits and labelling of axes are taken
from the specifications in xvariates and yvariates for the first
element of the parallel lists, later specifications are ignored.
Notice that such specifications are usually necessary, unless the
ranges of variation for the first elements happens to cover the ranges
for later elements. This is why Y comes before FITTED in the above
example (but things may go wrong, also in this case).
HARDCOPIES.
The immedeate effect of a PLOT or (see below) HISTOGRAM command is
that a picture is displayed on the screen. The picture is removed by
Escape. There are two ways of getting the plots out on paper. The
simplest is to press Return instead of Escape to remove the picture
from the screen. This brings up a "Save picture as ..." dialog box, in
which you can select a file name and save the picture as a *.JPG file.
A *.BMP (bitmap) file can also be selected, but these files are much
bigger, and since most image processing programs can import both
types, there is generally no reason to do so. If, for some reason, you
need to produce a *.BMP file, you should be aware that this will only
work if you write the file name without the extension or explicitely
with extension .BMP. Both file types can be imported to image
processing programs and most text handling programs running under
Windows, where they can be modified, merged with text and other
pictures and printed out.
To produce graphics of a somewhat higher resolution (but without
colors), optionally with several plots per page, you can use the
commands OPENPS, PSFRAME and CLOSEPS for handling of PostScript files.
For example, to produce a single plot in landscape format on a page,
use
OPENPS filename { opens a file for PostScript output }
PSFRAME 1 1 { selects "first picture out of one" }
PLOT ...
CLOSEPS { closes the file }
After this, the file filename.PS will contain PostScript code that can
be sent directly to a PostScript printer. OPENPS can also produce
encapsulated PostScript files, which can be imported e.g. by Microsoft
Word. See the command descriptions.
When PostScript code is generated, color codes are handled as follows.
When points are plotted, all colors are translated to black, except no.
15 which is translated to white (no point plotted). For boxes (symbol
code #), where the color usually controls the fill color, color codes
are translated to "whiteness" proportionally to their numerical values,
with 0 meaning black and 15 meaning white.
THE 3-DIMENSIONAL VERSION OF THE PLOT COMMAND.
This is activated when the third parameter is (or begins with) a variate
name. In the simplest case
PLOT X Y Z
a 3-dimensional scatter plot is produced. As for the 2-d version, the
variates can be extended by information about their domain and the
desired numbers of cutpoints, and the four parameters may be parallel
lists of any (common) length, to produce overlayed 3-d-plots. The fourth
parameter "colors" determines the colors of points. The syntax here is
exactly the same as for the correponding parameter in the 2-d version.
The differences from the 2-d version are explained here:
The plot symbol is fixed (a three-dimensional cross), and points can not
be connected by lines.
Negative numbers of cutpoints are interpreted as 0, except in the third
parameter zvariates, where the sign has an effect quite different
from the one it has for the 2-d version, see below. The effect of
specifying numbers of cutpoints is that whenever two of the three
variates have a positive number of cutpoints, the corresponding lattice
will be drawn on the corresponding (back/bottom/left) side of the frame
box. For example,
PLOT X=0,10,1 Y=0,10,1 Z
or equivalently
PLOT X=0,10,1 Y=0,10,1 Z=,0
will produce a plot with a 10 by 10 lattice drawn at the bottom of the
frame box. If instead
PLOT X=0,10,1 Y=0,10,1 Z=,1
is used, vertical lines will also be drawn on the back and left sides of
the frame box.
A minus sign before the number of cutpoints for the Z-variate has the
effect that vertical "sticks" are drawn from points to the bottom of the
frame box. Similarly, a plus sign here has the effect that the points
will "hang in strings from the roof". This option may be specified
without an actual number of cutpoints, like
PLOT X Y Z=,+
For hardcopies, this option is recommended, because it is the only way
to give the copies a taste of the 3-d perspective (since it does not
help much to rotate the paper).
Once the picture is on the screen, the following keys are active (this
applies also to the 3-d version of HISTOGRAM):
- The four cursor movement keys rotate the frame box (or rather:
moves the flying observer around the coordinate box).
- The keys + and - on the numerical keyboard zoom and unzoom.
- Escape or Return terminates. Use Return if you want to save
the final picture as a .JPG file.
Labels and axis titles can not be created by the 3-d version of the PLOT
command. The initial picture on the screen tells which axis is which,
but after this you are on your own. Here some lattice lines may help
you to avoid losing orientation.
Hardcopies of the final picture via .JPG files or .BMP files can be
produced exactly as described above, but PostScript code can not. A
PostScript output file may be open, but 3-d plots will not write any
code to it.
3-d can only be produced in interactive mode, not in programs.
--------------------------------------------------------------------------------
HISTOGRAM
Histograms on screen and paper.
................................................................................
Syntax for 2-dimensional version:
HISTOGRAM [weight*]variate [factor]
and for the 3-dimensional version (3-d histogram for a 2-dimensional
empirical distribution)
HISTOGRAM variate1 variate2
Without the second parameter "factor" and the specification of
"weight", the command produces an ordinary histogram, showing the
empirical distribution of the (present and non-missing) values of the
variate.
Notice that HISTOGRAM computes the counts. If the counts are known in
advance and stored in a variate (for example by a TABULATE command), use
PLOT instead (with fourth parameter '#', see 6 screenfuls above). Or
use the "weighted" version of HISTOGRAM explained 1-2 screenfuls below.
If the second parameter "factor" is included and is the name of a
factor of the same length, one histogram for each factor level except
level 0 is produced, for comparison of the distributions of variate
values on the different factor levels. The factor name can be extended
by a list of level names as usual. However, level 0 is always ignored.
In some cases (long level names, many intervals) the level names are
written in such a way that the first box is more or less destroyed by
the text string.
The variate name can be extended by a specification of the form
=lower,intervals,upper
to determine the number of intervals and its endpoints. For example,
HIST AGE=20,16,100 SEX
will produce a histogram for each SEX, with the AGE interval [20,100]
divided into 5-year groups. AGEs outside the interval [20,100] are
ignored. If no extension of this form is given, min(AGE) and max(AGE)
are taken as the endpoints, and the number of intervals is suitably
chosen. This specification may be more or less incomplete. For example,
HIST AGE=d.dd,16 SEX
would use the default endpoints max(AGE) and min(AGE), but the number
of intervals would become 16. The layout of the "pseudo left endpoint"
d.dd implies that x-axes labels are written with two significant
digits after the decimal point.
The first parameter may also be the name of a factor. In this case,
the factor is interpreted as a variate with integer values
(0,)1,2,..., and the histogram is drawn in the obvious way, with one
box for each level of the factor. However, if the factor name is
extended by a list of level names, these names will be written under
the frame instead of the integer levels. In this case, a non-empty
level name for level 0 will imply that a box for level 0 is also
drawn. For a factor with many levels, the level names must be short.
If the number of levels exceeds, say, 15, it is usually preferable to
let the procedure write the integer levels.
Weights. In the versions of the HISTOGRAM command explained above, the
heights of the boxes drawn are counts of units, summing up to the
number of units present and non-missing in the variate or factor given
as the first argument. Sometimes, in particular when dealing with
aggregated data obtained by grouping of a variate or factor, it is
desirable to make a histogram where each unit i counts some value w[i]
instead of 1. The syntax for this is to "multiply" the first argument
by this "weight" variate from the left.
EXAMPLE. To display a distribution of individuals according to the
levels of a factor AGEGR for each SEX level we can use
HIST AGEGR SEX
Suppose, however, that we have at our disposal only the counts in
the AGEGR*SEX groups - for example as they would be after
TABULATE COUNT AGEGR*SEX AGEGR0*SEX0
To produce the same two parallel histograms as before, we could then
use
HIST COUNT*AGEGR0 SEX0
To produce a single histogram, showing the distribution of individuals
in age groups with the vertical axis scaled such that the box heights
are percentages (summing up to 100), we could use
PERCENT=100*COUNT/sum(COUNT)
HIST PERCENT*AGEGR0
THE 3-DIMENSIONAL VERSION OF THE HISTOGRAM COMMAND.
If the second parameter is included and is the name of a variate, a
different StatUnit procedure, making a 3-dimensional picture of the
2-dimensional histogram, is activated. The two variate names can be
extended by limits etc. as described above. The picture can be rotated,
zoomed/unzoomed etc. as described for the 3-d version of the PLOT
command, see approximately 4 screenfuls above. Weights can not be
used, and parallel histograms can not be produced.
HARDCOPIES are produced exactly as for PLOTs. But the 3-d version of
the HISTOGRAM command can not be used in programs, and cannot produce
PostScript code
--------------------------------------------------------------------------------
FRAMETEXT
Text above frame for 2-d versions of PLOT and HISTOGRAM.
................................................................................
Syntax: FRAMETEXT [text]
The default action of PLOT is to draw scatter plots without any heading.
For HISTOGRAM, the default is "Histogram for ", or (in
case of two parameters "Histogram for by ".
These defaults are suppressed as long as a FRAMETEXT command is in
force, which remains until the command
FRAMETEXT
(without parameters) or a DELETE command without parameters is given.
Multiple blanks in a string are usually removed, but you can use "pseudo
blanks" to enforce multiple blanks. In particular,
FRAMETEXT |
will suppress headings entirely in the following PLOT and HISTOGRAM
commands.
--------------------------------------------------------------------------------
XTEXT
Text below frame for PLOT and HISTOGRAM.
................................................................................
Syntax: XTEXT [text]
The default action of PLOT is to write the variate name(s) at the
x-axis. For HISTOGRAM, the default is to write nothing there in case of
a single parameter, and the name of the variate if two parameters are
specified. These defaults are overwritten by an XTEXT command. Other
rules (for pseudo blanks, cancellation of action etc.) are exactly as
for FRAMETEXT above.
--------------------------------------------------------------------------------
YTEXT
Text at vertical axis for PLOT and HISTOGRAM.
................................................................................
Syntax: YTEXT [text]
The default action of PLOT is to write the variate name(s) at the
y-axis. For HISTOGRAM, the default is to write "Count" there in case of
a single parameter, the factor name if two parameters are given. These
defaults are overwritten by a YTEXT command. Other rules (for pseudo
blanks, cancellation of action etc.) are exactly as for FRAMETEXT above.
The command
YTEXT |
is often useful for enlargement of the labels on the y-axis.
--------------------------------------------------------------------------------
OPENPSFILE
Open file for output of PostScript code
................................................................................
Syntax: OPENPSFILE [filename]
If the file name is empty the default name ISUW.PS is used.
Use this together with PSFRAME and CLOSEPSFILE to send graphics to a
printer in PostScript format.
EXAMPLE (two plots on a page).
OPENPS
PSFRAME 1 2 {can be omitted, since this is the default}
PLOT ... or HIST ...
PSFRAME 2 2 { "second of two", i.e. lower half of the paper }
PLOT ... or HIST ...
CLOSEPS
... to be followed e.g. by COPY ISUW.PS LPT2 from a command window, or
a similar operation via Windows explore, GhostScript or whatever.
If "filename" is specified explicitely with extension .EPS,
encapsulated PostScript code is produced. This is only for a single
plot per page (the PSFRAME command can not be used), but these files
can be imported to some text handling programs.
When a PostScript file is open in interactive mode, the images
produced are shown on the screen as usual, and you still have the
option of saving them (in addition) as .JPG or .BMP files. In programs
this option is not available. The images are formed on the screen as
usual, but they are removed immediately. This enables you to write
programs that can run unattended while they produce one or more
PostScipt pages.
--------------------------------------------------------------------------------
PSFRAME
Selecting the position on the page for PostScript graphichs output.
................................................................................
Syntax: PSFRAME select total
The two parameters must be integer or integer expressions, satisfying
select = 1,2,...,total,
total = 1, 2, 4, 8, 9, 16, 18, 25, 32, 36, 49, 50, 64 or 72.
The idea is that the command selects frame no. "select" out of "total"
equally sized frames on the paper. The values total = 2, 8, 18, 32,
50, 72 (double squares) produce pages in "portrait" format. The
remaining values 1, 4, 9, 16, 25, 36, 49 and 64 (squares) produce
pages in "landscape" format.
-------------
EXAMPLE. To produce 3 plots in a | 1 | 2 |
4 x 2 arrangement, leaving the lower -------------
half of the paper and the lower | 3 | |
right corner of the upper half blank, -------------
do something like this: | | |
-------------
| | |
OPENPS -------------
PSFRAME 1 8
PLOT ... { 1 }
PSFRAME 2 8
PLOT ... { 2 }
PSFRAME 3 8
PLOT ... { 3 }
CLOSEPS
A special version of the command takes the form
PSFRAME # total
Here, the number of frames on the page is given by the last parameter,
but the specification of the frame by the symbol # implies that the
first parameter "select" will take the values 1, 2, ..., shifting by 1
each time a PLOT or HISTOGRAM command is met.
EXAMPLES. In the example above, we could obtain exactly the same by
OPENPS
PSFRAME # 8
PLOT ... { 1 }
PLOT ... { 2 }
PLOT ... { 3 }
CLOSEPS
The following program generates a PostScript page with 32 probit
diagrams for 100 simulated normal observations.
DEL
VAR x probit 100
probit=phiinv(#/101)
OPENPS
$i=0
%%%
$i=$i+1
PSFRAME $i 32
x=normal
SORT x
PLOT x probit =15 =L
GOTO %%% $i<32
CLOSEPS
Here, we could also have put the PSFRAME command before the loop if we
had given it the form PSFRAME # 32.
If no PSFRAME command is used, ISUW uses the default which corresponds
to PSFRAME 1 2. The command can not be used when OPENPS has specified
an encapsulated postscript file (extension .eps).
--------------------------------------------------------------------------------
CLOSEPSFILE
Close file for PostScript graphics output.
................................................................................
No parameters.
Closes the file for output of PostScript code. Use with OPENPSFILE,
PSFRAME, PLOT and HISTOGRAM.
--------------------------------------------------------------------------------
SORT
Parallel sorting of vectors.
................................................................................
Syntax: SORT vector1 [vector2 [...]] [+]
The vectors occurring as parameters must be of the same length. The
effect of this command is that the vectors are sorted in parallel,
such that the resulting vectors are ordered increasingly, primarily by
values/levels of vector1, then (within constant value/level of
vector1) by vector2, etc. etc. A final plus sign has the effect that
all other vectors of the same length are sorted in parallel with those
in the list; but the ordering within tie groups (if any) determined by
the vectors in the list, becomes arbitrary.
EXAMPLE. Let X be a variate and SEX a factor on two levels, both of
length 10, with values/levels as LISTed here:
SEX X
1 1.5933
1 1.8114
1 1.5953
2 2.0208
2 1.6493
1 1.9338
1 1.9628
2 1.2509
1 1.2309
2 1.5081
Then, after the command
SORT SEX X
a similar LISTing would look like this:
SEX X
1 1.2309
1 1.5933
1 1.5953
1 1.8114
1 1.9338
1 1.9628
2 1.2509
2 1.5081
2 1.6493
2 2.0208
In this case, you might want to perform the sorting by SEX only,
preserving the order of X-values unchanged within SEX-groups. However,
you can not do this by just writing SORT SEX, because only the vectors
in the list are sorted; nor can you do it by SORT SEX +, because the
QuickSort algorithm will create arbitrary permutations of units within
SEX groups. A solution to this problem goes as follows: Create (if it
doesn't exist already) a vector which is ordered by unit number, for
example by
VAR UNIT 10
COMPUTE UNIT=#
Include this as the second vector in the list,
SORT SEX UNIT X
or just (if other vectors of length 10 are present and should be sorted
in parallel)
SORT SEX UNIT +
With the original unit number as the secondary criterion, the original
order within SEX-groups is preserved.
Notice that a SORT command without a final plus sign should usually have
all vectors of the relevant length in the list of parameters. Otherwise,
the unit-to-unit correspondance between vectors is lost, and this is
rarely useful. If you make a sorting without some vectors of the
relevant length, a warning is given.
SORT ignores restrictions, and if restrictions are present the
restriction indicator is NOT sorted in parallel. Thus, the excluded
units, if any, are not the same as before. For this reason, a warning
is given if restrictions are present under a SORT command. If you
forgot to do it before, you may as well right after a SORT command
perform an INCLUDEALL command.
If you want to preserve the restrictions, you will have to construct your
own "restriction indicator". For example by
FAC PRESENT N 1 { where N stands for the common length of }
{ the vectors to be sorted. }
COMPUTE PRESENT=1 { PRESENT becomes the "presence indicator", }
{ since the default value 0 remains unchanged }
{ for the non-present units. }
INCLUDEALL { this command can also be placed after the }
{ SORT command, it doesn't matter. }
SORT ... { where the argument list should include }
{ PRESENT or end with a + }
EXCLUDE PRESENT=0
DELETE PRESENT
Missing values of variates are treated according to their physical
representation, which is the numerical value -1E-37. Thus, after a
SORTing by values of a variate, the missing values will occur after the
negative values and before the zeroes.
--------------------------------------------------------------------------------
COMPUTE
Computation of variates and factors from other variates and factors.
................................................................................
Syntax: COMPUTE vectorname=expression
or COMPUTE vectorname(index) = expression
or COMPUTE expression[:[width]:[decimals]]
In interactive mode, the command name COMPUTE can be replaced with a
blank in position 1 of the command window. In programs you may even omit
the blank provided that the COMPUTE command has a left hand side. In
this case the command is identified by the equality sign after the first
connected string.
In interactive mode you can generate a COMPUTE command from position 1
by = (with equality sign) or + (without equality sign). In programs,
COMPUTE commands without a left hand side can be preceeded by = or +
instead of the command name.
In the simplest case, the action taken by COMPUTE is a unit-by-unit
computation of values of the vector on the left hand side. For example,
if Y is a variate of length 100, the statement
LOGY=ln(Y)/ln(10)
will give LOGY values ln(Y(1))/ln(10), ... ,ln(Y(100))/ln(10). If LOGY
is declared in advance it must be a variate of length 100, otherwise it
will be declared as such. The expression on the right hand side may
involve the algebraic operators +, -, *, / and ^ (meaning "raised to the
power", e.g.(-2)^3=-8), six relational operators (see 2 screenfuls below),
explicit constants, standard functions (see 5 screenfuls below) and
parentheses as necessary. All the vectors occurring on the right hand
side must be existing vectors of the same length, with some exceptions
to be explained later. We call them "parallel" vectors, to emphasise the
one-to-one correspondence between their entries and the entries of the
resulting vector on the left hand side.
The result of a COMPUTE command without a left hand side is that the
result is computed and displayed, rather than stored in a vector. For
example, the result of the command
COMPUTE X/Y:6:2
for existing variates X and Y of the same length is roughly the same as
you would obtain by
COMPUTE RATIO=X/Y
LIST1 RATIO:6:2
DELETE RATIO
For this reason, most of what follows is only explained for the case
where a left hand side is present.
MISSING VALUES.
Missing values are taken into account by COMPUTE in the sense that the
result will always become missing for entries where one of the parallel
vectors on the right hand side has a missing value. Even if you write
COMPUTE Y=X+0*Z
a missing value of Z will result in a missing value of Y.
Algebraically or numerically undefined quantities, such as ln(0) or
ln(-4), sqrt(12-20), (-7)^1.3, exp(100) etc. etc., are set to the
missing value, and a warning about this is given.
RESTRICTIONS.
Restrictions are obeyed in the sense that for non-present units the
vector on the left hand side is left unchanged. An exception from this
occurs when the result is a variate of length 1, this will always be
computed.
EXAMPLE. The statements
FOCUSONLEVEL F 3
COMPUTE Y=1/0
INCLUDEALL
can be used to give Y missing values for all units on level 3 of the
factor F. However, a shorter way of doing this is by the single command
COMPUTE Y=Y/(F<>3)
RELATIONAL OPERATORS.
In the command line above, the denominator (F<>3) becomes 1 for units
on an F-level different from 3, 0 for units on level 3. The following
six relational operators
= equal to
<> different from
< less than
<= less than or equal to
> greater than
>= greater than or equal to
can be used on the right hand side of a COMPUTE command, and the
resulting boolean expressions are given values 1 (for TRUE) or 0 (for
FALSE). For example,
COMPUTE INDIC=exp(X)>Y+3
will return a variate INDIC of zeroes and ones, 1 when exp(X)>Y+3. As
a more complicated example,
COMPUTE MINXY=(X<=Y)*X+(Y1 on the right hand
side. If the result is to become a factor, it must be declared in
advance, see 6-7 screenfuls below.
FACTORS ON THE RIGHT HAND SIDE.
Factor dummies can be written like (F=2), meaning the variate which is 1
when the factor F takes the level 2, 0 otherwise. More generally,
factors of the correct length may occur on the right hand side, where
they are interpreted as parallel variates with their numerical levels as
values.
EXAMPLE. If F is a factor on three levels,
COMPUTE X= 1.1*(F=1) + 2.3*(F=2) + 4.8*(F=3)
will result in a variate X with values 1.1, 2.3 and 4.8, determined by
the levels of F.
UNIT-BY-UNIT FUNCTIONS.
Vector valued functions, operating unit by unit, are
EXP() the exponential function
LN() log (base e, use LN()/LN(10) if you want base 10)
SQR() x -> x*x
SQRT() x -> square root of x
ABS() x -> |x|
SIN()
COS() wellknown trigonometric functions
ARCTAN()
INT() x -> [x]
(integer part, upwards rounding for negative argument)
PHI() The c.d.f. of the normalised normal distribution.
PHIINV() The inverse of PHI().
NORMAL simulated standardised normal values.
POISSON() random Poisson distributed values with the argument as
parameter. If the parameter is 0 or negative, 0 is
returned.
RANDOM random uniform on [0,1]
EXAMPLES.
To generate discretely uniform random values in the range 1, 2, ... , 6,
say, use construcions like
FAC DICE 10000 6
COMPUTE DICE=1+INT(6*RANDOM)
To fill an existing variate Y with random zeroes and ones, 1 occurring
with probability 1/3, write
COMPUTE Y=RANDOM<1/3
If FITTED is a variate holding supposedly correct means in a
multiplicative Poisson model (produced e.g. by FITLOGLINEAR ... and
SAVEFITTED FITTED), then
COMPUTE SIMDATA=POISSON(FITTED)
will produce a simulated response variate under the estimated model.
To make a probit diagram for the observations in a variate X write
something like
SORT X +
COMPUTE N=##(X) { ##() is explained below }
VAR PROBIT N
COMPUTE PROBIT=PHIINV(#/(N+1)) { # is explained below }
PLOT X PROBIT
SCALARS ON THE RIGHT HAND SIDE.
Until now, all vectors occurring on the right hand side have been
assumed to be parallel vectors, i.e. vectors whose lengths must coincide
with (and sometimes determine) the length of the resulting vector on the
left hand side. Here comes the first exception: Variates of length 1 may
occur on the right hand side, where they are treated exactly as
explicitely written constants. The length of the resulting vector will
then be determined by other parallel vectors, or by its own length if it
exists. For example,
VAR X 100
COMPUTE A=2
COMPUTE X=#^A { # is explained below, but have a guess ... }
will create a vector of length 100 with values 1, 4, 9, ... , 10000,
provided that A has not been created earlier as something else than a
variate of length 1.
SCALAR VALUED VECTOR FUNCTIONS.
The following scalar valued functions are available:
SUM() returns the sum of the (present and non-missing) values of a
variate.
MEAN() returns the average of the (present and non-missing) values of a
variate. If no values are present and non-missing, 0 is returned.
MIN() and MAX() return minimum and maximum of the (present and
non-missing) values of a variate. If no values are present and
non-missing, 0 is returned.
VARIANCE() returns the (denominator n-1) sample variance of the
(present and non-missing) values of a variate. If no values or only a
single value are present and non-missing, 0 is returned.
##() returns the full length of a vector, without correction for
restrictions or missing values. If the argument is the name of a vector
that does not exist, 0 is returned. This convention is useful if you
want to check for existence of a vector in a program.
#() returns the number of units present of a variate or factor. For
variates, missing values are counted as non-present. But units with
level 0 for factors are regarded as present. Thus, if F is a factor
and no restrictions are present, we have ##(F) = #(F).
The argument of a scalar-valued function must be the name of a vector,
not an expression, and this vector is NOT a parallel vector, i.e. it
may be of any length, and this length has no influence on automatic
declaration of the vector on the left hand side.
EXAMPLE. if X is a variate of length 100, and SD is not declared (or of
length 1), you can write
COMPUTE SD=sqrt(variance(X))
to obtain the standard deviation (as a variate of length 1), and then
COMPUTE X0=(X-mean(X))/SD
to compute the vector X0 of standardised values. This can also be done
in a single step by
COMPUTE X0=(X-mean(X))/sqrt(variance(X))
THE UNIT INDEX # AND THE NUMBER OF UNITS ##.
The identifier # (not to be confused with the scalar valued vector
function #() ) has a special meaning as a (non-existing) variate with
the values 1, 2, 3,... . Writing e.g., for an existing variate UNIT of
length 100,
COMPUTE UNIT=#
the variate UNIT will get values 1, 2, ... , 100.
The identifier ## (not to be confused with the scalar valued vector
function ##() ) has another special meaning, as the length of the
resulting left hand side. Thus, if you write, for an existing vector X
of length N,
COMPUTE X=#/##
you will give X the values 1/N, 2/N, ... , 1.
EXAMPLE. Suppose we have a vector X of length 100, and want to split it
up in two vectors, one containing the odd-numbered entries and the other
containing the even. This can be done by
VARIATE X1 X2 50
COMPUTE X1=X(2*#-1)
COMPUTE X2=X(2*#)
(cfr. VECTORS AS FUNCTIONS OF UNIT INDEX a few screenfuls below)
VECTOR CONSTANTS.
A list of real numbers, seperated by blanks and embraced by brackets
[], can be used in COMPUTE commands to represent an unnamed vector
with given values. For example, to declare and simultaneously give
values to (short) variates, simply use commands like
COMPUTE X=[1.1 -1.2 0.2 3.4]
EXAMPLE. In an earlier example, the statement
COMPUTE X= 1.1*(F=1) + 2.3*(F=2) + 4.8*(F=3)
was suggested as a way of giving values 1.1, 2.3 and 4.8 to a variate,
depending on the level of a factor. An easier solution is
COMPUTE X=[1.1 2.3 4.8](F)
(cfr. VECTORS AS FUNCTIONS OF UNIT INDEX a few lines below)
It is also possible to include vector names in the list, representing
the list of the vectors values (or levels, in case of a factor). For
example
COMPUTE A=[1 2 3]
COMPUTE B=[a 0 a]
will result in a vector B of length 7 with values 1 2 3 0 1 2 3.
However, these "bracketed lists" must not (and need not) be nested.
For example,
COMPUTE B=[[1 2 3] 0 [1 2 3]]
would NOT work.
FACTORS ON THE LEFT HAND SIDE.
The vector on the left hand side may be a factor. If the result is
non-integer or out of range, the level 0 will be assigned, and a warning
will be given.
EXAMPLE. If F is a factor of length 100 on 5 levels, and you want to
collapse it to a factor G on three levels, representing the groups {1},
{2,3} and {4,5} of F-levels, you can write
FACTOR G 100 3
COMPUTE G=[1 2 2 3 3](F)
- where the explanation of the last line follows now.
VECTORS AS FUNCTIONS OF UNIT INDEX.
Variates and factors may occur as "functions" on the right hand side. In
this case the argument must be integer and is interpreted as a unit
index. The vector is NOT a parallel vector in this case (but its
argument may very well be so). The following example illustrates this
point.
EXAMPLE. The "display" command (i.e. a COMPUTE command without a left
hand side, here extended by a format)
COMPUTE [10 20 30 40 50]([1 5 2]):3:0
will produce the output
10 50 20
A more useful example follows here.
EXAMPLE. If X and X1 are variates of the same length,
COMPUTE X1=X(#-1)
will give X1 the "lagged" values of X. The first value X1(1) will become
missing, because X(0) is undefined (and a warning about this will be
given). Notice that X1 must be declared first, since X on the rigth hand
side is not a parallel vector. If X1 was undeclared, the statement would
result in a variate of length one with a single missing value.
WARNING. The statement
COMPUTE X=X(#-1)
will not work as - perhaps - expected, since the computations are
performed unit by unit in the natural order. This statement would
actually result in a vector of missing values, since we would get
first for unit 1 X(1) = X(0) = *
then for unit 2 X(2) = X(1) = *
then for unit 3 X(3) = X(2) = *
etc. etc.
A similar warning comes here: Suppose you want to transform a variate X
by subtraction of its first value from all entries. Then
COMPUTE X=X-X(1)
will not work, because X(1) is set to zero before the later entries are
computed. Instead, you would have to do something like
COMPUTE X1=X(1) { X1 undeclared, thus becoming of length 1 }
COMPUTE X=X-X1
DEL X1
In general one has to be very careful when the variate on the left hand
side occurs as a function on the right hand side. The computations are
performed unit by unit, and if entries of the variate have been changed
by earlier steps, this may give unexpected results.
However, as long as you know the rules, the dynamic execution can be
useful. For example, to produce a vector S holding the cumulated values
X(1), X(1)+X(2), X(1)+X(2)+X(3), ... of an existing variate X, write
(provided that no restriction are present)
COMPUTE S=X
EXCLUDE 1
COMPUTE S=S+S(#-1)
INCLUDE 1
The exclusion of unit 1 is necessary here, because otherwise the
reference to unit 0 would produce a missing value, and this would
persist all the way through, producing a vector of missing values.
Notice that it is OK to refer to S(#-1) also for #=2, the exclusion of
unit 1 does not prevent this. The restrictions refer to the unit index
for the vector on the left hand side.
Another example, clearly demonstrating how and why vectors are not
parallel when they occur as functions, follows here.
EXAMPLE. Suppose we have some monthly data over 20 years. Let MONTH be a
factor of length 240 on 12 levels, holding ... guess what. To create a
variate DAYS of length 240, holding the number of days in each month,
simply write
DAYS=[31 28 31 30 31 30 31 31 30 31 30 31](MONTH)
Here, MONTH is a parallel vector, the vector [31 28 ... 31] is not.
SINGLE VECTOR ENTRIES ON THE LEFT HAND SIDE.
To get the leap years correct in the example above, you could add a few
statements of the form
DAYS((1984-1981)*12+2)=29
DAYS((1988-1981)*12+2)=29
etc.
(pretending here that the first month is January 1981).
Quite generally, the left hand side in a COMPUTE command may be of the
form
vectorname(integer expression).
In this case the right hand side must interpretable as a vector of
length 1, and the result is stored in the corresponding entry of the
vector on the left hand side. For example (provided that A and B are
undeclared, or variates of length 1)
VAR X 10
COMPUTE A=3
COMPUTE B=23.5
COMPUTE X(Sqr(A))=B
DEL A B
is an extremely complicated way of setting the 9'th value of X to 23.5;
which could also be done by the single command
COMPUTE X(9)=23.5
Also in this case can the vector on the left be a factor, if the
expression on the right hand side is integer and in the range of valid
levels.
RULES FOR NAME CONFLICTS.
Names of vectors may be ISUW function names, but in this case the
corresponding functions are no longer available. For example, If you
declare a variate named EXP you can no longer use the exponential
function, because e.g. exp(1.3) will be interpreted as the (missing)
1.3'rd value of the variate EXP. Similarly, if you declare vectors named
RANDOM, MEAN, SUM, ... these functions can no longer be used.
THE CONSTANT PI.
The constant PI=3.14159.. can be constructed (if 3.14159 is not good
enough) by
COMPUTE PI=4*ARCTAN(1)
REGISTER OVERFLOW AND STACK OVERFLOW.
Large formulas may result in an error message reporting "register
overflow" or "stack overflow". The reason for this is that COMPUTE is
based on a parser procedure (formula interpreter) that calls itself, and
also that intermediate results are stored in a limited number of
registers. If this appears to be a problem, you will have to perform the
computations in two or more steps. For example, computing a sum of more
than (approximately, depending on other circumstances) 100 terms may
result in such an error. Splitting it up as a sum of two, like
COMPUTE X=A1+...+A50
COMPUTE X=X+A51+...+A100
will solve this problem (which you will hardly ever meet).
--------------------------------------------------------------------------------
GENERATELEVELS
Assigning levels to a factor in a systematic (cyclic) way.
................................................................................
Syntax: GENERATELEVELS factorname lag
Assigns cyclically varying levels to a factor. For example, if F is a
factor on 3 levels,
GEN F 2
will assign levels 1 1 2 2 3 3 1 1 2 2 ... to the factor. Hence, the
second parameter (2, in this case) determines the lag between change
points.
Restrictions are NOT taken into account.
EXAMPLE. A file contains the 3 by 4 table
1.2 1.4 1.1 1.9
1.2 1.1 1.3 1.6
1.1 1.4 1.2 1.3
We can read these values into a variate of length 12 by
VAR Y 12
OPEN filename
READ Y
The two factors reflecting the two-way structure can be constructed by
FAC ROW 12 3
GEN ROW 4
FAC COL 12 4
GEN COL 1
--------------------------------------------------------------------------------
GROUP
Construction of a factor by interval grouping of a variate.
................................................................................
Syntax: GROUP variatename factorname [levels [cutpoints]]
Constructs a factor by interval grouping of an existing variate. The
number of levels is set by the integer parameter "levels". If the
factor is existing in advance, it must be of the same length as the
variate, and "levels" must be the number of levels for that factor, if
specified (or it can be specified as zero, or marked with an asterix).
If the factor does not exist, it is automatically declared. If
"levels" is a positive integer, it is taken as the number of levels
for the new factor. If "levels" is specified as 0 or not specified at
all, the new factor is automatically declared with its number of
levels set to the rounded value of the square root of the length of
the variate, but at most 255.
The final parameter(s) cutpoints, if specified, must contain levels-1
real numbers separated by blanks. These must be in increasing order,
and they determine the cutpoints between intervals. This parameter can
also be the name of a variate, holding the desired cutpoints. The
length of this variate must be levels-1, and its values must be
nonmissing and increasing. If this parameter is not specified, the
cutpoints are placed equidistantly between MIN and MAX of the variate.
EXAMPLE.
GROUP AGE AGEGRP 4 20 40 60
is equivalent to
FACTOR AGEGRP ##(AGE) 4
COMPUTE AGEGRP=1+(AGE>20)+(AGE>40)+(AGE>60)
Ties are handled according to the convention "] , ]", meaning that a
value falling exactly at a cutpoint is put in the lower of the two
possible categories.
Restrictions are obeyed in the following sense. If the factor exists in
advance, it is unchanged for the hidden units. If the factor is
automatically declared, it becomes 0 for those units. Notice that if
cutpoints are selected automatically, they will be based on MIN and MAX
of the present observations only.
--------------------------------------------------------------------------------
TRANSFER
Transfers subvector to subvector, or compresses vector by removal of
hidden entries.
................................................................................
Syntax: TRANSFER name1 s1 e1 name2 s2 e2
or TRANSFER name1 name2
For the first form (with six parameters), the two names must be names
of either factors or variates, and the four integer parameters s1, e1,
s2 and e2, specifying the start and end of the vector segments, must
satisfy obvious consistency requirements involving the lengths of the
two vectors and the two subvectors.
EXAMPLE. If X and Y are vectors of lengths (at least) 10 and 100, the
command
TRANSFER X 1 10 Y 91 100
will set Y-values equal to X-values according to the scheme
Y(91) = X(1)
Y(92) = X(2)
...
Y(100) = X(10)
Transfer of a subvector with values/levels in reversed order is
performed when either s1 is greater than e1 or s2 is greater than e2.
Special care should be taken in the case where name1 = name2. This is
allowed, but the transfer takes place in the order s1 first ... e1
last, and this may give unexpected results.
EXAMPLE. If X is a vector of length 100, the command
TRANSFER X 1 99 X 2 100
will produce a vector with the same value for all units, because the
same value (originally X(1)) is transferred again and again. Whereas
TRANSFER X 99 1 X 100 2
COMPUTE X(1)=1/0
will produce a proper "lagged" vector (with the first value set to a
missing value, by the last statement).
TRANSFER with six parameters does NOT take restrictions into account.
For the short form (two parameters), name1 must be the name of an
existing vector. name2 should usually be specified as a valid vector
name which is not in use, but if earlier defined it must be of length
equal to the number of entries present in name1, and its type
(including number of levels, if it is a factor) must coincide with the
type of name1. The action taken is to create name2 if necessary and
transfer the values/levels present in name1 to name2 in the obvious
order.
EXAMPLE. Suppose we have variates AGE and HEIGHT and a factor SEX on
two levels, all of the same length. To create a data set MALES that
contains only the part of data with SEX=1, and import this to our
session, we can (as explained earlier in the description of SAVEDATA)
do as follows (assuming no restrictions present from the beginning).
FOCUSONLEVEL SEX 1
SAVE MALES AGE HEIGHT
DELETE
GET MALES
However, "short" vectors AGE_MEN and HEIG_MEN can also be created
directly by
FOCUSONLEVEL SEX 1
TRANSFER AGE AGE_MEN
TRANSFER HEIGHT HEIG_MEN
If this is to be followed by some relevant operations on the new
vectors, we should obviously proceed with
INCLUDEALL
since the restrictions imposed on the long vectors are not likely to
be relevant for the short vectors.
--------------------------------------------------------------------------------
FITLINEARNORMAL
Regression and analysis of variance
................................................................................
Syntax: FITLINEARNORMAL variate=modelformula[/weight]
Fits a standard linear model for normal observations. "variate" is the
dependent variable, and "modelformula" is a code for the linear
expression for the mean in the model. The model formula consists of
terms separated by plusses or blanks. Each term may be a factor, a
variate, or a formal product of factors and variates. The special term
'1' represents a constant term in the model (like a factor on one
level or a variate filled with 1's). As opposed to most other
statistics packages, ISUW requires explicit specification of the
constant term when it should be included (and it does not have to be
the first term).
All vectors occuring on the right hand side must be of the same length
as the reponse variate on the left.
If "weight" is specified, it must be a variate of the same length. Its
values must be positive, and they are interpreted as weights of
observations - implying that the assumption of constant variance is
replaced with the assumption that the variances are proportional to
the inverse weights. In this case, the unknown proportionality factor
takes over the role of the variance in the homoscedastic case.
Restrictions are obeyed in the sense that the analysis is performed
only for the data present. Missing values of variates must not occur.
Factors in the model formula may take the level 0, but this complicates
the interpretation and is not recommended in general. See, however,
2-3 screenfuls below.
EXAMPLES.
X, Y and COUNT are variates, F and G are factors. When more than one
model formula is given, they correspond to different parameterisations
of the same model, provided that none of the factors take the level
zero.
FITLIN Y=F
FITLIN Y=1+F one-way analysis of variance
FITLIN Y=1+X ordinary linear regression
FITLIN Y=F+G two-way analysis of variance,
FITLIN Y=1+F+G additive model
FITLIN Y=F*G two-way analysis of variance,
FITLIN Y=1+F+F*G with interaction
FITLIN Y=1+F+G+F*G
FITLIN Y=F+F*X a regression line for each F-level
FITLIN Y=1+F+F*X
or 1+F*(1+X) (this is legal - the "distributive
law" is build into the syntax)
FITLIN Y=F+X
FITLIN Y=1+F+X parallel regression lines
FITLIN Y=1+F*X regression lines with common intercept
FITLIN Y=F*X regression lines through (0,0)
FITLIN Y=1+X+X*X second degree polynomial regression
FITLIN Y=1+X/COUNT linear regression with weights COUNT
(referring to a standard context where
Y holds the averages in groups of
observations with common X-value, COUNT
holding the group sizes).
Quite generally, the rules for translation of the model formula to a
model can be described as follows.
The model states that the mean vector for the response variate is a
linear combination of columns of a MODEL MATRIX (or DESIGN MATRIX), the
(number of observations) times (number of parameters)
matrix which is ususally called X when the model is treated
mathematically. The model matrix can be thought of as generated from
the model formula by the following simple algorithm. Each term of the
model formula generates a number of columns of the matrix. A term
which is a product of factors generates a number of columns which is
the product of the numbers of levels for the factors. These columns
become "dummies", i.e. 0/1-indicators for the levels of the
corresponding product factor or cross classification. Multiplication
of a term by a variate does not change the number of columns
generated, but each column is multiplied (entry by entry) with the
values of the variate. In particular, a term consisting of a single
variate simply creates a column holding the values of the variate. The
term 1 generates a single column filled with 1's, and is thus
equivalent to a variate filled with 1's.
Hence, to count the number of columns (which is also the number of
linear parameters, including those that are set to zero due to
overparameterisation), simply add the orders of terms in the model
formula, where the order of a term is computed as the product of the
numbers of levels for factors in the term (1 if no factors are involved).
If the model matrix has linearly dependent columns (which is almost
always the case), parameters are set to zero according to the
following rule: Whenever a column is a linear combination of the
preceding columns, the corresponding parameter (i.e. the coefficient
to that column in the expression for the mean vector) is set to zero.
In this way, a unique parameterisation is obtained. For models
involving factors, specified with a constant term first, then main
effects, then first order interactions etc., this results in the usual
"corner point parameterisation" where the parameters corresponding to
last levels or (for interactions) level combinations involving a last
level are set to zero.
As an implicit consequence of these rules, the level 0 for a factor
(or a level combination for a cross classifications involving a level
0) does not create a dummy column of the model matrix. Our general
advice is not to use factors taking the level 0. If the level 0 is
used at all it should be as a "missing level", and accordingly units
with a level 0 should be excluded before FITLINEARNORMAL.
However, level 0 as a "relevant level" can be used to avoid
overparametrisations, if desired. For example, in a design with many
factors on two levels, it is sometimes preferable to declare these
factors with one level and use levels 0 and 1 as the two levels. This
actually means that you take over the complete control of the dummies
that form the model matrix. Multiplication of two factors on one level
in a model formula litteraly means multiplication of the two dummies,
etc. In this case you will have to be completely aware of what you are
doing. In particular, you should be aware that main effects without a
constant term, interactions without main effects etc., which can
usually be specified without much danger of confusion if you avoid to
use level 0, may result in meaningless models when the level 0 occurs.
EXAMPLE. To fit a one-way ANOVA model in such a way that the parameter
estimates are directly interpretable as expectations in the groups, a
command of the form
FITLIN Y=F
can be used. However, this will only work if the factor F does not
take the level 0, because the specification assumes E(Y)=0 for units
with F=0. If the level 0 actually occurs, you can use
FITLIN Y=1+F
to avoid this. But notice that the interpretation of the estimated
parameters will be very different from what it is when the level 0
does not occur. Level 0 takes over the role as "baseline level", which
is usually taken by the last level.
Output from FITLINEARNORMAL consists of an analysis of variance table,
giving the degrees of freedom, square sums, mean square sums,
F-statistics and P-values for successive reduction of the model by
removal of the terms from the model formula, beginning with the last.
This is what SAS users call the "type I" ANOVA table. However, there
are some differences. Firstly, the constant term will (if it is
specified) occupy a line, just as any other model term. Accordingly,
the last "Total" line will have the total number n of observations
(rather than n-1) as "degrees of freedom" and the total square sum of
the observations (rather than the square sum of deviations from the
mean) as its "sum of squares". Secondly, the tests performed are in
accordance with the usual rules for successive model reduction, in the
sense that the denominators of the F-statistics are "pooled variance
estimates", not just the variance estimate from the original model.
As a consequence of this, the order of terms determines the model
reductions to be tested, and also which parameters to be set to zero
to avoid overparameterisation. For this reason, the supposedly least
important terms (e.g. higher order interactions) should be put last in
the model formula. For example, if in a simple regression analysis a
line through the origin can be expected, it is more natural to write
"Y=X+1" than "Y=1+X". If the hypothesis "intercept=0" is accepted, the
proportionality model is fitted by "Y=X".
The estimated variance and standard deviation of the model is also
printed out. In addition, if the model formula has a constant term as
its first term, the usual R-square statistic (the proportion of the
total square sum of deviations explained by the model) is displayed.
To list the parameter estimates and their standard deviations for the
last model fitted by FITLINEARNORMAL, use the command LISTPARAMETERS.
To save these estimates (and optionally their estimated standard
deviations) in a variate use SAVEPARAMETERS. To extract fitted values,
residuals and normed residuals, use SAVEFITTED. See also the
descriptions of ESTIMATE (for estimation of specific contrasts) and
SAVENORMEDRESIDUALS (for extraction of studentized residuals).
TESTMODELCHANGE, which refers to the last two models fitted, can be
used for test of model reductions if more than one term is removed in
a single step.
--------------------------------------------------------------------------------
FITLOGLINEAR
Multiplicative Poisson models
................................................................................
Syntax: FITLOGLINEAR variate[-offset]=modelformula
Fits a log-linear or multiplicative Poisson model. It is wellknown how
conditioning on the total sum or a set of marginal sums in such models
results in models for multinomial data, which can be handled by the same
computational methods. But in the following, we refer to the
"independent Poisson" interpretation.
"variate" is the dependent (non-negative integer) variable, and the
model formula is a code for the linear expression for the logarithmised
mean in the model, cfr. the description of FITLINEARNORMAL above.
EXAMPLE. It should be emphasised that the following example is not
relevant at all, it is just a fast and easy way of explaining the
relation between model formulas and some wellknown concepts related to
contingency tables.
Let BIRTHS be a variate of length 24 holding the monthly counts of
births of boys and girls in a certain area during a year. Let MONTH be
the factor of length 24 on 12 levels holding the month number 1..12, and
let SEX be the factor of length 24 on 2 levels holding the sex
(boys/girls). The model fitted by
FITLOGLIN BIRTHS=MONTH*SEX
is the full or saturated model, where each observation has its own
freely varying expectation. The model fitted by
FITLOGLIN BIRTHS=MONTH+SEX
corresponds to independence in the 12*2 table, i.e. the sex proportion
is constant from month to month. The further reduction to
FITLOGLIN BIRTHS=MONTH
correponds to the assumption that the sex proportion is exactly 1:1,
whereas
FITLOGLIN BIRTHS=SEX
assumes (more or less, see below) constant birth intensity over the year.
Finally,
FITLOGLIN BIRTHS=1
assumes both constant sex proportion and equal distribution over months.
An OFFSET is a given variate of values that should be added to the
linear expression determining the mean. In the context of multiplicative
models, the usual expectation
E(Y) = exp(linear expression)
becomes, with an offset variate OFFSET,
E(Y) = exp(OFFSET)*exp(linear expression)
EXAMPLE. The above mentioned model "BIRTHS=SEX" does not take into
account the fact that months are unequally long. A more interesting
hypotesis would be that the numbers of births per day are i.i.d., or
that the expected number per month is proportional to the number of days
in the month. This model could be fitted (assuming that the year is not
a leap year) by
OFFS=LN([31 28 31 30 31 30 31 31 30 31 30 31](MONTH))
FITLOGLIN BIRTHS-OFFS=SEX
The output from FITLOGLINEAR includes the likelihood ratio test and
Pearson's approximation to it for the model against the full model
(i.e. the model where each observation has its own free parameter).
Be careful with these tests, they are not reliable for small
(expected) counts.
--------------------------------------------------------------------------------
FITLOGITLINEAR
Logit linear models for binary or binomial data.
................................................................................
Syntax:
FITLOGITLINEAR variate[-offset]=modelformula[/total]
Fits a logit-linear or logistic regression model. "variate" is the
dependent variable. The command has two different forms, depending on
whether "total" is specified or not:
1. BINARY RESPONSE. Here, the response variate is assumed to be binary,
i.e. only the values 0 and 1 are allowed. The model states that these
0-1 variables are independent, with (let us call them Y(i))
exp(linear expression)
P(Y(i)=1) = p(i) = -----------------------------
1 + exp(linear expression)
where the linear expression is determined by the model formula,
exactly as for FITLINEARNORMAL. "total" should not be specified.
2. BINOMIAL (FREQUENCY) RESPONSE. Here, the responses are assumed to be
relative frequencies (notice: FREQUENCIES, not COUNTS) of the form
Y(i)/M(i), where the Y(i) are independent, binomially distributed
with binomial totals or indices M(i) and probability parameters
p(i) parameterised as above. The variate "total" after the slash
must contain the binomial totals M(i).
Notice that model 1 is a special case of model 2, namely if "total" is a
variate filled with 1's. Conversely, model 2 can always be regarded as a
model derived from model 1 by sufficient reduction, namely summation of
binary responses over covariate classes (i.e. groups in which all
explanatory variables and factors are constant).
The rules for translation of the model formula to a linear expression,
the handling of overparameterisations etc. are exactly as for
FITLINEARNORMAL, see 6-8 screenfuls above.
EXAMPLE. See the "SIMPLE EXAMPLE" starting approximately 4 screenfuls
below top of this file.
The interpretation of the optional "offset variate" "offset" is exactly
as for FITLOGLINEAR. The offset is a vector of values that are added in
advance to the linear expression. You may think of it as a covariate in
the model with its coefficient "frozen" at the value 1.
In the binomial case, the output from FITLOGITLINEAR includes the
likelihood ratio test for the model against the full model, which
merely assumes that the responses are binomial frequencies with freely
varying probability parameters. Pearson's approximation is not given,
but it can be computed as the square sum of the normed residuals, and
it also comes out if you fit the corresponding overdispersion model by
FITNONLINEAR, see below. Be careful with these tests, they are not
reliable when the binomial totals are small or when relative
frequencies are close to 0 or 1.
--------------------------------------------------------------------------------
FITNONLINEAR
Nonlinear regression and generalised linear models with overdispersion.
................................................................................
Syntax: FITNONLINEAR modelspec m Dm v [init [worky [workw]]]
where "modelspec" has the form
variate[-offset]=modelformula[/weight]
The syntax for the model specification is exactly as for
FITLINEARNORMAL, except that an offset variate is allowed (cfr.
FITLOGLINEAR above). However, for obvious reasons there must not be
any blanks in the model specification (use +, not blank, to separate
terms), nor in the following function specifications.
The parameters m, Dm and v are expressions for the three functions
that define the model (closely following the notation in Tjur (1998):
Nonlinear regression, quasi likelihood, and overdispersion in
generalized linear models, The American Statistician 52 pp. 222-227).
m is the mean (or inverse link) function, Dm is the first derivative
of m (required for the numerical procedure) and v is the variance (up
to a common scale factor and the weights) as a function of the mean.
In these expressions, the argument is written as a period '.'. Apart
from this, the syntax is exactly as for the right hand side of a
COMPUTE statement, with '.' entering as a (parallel) variate of the
relevant length.
EXAMPLES. To fit a log-linear model with variance proportional to the
mean (multiplicative Poisson type structure) write
FITNONLINEAR ... exp(.) exp(.) .
To fit a linear model with a variance proportional to the squared mean
(constant coefficient of variation) write
FITNONLINEAR ... . 1 sqr(.)
To fit a logit-linear model for binomial frequencies FREQ=COUNT/M
(including an overdispersion parameter) write
FITNONL FREQ=.../M exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.)
To fit a probit-linear model instead, write
FITNONLINEAR FREQ=.../M phi(.) exp(-sqr(.)/2)/sqrt(2*3.14159) .*(1-.)
The nonlinear regression models that can be handled by FITNONLINEAR are
characterised as follows:
The observations are (in principle) independent, normally distributed.
The expectation for each observation is given as a known function (m) of
the "linear parameter" associated with each unit. These linear
parameters are, in turn, linear combinations of covariates specified by
a model formula, just like the mean in a linear model. The variance is
specified as a known function (v) of the mean, multiplied by an unknown
"overdispersion" or squared scale parameter, common to all observations,
optionally divided by known weights.
The procedure estimates a nonlinear regression model of this kind by the
method known as Iteratively Reweighted Least Squares (IRLS) or Quasi
Likelihood. In case of a constant variance function, this reduces to
ordinary (optionally weighted) least squares or maximum likelihood.
In full detail, the function parameters m, Dm and v and the model
formula determine the model as follows. The i'th observation Y(i) is
normally distributed with mean
E[Y(i)] = mu(i) = m( [ offset(i)+ ] eta(i) )
= m( [ offset(i)+ ] beta(1)*x(i,1) + ... + beta(p)*x(i,p) )
where x(i,j) are the elements of the model matrix determined by the
model formula, and variance
var[Y(i)] = lambda*v(mu(i))/w(i)
where w(i) is the i'th value of the weight variate (or 1, if a weight is
not specified) and lambda is the overdispersion (or squared scale)
parameter. The parameters to be estimated are beta(1), ... , beta(p) and
lambda.
The last three command parameters INIT, WORKY and WORKW are optional.
They can be omitted, or replaced with an asterix to be skipped.
INIT, if specified, should be a variate of the same length as all other
vectors involved. The values of INIT are taken as the initial linear
parameters. This may speed things up, and in some cases (in particular
when the mean function has a singularity at zero, like m='1/.') such
initial values are necessary for the iterative method to get started.
Indeed, if INIT is not specified a variate of zeroes is used.
EXAMPLE. For a model with m(eta)=exp(eta) and variance proportional to
the mean you could do something like this (if not for any other reason,
then to save computing time):
LNY=LN(Y+0.5) {'+0.5' can be omitted if Y has no zeroes}
FITLINEARNORMAL LNY=1+SEX+AGE
SAVEFITTED ETA0
FITNONLINEAR Y=1+AGE+SEX exp(.) exp(.) . ETA0
SAVEFITTED ETA0
FITNONLINEAR Y=1+AGE exp(.) exp(.) . ETA0
etc.
In general, suitable initial values of the linear parameters can be
produced as the fitted values in a (possibly weighted) linear regression
where the response variate has values computed as the observations
transformed by the inverse mean function (also called the link
function). In later FITNONLINEAR commands with a slightly modified model
formula (same y-variate, same offset, same weights) the estimated linear
parameters from a fit can usually be taken as the initial values for the
next.
GENERALISED LINEAR MODELS WITH OVERDISPERSION.
It is a property of the IRLS method that if Y has nonnegative integer
values, the estimates in the above example will actually coincide with
the maximum likelihood estimates in a multiplicative Poisson model given
by the same model formula. Similarly, with observations FREQ=Y/M defined
as relative frequencies, after a command of the form
FITNONL FREQ=.../M exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.)
the IRLS estimates will coincide with the maximum likelihood estimates
in the corresponding logistic regression model. Quite generally, this
equivalence holds for any generalised linear model in the sense of
Nelder and Wedderburn (1972, JRSS A p. 370-84). The results produced by
FITNONLINEAR are thus, to a large extend, valid for generalised linear
models with overdispersion as described e.g. in the book on Generalised
Linear Models by McCullagh and Nelder (Chapman and Hall 1989). The
standard deviations produced by LISTPARAMETERS are corrected for
overdispersion, and the tests for beta(j)=0 are based on the relevant
T-distribution. The F-tests in the analysis of variance table can be
regarded as second order approximations to the usual likelihood ratio
tests based on the Chi-square approximation, with correction for
overdispersion and (using F rather than Chi square) for random variation
of the estimate of the overdispersion parameter.
According to the quasi likelihood interpretation (Wedderburn, Biometrika
1974, p. 439-447), the IRLS method is in fact valid quite generally for
"distribution free" models specified by their first and second moments
only. However, standard central-limit-theorem-type assumptions are, of
course, required for F-tests, Chi-square tests etc. to be asymptotically
valid.
The computations are performed iteratively. Each iteration calls (with
output suppressed) the StatUnit procedure FitLinearNormal with a
dependent variable WORKY and a weight variate WORKW computed from the
results of the previous iteration as
WORKY = (Y-m(FITTED))/Dm(FITTED) + FITTED [-OFFSET]
WORKW = W*sqr(Dm(FITTED))/v(m(FITTED))
(Y is the original dependent variate, W the original weight vector,
FITTED the variate of fitted values from the previous fit, including the
offset, if specified).
The iterations are stopped when the weighted square sum of changes,
defined as the sum of the quantities
W*sqr(m(NEWFITTED)-m(LASTFITTED))/v(m(LASTFITTED))
is less than (1.0E-12)*ModelVariance*ResDF
where ModelVariance is the present estimate of the overdispersion
parameter, ResDf its degrees of freedom.
A final iteration is performed resulting in an analysis of variance
table with approximate F-tests for removal of terms (beginning with
the last, as usual). Notice that a new fit after removal of
insignificant terms from the bottom of the table will result in square
sums for the remaining terms that are slightly changed, as opposed to
what happens in the case of a linear model with constant variance
function.
After the ANOVA table FITLINEARNORMAL prints the estimate of the
overdispersion parameter and its square root, and finally FITNONLINEAR
adds the Chi-square test for "no overdispersion" (i.e. overdispersion
parameter = 1), which coincides with Pearsons goodness-of-fit test in
the multiplicative Poisson and logistic regression type situations
mentioned above. Notice that the test for "no overdispersion" is
irrelevant in proper nonlinear regression situtations, where the value
1 of the scale parameter plays no particular role. And also in the
case of a logistic regression model for binary responses.
If nothing else is specified, the variates referred to as WORKY and
WORKW above are saved under the names #NL_WY and #NL_WW. Existing
variates of these names will be deleted. These variates are saved
because they are used by SAVEFITTED. However, you can specify other
names for them as the last two parameters to FITNONLINEAR.
After a call to FITNONLINEAR, ISUW commands referring to the last
model fit refer to the call of FITLINEARNORMAL in the final iteration.
This implies that
LISTPARAMETERS
will return the IRLS estimates, with standard deviations and T-tests
based on the approximating linear model.
SAVEFITTED FIT0 RES0 NRES0
will result in the following:
FIT0 will contain the estimated linear parameters. To produce estimated
means, transform to FIT = m( [ OFFSET + ] FIT0 ).
RES0 will contain the quantities (Y-FIT)/Dm( [ OFFSET + ] FIT0 ). To
obtain residuals in the usual sense (observations minus estimated means)
multiply by Dm( [ OFFSET + ] FIT0 ), or compute more directly (with FIT
computed as above) RES=Y-FIT.
NRES0 will actually contain the correct normed residuals (residuals
divided by their estimated standard deviations).
Notice that you have also, at your disposal after a model fit, the
variate of "linearised observations" (default name #NL_WY) and the
workweights for the last fit (default name #NL_WW).
The class of models that can be handled by FITNONLINEAR is acutally
broader than indicated above, because variates may occur in the
formulas for m, Dm and v. This means that some kinds of unit-specific
mean and variance functions are allowed. As a trivial example (which
does obviously not extend the class), notice that the command
FITNONLINEAR Y=1+AGE+SEX OFFS+. 1 1/W
will perform exactly the same as
FITNONLINEAR Y-OFFS=1+AGE+SEX/W . 1 1
which, in turn, could be obtained simply by
Y0=Y-OFFS
FITLINEARNORMAL Y0=1+AGE+SEX/W
In this sense, FITNONLINEAR's conventions for offsets and weights are
unnecessary, because they can be built into the functions.
Restrictions are obeyed in the obvious way.
--------------------------------------------------------------------------------
FITCOXMODEL
Proportional hazards models for survival data by Cox's partial
likelihood.
................................................................................
Syntax:
FITCOXMODEL exittime[*deathind][-entrytime][/stratum]=modelformula
exittime must be a variate holding the times of death/censoring, and
this must come first on the left hand side. The order of the three
optional specifications deathind (indicator of the event death, as
opposed to censoring), entrytime (times of left truncation) and
stratum (a factor dividing individuals into groups with common
underlying intensity) is irrelevant, they are identified by the
preceeding characters ( * , - or / ).
"deathind" must be specified in case of right censoring. It must be a
factor on a single level and of length equal to the length of exittime.
It is interpreted as an indicator of the event death. Thus, censored
individuals should have the level of this factor equal to zero.
entrytime should hold the times of entrance when survival times are left
truncated. For each individual, the time under observation must be
positive, i.e. entrytime < exittime (sharply).
"stratum", if specified, must be a factor of the same length as exittime.
This specification means that each stratum has its own underlying
(unknown) intensity. Only the parameters of interest (given by the right
hand side of the model specification) are common to the strata. A
typical example is stratification by SEX, which is often required. If
the same factor occurs also on the right hand side in interaction with
everything else, the model is equivalent to a separate Cox model for
each stratum.
"modelformula" must be a model formula, involving variates and factors of
the same length as exittime. Notice that a constant term 1 should not be
specified, because a constant factor on the intensity is absorbed by the
unknown underlying intensity. In the expression for Cox's likelihood, a
constant factor simply cancels out.
The model considered is given by the following expression for the death
intensity of an individual:
DeathIntensity(time)=lambda0(time)*exp(linear expression in covariates)
Here, lambda0 is the "underlying intensity", or the intensity for an
individual with all covariates = 0. The linear expression involves
individual specific information like (in a medical context) sex, age,
weight, smoking habits, treatment or whatever, with unknown parameters
as coefficients, just as in a multiple regression model or any
other generalised linear model.
Cox's partial likelihood for the parameters of interest (based on the
order in which events took place, disregarding the actual times where
they took place) can be written as the product over all dead individuals
of the fractions
exp(linear expression for dead individual)
-------------------------------------------------------
Sum over all indivials at risk at that time of exp(...)
where ... stands for the linear expression for the single indivial in
the set of individuals under risk at the time of the death. "Under risk"
means present at that time (entrytime < t <= exittime) and in the same
stratum as the one who died.
Ties (coincident times of death/censoring) are handled according to
Breslow's method, which is to include all individuals under risk just
before a death in the risk set for that event. For coincident deaths
this means that the individuals occur in each other's risk sets. This
may seem artificial, but at least it gives a non-arbitrary correction
for ties, which is acceptable in case of only few ties.
Time dependent covariates can not be handled, with the following
exception: If you have one or a few time dependent but PIECEWISE
CONSTANT covariates, these can be handled as follows. Whenever a
covariate shifts its value, let the corresponding individual "change its
identity", i.e. remove it by rigth censoring and introduce a new (with
the new values of the covariate) by entrance (left truncation) at the
same time point.
This may seem restrictive, but in principle any time dependent covariate
can be handled in this way, because what matters is only the values of
covariates at the finitely many timepoints where a death takes place.
The problem is the construction of the data set, and the fact that this
data set may become very long.
Restrictions are taken into account in the obvious way. Missing values
of variates for units present must not occur. Levels 0 for factors are
treated as usual (no dummy generated, take care that main effects are
present when interactions are specified etc.). The level 0 of the factor
stratum is treated as any other level, if it occurs.
In rarely occuring situations, involving few individuals and/or many
covariates, the ouput will contain the following warning:
Iterations stopped, fitted values are out
of range - probably because deterministic
model fits.
This happens when a covariate or a linear combination of such is a
monotone function of the time of death (or in situations where this is
close to the truth). In this case, the maximum-likelihood estimate of
the parameters does not exist, unless the values plus/minus infinity are
allowed. Iterations are stopped due to numerical problems. The results
are not reliable (and hardly of any interest either) in this case.
After FITCOXMODEL, the command SAVEFITTED can be used, but only for
storage of fitted values, i.e. the quantities referred to as "linear
expressions" above. Residuals and normed residuals are not defined. The
commands LISTPARAMETERS, ESTIMATE, SAVEPARAMETERS etc. will work in the
obvious way. Similarly, TESTMODELCHANGE can be used after fit of a model
and a reduced model. Notice that a model reduction here means removal of
one or more terms from the model formula. Removal of a stratifying
factor, for example, can not be tested in this way.
ESTIMATION OF THE INTEGRATED UNDERLYING INTENSITY.
This can be done simultaneously with the fit. The syntax for this is to
add a final variate name to the model formula, preceeded by a slash / ,
like
FITCOXMODEL EXITTIME*DEAD-ENTRYTIM/SEX=AGE+TREATMT/INTINT
which will imply that the usual estimate of the integrated underlying
intensity is saved in a variate named INTINT. The name specified after
the slash must be a valid variate name, and if this variate is already
declared it must be of the same length as all other vectors in play.
The resulting variate will have missing values in all entries, except
those correponding to individuals that are "present" and dead. In a plot
of INTINT against EXITTIME, these are the upper left breakpoints of the
broken line, when the function is drawn as a step function in the usual
way. To produce the usual plot, connect consecutive points by a
horizontal line (from left to right) and a vertical line (upwards).
In case of a stratified model, the integrated intensities are estimated
correctly, but notice that a point plot of this variate against exittime
will be rather confusing, unless different colors or symbols are used
for the strata, or all but one stratum are excluded.
Notice that a fit of a Cox model without covariates, like
FITCOXMODEL EXITTIME*DEAD-ENTRYTIM=/INTINT
(empty model formula) will result in the usual nonparametric estimate of
the integrated intensity for a homogenouos sample of individuals. A good
approximation to the non-parametric Kaplan-Meier estimate of the
survival function (i.e. one minus the c.d.f. of the survival time
distribution) can then be computed as
KM=exp(-INTINT)
From a plot of KM against EXITTIME, the usual plot of the Kaplan-Meier
estimate is obtained when consecutive points (beginning with (0,1)) are
connected by a horizontal line (from left to right) and a vertical line
(downwards).
The estimated standard deviation of the estimated baseline intensity can
also be computed. The syntax for this is to add an extra variate name
after a plus sign, like
FITCOXMODEL EXITTIME*DEAD-ENTRYTIM/SEX=AGE+TREATMT/INTINT+IISD
After this, pointwise confidence limits for the estimated baseline
intensity can be computed by
UPPER=INTINT+1.96*IISD
LOWER=INTINT-1.96*IISD
or better (by log-transformation)
UPPER=EXP( LN(INTINT)+1.96*IISD/INTINT )
LOWER=EXP( LN(INTINT)-1.96*IISD/INTINT )
CONSTRAINT. Since the Pascal code for this command is taken over
directly from the DOS version ISU, the length of the vectors entering in
the model specification must not exceed 16379. Hopefully to be changed
in the future, but not a matter of highest priority.
--------------------------------------------------------------------------------
FITMCLOGIT
FITMCPROBIT
FITMCCLOGLOG
P. McCullagh's model for ordered qualitative responses.
................................................................................
Syntax: FITMC... response=modelformula
where the left hand side "response" is either
(1) A factor on k levels, holding the responses,
or
(2) A list C1 C2 ... Ck of k variates, holding the
multinomial counts of individuals in the k response groups.
In the first case, the "unit by unit" (no aggregation) case, "response"
must be the name of a factor on k levels, where k is the number of
possible (ordered) responses. The levels of this factor must be in the
range 1..k (no zeroes). The length n of this factor (the number of
units) must coincide with the lengths of all vectors occurring in the
model formula.
In the second case, C1, C2, ... Ck must be names of variates, containing
the multinomial counts of the k responses in n "covariate groups". Only
in this case will the command compute goodness of fit statistics and
fitted values. This representation is only relevant when the model
considered is specified by factors or covariates with several units for
each combination of factor levels / variate values. The common length of
C1..Ck must coincide with the lengths of all vectors occurring in the
model formula.
The model formula after the equality sign MUST contain a constant term 1
as its first term, since only k-2 cutpoint parameters are used. Thus,
for k=2, no cutpoint parameters are defined, and the model is equivalent
to a logit-linear, probit-linear or cloglog-linear model for binary
data. For k>2, the first parameter CONSTANT represents the cutpoint
between level k-1 and k, and the cutpoints THR[1]..THR[k-2] are
negative, representing the differences between cutpoints 1, 2 ... k-2
and cutpoint k-1.
Mathematically, the model can be stated as follows. The probability that
an individual responds r (=1..k) is
F(CUTPT[r] + linear expression ) - F(CUTPT[r-1] + linear expression )
(subsuming CUTPT[0]=minus infinity, CUTPT[k]=plus infinity). Hence, the
response can be regarded as a discretised version of a non-observable
continuous variable X with c.d.f. F(linear expression + x). In other
words, the model is really a linear position parameter model with error
distribution F, but the observations are only available in "rounded"
form, where the "rounding" is a grouping in k intervals with unknown
cutpoints. In the parameterisation chosen here, these cutpoints are
CONSTANT+THR[1] < CONSTANT+THR[2] < ... < CONSTANT+THR[k-2] < CONSTANT
The "linear expression" is determined by the model formula as usual.
The three models available in ISUW are defined by their c.d.f.'s as
follows:
COMMAND F(x) distribution inverse F
FITMCLOGIT F(x) = exp(x)/(1+exp(x)) Logistic logit
FITMCPROBIT F(x) = PHI(x) Normal probit
FITMCCLOGLOG F(x) = 1-exp(-exp(x)) Gompertz cloglog
(cloglog is short for "complementary log log").
For F(x) = exp(x)/(1+exp(x)) (the logistic c.d.f.) and k=2, the usual
logit-linear model for binary or binomial data comes out of it. For
k>2, this model becomes a "union" of several such logistic models, in
the sense that the marginal models obtained by dichotomisations of the
ordered scale are ordinary logistic regression models for binary data.
But the point is that the interesting parameters (the coefficients to
covariates in the linear expression) are common to these marginal
models; only the constant term (the cutpoint) does, of course, depend
on the choice of dichotomisation.
EXAMPLE. Consider a classical dose-response situation, where doses of
some drug are given to animals. A standard model for analysis of this
situation states that the probability of death for an animal depends
logit-linearly on log(dose). A similar model could be used to describe
dose-dependence of an event like "death or serious damage". This
corresponds to the two possible dichotomisations of the ordered
three-point scale
1: No effect
2: Seriously damaged, but not dead
3: Dead
The Logistic McCullagh model for the full three-level response, which
can be fitted by a command of the form
FITMCLOGIT RESP=1+LOGDOSE
(RESP a factor on three levels) can be regarded as a way of
incorporating both binary models in one, subsuming that the
interesting parameters (here the slope, i.e. the coefficient to
log(dose)) is the same in the two models.
Notice that FITMCPROBIT can be used to fit classical probit linear
models for binary data (k=2). Similarly FITMCLOGIT can be used to fit
logistic regression models for binary data - but FITLOGITLINEAR is
much faster. FITMCCLOGLOG is useful in survival analysis, where this
model comes out when survival times in a proportional hazards model
are grouped.
After any of the FITMC... commands, the command SAVEFITTED can be
used, but only for storage of fitted values, i.e. the quantities
referred to as "linear expressions" above. Residuals and normed
residuals are not defined (or, rather, if they were they should be
variates of length n*k, not n). The commands LISTPARAMETERS, ESTIMATE,
SAVEPARAMETERS etc. will work in the obvious way. Similarly,
TESTMODELCHANGE can be used after fit of a model and a reduced model.
CONSTRAINT. Each response must occur at least once. If this is not the
case, the response scale must be reduced by collapse of neighbour
levels.
--------------------------------------------------------------------------------
FITCLOGIT
Conditional logistic regression.
................................................................................
Syntax:
For binary data:
FITCLOGIT response/groups=modelformula
For binomial data:
FITCLOGIT response/groups=modelformula/totals
Consider a logit linear model of the following form. A binary response y
(values 0 and 1) is assumed to have independent elements with
logit( P(y=1) ) = a(g) + general linear expression.
In principle, this is an ordinary logistic regression model, but the
factor level g (for group) has a particular role in the following. It is
assumed to represent a classification of units in groups, for which the
parameters a(g) are regarded as nuissance parameters. In analogy with
wellknown variance analysis concepts, we may think of the groups as
blocks, and the conditional analysis performed is similar to the
intra-block analysis.
Let S(g) denote the sum of the responses y(i) over all units in group g.
It is easy to prove that the model obtained by conditioning on these
group sums has a likelihood which does not depend on the nuissance
parameters a(g). This is an exclusive property of the logit-linear
model, which is not shared e.g. by probit-linear models or other
generalised linear models for binary data.
An important special case occurs in connection with case-control
studies, where each case (response y=1) is matched with a given number
of controls (y=0). The controls are selected at random from a large
population in such a way that they match the case with respect to
characteristics that are not to be analyzed in the context (age, sex,
...). Considering the grouping into case-control groups, and
conditioning on the corresponding sums (which are all 1), we obtain a
model where the group parameters (and, in turn, all parameters
representing effects of matched factors) disappear.
"response" must be the name of a variate, containing the binary
responses (0/1). In case of binomial data (i.e. when the binary
responses are aggregated in covariate groups within the "conditioning
groups"), response must contain the relative frequencies, and the name
of the variate holding the binomial totals must be given after a slash
after the model formula.
"groups" must be the name of a factor or variate of the same length as
response. The levels/values of this must be increasing, and they
determine the groups. Typically, this vector could hold the group
numbers. In a case-control data set, where the case occurs first in
each group, these values can be obtained by cumulation of the
reponses. But the values are actually irrelevant, only their order
matters. A positive increase means that a new group begins, a proper
decrease must never occur.
Variates and factors in the model formula must be of the same length
as response and groups. The string "modelformula" has exactly the same
form and interpretation as in a call to FITLOGITLINEAR, except that
the conditioning factor must be given after a slash before the
equality sign, and an offset is not allowed. A typical call in
connection with case-control studies might look like
FITCLOGIT CASE_IND/CCGROUP=SOCGR+EXPOSURE
Here, CASE_IND is a variate with the value 1 for cases, 0 for controls.
CCGROUP is a variate or factor with increasing values/levels, holding
the number of the case control group. SOCGRP could be a factor,
containing some information about social covariates, and EXPOSURE could
be a covariate holding the exposures for some suspected toxic matter or
whatever. Usually relevant effects like AGE, SEX, COHORT etc. can be
left out, if taken into account by the matching. If they are included in
the model formula, the corresponding parameters will be set to zero, and
so will the intercept parameter corresponding to a constant term 1, if
included.
Groups, in which all units respond 1, or all units respond 0, will
obviously not contribute to the likelihood.
Restrictions are taken into account in the obvious way. Units excluded
are simply excluded from their group, and if this results in a "trivial
group" in the above sense, the entire group is ignored. However, the
grouping factor (or variate) GROUP must still be sorted, also as regards
the missing values.
After a call to FITCLOGIT, SAVEFITTED can be used for storage of fitted
values (but not residuals and normed residuals). However, these fitted
values are not very useful in themselves, because they do not contain
the contributions of the conditioning factor or effects confounded with
it. Exact fitted values can not be easily obtained (cfr. the problems
with computation of a mean in a non-central hypergeometric
distribution), but a good approximation can be obtained as follows. Use
SAVEFITTED FIT
and after this, fit a logit linear model (unconditional) with the
conditioning group factor as the only explanatory variable and with the
variate FIT as offset, like
FITLOGIT Y-FIT=GROUP/M
SAVEFITTED NEWFIT
NEWFIT=M*NEWFIT
After this, NEWFIT will contain a good approximation to the expectations
of the binomial observations under the estimated model.
CONSTRAINT. The number of positive responses in a group must not exceed
255.
--------------------------------------------------------------------------------
FITCRASCH
Conditional estimation in the Rasch model.
................................................................................
Syntax: FITCRASCH response/sgsizes=modelformula
In its simplest form, the model considered here is the logit additive
model for a two-way table of binary responses (referred to as the full
Rasch model below), stating that the probability of a positive response
is additive on the logit scale. That is, if y(row,col) denotes the
binary response in the (row,col)'th cell,
logit( P( y(row,col) = 1 ) ) = alpha(row) + beta(col).
The usual maximum likelihood estimates for column parameters are known
to have bad asymptotic properties when the number of rows (and thereby
the total number of parameters) tends to infinity. This problem
disappears when the conditional estimates, given the row sums, are used
instead.
More generally, we are considering a special case of the conditional
logistic regression model (see above) defined by the following two
properties concerning the groups defined by the conditioning factor
(here the rows of our table):
(1) The groups are equally sized
(2) All covariates are functions of the internal unit number in
the group.
Or, in other words: Our vector of binary responses can be set up in a
two-way table such that the sums conditioned on are the row sums, and
such that the covariates in the logistic model (disregarding those that
are "conditioned away") are columnwise constant.
In particular, the largest model satisfying this is the "full Rasch
model", the model with a free parameter for each column.
In this context, rows are often regarded as subjects and columns as
items. The idea is that each subject responds 0 or 1 to each item, and
the conditioning taking place here is on subject sums or subject
"scores".
The command FITCLOGIT is unnecessarily slow for estimation of these
models, because many quantities that need only be computed once in each
iteration will be computed once for each subject in each iteration.
Under the above assumptions, the conditional likelihood turns out to
depend only on the item totals (i.e. sums of responses for each item)
and the sizes of the score groups, i.e. the number of subjects with
1,2,...,k "correct" answers. In the command (see syntax description
above)
"response" is a variate of length k, holding the item totals.
"sgsizes" is a variate of the same length k, holding the sizes of
score groups. Its first value must be the number of subjects with
score 1, its second value the number of subjects with score 2, etc.
The number of individuals with score 0, i.e. no correct answers, is
left out because it is irrelevant. In fact, the last value (the number
of subjects with all answers correct) is also irrelevant, but it is
included for cosmetic reasons (to keep all variates of the same
length) and to allow for the check of consistency mentioned below.
The model formula is, in the simplest case (the full Rasch model) a
factor of the same length k with distinct levels 1,...,k. But it may
also take the more general form of a model formula in variates/factors
of length k.
The command takes restrictions into account in the following sense: If a
unit is missing, it means that the corresponding item is left out. Think
of a table where a column is deleted. However, this will change the row
sums, and accordingly one will have to change the score group sizes.
Entries in the vector of score group sizes are never, in any sense,
regarded as missing. This means that you can not remove an item from the
analysis merely by excluding the unit number, you must modify the
variate of score group sizes accordingly.
In practice, a more relevant kind of restrictions has to do with fit
of the Rasch model to a subset of the set of subjects. Notice that
this can be done here, but it has nothing to do with restrictions on
the formal set of units, which is the set of items. The values of
"response" and "sgsizes" are sums over the set of subjects, and any
change of the set of subjects can be performed by accordingly changing
the values of these variates.
EXAMPLE. Consider the table
item 1 2 3 sum
subject 1 1 1 0 2
2 0 1 0 1
3 0 1 0 1
4 1 1 0 2
5 1 1 0 2
6 0 0 1 1
7 0 0 1 1
8 1 1 1 3
9 0 0 0 0
10 1 1 0 2
sum 5 7 3
Let ITEM be factor of length 3 with three levels, SUM and SG_SIZE
variates of length 3 with values given by
ITEM SUM SG_SIZE
1 5 4
2 7 4
3 3 1
Then
FITCRASCH SUM / SG_SIZE = ITEM
will fit a full Rasch model.
There is an obvious check of consistency, which in the above example
takes the form
Total sum of responses = 5 + 7 + 3 = 1*0 + 4*1 + 4*2 + 1*3
This check is performed by FITCRASCH, and the command is interrupted
with an error message if the check fails.
CONSTRAINT. The number of items must not exceed 255.
--------------------------------------------------------------------------------
FITNEGBIN
Estimation in log linear models for negatively binomially
distributed counts.
................................................................................
Syntax: FITNEGBIN response=modelformula[/weight] [initalpha]
The negative binomial distribution is the distribution on {0,1,2,...}
with point probabilites
a y
P(Y=y) = (a+y-1 over y) (1-p) p ( a > 0 ).
(with an obvious notation '( ... over ... )' for binomial coefficients).
For integer values of the parameter a (called ALPHA in procedure output)
this is the distribution of the waiting time to (or rather, the number
of non-succesful outcomes before) the a'th success in a sequence of
independent identical binary experiments with probability 1-p of
"success". For arbitrary a>0, the negative binomial distribution can be
characterised as a mixture of Poisson distributions with respect to a
Gamma distribution in the following sense. If Y is Poisson distributed
with a random paramater lambda, which is drawn from a Gamma distribution
with form parameter a and scale parameter b, then the resulting
distribution of Y is a negative binomial with "a=a" (the form parameter
of the Gamma distribution takes the role of the a in the negative
binomial) and p=b/(1+b).
The last interpretation justifies the use of negative binomial models in
situations where the usual log linear Poisson models fail due to
overdispersion. The mean in the negative binomial distribution is
m=a*p/(1-p), the variance m*(1+m/a) > m.
The simplest kind of models (without a "weight") that can be estimated
by the command FITNEGBIN can be characterised as follows. Each
(nonnegative integer) observation y[i] has a negative binomial
distribution. The parameter a is common to all observations, the
parameter p = p[i] depends logit-linearly on background
variates/factors, specified as usual by a model formula. Thus, the mean
of the i'th observation becomes
E(Y[i]) = a * exp( par1*X(i,1) + par2*X(i,2) + ... ) ,
which is of the usual log-linear form. This kind of models are fitted by
commands like (in case of a model with a constant term and a single
covariate)
FITNEGBIN Y=1+X
"WEIGHTED" MODELS.
A useful generalisation is the following. Rather than assuming the
parameter a to be the same for all units, we assume that there are
parameters a[i] = a*n[i], proportional to a given variate n. This is
typically the case if each y[i] has come out by summation of i.i.d.
counts for n[i] individuals. If these counts (on a nonobservable
"micro-level") follow a model of the simpler kind described above, a
model with such proportional parameters a[i]=a*n[i] comes out of it. The
interpretation of the unknown parameter ALPHA (=a) is exactly as before
(but on the micro-level). Since the variate n here has a role very
similar to a weight variate in a generalised linear model, the syntax
for this has been chosen such that the "weight" variate should follow
the model formula, separated by a slash, like
FITNEGBIN Y=1+X/N
Notice, however, that the variate N here is not merely a weight occuring
directly in the summation of the log likelihood function. Moreover, this
kind of "aggregation" over "micro-units" is not quite as simple as the
corresponding concept in the log-linear Poisson models. In the Poisson
case, this is simply a sufficient reduction. For the present kind of
models, the aggregation is not a sufficient reduction, but the marginal
model for the aggregated data set is a similar model, due to the
convolution property of the negative binomial distribution.
Sometimes the iterative estimation procedure will be interrupted by an
error message stating that the information matrix is not positively
semi-definite. Typically, this happens if ALPHA has become too large.
The default action of FITNEGBIN is to take ALPHA=1 as the starting
value. If this problem appears give an initial value of ALPHA as the
last parameter, like
FITNEGBIN Y=1+X/N 0.01
An initial value of ALPHA can also help if ALPHA becomes negative
during the iterative maximisation (which results in an error message).
If there is no overdispersion at all, convergence may fail. The result
should not be trusted in this case, use a Poisson model (FITLOGLINEAR)
instead. Or, if there is some overdispersion, use FITNONLINEAR (see
one screenful below).
Restrictions are taken into account in the obvious way. Missing values
of variates among the units present must not occur.
WARNINGS, LIMITATIONS.
Do not rely too much on the standard deviation estimated for the
parameter ALPHA (called a above). The log likelihood is not well
approximated by a quadratic function in this parameter. Moreover, the
test for ALPHA=0 is irrelevant. The log linear Poisson model corresponds
to the value plus infinity of ALPHA. To test for ALPHA = plus infinity
(no overdispersion), fit the corresponding log linear Poisson model
(with log of the "weights", if present, as an offset) and look at the
test against the full model. Do not use the negative binomial model when
overdispersion is not present - what happens is simply that the estimate
of ALPHA becomes very large, so that the model is almost a Poisson model.
The computation of the log likelihood and its derivatives involves a
summation from 1 to y for each unit. Thus, for large observations (e.g.
y > 1000) the procedure is slow. An alternative (if you insist on a
variance function of the same shape as for the negative binomial) is to
use
FITNONLINEAR modelformula exp(.) exp(.) .*(1+./a)
trying with different values of a until the residual plot is OK. You can
also handle the weigted case by FITNONLINEAR (use variance function
.*(1+./(a*n)), where n is the variate of "weights").
--------------------------------------------------------------------------------
LISTPARAMETERS
Lists parameter estimates and their estimated standard deviations.
................................................................................
No parameters.
After any model fit command except FITANOVA, this command lists the
parameter estimates, their estimated standard deviations and the T- or
approximate U-tests for hypotheses of the form "parameter=0".
There is one parameter for each column of the model matrix, but some
of these are usually set to zero due to overparameterisations (see the
description of FITLINEARNORMAL). For this reason, these tests, given
by the two last columns produced by LISTPARAMETERS, should be
interpreted with some care.
EXAMPLE. In a simple one-way situation, after
FITLIN Y=F
LISTP
the T-tests reported are for hypotheses of the form "mean in group f
equals zero", which is rarely relevant. After fit of the same model
(over-) parameterised by
FITLIN Y=1+F
LISTP
the T-tests reported are correspond to pairwise comparisons of each
F-level with the last F-level (which is sometimes relevant) - except
for the first line labelled "CONSTANT", which returns the test for
"mean in last group equals zero" (!).
Notice that the last form "Y=1+F" is the relevant one, if the test for
"no effect of F" is to be derivable from the analysis of variance table
(for FITLINEARNORMAL and FITNONLINEAR only).
--------------------------------------------------------------------------------
TESTMODELCHANGE
Likelihood ratio test for reduction/extension given by the two last
models fitted.
................................................................................
Syntax: TESTMODELCHANGE [ test-statistic [p-value]]
For log-linear, logit-linear models and many other models, for which an
approximate chi-square test on the likelihood-ratio test statistic is
appropriate, TESTMODELCHANGE performs a simple subtraction of the
log-likelihoods from the two latest model fits and a similar computation
of the change in number of parameters, and computes the likelihood-ratio
statistic and the relevant tail probability in the approximating
Chi-square distribution.
For linear normal models fitted by FITLINEARNORMAL, the relevant F-test
is performed.
For nonlinear models fitted by FITNONLINEAR, the approximate F-test,
based on the weighted residual sums of squares from the last two
models, is performed. Notice that this is not always reliable, since
the weights may have changed if there is a non-constant variance
function involved. The tests given in the approximate ANOVA table are
probably more reliable.
If the first parameter "test-statistic" is specified it must be the
name of a vector of length 1 or a valid vector name which is not in
use. In the latter case it becomes a vector of length 1, and the value
of the (chi-square or F) test statistic is stored in it. Similarly, if
a second parameter is specified, the P-value (tail probability) of the
test is stored in this. To keep only the P-value but not the test
statistic, write the first parameter as an asterix.
The command does not work after FITANOVA.
WARNING. It is a requirement - and mainly your own responsibility - that
the test makes sense. In particular, the two last models fitted must be
of the same type (log-linear, logistic or whatever), the number of
observations must be the same (no restrictions changed in between,
weights and offsets unchanged). For non-linear models, the three
functions specifying such a model must be the same for the last two
models, for Cox models the stratifying factor must not have changed,
etc. etc. In brief, one of the two last models fitted must be a submodel
of the other.
--------------------------------------------------------------------------------
SAVEFITTED
Computes fitted values, residuals and normed residuals after a model fit
command.
................................................................................
Syntax: SAVEFITTED fitted [residuals [normedresiduals]]
After FITLINEARNORMAL, FITLOGLINEAR, FITLOGITLINEAR and FITNEGBIN, this
command saves
- Fitted values ( = estimated means of observations)
- Residuals ( = differences between observations and fitted values)
- Normed residuals ( = residuals divided by estimated standard
deviations)
The three parameters must be unused vector names or names of variates of
length equal to the number of units in the last model fit directive. An
asterix * or a pseudoblank | (or simply omittance, for the parameters
coming last) means that the variate is not to be computed.
EXAMPLE.
SAVEF * RES
saves residuals in RES (and creates RES, if required), but fitted values
and normed residuals are not computed.
Normed residuals are computed without correction for the fact that the
corresponding observation contributes to the estimation of parameters.
Thus, normed residuals are typically less dispersed than i.i.d.
observations from a normalised normal. For normal linear models, this is
emphasised by the fact that the square sum of the normed residuals will
always equal the number of observations minus the number of parameters
estimated. However, this means that an extremely large normed residual
can be taken as a (conservative but) safe indication of an outlying
observation. For more exact outlier detection in the normal linear
case, use SAVENORMEDRESIDUALS.
Restrictions are not obeyed by SAVEFITTED, except in the obvious sense
that if the last model fit was made under restrictions, the parameter
estimates used by SAVEFITTED will be influenced by this. But the
fitted values (and also residuals and normed residuals) are computed
for non-present observations in exactly the same way as they are for
the present obvservations. This actually means that the fitted values
corresponding to observations that were not present when the model was
fitted can be regarded as predictions of these "new" observations.
This makes the command SAVEFITTED useful for many different purposes,
such as cross-validation, replacement of missing observations and
prediction in time series.
SAVEFITTED can also be used after FITCOXMODEL, FITMC... , FITCLOGIT and
FITCRASCH, but only for storage of fitted values (residuals and normed
residuals are not defined). Here, "fitted value" means "estimated linear
expression", not "estimated mean of observation". After FITNONLINEAR the
command will work, but the resulting variates are not fitted values and
residuals in the usual sense. See the description of FITNONLINEAR.
--------------------------------------------------------------------------------
SAVEPARAMETERS
Save estimated parameters from last model fit command.
................................................................................
Syntax: SAVEPARAMETERS estimates [estsd [variance] ]
After any model fit command but FITANOVA, this command saves the
estimated parameters in a variate. There is one parameter for each
column of the model matrix, as generated by the model formula. The
order and interpretation of the parameters (which can be displayed by a
LISTPARAMETERS command) follows from the rules described under
FITLINEARNORMAL. Notice that parameters which are set to zero due to
overparameterisation are also saved.
If the second command parameter is specified, a variate holding the
estimated standard deviations of estimates is also created.
The parameters must be valid names of non-existing variates, or names
of variates of the correct length (which is the number of columns of
the model matrix, or the number of lines in the table produced by
LISTPARAMETERS).
The last parameter "variance" makes sense only after FITLINEARNORMAL
and FITNONLINEAR. It must be the name of a variate of length 1, or a
valid vector name which is not in use. The effect of this is that the
estimate of the error variance (or the corresponding squared scale
parameter, in case of a weighted model or a nonlinear model with
variance function different from 1) is stored in this variate.
To skip a parameter, write it as an asterix. For example, to save only
linear estimates and the estimated model variance, write e.g.
SAVEPAR pars * var
EXAMPLE. Suppose a log-linear model has been fitted by a command like
FITLOGLIN COUNT-LOGSIZE=1+TREAT+SEX+...
where TREAT is a factor on four levels. To compute and list standard
approximate 95% confidence limits for the relative multiplicative
effects of TREAT with level 4 as baseline, do something like this:
SAVEP PARS1 SD1
VAR PARS SD 4
TRANSFER PARS1 2 5 PARS 1 4
TRANSFER SD1 2 5 SD 1 4
DEL PARS1 SD1
ESTIMATE=EXP(PARS)
LOWER=EXP(PARS-1.96*SD)
UPPER=EXP(PARS+1.96*SD)
DEL PARS SD
LIST ESTIMATE LOWER UPPER
Notice: This relies on the convention that the last parameter of TREAT
is set to zero, because this is where the linear dependence of columns
in the model matrix is met for the first time. The last line of the
listing will have ESTIMATE=1 (=EXP(0)) and LOWER=UPPER=ESTIMATE (since
SD=0). To select level 1 as the baseline level, begin (before the model
fit command) by a "baselining" of that level, e.g. by
TREAT=TREAT-1
(setting the desired baseline level to 0, which implies that no "dummy"
column is generated for that level), or make a permutation of the levels
such that the desired baseline level becomes the last. See also the
command ESTIMATE below.
--------------------------------------------------------------------------------
ESTIMATE
Outputs specified contrasts and their standard deviation. The command
can be used after any model fit command.
................................................................................
Syntax: ESTIMATE term1 [term2 [...]]
or ESTIMATE variate1 [variate2 [...]]
In the first case, the parameters must be terms of the model formula,
separated by blanks or plusses. For terms involving factors, all
possible differences between parameter estimates are listed with their
estimated standard deviations. For terms with variates only, the
estimate of the regression coefficient and its standard deviation is
given. If the term involves only a single factor, a simple plot
showing how the estimates are positioned on the line is added. If more
than 20 parameters are involved this is the only output you will get,
since the list of pairwise comparisons is too long to be of any use.
The second form can be used for estimation of quite general linear
combinations of the parameters. variate1 , variate2 etc. must be of
length equal to the number of linear parameters in the model,
including those that are set to zero due to linear dependence (i.e.
the number of lines written by LISTPARAMETERS), and the corresponding
linear combinations (with the variate's values as coefficients) are
estimated.
EXAMPLE. If TREAT is a factor on 3 levels, the two ESTIMATE commands in
the following program will give roughly the same output (except that
the last one will not produce a plot)
FITLIN Y=1+TREAT
INCLUDEALL { required if restrictions on units 1..4 }
COEFF1=[0 -1 1 0]
COEFF2=[0 -1 0 1]
COEFF3=[0 0 -1 1]
ESTIMATE TREAT
ESTIMATE COEFF1 COEFF2 COEFF3
The two kinds of parameters can be mixed as desired. For example,
ESTIMATE F COEFF1 COEFF2 COEFF3
would be OK in the above example.
WARNING. The quantities estimated are defined in a simple way in
relation to the model matrix, namely as either
coefficients to the corresponding covariates (in case of a variate
argument),
differences between such coefficients (in case of an argument
involving at least one factor), or
linear combinations of such coefficients (in case of a variate
argument with length equal to the number of parameters in the
model).
However, in relation to the model they are not always meaningful. For
example, in a R(ow) x C(olumn) two-way setup,
ESTIMATE R*C
makes perfectly sense after
FITLINEAR Y=R*C or
FITLINEAR Y=1+R*C
(where it performs all the pairwise comparisons of (r,c)-means), but
not after
FITLINEAR Y=1+R+C+R*C
because the presence of main effect terms will destroy the simple
interpretation of the differences between interaction parameters.
Similarly,
ESTIMATE R
makes no sense at all after
FITLINEAR Y=1+R+C+R*C
- or perhaps we should say that the sense it makes is somewhat
complicated. The quantities estimated would be differences between
cell means in the last column of the two-way table.
After FITANOVA, the command ESTIMATE can also be used, but here it
activates a quite different procedure adapted to the case of a mixed
model, where contrast variances can be sums of contributions from
different error strata. The only arguments allowed are fixed terms
from the model formula of the last FITANOVA command, and the estimates
coming out of this are always the means of observations in the groups
defined by this factor or product factor. The command outputs a table
of means, together with information that enables you to compute
standard deviations of simple contrasts (differences between means).
In the simplest case (effects of equally replicated factors with no
partially confounded random factors) the standard deviation is given
explicitely. In more complicated situations you must compute it from
the contributions to the variance from the different strata. Since the
treatment structure is always specified by the maximal model formula,
estimability of linear parameters is not taken into account by this
form of the ESTIMATE command.
--------------------------------------------------------------------------------
SAVENORMEDRESIDUALS
Computes "studentized" (T-distributed) normed residuals and the
correponding tail probabilities after FITLINEARNORMAL and FITNONLINEAR.
................................................................................
Syntax: SAVENORMEDRESIDUALS nres [pvalues]
When SAVEFITTED is used after FITLINEARNORMAL, it computes normed
residuals simply as residuals divided by the estimated standard
deviation. When hunting outliers, a more relevant definition of normed
residuals is the one that makes them T-distributed with ResDF-1
degrees of freedom. A "studentized" residual can be computed by
removal of the observation from the data set and fit of the model to
the remaining observations. Or by extension of the model with a dummy
that allows the observation to have its own, freely varying mean, and
performing the T-test for the hypothesis that this term can be removed
from the model. But a faster way of computing all these quantities
goes as follows. Let ModelVariance*h(i) be the estimated variance of
the i'th fitted value, computable as the double sum over (j1,j2) of
the quantities
Xmatrix(i,j1)*Xmatrix(i,j2)*ParameterCov(j1,j2).
h(i) can also be interpreted as the i'th diagonal element of the
orthogonal projection matrix for the linear subspace of means associated
with the model. The (estimated) variance on the i'th residual r(i) is
then
V(i) = ModelVariance*(1-h(i)).
The i'th studentized residual can now be computed as
r(i)/sqrt(1-h(i))
NRES(i) = -------------------------------------------------- .
sqrt( ( SS(res) - sqr(r(i))/(1-h(i)) )/(ResDF-1) )
A similar formula exists for the weighted case. The command
SAVENORMEDRESIDUALS performs this computation and saves the result in
the variate "nres" given as the first parameter. If this is an
existing vector, it must be a variate of the correct length, otherwise
it is declared as such.
The second variate "pvalues", if specified, is also declared
automatically, and the command will store in this the two-sided tail
probabilites in the relevant T-distribution. A reasonable
(conservative) criterion for an observation being "outlying" is that
this tail probability is less than, say, 0.05 divided by the number of
observations (because this ensures that an outlier is found in a
correct model with probability at most 0.05).
The command can also be used after FITNONLINEAR, but here the
T-distribution must obviously be taken as an approximation. The
"studentized residuals" produced in this case are those associated
with the weighted regression performed in the last iteration.
Restrictions are obeyed in the following sense: For non-present units
normed residuals are set to zero and tail probabilities to one. In
weighted models, weight=0 produces missing values for the
corresponding studentized residuals (and P-values).
--------------------------------------------------------------------------------
FITANOVA
Analysis of variance in orthogonal designs.
................................................................................
Syntax: FITANOVA response=fixedterms+[randomterms]
where the brackets here are not "syntax brackets", they should
actually be there if the model contains random effects.
This command performs analysis of variance for models in orthogonal
designs, including certain variance component models, closely following
the exposition
Tjur, T. (1984):
Analysis of Variance Models in Orthogonal Designs
International Statistical Review 52, pp. 33-81
The theory given in that paper will not be repeated in full detail here.
The syntax for FITANOVA is similar to that of FITLINEARNORMAL, with the
following modifications:
(1) A weight can not be specified.
(2) Only factors, not variates, are allowed in the model specification on
the right hand side of the equality sign.
(3) Random effects in addition to the "unit-to-unit variation" (which is
always assumed to be present) can be included in a final bracket.
For example
FITANOVA Y = 1 + ROW + COL + [ROW*COL]
will estimate a model with fixed effects of ROW and COL (and a
constant term), random ROW*COL effect (interaction) and a
(mandatory) random UNIT effect (e.g. a measurement error, i.e.
independent i.i.d. error terms as in a linear normal model).
(4) The model specified must satisfy the following conditions, which
- apart from non-essential modifications - are those given in Tjur
(1984):
A) The entire set of factors, occurring in the model specification,
must be closed under the formation of minima. For example, for a
balanced two-way table,
FITANOVA Y = ROW + COL + [ROW*COL]
will not work, because the minimum of ROW and COL (the trivial
factor, represented by the constant term 1) is not present.
B) The set of random factors (those occurring in the bracket) must
be closed under the formation of minima, and they must all be
balanced (i.e. they must group data into equally sized classes).
C) Any two factors or product factors in the model must be
orthogonal.
It follows from these assumptions that the set of factors occuring in
the model specification constitutes an orthogonal design (Tjur 1984, p.
41-42), and the set of fixed (non-bracketed) factors constitutes the
maximal model formula specifying the treatment structure in this design.
The analysis of variance table, output by FITANOVA, contains a line for
each factor or product factor occurring in the model specification,
including the random UNIT effect, with omission of square sums that
are "structurally zero" (i.e. with zero degrees of freedom). The lines
of the table are ordered by strata, each stratum containing the sums
of squares for fixed effects in that stratum, with the sum of squares
for the associated random factor as the last line, labeled "Residual".
The lines for fixed effects give the F-tests for removal of the
corresponding factors from the model. The "Residual" lines (for random
effects) give the F-tests for removal of the corresponding random
terms from the model, whenever this is a legal hypothesis (see Tjur
1984, p. 58).
The estimated eigenvalues of the covariance matrix are found in the
analysis of variance table as the "Residual" MS's. In the list of
estimated variance components, the column SD gives the estimated
standard deviations of these estimates, based on their interpretation as
linear combinations of the Chi square distributed, independent
eigenvalue estimates. Normal confidence limits, based on these standard
deviations, are only reliable for large residual degrees of freedom in
the corresponding stratum and all higher strata.
The estimate "Total variance" (the sum of the variance components, i.e.
the variance on a single observation) is similarly equipped with a
standard deviation, computed in the same way.
Missing minima of (product) factors, which can not be generated in a
simple way, can be constructed by the command CONSTRUCTMINIMUM, see
below.
Estimation of fixed effects can be performed by the command ESTIMATE,
see approximately 8 screenfuls above. LISTPARAMETERS, SAVEFITTED,
SAVEPARAMETERS and SAVENORMEDRESIDUALS can not be used.
PSEUDO STRATA.
The formal requirement that the set of random factors should be closed
under the formation of minima implies, e.g., that a model for a two-way
table with random row and column effects can not be estimated by
FITANOVA Y=1+[ROW+COL+ROW*COL]
since the minimum 1 of ROW and COL is not among the random effects. You
will have to use
FITANOVA Y=1+[1+ROW+COL+ROW*COL]
instead. However, this implies that the eigenvalue parameter for the
"pseudo" stratum CONSTANT STRATUM is set to zero, because it can not be
estimated. Accordingly, the variance on the grand mean is formally
estimated as zero. However, the procedure ESTIMATE assumes that
variance components for such strata should be set to zero, so the
variance of the grand mean in this case (in output from an ESTIMATE 1
command) will be computed from the contributions to the variance from
the three "non-pseudo" strata. This may result in a negative value for
the estimated variance on the grand mean. More generally, variance
components corresponding to strata with zero degrees of freedom for the
residual will be skipped when contrast variance estimates are computed,
and this may result in negative values of these estimates.
The response variate must not contain missing values, and the factors
involved must not have units on level zero. Restrictions are taken into
account in the obvious way. However, the exclusion of units with missing
or outlying responses will usually destroy orthogonality and
balancedness. To obtain an approximate solution, replace (a few) missing
values or outliers with suitable "typical values", e.g. group averages
or predicted values from a linear model.
CONSTRAINTS. The maximal number of levels for a product factor is 2049.
The maximal number of fixed terms is 40, the maximal number of random
terms is 10, and no term must be a product of more than 8 factors.
--------------------------------------------------------------------------------
CONSTRUCTMINIMUM
Construction of the minimum of two (products of) factors.
................................................................................
Syntax: CONSTRUCTMINIMUM prodfac1 prodfac2 mininum
CONSTRUCTION OF PSEUDO FACTORS.
Sometimes the closedness-under-minima constraint in FITANOVA enforces
the inclusion of fixed-effect factors which are not statistically
meaningful. Such factors are called pseudo-factors. Very often such
factors will not be given in the data set, so you will have to construct
them. The minimum of two given (products of) factors can be constructed
by the command CONSTRUCTMINIMUM. The three parameters are strings. The
first two must be names of existing factors or products of such, the
last must be a valid identifier of a non-existing vector. The minimum of
the factors given as parameter 1 and 2 is stored in a factor of name
parameter 3.
CONSTRAINTS. Since the result is stored as a single factor, the number
of levels for the minimum must not exceed 255. Restrictions are NOT
taken into account.
--------------------------------------------------------------------------------
BARTLETT
Bartlett's test for variance homogeneity.
................................................................................
Syntax: BARTLETT variate factor
The variate and the factor must be of the same length. Bartlett's test
for equality of the variances in the "variate"-samples in the one-way
setup defined by "factor" is performed. Restrictions are taken into
account, and empty groups or groups with a single observation are
ignored. The variances in groups are listed, and Bartlett's test with
bias correction is performed. If only two samples are present, also
the relevant F-test on the proportion between the two variances is
performed. Here, the right tail probability in the F-distribution is
given for the ratio between largest and smallest variance.
To check for constant variance across the groups determined by a
factor in a linear model which is not just the one-way model
determined by the factor, use BARTLETT with the variate of residuals
as the first argument. This is a reasonable approximation when the
number of observations is large compared to the number of parameters
in the model.
--------------------------------------------------------------------------------
CORMAT
Writes a table of correlation coefficients for a set of variates.
................................................................................
Syntax: CORMAT variate1 variate2 [variate3 [...]] [stars]
The variates must be of the same length. The procedure writes a table of
correlations between the variates.
The last (optional) parameter "stars" controls the printing of
significance indicating stars. Without this parameter, no such
printing takes place. If stars = * , a single star indicates
significance on (two-sided) level 5%. If stars = ** , a double star in
addition indicates significance on (two-sided) level 1%. For stars =
*** , a triple star means significance on (two-sided) level 0.1%. To
avoid line overflow, the number of decimals for the correlation
coefficient becomes smaller when the number of stars is increased. The
number of decimals after the point is 5 minus the number of stars. If
you specify four or more stars, the number of decimals becomes 5 and
no stars are printed, but a table of P-values is displayed instead.
In this context "significance", of course, refers to the distribution
of the empirical correlation coefficient in the normal case when the
theoretical correlation is zero.
Restrictions are obeyed and missing values are treated as non-present.
But for each pair of variates, only the missing values of the two
variates involved are excluded, not the units corresponding to missing
values for other variates in the list. Thus, if you want a proper
(positively definite) empirical correlation matrix, EXCLUDEMISSING for
all variates involved must be used before CORMAT.
--------------------------------------------------------------------------------
WILCOXON
Wilcoxon (or Mann-Whitney) two sample test
................................................................................
Syntax: WILCOXON variate factor level1 level2
Performs Wilcoxon's nonparametric rank sum test for comparison of two
empirical distributions, with the normal approximation to the
distribution of the test statistic. The two samples of observations
are defined as the values of the variate corresponding to units given
by the two factor levels of the factor. Only present values are taken
into account, and missing values are treated as non-present. Ties are
corrected for by simple averaging, and the two extreme values of the
test statistic, consistent with the ties, are also computed.
--------------------------------------------------------------------------------
SPEARMAN
Spearman's rank correlation test for independence
................................................................................
Syntax: SPEARMAN variate1 variate2
The two parameters must be names of variates of the same length, with
at least two values present and no missing values present. Spearmans
rank correlation (for the values present) is computed, and the
approximate test for no dependence (assuming sqrt(n-1)*RankCor
normalised normal) is performed. If ties are present, the result comes
out as the average of the two extreme values obtainable by arbitrary
ranking within tie groups (and these "extreme rank correlations" are
reported also). As for WILCOXON (see above), this averaged test is
conservative. Rejection of the hypothesis "no dependence" is reliable,
but acceptance is, in principle, only possible when the two "extreme"
tests agree on this.
--------------------------------------------------------------------------------
SHOW
Shows the contents of a file.
................................................................................
Syntax: SHOW [filename, directory or path]
When the command is used without parameters, the sessions output file
is shown. Here, some color effects are used to make the file more
readable. For this reason, the output file should in general be kept
short, otherwise it will take a long time to form the image. Do not
save LISTings of very long data sets, they are useless anyway. If you
really want to save such listings (for example as input to other
programs), redirect the output to another file by an OUTFILE command,
see 4 screenfuls below. Also, if you write programs with loops, use
ECHO 0 and OUTFILE 0 before the loops, otherwise the output file will
be filled up with command echoes and output from each loop. If, by
mistake, you have created a very long output file, you can reset it by
an OUTFILE * command (unless the file contains things that are
important, in this case you will have to QUIT the session and start a
new; or copy and paste to an "emergency backup file" which you create
by an EDIT command).
If the parameter is a valid directory name or a path, a file selection
menu appears. For example,
SHOW *.* { or just SHOW . }
allows you to search among all files in the present directory (i.e. the
working directory), whereas
SHOW ..\proj2\*.JPG
can be used if you want have a look at the JPG pictures on the sibling
directory proj2.
If the parameter is the name of an existing file, this file is chosen.
The action of the SHOW command depends on the file extension:
.OUT : The file is shown as an ISUW output file (to view output
files from earlier ISUW sessions).
.JPG, .BMP : The picture is shown in ISUW's own picture viewer. Use
a SHELL command if you want Windows' default device for these
extensions.
.SUD : Writes the list of vectors stored in the StatUnit data set,
their types, lengths, number of levels for factors and the space
they occupy in bytes. Vector names that coincide with names of
existing vectors are marked with an asterix.
.ISU, .TXT, .DAT, .BAT, .CMD, .PS, .EPS, no extension : Shows the
(plain text) file in ISUW's "read only editor".
When a text file is viewed, you can use standard Windows features to
select a portion of the text and copy (Ctrl-C) it to the clipboard,
from which you can paste it into e.g. the editor by Ctrl-V. Use Ctrl-F
to search for a given text string. The search is forwards from the
cursors position, ignores case and multiple blanks and interpretes
newline symbols as blanks. Repeat the search by Ctrl-L. Use Ctrl-P to
print the whole file or the selected portion.
SHOW commands are ignored in programs, except that the last SHOW
command in a program is executed after the program has ended. More
about this in the description of RUN.
To open a file by its default Windows application determined by the
file extension, use the SHELL command.
--------------------------------------------------------------------------------
REMARK
Writes a message to the output file.
................................................................................
Syntax: REMARK text
Writes "text" to the output file.
If ECHO is off (see below) or an alternative output file has been
selected by OUTFILE (see 2 screenfuls below), the text is simply written
(leading blanks are ignored, multiple blanks are replaced with single
blanks). Avoid text exceeding 80 characters, the lines will not be
broken.
--------------------------------------------------------------------------------
QUIT
Terminates the ISUW session.
................................................................................
No parameters.
In interactive mode, the same happens if you press Escape from an empty
command line, and then press Return.
Before the session is terminated you are given the option to save the
output file (temporarily created under the name ISUWOUT.TMP on the ISUW
root directory) as a file with extension .OUT. These files are plain
text files and can as such be imported to text handling programs, or
viewed later from other ISUW sessions by the SHOW command.
--------------------------------------------------------------------------------
ECHO
Turns ISUW command and error message echo on/off
................................................................................
Syntax: ECHO [realexpression]
The default action of ISUW is to echo all commands and error messages to
the output file, which means that the output file will contain a
complete log of the session. If you want to create a condensed output
file, consisting only of selected output, use ECHO as the first command.
When ECHO is used without parameters, it simply toggles ECHO on or
off. With a real number or expression as the parameter, a positive
value means that ECHO is set on, whereas zero or a negative value sets
it off. Typically, you will use the parameters 1 for on, 0 for off.
Another way of creating a condensed output file without command echoes,
error messages etc., is to select an alternative output file by OUTFILE,
see below.
ECHO 0 should be used in most cases before loops in a program. Without
this, the sessions output file will very quickly be filled up with
echoes of the commands in the loop.
--------------------------------------------------------------------------------
OUTFILE
For redirection of StatUnit output to another file than the sessions
output file.
................................................................................
Syntax: OUTFILE [filename]
EXAMPLE. To write the output from a LIST command on a file LIST.TXT
(which is the simplest way of exporting data to another program or
statistics package), write
OUTFILE LIST.TXT
LIST ...
OUTFILE
% and just to check that everything is OK
SHOW LIST.TXT
OUTFILE without parameters (the third command in the program)
redirects output to the primary output file. If this is already done,
nothing happens. Do not write the name of the sessions temporary
output file explicitely, this will overwrite the file instead of
appending to it.
Quite generally, if the parameter is the name of an existing file,
that file will be overwritten without warning.
When output is redirected in interactive mode, the result of an output
generating command is previewed as usual, and the way you leave the
read-only-editor (by Return or Escape) determines whether output is
saved or suppressed. The only difference is that it is written to
another file. Command echoes, error messages etc. are still written to
the sessions original output file.
You can use the file name 0 or NUL to suppress output entirely. In
interactive mode, output will still be shown in the preview window,
but regardless of whether you leave this window by Return or Escape
the output will be lost. The command OUTFILE 0 should be used before
loops in a program, if the loop contains output generating commands
- unless you really want to see the output from all the loops.
A special emergency version of the OUTFILE command has the form
OUTFILE *
The effect of this is that the sessions output file is "reset". All
lines (except the first four) are deleted. You can use this to get rid
of all output produced until now, for example if you have created a
very long and unhandy output file by LISTing a long data set or
forgetting to turn command echo off before the execution of a program
with loops. But notice that all output produced earlier in the session
is lost. If there is something you want to keep, you will have to save
it on another file by copy and paste before you use the OUTFILE *
command. Notice also that this command acts on the sessions "official"
output file, not on a file to which you have redirected output by an
earlier OUTFILE command. Such files can be reset simply by reopening
them.
--------------------------------------------------------------------------------
EDIT
Edits ISUW programs and other plain text files.
................................................................................
Syntax: EDIT [filename]
If the command is given without parameters or with a path or file mask
as its parameter, a file selection menu appears.
For ISUW programs (see the RUN command below), the file extension .ISU
can be omitted. To edit a file without extension, add a period as the
last character of the file name.
When you edit a file you can
- press Ctrl-Return to truncate the line from the cursor's position.
- press F1 to get help.
- press F2 to save changes.
- press F10 to display the sessions output file.
- search for a word or phrase by Ctrl-F.
- repeat last search by Ctrl-L.
- press Escape to leave the editor.
When you edit an ISUW program you can, in addition,
- press Ctrl-F1 to get help on the command in the present line.
- press F9 to execute the program without leaving the editor. If an
error occurs, the corresponding line of the program will be
marked. Use a SHOW command as the last command of the program if
you want to have a look at the result right after execution.
If a portion of the text is marked as a block, only the block is
executed.
- use shortkeys defined by the KEYS command (with the obvious
exceptions Ctrl-F, Ctrl-L, F2, F9 and F10). They will act roughly
as they do from the command line, except that a final exclamation
sign is replaced with a line shift, and the text of the present
line will only be substituted for question marks in the KEYS
string if it is marked (highlighted). However, the marked portion
of the text must be contained in a single line, otherwise nothing
happens.
In addition to this, the editor has the following standard features of
Windows editors. Mark (highlight) a block by the cursor arrows with
the Shift key down, or use the mouse. To mark the entire text as a
block, press Ctrl-A. When a block has been marked you can do the
following:
- Delete it by the Delete key (or overwrite it by any character key).
- Delete it and copy its contents to a hidden "clipboard" by Ctrl-X.
(these two operations can be regretted by Ctrl-Z)
- Copy its contents (without deleting it) to the clipboard by Ctrl-C.
Later, you can (optionally while editing another file, perhaps even in
another editor)
- Paste the contents of the clipboard into the text at the cursors
position (or to replace marked text) by Ctrl-V.
EDIT commands in programs are ignored.
--------------------------------------------------------------------------------
RUN
Executes a program.
................................................................................
Syntax: RUN [filename [par1 [par2 [...]]]]
In many situations it is convenient to write a sequence of commands line
by line on a text (ASCII) file, for later error correction, modification
and reuse. Such a program file - call it PROGRAM.ISU - can be created
and executed by
EDIT PROGRAM
RUN PROGRAM
It is also possible to execute a program directly from the editor, see
the description of EDIT right above.
Notice that the extension .ISU - which is required - is not written as
part of the file name in the EDIT and RUN commands.
If the file name is omitted or replaced with an asterix, a file
selection menu (path *.ISU) appears.
The optional additional parameters par1, par2, ... are for parameter
substitution in the program, see the description of SUBSTITUTE
approximately 4 screenfuls below.
The file must contain one command per line, except that a backslash \ at
the end of a line means "append next line". The backslash can break the
command at any point, also in the middle of a connected word. This means
that a blank is not inserted automatically - don't forget to insert a
blank before the backslash or in the beginning of next line, if there
should be one.
The program executes roughly as if you were typing the file line by
line from the keyboard, unless GOTO commands are used to break the
order. However, the commands are not stored in the upper "reuse"
window of ISUW's front window, and all output is written directly to
the sessions output file without any previewing. Thus, if you want
output suppressed for certain commands, you must surround these
commands by OUTFILE 0 ... OUTFILE commands.
Commands can be truncated as far as they are unique. However, command
field shortkeys (like A for INCLUDEALL) can not be used. COMPUTE
commands can be written directly without the command name and without
a leading blank, they are recognized by the appearence of an equality
sign right after the first connected word. However, COMPUTE commands
without a right hand side must start with = or + (or COMPUTE).
Graphics output produced by PLOT and HISTOGRAM commands is previewed as
in interactive mode. However, an exception from this occurs when a
PostScipt file is open. In this case, the images are removed from the
screen immediately, implying that you do not have to press Escape each
time a picture is formed and sent to the PostScript file.
The 3-d versions of PLOT and HISTOGRAM can not be used in programs.
If an error occurs during the execution of a program, it will abort
with an error message. The message on the screen contains information
about which line the error occurred in and which loop (if not the
first). The number of loops, in this context, is the number of times
that the program file was opened and read from the beginning, see the
GOTO command 7 screenfuls below. Edit the program to correct the
error, remove vectors that were imported to or created by the program
before the error interrupt, and RUN it again.
The Escape key can be used to interrupt a program if it takes more
time than expected or is stuck in an infinite loop. Escape will force
the program to exit after execution of the command presently handled,
just as if an error had occurred in that command.
Notice that vectors that are (explicitely or implicitely) declared in
the program or imported to the program by GETDATA will be present when
the program terminates (except for "$-vectors", see below). This means
that you will typically run into name coincidence conflicts if you
forget to delete these vectors before you run it again. For larger
programs, it is usually a good idea to write the program such that it
imports and creates its own data, and let the first line of the
program be a DELETE command without parameters.
A useful convention for "local variables" is this: If the name of a
variate or factor begins with a dollar sign it is deleted at exit from
a program, also if the exit is due to an error or a keyboard
interrupt. Notice that ALL vectors beginning with a $ are deleted,
also those that were present when the program started. Thus, a good
advice is not to use $-names for anything else.
Lines can be indented as desired, and empty lines can be inserted, to
make the program more readable.
Comments can be inserted in two ways:
(1) Lines starting with a percentage sign '%' (optionally after
some blanks) are ignored.
(2) Text in curled parentheses {} within a line is ignored.
Such comments are not echoed to the output file (use REMARK for this).
Nested programs are not allowed, i.e. RUN commands must not occur in
programs. Also, OPENCOMMANDFILE and the 3-D versions of PLOT and
HISTOGRAM are forbidden. EDIT and SHELL commands in programs are
ignored.
SHOW commands have a special role in programs. Or rather, the last
SHOW command in a program has a special role, since this is the only
one that is executed, and this is done AFTER the program has executed.
For this reason, it is usually most convenient to place the SHOW
command as the last command of the program. A construction which is
particularly useful when a large program is tested from the editor is
the following. Let the program start with a command like
OUTFILE tmp
and end with
SHOW tmp
This implies that all output is written to a temporary file (here
called tmp), which is overwritten each time the program runs. As long
as there are errors in the program you will stay in the editor, but as
soon as the program executes without errors, its output will be shown
immediately.
--------------------------------------------------------------------------------
SUBSTITUTE
Parameter substitution in programs.
................................................................................
Syntax: SUBSTITUTE string1 [string2 [...]]
where string1 , ... are connected strings (i.e. without blanks).
This command, which makes sense only in programs, represents the
closest ISUW comes to a macro or procedure facility. The action of a
SUBSTITUTE command is best explained by a small example:
Suppose that a program LOGIT1.ISU, to be executed by a RUN command,
consists of the two lines
SUBSTITUTE model
FITNONLINEAR model exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.)
Then the effect of writing e.g.
RUN LOGIT1 freq=1+group+sex+age/count
is that the (long and tedious) command
FITNONLINEAR freq=1+group+sex+age/count \
exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.)
is executed - which actually means that a logistic regression model
with overdispersion is analysed. The pseudo-parameter "model" is
simply replaced with the first connected string after the file name in
the RUN command.
The general idea is that the SUBSTITUTE command specifies a list of
strings separated by blanks. Whenever one of these strings occurs later
in the same program, it is replaced with the corresponding element of
the list of parameters in the RUN command that called the program. For
this reason, the one and only SUBSTITUE command in a program should
quite generally be placed in the beginning, and certainly not in a
loop.
It is also possible to give the replacement strings directly in
the SUBSTITUTE command. This is particularly useful when you are
writing and testing a program in the editor. The syntax for this is,
for the program LOGIT1.ISU above, to write
SUBSTITUTE model=freq=1+group+sex+age/count
FITNONLINEAR model exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.)
This program will RUN without additional parameters. The word "model"
will be replaced with "freq=1+group+sex+age/count" whenever it occurs
in the lines following the SUBSTITUTE command. Even if the RUN command
specifies an additional parameter, this will be overwritten by the
specification following the first equality sign in the SUBSTITUTE
command. The general rule is that if a parameter in the SUBSTITUTE
command contains at least one equality sign, everything before the first
equality sign becomes the string to be substituted, everything after
becomes the string that will replace it.
An obvious consequence of this is that the words to be substituted in
a program must not contain equality signs.
The substitution works by simple case-sensitive comparison of
substrings. There are some obvious problems with this. Things can
break down if one string is a substring of another, or a substring of
a string somewhere in the program, or of a string which is already
substituted for another string. An easy way of avoiding such problems
is to let all strings in the SUBSTITUTE command begin and end with
special characters. A simple and useful convention is to let
substitute strings be names in brackets. Since brackets are not used
for much else in ISUW, this is rather safe.
EXAMPLES. The following program BINSIM.ISU can be used for simulation of
binomal observations. When called on the form, say,
RUN BINSIM Y 0.4 30
it will fill the existing variate Y with simulated observations from a
binomial distribution with probability parameter 0.4 and binomial total
(index) 30.
BINSIM.ISU:
ECHO 0 { to avoid echoes of all the loops }
SUBST [VAR] [P] [N]
VAR $BIN [N]
$I=0 {since $I is undefined it becomes a variate of length 1}
%LABEL
$I=$I+1
$BIN=(RANDOM<[P]) { $BIN is filled with Bernoulli variables }
[VAR]($I)=SUM($BIN) { ...and the $I'th entry of [VAR] becomes }
{ the sum of these }
GOTO %LABEL $I<##([VAR]) { GOTO is described below }
ECHO 1
Notice that the two auxillary variates $BIN and $I are deleted
automatically at exit, because they have a dollar sign as the first
character of their names.
The following program OUTLIERS.ISU, when called on the form, say,
RUN OUTLIERS Y 1+SEX+SEX*AGE 0.05
performs some outlier detection in connection with the linear
regression model specified by the first two parameters (here
'Y=1+SEX+SEX*AGE'). A plot of fitted values against studentized
residuals (see the description of SAVENORMEDRESIDUALS), with a color
marking of outliers, is produced, and the outliers, if any, are
listed. Here, an outlier is defined conservatively in such a way that
the probability of finding a positive number of outliers in a correct
model will not exceed the number specified as the third parameter,
here 0.05. The program is complicated because the listing of outliers
is performed under restrictions, and the original restrictions, if
any, must be reestablished. And also because we want a special action
to be taken in the case where no outliers are detected.
The contents of OUTLIERS.ISU:
SUBST [y] [model] [alpha]
FITLIN [y]=[model]
SAVEFIT $fitted
SAVENORMED $nres $p
FAC $sign $pres ##([y]) 1
VAR $unit ##([y])
$pres=1 { $pres becomes 1 for units present, 0 otherwise }
$unit=#
$sign=($p<[alpha]/#([y])) { $sign becomes 'outlier-indicator' }
XTEXT Fitted values
YTEXT Studentized residuals
PLOT $fitted $nres $sign=7,12 =*
% Reestablishing default labels for plots:
XTEXT
YTEXT
% Restrict such that only the outliers are present:
FOCUS $sign 1
GOTO %no outliers #($sign)=0
REMARK Strictly significant outliers (alpha=[alpha]).
LIST $unit:5:0 [y] $fitted $nres $p
GOTO %continue 1
%no outliers
REMARK No outliers detected (alpha=[alpha]).
%continue
% Reestablishing initial restrictions:
INCLUDEALL
FOCUS $pres 1
SHOW
When a program with a SUBSTITUTE command is executed, the echo of
command lines after the SUBSTITUTE command will appear as they are
after the substitution.
--------------------------------------------------------------------------------
GOTO
Controls conditional jumps to labels in a program.
................................................................................
Syntax: GOTO label realexpression
The first parameter "label" must be a text string holding the full
contents of a line somewhere else in the program. It is preferable to
use comment lines as labels (either "echoed" comments REMARK ... or
"non-echoed" comments % ...). Comments in curled parenthesis can not
be used because they would be removed from the GOTO command before
its execution.
EXAMPLE. A program of the form
...
% Loop starts here
...
GOTO % Loop starts here 1
...
will create an infinite loop which - unless the program creates some
overflow error - can only be broken by the Escape key. Notice the last
parameter 1, which (as would any other positive constant) implies that
the GOTO statement is actually executed. In more relevant constructions,
the last parameter is a real expression, and the GOTO statement is only
executed if this expression returns a positive value. With the usual
translation of reals to booleans, you may think of the statement as
having an invisible IF before the last connected string.
When a GOTO command with a positive value of its last parameter is
executed, the following lines of the program are read and skipped
until a line consisting of the text "label" is found. If the program
file is read through, it is reopened and read once more from the
beginning. If a line with the correct text is found, execution of the
program is taken up again right after (notice, AFTER) that line,
otherwise the program is left regularly (i.e. without any warning or
error message). This implies that a construction like
...
GOTO %EndOfProgram n=100
...
%EndOfProgram
will work as intended, even if the last line is forgotten or misspelled
(provided that no other line with exactly this content is found).
The line to be searched for must match "label" litterally, also as
regards upper/lower case of letters. But leading, trailing and multiple
blanks are ignored in the comparison.
The second parameter "realexpression" - in this case defined as the
last connected word of the command - must be a valid right hand side of
a COMPUTE statement with a variate of length 1 on its left hand side.
WARNING. Programs with loops that are executed many times should
in general have the command
ECHO 0
somewhere before (or in) the loop. If the loop contains output
producing commands, also the command
OUTFILE 0
should be found somewhere before (or in) the loop. Otherwise, all
commands (and their output) will be appended to the output file each
time, and this takes unnecessarily long time and fills up the output
file with useless garbage.
EXAMPLES.
The following program gives a variate I of length 1 the values 1, 2,
..., 10.
I=0
ECHO 0 { commands in the loop not echoed }
%LOOPSTART
I=I+1
COMPUTE I { write I to the sessions output file }
GOTO %LOOPSTART (I<10)
ECHO 1 { reestablish command echo }
SHOW { to see what came out of this interesting program }
A more interesting example is this. Suppose that X and Y are variates
of length 365. Think of them as time series sampled over a period of
365 days. The following program fits the linear regression model
"Y=1+X" 345 times on data from the 20-day period up to and including
day I, where I takes all the possible values 20, 21, ..., 365. The
slope estimates are stored in the variate BETA, and the final plot
shows the development of this locally estimated regression coefficient
over time. Similar programs could be used for kernel smoothing by
local polynomial regression, optionally with other kernels than the
rectangular (using weights rather than restrictions).
VAR DAY BETA 365
DAY=#
I=19
OUTFILE 0 { To avoid 345 ANOVA tables }
%loopstart
I=I+1
EXCLUDE DAY>I DAY<=I-20
FITLIN Y=1+X
INCLUDEALL
SAVEPAR PARS
BETA(I)=PARS(2)
ECHO 0 { To avoid 344 additional echoes of the loop }
GOTO %loopstart I<365
OUTFILE
ECHO 1
PLOT DAY BETA =7 =L
Another useful example follows here. It is wellknown that even
experienced statisticians tend to be overcritical when normality
assumptions are checked graphically. The only really valid way of
doing it is by comparison of the histogram (or the probit diagram)
with a set of similar plots for data sets of the same length, where
the normality assumption holds. The following program NORMHIST, when
called on the form, say,
RUN NORMHIST 20 100 -4.0,8,4.0
will display 20 histograms of 100 pseudo-normal variables with the
vertical axis from -4.0 to 4.0 divided into 8 intervals.
NORMHIST.ISU:
ECHO 0
SUBST [Hist] [Obs] [Int]
VAR $U [Obs]
VAR $I 1
$I=0
FRAMETEXT Histogram for [Obs] normal observations
XTEXT |
%Label
$I=$I+1
$U=normal
$U=$U-MEAN($U)
$U=$U/SQRT(VARIANCE($U))
HIST $U=[Int]
GOTO %Label $I<[Hist]
FRAMETEXT
XTEXT
ECHO 1
A final example. In the description of SAVEDATA it was explained how
this command can be used to create sub-datasets where excluded units
are "physically deleted". This is the easiest way of doing it, but just
for illustration, we show here how it can be done in a more direct way.
The following program EXTRACT.ISU, when called on the form
RUN EXTRACT COUNTER X
constructs a variate COUNTER, which contains the indices for the units
present in vectors of the same length as X. Thus, the length of
COUNTER becomes the number of units present in X. After this, you can
construct a "short" version of X (or any vector of the same length)
by
INCLUDEALL
COMPUTE SHORT_X=X(COUNTER)
EXTRACT.ISU:
subst [c] [x]
var [c] #([x])
var $present $unit ##(x)
$present=1 { $present becomes "restrict indicator" }
includeall
$unit=$present
exclude 1
$unit=$unit+$unit(#-1){ $unit becomes cumulated restrict indicator }
includeall
$i=0 { running unit number }
%loopstart
$i=$i+1
goto %skip $present($i)=0
[c]($unit($i))=$i
%skip
echo 0 { loop echoed only first time }
goto %loopstart $i<##(x)
echo 1
exclude $present=0 { reestablish restrictions }
--------------------------------------------------------------------------------
OPENCOMMANDFILE
Enables you to import commands from a text file.
................................................................................
Syntax: OPENCOMMANDFILE [filename]
An ISUW command file is roughly the same as an ISUW program. The file
must have extension .ISU, and in the command syntax it must be written
without extension.
If the command is used when a command file is already open, the
present command file is closed and the new is opened. In particular,
if the command is used without parameters, the present command file
- if any - is closed. A blue command field indicates that a command
file is open. Escape from a blue command field generates an
OPENCOMMANDFILE command.
The effect of this command is that the lines of the file specified can
be imported one by one to the command field. The next line of the file
is imported whenever the CursorDown arrow is pressed from an empty
command field. Thus, it is essentially nothing more than an editor
device associated with the command field. As long as you don't press
the CursorDown arrow, everything works as usual in interactive mode.
Conversely, if you import commands one by one and press Return as soon
as a command has been imported, this is just a slow way of executing
an ISUW program. You could obtain almost the same by a RUN command.
However, there are some important differences. Output is previewed as
usual, and you can append it to the output file by Return or suppress
it by Escape as usual in interactive mode. The commands that are
forbidden in programs executed by RUN (RUN, EDIT, SHELL and the 3d
versions of PLOT and HISTOGRAM) can be used when a program is executed
in this way. Conversely, GOTO and SUBSTITUTE can not be used.
The advantage of executing a sequence of commands in this way is that
you can skip commands as desired, insert other commands and make
changes to the commands imported from the file before you execute
them. If an error occurs you can usually correct it and continue - the
command file is still open for import to the command field.
The rules for breaking of long lines and truncation of commands are as
described for RUN. Comments in the form of lines starting with a
percentage sign are skipped, but comments in curled parentheses within
a line can not be used (unless they are removed manually before Return
is pressed).
The demonstration programs on the directory DEMOS under the ISUW root
directory are executed in this way, with the additional feature that
the %-comments before each command are displayed in a separate window.
You can actually write your own demonstration programs for special
purposes and place them on the directory DEMOS.
--------------------------------------------------------------------------------
SHELL
Starts another Windows program, or opens a file by its default application.
................................................................................
Syntax: SHELL [command and/or filename]
EXAMPLES.
To edit a program PRG.ISU with NotePad, if you prefer this for ISUW's
own EDIT command, write
SHELL NOTEPAD PRG.ISU
To view a PostScript file page1.ps (created e.g. by OPENPS etc.) and
optionally print it, simply use
SHELL page1.ps
provided that GhostScript or a similar device is set up as the default
handler of .PS files on your computer.
To invoke Windows Explore, use the SHELL command without parameters or
with the name of a directory as the parameter. Be careful, there are
many things you can do to generate an immediate uncontrolled crash,
like deleting the file ISUWOUT.TMP on the ISUW root directory, or
starting a second ISUW session under the same ISUW root. If this
happens, it may be necessary to perform an emergency closedown (press
Ctrl-Alt-Del and use Window's task manager).
SHELL commands in programs are ignored.
--------------------------------------------------------------------------------
KEYS
Programming the keyboard.
................................................................................
Syntax: KEYS [keyname [string]]
EXAMPLE. After the command
KEY AR RAMSTATUS!
the key combination Alt-R gives the same result as you would obtain by
writing RAMSTATUS in the command field and pressing Return.
Programmable keys are
F2..F12 (F2..F12)
Alt-F2..F12 (AF2..AF12)
Ctrl-F2..F12 (CF2..CF12)
Shift-F2..F12 (SF2..SF12)
Alt-A..Z (AA..AZ)
Alt-0..9 (A0..A9)
Ctrl-A..Z (CA..CZ) except... (see below)
Ctrl-0..9 (C0..C9)
where the exceptions of the type Ctrl-letter correspond to the letters
C, X, V, A and Z, which are reserved for standard Windows editing
purposes (copy, cut, paste, select all, regret).
The lists in parentheses show the way keys are referred to in the first
parameter keyname. For example,
KEY AF4 DELETE!
will imply that Alt-F4 performs a DELETE command without parameters
(thus overwriting the default Windows action of Alt-F4, which is to
terminate the ISUW session).
Without the final '!', a programmed key simply inserts "string" in the
command line at the cursors position, and no execution takes place
before you press Return. If a portion of the text in the command
window is selected (marked), this text is replaced with "string".
EXAMPLE. Suppose you have just executed a READ command like
READ UNIT SEX=Unknown,Male,Female HEIGHT GROUP
Since ISUW has no other facility for saving names of factor levels, you
might want to save the string 'SEX=Unknown,Male,Female' for later use.
To get this string written whenever Alt-6 is pressed, write (reusing
some of the previous command line)
KEY a6 SEX=Unknown,Male,Female
The effect of an exclamation sign - which must be the last character
of the string parameter - is that the command is executed immediately.
In this case, the string must be a command, possibly with some
parameter substituted with a question mark, see below. Think of the
'!' as a code for the Return key. If the string is a COMPUTE command,
the command name can not be omitted. In this case the text in the
command window is overwritten, unless the string defined by KEYS has a
question mark somewhere.
If the KEYS string contains a question mark "?" somewhere, this is
substituted by the presently selected text in the command window.
However, if the string is a command (that is ends with an exclamation
sign) the rule is different. In this case the entire contents of the
command window, with the exception of the leading blank (which has to
be there) will be inserted at the question mark's place).
EXAMPLES. After
KEY AP (?)
you can put a pair of parentheses around the cursor or the selected
text by Alt-P.
After
KEY AE SHELL NOTEPAD ?.ISU!
you will be able to edit a program with NotePad simply by writing its
name without extension (after a blank) in the command line and then
press Alt-E. After
KEY AV COMPUTE Variance(?)!
you will be able to display the variance of a variate by writing its
name (after a blank) and press Alt-V. Whereas the command
KEYS AL FITNONLINEAR ? exp(.)/(1+exp(.)) exp(.)/sqr(1+exp(.)) .*(1-.)!
will enable you to fit a logistic regression with overdispersion just
by writing the model formula (after a blank) in the command field and
then press Alt-L.
To get a list of presently active shortkeys write
KEYS
(without parameters).
Your present shortkey definitions are automatically saved on exit and
recovered next time you start an ISUW session under the same root
directory. You can, however,
- add the startup shortkey configuration (if you have destroyed it) by
KEY +
- delete a single shortkey function by programming it as an empty string,
KEY keyname
- delete all shortkey functions by
KEY -
Notice that if you use the last command and then terminate the session,
the startup configuration is lost. If you have some shortkey definitions
that you want to keep safely, write a program KEYS.ISU like the
following.
KEY -
KEY F10 SHOW! { the output file is displayed by F10. A }
{ natural choice, since this is what happens }
{ when you press F10 from the editor. }
KEY AA EDIT AUTOEXEC.ISU! { edit autoexec program by Alt-A }
KEY AR RAMSTATUS! { a RAMSTATUS is shown when Alt-R is pressed }
KEY AD COMP sqrt(Variance(?))! { display s.d. of a variate by }
{ writing its name and pressing Alt-D }
KEY AX QUIT! { exit with Alt-X }
KEY AP (?) { surround selected text with parentheses by }
{ Alt-P }
KEY F2 SHELL ?.PS! { open postscript file by Windows default }
{ device by writing its name without }
{ extension and pressing F2 }
KEY AN SHELL ezlearn.cbs.dk/stat/hamat-2/tt/! { open ISUW site }
{ by Alt-N. If you download ISUWINST.EXE, }
{ don't execute it while the session is open.}
KEY AT SHELL C:\EXE\WINT.EXE! { open WinT - Desk Calculator }
{ with Statistical Tables - by Alt-T }
KEY AK KEYS! { display present shortkeys by Alt-K }
KEY CK EDIT C:\ISUW\KEYS! { edit (and execute by F9, if desired)}
{ this file by Ctrl-K }
...
Save this program on a suitable working directory or (as an exclusive
exception) on the ISUW root directory.
Local shortkeys - i.e. shortkeys you want to be active only on a
particular working directory - can conveniently be placed as KEY
commands in an AUTOEXEC.ISU program on that directory.
********************************************************************************
ISUW - the Windows version of Interactive StatUnit ISU - is a Delphi 5
application, based on a collection of Turbo Pascal (later Borland
Pascal) units for statistical analysis, developed from around 1990.
ISUW is public domain software. Accordingly, I take no responsibility
for errors in the program; but I would certainly like to hear about
them, and correct them if I can.
The latest version of ISUW can be downloaded from my download page
ezlearn.cbs.dk/stat/hamat-2/tt/
where other useful stuff can also be found.
Tue Tjur e-mail tuetjur@cbs.dk
Copenhagen Business School
The Statistics Group
Solbjerg Plads 3
DK-2000 Frederiksberg
Denmark
@@@ Copyright Tue Tjur 2006 @@@
======================================================== End of ISUWHELP.TXT ===