pspar program

PSPAR: Sparse Matrix Version of PSTAR

Andrew Seary
Simon Fraser University
15 - May - 1999

This is sparse documentation for the PSPAR, the sparse matrix version of p*

Download psparw32.zip This is the 32-bit Win95/NT version with sample data and output files

If you download these programs, please send a brief message to either one of us to let us know about your experience with them. Send mail to: Andrew Seary seary@sfu.ca or Bill Richards richards@sfu.ca

The programs on this page were updated on June 11, 1999. Previous versions read only the ID number of the sender and receiver of each link. They did not read a third number that describes the presence/absence (or strength) of the link. The previous versions assumed that the data file contained a list of the links that are present in the data.
I also updated adj2neg (available on the Utility Programs section) today. Previous versions could handle matrices with up to 400 rows/columns. They also created a line in the data file for each entry in the matrix, regardless of whether it was a 0, a 1, or something else. Data files created with the earlier version of adj2neg would thus include lines for pairs of nodes that had a 0 in the adjacency matrix. Although adj2neg put the number that was in row i, column j of the adjacency matrix after the ID numbers for node i and node j, there was a problem when previous versions of pspar were used to analyze this data.

The problem is that previous versions of pspar did not check to see if the number after the ID numbers was non-zero. This means that the "0" elements in the adjacency matrix were treated the same as the "1" elements.

The new version of adj2neg (the one available now) differs from the previous version in two ways:

1) it can handle networks with up to 2,000 rows/columns (previous versions only up to 400);

2) it asks you if you want it to include lines in the data file for "0" entries in the adjacency matrix. If you answer "n," it will only include lines for non-zero entries in the adjacency matrix. This is the option you should choose unless you want access to the "0" entries in your analysis.

The new version of pspar (the ones available on this page) differs from the previous one in one important way: It reads three numbers from each line of data. The first two are ID numbers of the row ("from") and column nodes ("to"). The third number tells whether the link was present ("1" in the adjacency marix) or absent ("0" in the adjacency matrix). If the third number is "0", that line of data is ignored.

wdr
Vancouver

PSPAR.EXE is a preliminary offering:

it comes in two flavours:
- the 32-bit Win95/NT version can handle up to 2,000 nodes and 500,000 links;
- the 32-bit OS/2 version can handle up to 2,000 nodes and 500,000 links;
both do stand-alone logistic regression;
they fit to many of the statistics produced by PREPSTAR and more (not the p1 alphas and betas - these don't make sense for large networks). However, they DO fit to all 15 (non-trivial) triads, and to comparative networks;
they produce output similar to that shown in the Connections article by Crouch and Wasserman;
they do not require PREPSTAR or any other pre-processing programs or files
they do not require SPSS, SAS, BMDP, or any other analytic packages

PSPAR was designed both

to handle large networks, AND
to make p* fitting easy to use

Both the Win95/NT and the OS/2 versions are 32-bit, compiled with GNU F77, and can deal with networks that have up to 2,000 nodes and 500,000 links.

PSPAR is NOT a pre-processor

PSPAR is not a program you run as a first step; it is not a program that you run to prepare your data for other analysis. With PSPAR, you do NOT use SAS, SPSS, or any other package that does the regression analysis.

PSPAR does everything in one step. It does the complete logistic regression.

PSPAR does not require or produce huge files. In particular, it does NOT use adjacency matrices. If your data is in adjacency matrix format, you can convert it to the required format by using ADJ2NEG.EXE, available at Bill Richards' web site in the "Utility Programs" section.

Network input files are NEGOPY-style link lists. There is one line of data in the file for each link. Each line of data contains the ID number of the "sender" and the ID number of the "receiver" and a "1" to indicate presence of a link. Examples are included in pspar.zip, with the extension .NEG

To see a sample .NEG file, click here.

For large sparse networks, this is a much more efficient representation (and much easier to create, check, and edit) than adjacency matrices. NEGOPY-style files are simplified versions of the .LNK files used by FATCAT and MultiNet.

Blocking is accomplished with attribute files which list node ID numbers and attributes.

There are example attribute files in pspar.exe, with extension .ATR (These are simplified versions of the .IND files used by FATCAT and MultiNet.)

Comparative networks can be any .NEG file

Sample output files are included in pspar.zip, with extension .OUT

CURRENT RESTRICTIONS:

1. Interaction with the program and error-handling are currently rudimentary.

2. For PSPAR, the .NEG files are assumed to be sorted by ID number.

like this        not like this
            
   1 2               5 5
   1 3               5 2
   1 7               2 4
   1 9               1 3
   2 1               2 3
   2 4               2 1
   2 6               1 7
   2 8               1 2
   3 4               3 5
   3 5               1 9
    :                 :
    :                 :

This is the format automatically produced by ADJ2NEG

3. Only integer data can be read from attribute and comparative files.

4. The following are current size limits:

In Win95, NT, OS/2:
- 2,000 nodes maximum.
- 500,000 links maximum.
64 parameters maximum may be fit.
8 blocking attributes maximum. Each attribute may have 16 categories, so 16 x 16 blocks maximum.
16 block types maximum

Larger versions which have most of these restrictions removed are in the works. They will do more with p*, too. But first...

We WELCOME any questions or comments.

PLEASE send email to:

Andrew Seary seary@sfu.ca or Bill Richards richards@sfu.ca

This sparse documentation will be expanded along with the program.

Here is a sample run of the program, using the class4 data from the p* home page. To see a complete output file, click here.

----------------------------------------------------------
D:>pspar
Sparse Matrix p*
by Andrew Seary (March, 1999)

Enter name of network file: class4.neg
Include diagonal (y or n)? y

Fit to block parameters (y or n)? y
Enter name of attribute file: class4.atr
How many attributes (not including id)? 1

Enter name of output file: class4.out

Reading class4.neg               .... 
Enter attribute number for blocking (1- 1): 1

 1  0
 0  1
Accept this block structure? (y or n): y

Select from
Edges:     1) i->j,         REdges:          2) i<>j
2Stars:    3) k<-i->j,      4) k->i<-j,      5) k->i->j
Triads:    6) i->j->k<-i,   7) i->j->k->i,
R2Stars:   8) k<>i->j,      9) k<>i<-j,     10) k<>i<>j
RTriads:  11) i<>j->k<-i,  12) i<>j<-k->i,  13) i<>j<-k<-i
          14) i<>j<>k<-i,  15) i<>j<>k<>i
Comparative network:       16)

Add 100 for correponding block parameter

How many parameters? 5
Enter parameter numbers: 1 101 2 102 6
Pass  1.. 2.. 3.. 4.. 5.. 6.. Final

-2 Log Likelihood =              435.405
Goodness of Fit   =              483.413
Model Chisquare =                363.101     df =   5

              Fit       % Correct      Residuals
Data      375      40       90.36      Absolute            138.0584
           60     101       62.73      Squared              70.0653
              Overall       82.64

Parameter   Block           b        S.E.        Wald      exp(b)
       1              -3.5750       .3021    140.0862       .0280
       1        1       .4370       .3642      1.4394      1.5481
       2               1.2794       .5225      5.9960      3.5944
       2        1       .3397       .5983       .3223      1.4045
       6                .2769       .0375     54.5833      1.3190

Continue? (y or n): y
Same files? (y or n): y
Same blocking? y

Select from
Edges:     1) i->j,         REdges:          2) i<>j
2Stars:    3) k<-i->j,      4) k->i<-j,      5) k->i->j
Triads:    6) i->j->k<-i,   7) i->j->k->i,
R2Stars:   8) k<>i->j,      9) k<>i<-j,     10) k<>i<>j
RTriads:  11) i<>j->k<-i,  12) i<>j<-k->i,  13) i<>j<-k<-i
          14) i<>j<>k<-i,  15) i<>j<>k<>i
Comparative network:       16)

Add 100 for correponding block parameter

How many parameters?

         :
         :
-----------------------------------------

Here is another sample run, using the Vickers and Chan data from the Wasserman & Pattison paper.

-----------------------------------------
Enter name of network file: vcga.neg
Include diagonal (y or n)? n
Fit to block parameters (y or n)? y
Enter name of attribute file: vcga.atr

How many attributes (not including id)? 1
Enter name of output file: vcga.out
Reading vcga.neg                 ....
Enter attribute number for blocking (1- 1): 1

 1  0
 0  1
Accept this block structure? (y or n): n

Enter  2 rows, and  2 columns of block types between 0 and 16
Row  1: 1 3
Row  2: 4 2

Select from
Edges:     1) i->j,         REdges:          2) i<>j
2Stars:    3) k<-i->j,      4) k->i<-j,      5) k->i->j
Triads:    6) i->j->k<-i,   7) i->j->k->i,
R2Stars:   8) k<>i->j,      9) k<>i<-j,     10) k<>i<>j
RTriads:  11) i<>j->k<-i,  12) i<>j<-k->i,  13) i<>j<-k<-i
          14) i<>j<>k<-i,  15) i<>j<>k<>i
Comparative network:       16)

How many global parameters? 2
Enter parameter numbers: 2 6

Block structure:
 1  3
 4  2
Select parameter and number of blocks. 0 0 to quit.
Parameter, number of Blocks: 1 4
Parameter, number of Blocks: 0 0
Pass  1.. 2.. 3.. 4.. 5.. Final

-2 Log Likelihood =              752.992
Goodness of Fit   =              776.118
Model Chisquare =                372.679     df =   6

               Fit        % Correct    Residuals
Data      359      94       79.25      Absolute            246.9875
           98     261       72.70      Squared             124.0224
              Overall       76.35

Parameter   Block           b        S.E.        Wald      exp(b)
       2               1.3265       .1960     45.7888      3.7678
       6                .1319       .0125    111.7546      1.1410
       1        1     -2.2206       .2737     65.8364       .1085
       1        2     -3.1949       .3139    103.6289       .0410
       1        3     -2.9501       .2834    108.3339       .0523
       1        4     -4.3489       .3314    172.2432       .0129

Continue? (y or n): y
Same files? (y or n): y
Same blocking? y

Select from
Edges:     1) i->j,         REdges:          2) i<>j
2Stars:    3) k<-i->j,      4) k->i<-j,      5) k->i->j
Triads:    6) i->j->k<-i,   7) i->j->k->i,
R2Stars:   8) k<>i->j,      9) k<>i<-j,     10) k<>i<>j
RTriads:  11) i<>j->k<-i,  12) i<>j<-k->i,  13) i<>j<-k<-i
          14) i<>j<>k<-i,  15) i<>j<>k<>i
Comparative network:       16)

How many global parameters? 3
Enter parameter numbers: 2 6 16

Block structure:
 1  3
 4  2
Select parameter and number of blocks. 0 0 to quit.
Parameter, number of Blocks: 1 4
Parameter, number of Blocks: 0 0

Enter name of comparative file: vcww.neg
Pass  1.. 2.. 3.. 4.. 5.. Final

-2 Log Likelihood =              684.549
Goodness of Fit   =              796.228
Model Chisquare =                441.122     df =   7

         Fit   % Correct              Residuals
Data      391      62       86.31      Absolute            220.0587
          106     253       70.47      Squared             110.2376
              Overall       79.31

Parameter   Block           b        S.E.        Wald      exp(b)
       2               1.2144       .2056     34.8769      3.3684
       6                .1056       .0133     62.7295      1.1114
      16               2.1955       .3020     52.8469      8.9848
       1        1     -2.5192       .3036     68.8520       .0805
       1        2     -3.0657       .3276     87.5713       .0466
       1        3     -2.8777       .2971     93.8152       .0563
       1        4     -3.7990       .3309    131.8378       .0224
Continue? (y or n):
--------------------------------------------

Compare these results with W&P Table 6 (model 30) and Table 9 (model 35).

NOTE: The output files contain more information, including the covariance matrix.

For these two examples, look at class4.out and vcga.out, both included in pspar.zip.