Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
BOXCHART Statement

Example 32.1: Using Box Charts to Compare Subgroups

See SHWBOX4 in the SAS/QC Sample Library

In this example, a box chart is used to compare the delay times for airline flights during the Christmas holidays with the delay times prior to the holiday period. The following statements create a data set named TIMES with the delay times in minutes for 25 flights each day. When a flight is cancelled, the delay is recorded as a missing value.

   data times;
      informat day date7. ;
      format   day date7. ;
      input day @ ;
      do flight=1 to 25;
         input delay @ ;
         output;
         end;
   datalines;
   16DEC88   4  12   2   2  18   5   6  21   0   0   0  14   3
             .   2   3   5   0   6  19   7   4   9   5  10
   17DEC88   1  10   3   3   0   1   5   0   .   .   1   5   7
             1   7   2   2  16   2   1   3   1  31   5   0
   18DEC88   7   8   4   2   3   2   7   6  11   3   2   7   0
             1  10   2   3  12   8   6   2   7   2   4   5
   19DEC88  15   6   9   0  15   7   1   1   0   2   5   6   5
            14   7  20   8   1  14   3  10   0   1  11   7
   20DEC88   2   1   0   4   4   6   2   2   1   4   1  11   .
             1   0   6   5   5   4   2   2   6   6   4   0
   21DEC88   2   6   6   2   7   7   5   2   5   0   9   2   4
             2   5   1   4   7   5   6   5   0   4  36  28
   22DEC88   3   7  22   1  11  11  39  46   7  33  19  21   1
             3  43  23   9   0  17  35  50   0   2   1   0
   23DEC88   6  11   8  35  36  19  21   .   .   4   6  63  35
             3  12  34   9   0  46   0   0  36   3   0  14
   24DEC88  13   2  10   4   5  22  21  44  66  13   8   3   4
            27   2  12  17  22  19  36   9  72   2   4   4
   25DEC88   4  33  35   0  11  11  10  28  34   3  24   6  17
             0   8   5   7  19   9   7  21  17  17   2   6
   26DEC88   3   8   8   2   7   7   8   2   5   9   2   8   2
            10  16   9   5  14  15   1  12   2   2  14  18
   ;

First, the MEANS procedure is used to count the number of cancelled flights for each day. This information is then added to the data set TIMES.

   proc means data=times noprint;
      var delay;
      by day ;
      output out=cancel nmiss=ncancel;

   data times;
      merge times cancel;
      by day;
   run;

The following statements create a data set named WEATHER that contains information about possible causes for delays. This data set is merged with the data set TIMES.

   data weather;
      informat day date7. ;
      format   day date7. ;
      length reason $ 16 ;
   input day flight reason & ;
   datalines;
   16DEC88  8   Fog
   17DEC88  18  Snow Storm
   17DEC88  23  Sleet
   21DEC88  24  Rain
   21DEC88  25  Rain
   22DEC88  7   Mechanical
   22DEC88  15  Late Arrival
   24DEC88  9   Late Arrival
   24DEC88  22  Late Arrival
   ;

   data times;
      merge times weather;
      by day flight;
   run;

Next, control limits are established using the delays prior to the holiday period.

   proc shewhart data=times;
      where day <= '21DEC88'D;
      boxchart delay * day /
         nochart
         stddeviations
         outlimits=timelim;
   run;

The OUTLIMITS= option names a data set (TIMELIM) that saves the control limits. The STDDEVIATIONS option specifies that the estimate of \sigma is to be calculated from subgroup standard deviations. This, in turn, affects the calculation of the control limits. The NOCHART option suppresses the display of the chart.

The following statements create a box chart for the complete set of data using the control limits in TIMELIM:

   symbol1 v=plus     c=salmon;
   symbol2 v=square   c=vigb;
   symbol3 v=triangle c=vig;
   title 'Box Chart for Airline Delays';
   proc shewhart data=times limits=timelim ;
      boxchart delay * day = ncancel /
         stddeviations
         nohlabel
         nolegend
         symbollegend = legend1
         cboxes       = dagr
         cboxfill     = ywh
         cframe       = vligb
         cinfill      = ligr;
      legend1 label=('Cancellations:')
              cborder=black cframe=ligr;
      label delay = 'Delay in Minutes';
   run;

The box chart is shown in Output 32.1.1. The level of the symbol-variable NCANCEL determines the symbol marker for each subgroup mean, and the SYMBOLLEGEND= option controls the appearance of the legend for the symbols. The NOHLABEL option suppresses the label for the horizontal axis, and the NOLEGEND option suppresses the default legend for subgroup sample sizes.

Output 32.1.1: Box Chart for Airline Data
boxex1.gif (5718 bytes)

The delay distributions from December 22 through December 25 are drastically different from the delay distributions during the pre-holiday period. Both the mean delay and the variability of the delays are much greater during the holiday period.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.