Chapter Contents |
Previous |
Next |
INFILE |
Valid: | in a DATA Step |
Category: | File-handling |
Type: | Executable |
Syntax |
INFILE file-specification <options> <host-options>; |
INFILE DBMS-specifications; |
Arguments |
Requirement: | You must have previously associated the fileref with an external file in a FILENAME statement, FILENAME function, or an appropriate operating environment command. |
See: | FILENAME |
Operating Environment Information: Different operating environments call an aggregate grouping of files by different names, such as a directory, a MACLIB, or a partitioned data set. For details on how to specify external files, see the SAS documentation for your operating environment.
Requirement: | You must have previously associated the fileref with an external file in a FILENAME statement, a FILENAME function, or an appropriate operating environment command. |
See: | FILENAME |
Alias: | DATALINES | DATALINES4 |
Alias: | CARDS | CARDS4 |
Featured in: | Changing How Delimiters are Treated |
Default: | dependent on the operating
environment
Operating Environment Information: For details, see the SAS documentation for your operating environment. |
Alias: | COL= |
See Also: | LINE= |
Featured in: | Listing the Pointer Location |
Requirement: | Enclose the list of characters in quotation marks. |
Featured in: | Changing How Delimiters are Treated |
Alias: | DLM= |
Default: | blank space |
Tip: | DELIMITER= allows you to use list input even when the data are separated by characters other than spaces. |
See: | Reading Delimited Data |
See Also: | DSD option |
Featured in: | Changing How Delimiters are Treated |
Interaction: | Use the DELIMITER= option to change the delimiter. |
Tip: | Use the DSD option and list input to read a character value that contains a delimiter within a quoted string. The INPUT statement treats the delimiter as a valid character and removes the quotation marks from the character string before the value is stored. Use the tilde (~) format modifier to retain the quotation marks. |
See: | Reading Delimited Data |
See Also: | DELIMITER= |
Featured in: | Handling Missing Values and Short Records with List Input and Changing How Delimiters are Treated |
Restriction: | You cannot use the
END= option with
|
Tip: | Use the option EOF= when END= is invalid. |
Featured in: | Reading from Multiple Input Files |
Interaction: | Use EOF= instead
of the END= option with
|
Tip: | The EOF= option is useful when you read from multiple input files sequentially. |
See Also: | END=, EOV=, and UNBUFFERED |
Tip: | Reset the EOV= variable back to 0 after SAS encounters each boundary. |
See Also: | END= and EOF= |
Default: | NOEXPANDTABS |
Tip: | EXPANDTABS is useful when you read data that contain the tab character that is native to your operating environment. |
Tip: | Use a LENGTH statement to make the variable length long enough to contain the value of the filename. |
See Also: | FILEVAR= |
Featured in: | Reading from Multiple Input Files |
Restriction: | The FILEVAR= variable must contain a character string that is a physical filename. |
Interaction: | When you use the FILEVAR= option, the file-specification is just a placeholder, not an actual filename or a fileref that has been previously-assigned to a file. SAS uses this placeholder for reporting processing information to the SAS log. It must conform to the same rules as a fileref. |
Tip: | Use FILEVAR= to dynamically change the currently opened input file to a new physical file. |
See Also: | Updating External Files in Place |
Featured in: | Reading from Multiple Input Files |
Default: | 1 |
Tip: | Use FIRSTOBS= with OBS= to read a range of records from the middle of a file. |
Example: | This statement processes
record 50 through record 100:
infile file-specification firstobs=50 obs=100; |
See: | Reading Past the End of a Line |
See Also: | MISSOVER, STOPOVER, and TRUNCOVER |
Tip: | This option in conjunction with the $VARYING informat is useful when the field width varies. |
Featured in: | Reading Files That Contain Variable-Length Records and Truncating Copied Records |
Range: | 1 to the value of the N= option |
Interaction: | The value of the LINE= variable is the current relative line number within the group of lines that is specified by the N= option or by the #n line pointer control in the INPUT statement. |
See Also: | COLUMN= and N= |
Featured in: | Listing the Pointer Location |
Operating Environment Information: Values for line-size are dependent on the operating environment record size. For details, see the SAS documentation for your operating environment.
Alias: | LS= |
Range: | up to 32767 |
Interaction: | If an INPUT statement attempts to read past the column that is specified by the LINESIZE= option, the action that is taken depends on whether the FLOWOVER, MISSOVER, SCANOVER, STOPOVER, or TRUNCOVER option is in effect. FLOWOVER is the default. |
Tip: | Use LINESIZE= to limit the record length when you do not want to read the entire record. |
Example: | If your data lines contain
a sequence number in columns 73 through 80, use this INFILE statement to restrict
the INPUT statement to the first 72 columns:
infile file-specification linesize=72; |
Operating Environment Information: Values for logical-record-length are dependent on the operating environment. For details, see the SAS documentation for your operating environment.
Default: | dependent on the operating environment's file characteristics. |
Tip: | LRECL= specifies the physical line length of the file. LINESIZE= tells the INPUT statement how much of the line to read. |
Tip: | Use MISSOVER if the last field(s) may be missing and you want SAS to assign missing values to the corresponding variable. |
See: | Reading Past the End of a Line |
See Also: | FLOWOVER, SCANOVER, STOPOVER, and TRUNCOVER |
Featured in: | Handling Missing Values and Short Records with List Input |
Default: | the highest value following a # pointer control in any INPUT statement in the DATA step. If you omit a # pointer control, the default value is 1. |
Interaction: | This option affects only the number of lines that the pointer can access at a time; it has no effect on the number of lines an INPUT statement reads. |
Tip: | When you use # pointer controls
in an INPUT statement that are less than the value of N=, you might get unexpected
results. To prevent this, include a # pointer control that equals the value
of the N= option. For example,
infile 'external file' n=5; input #2 name : $25. #3 job : $25. #5;The INPUT statement includes a #5 pointer control, even though no data are read from that record. |
Featured in: | Listing the Pointer Location |
Default: | the LRECL value of the file |
Interaction: | If the number of bytes to read is set to -1, the FTP and SOCKET access methods return the number of bytes that are currently available in the input buffer. |
See: | the FILENAME, SOCKET RECFM= option and the FILENAME, FTP RECFM= option |
Tip: | Use OBS= with FIRSTOBS= to read a range of records from the middle of a file. |
Example: | This statement processes
only the first 100 records in the file:
infile file-specification obs=100; |
Default: | NOPAD |
See Also: | LRECL= |
Tip: | To read a print file in a DATA step without having to remove the carriage control characters, specify PRINT. To read the carriage control characters as data values, specify NOPRINT. |
Operating Environment Information: Values for record-format are dependent on the operating environment. For details, see the SAS documentation for your operating environment.
Interaction: | The MISSOVER, TRUNCOVER, and STOPOVER options change how the INPUT statement behaves when it scans for the @'character-string' expression and reaches the end of record. By default (FLOWOVER option), the INPUT statement scans the next record while these other options cause scanning to stop. |
Tip: | It is redundant to specify both SCANOVER and FLOWOVER. |
See: | Reading Past the End of a Line |
See Also: | FLOWOVER, MISSOVER, STOPOVER, and TRUNCOVER |
Alias: | SHAREBUFS |
Tip: | Use SHAREBUFFERS with the INFILE, FILE, and PUT statements to update an external file in place. This saves CPU time because the PUT statement output is written straight from the input buffer instead of the output buffer. |
Tip: | Use SHAREBUFFERS to update specific fields in an external file instead of an entire record. |
Featured in: | Updating an External File |
See Also: | _INFILE_ option in the PUT statement |
Tip: | Use FLOWOVER to reset the default behavior. |
See: | Reading Past the End of a Line |
See Also: | FLOWOVER, MISSOVER, SCANOVER, and TRUNCOVER |
Featured in: | Handling Missing Values and Short Records with List Input |
Tip: | Use TRUNCOVER to assign the contents of the input buffer to a variable when the field is shorter than expected. |
See: | Reading Past the End of a Line |
See Also: | FLOWOVER, MISSOVER, SCANOVER, and STOPOVER |
Alias: | UNBUF |
Interaction: | When you use UNBUFFERED, SAS never sets the END= variable to 1. |
Tip: | When you read in-stream data with a DATALINES statement, UNBUFFERED is in effect. |
Restriction: | variable cannot be a previously defined variable. Make sure that the _INFILE_= specification is the first occurence of this variable in the DATA step. Do not set or change the length of _INFILE_= variable with the LENGTH or ATTRIB statements. However, you can attach a format to this variable with the ATTRIB or FORMAT statement. |
Interaction: | The maximum length of this character variable is the logical record length (LRECL=) for the specified INFILE statement. However, SAS does not open the file to know the LRECL= until prior to the execution phase. Therefore, the designated size for this variable during the compilation phase is 32,767. |
Tip: | Modification of this variable directly modifies the INFILE statement's current input buffer. Any PUT _INFILE_ (when this INFILE is current) that follows the buffer modification reflects the modified buffer contents. The _INFILE_= variable accesses only the current input buffer of the specified INFILE statement even if you use the N= option to specify multiple buffers. |
Tip: | To access the contents of the input buffer in another statement without using the _INFILE_= option, use the automatic variable _INFILE_. |
Main Discussion: | Accessing the Contents of the Input Buffer |
Operating Environment Information: For
descriptions of operating environment-specific options
in the INFILE statement, see the SAS documentation for your operating environment.
Details |
Operating Environment Information: The
INFILE statement contains operating environment-specific material. See
the SAS documentation for your operating environment before using this statement.
Because the INFILE statement identifies the file to read, it must execute before the INPUT statement that reads the input data records. You can use the INFILE statement in conditional processing, such as an IF-THEN statement, because it is executable. This allows you to control the source of the input data records.
Usually, you use an INFILE statement to read data from an external file. When data are read from the job stream, you must use a DATALINES statement. However, to take advantage of certain data-reading options that are available only in the INFILE statement, you can use an INFILE statement with the file-specification DATALINES and a DATALINES statement in the same DATA step.
When you use more than one INFILE statement for the
same file-specification and you use options in each INFILE statement, the
effect is additive. To avoid confusion, use all the options in the first
INFILE statement for a given external file.
You can read from multiple input files in a single iteration of the DATA step in one of two ways:
To
update individual fields within a record instead
of the entire record, see SHAREBUFFERS.
In addition to the _INFILE_= variable, you can use the automatic _INFILE_ variable to reference the contents of the current input buffer for the most recent execution of the INFILE statement. This character variable is automatically retained and initialized to blanks. Like other automatic variable, _INFILE_ is not written to the data set.
When you specify the _INFILE_= option in a INFILE statement then this variable is also indirectly referenced by the automatic _INFILE_ variable. If the automatic _INFILE_ variable is present and you omit _INFILE_= in a particular INFILE statement, then SAS creates an internal _INFILE_= variable for that INFILE statement. Otherwise, SAS does not create the _INFILE_= variable for a particular FILE.
During execution and at the point of reference, the maximum length of this character variable is the maximum length of the current _INFILE_= variable. However, because _INFILE_ merely references other variables whose lengths are not known until prior to the execution phase, the designated length is 32,767 during the compilation phase. For example, if you assign _INFILE_ to a new variable whose length is undefined, the default length of the new variable is 32,767. You can not use the LENGTH statement and the ATTRIB statement to set or override the length of _INFILE_. You can use the FORMAT statement and the ATTRIB statement to assign a format to _INFILE_.
Like other SAS variables, you can update the _INFILE_ variable in an assignment statement. You can also use a format with _INFILE_ in a PUT statement. For example
put _infile_ $hex100.;outputs the contents of the input buffer using a hexadecimal format.
Any modification of the _INFILE_ directly modifies the current input buffer for the current INFILE statement. The execution of any PUT _INFILE_ statement that follows this buffer modification will reflect the contents of the modified buffer.
_INFILE_ only accesses the contents of the current input
buffer for a INFILE statement, even when you use the N= option to specify
multiple buffers. You can access all the N= buffers, but you must use a INPUT
statement with the # line pointer control to make the desired buffer the current
input buffer.
To read a value as missing between two consecutive delimiters, use the DSD option. By default, the INPUT statement treats consecutive delimiters as a unit. When you use DSD, the INPUT statement treats consecutive delimiters separately. Therefore, a value that is missing between consecutive delimiters is read as a missing value. To change the delimiter from a comma to another value, use the DELIMITER= option.
For example, this DATA step program uses list input to read data that are separated with commas. The second data line contains a missing value. Because SAS allows consecutive delimiters with list input, the INPUT statement cannot detect the missing value.
data scores; infile datalines delimiter=','; input test1 test2 test3; datalines; 91,87,95 97,,92 ,1,1 ;With the FLOWOVER option in effect, the data set SCORES contains two, not three, observations. The second observation is built incorrectly:
OBS | TEST1 | TEST2 | TEST3 | |
---|---|---|---|---|
1 | 91 | 87 | 95 | |
2 | 97 | 92 | 1 |
infile datalines dsd;Now the INPUT statement detects the two consecutive delimiters and therefore assigns a missing value to variable TEST 2 in the second observation.
OBS | TEST1 | TEST2 | TEST3 | |
---|---|---|---|---|
1 | 91 | 87 | 95 | |
2 | 97 | . | 92 | |
3 | 1 | 1 | 1 |
NOTE: SAS went to a new line when INPUT @'CHARACTER_STRING' scanned past the end of a line.The STOPOVER option treats this condition as an error and stops building the data set. The MISSOVER option sets the remaining INPUT statement variables to missing values. The SCANOVER option scans the input record until it finds the specified character-string. The FLOWOVER option restores the default behavior.
The TRUNCOVER option, like the MISSOVER option, overrides the default behavior of the INPUT statement. The MISSOVER option, however, causes the INPUT statement to set a value to missing if the statement is unable to read an entire field because the field length specified in the INPUT statement is too short. The TRUNCOVER option writes whatever characters are read to the appropriate variable so that you know what the input data record contained.
For example, an external file with variable-length records contains these records:
----+----1----+----2 1 22 333 4444 55555The following DATA step reads these data to create a SAS data set. Only one of the input records is as long as the informatted length of the variable TESTNUM.
data numbers; infile 'external-file'; input testnum 5.; run;This DATA step creates the three observations from the five input records because by default the FLOWOVER option is used to read the input records.
If you use the MISSOVER option in the INFILE statement, the DATA step creates five observations. However, all the values that were read from records that were too short are set to missing. Use the TRUNCOVER option in the INFILE statement to correct this problem:
infile 'external-file' truncover;The DATA step now reads the same input records and creates five observations. See The Value of TESTNUM Using Different INFILE Statement Options to compare the SAS data sets.
OBS | FLOWOVER | MISSOVER | TRUNCOVER |
---|---|---|---|
1 | 22 | . | 1 |
2 | 4444 | . | 22 |
3 | 55555 | . | 333 |
4 | . | 4444 | |
5 | 55555 | 55555 |
Comparisons |
Examples |
By default, the INPUT statement uses a blank as the delimiter. This DATA step uses a comma as the delimiter:
data num; infile datalines dsd; input x y z; datalines; ,2,3 4,5,6 7,8,9 ;The argument DATALINES in the INFILE statement allows you to use an INFILE statement option to read in-stream data lines. The DSD option sets the comma as the default delimiter. Because a comma precedes the first value in the first dataline, a missing value is assigned to variable X in the first observation, and the value
2
is assigned to
variable Y.
If the data uses multiple delimiters or a single delimiter other than a comma, simply specify the delimiter values with the DELIMITER= option. In this example, the characters a and b function as delimiters:
data nums; infile datalines dsd delimiter='ab'; input X Y Z; datalines; 1aa2ab3 4b5bab6 7a8b9 ;The output that PROC PRINT generates shows the resulting NUMS data set. Values are missing for variables in the first and second observation because DSD causes list input to detect two consecutive delimiters. If you omit DSD, the characters a, b, aa, ab, ba, or bb function as the delimiter and no variables are assigned missing values.
The NUMS Data Set
The SAS System 1 OBS X Y Z 1 1 . 2 2 4 5 . 3 7 8 9 |
This DATA step uses modified list input and the DSD option to read data that are separated by commas and that may contain commas as part of a character value:
data scores; infile datalines dsd; input Name : $9. Score Team : $25. Div $; datalines; Joseph,76,"Red Racers, Washington",AAA Mitchel,82,"Blue Bunnies, Richmond",AAA Sue Ellen,74,"Green Gazelles, Atlanta",AA ;The output that PROC PRINT generates shows the resulting SCORES data set. The delimiter (comma) is stored as part of the value of TEAM while the quotation marks are not. The folowing output shows how to use the tilde (~) format modifier in an INPUT statement to retain the quotation marks in character data.
Data Set SCORES
The SAS System 1 OBS NAME SCORE TEAM DIV 1 Joseph 76 Red Racers, Washington AAA 2 Mitchel 82 Blue Bunnies, Richmond AAA 3 Sue Ellen 74 Green Gazelles, Atlanta AA |
data weather; infile datalines missover; input temp1-temp5; datalines; 97.9 98.1 98.3 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ;SAS reads the three values on the first data line as the values of TEMP1, TEMP2, and TEMP3. The MISSOVER option causes SAS to set the values of TEMP4 and TEMP5 to missing for the first observation because no values for those variables are in the current input data record.
When you omit MISSOVER option or use FLOWOVER, SAS moves the input pointer to line 2 and reads values for TEMP4 and TEMP5. The next time the DATA step executes, SAS reads a new line which, in this case, is line 3. This message appears in the SAS log:
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
You can also use the STOPOVER option in the INFILE statement. This causes the DATA step to halt execution when an INPUT statement does not find enough values in a record of raw data:
infile datalines stopover;Because SAS does not find a TEMP4 value in the first data record, it sets _ERROR_ to 1, stops building the data set, and prints data line 1.
data a; infile file-specification length=linelen; input firstvar 1-10 @; /* assign LINELEN */ varlen=linelen-10; /* Calculate VARLEN */ input @11 secondvar $varying500. varlen; run;The following occurs in this DATA step:
See the informat
$VARYINGw.
for more information.
data qtrtot(drop=jansale febsale marsale aprsale maysale junsale); /* identify location of 1st file */ infile file-specification-1; /* read values from 1st file */ input name $ jansale febsale marsale; qtr1tot=sum(jansale,febsale,marsale); /* identify location of 2nd file */ infile file-specification-2; /* read values from 2nd file */ input @7 aprsale maysale junsale; qtr2tot=sum(aprsale,maysale,junsale); run;The DATA step terminates when SAS reaches an end-of-file on the shortest input file.
This DATA step uses FILEVAR= to read from a different file during each iteration of the DATA step:
data allsales; length fileloc myinfile $ 300; input fileloc $ ; /* read instream data */ /* The INFILE statement closes the current file and opens a new one if FILELOC changes value when INFILE executes */ infile file-specification filevar=fileloc filename=myinfile end=done; /* DONE set to 1 when last input record read */ do while(not done); /* Read all input records from the currently */ /* opened input file, write to ALLSALES */ input name $ jansale febsale marsale; output; end; put 'Finished reading ' myinfile=; datalines; external-file-1 external-file-2 external-file-3 ;The FILENAME= option assigns the name of the current input file to the variable MYINFILE. The LENGTH statement ensures that the FILENAME= variable and FILEVAR= variable have a length long enough to contain the value of the filename. The PUT statement prints the physical name of the currently open input file to the SAS log.
data _null_; /* The INFILE and FILE statements */ /* must specify the same file. */ infile file-specification-1 sharebuffers; file file-specification-1; input state $ 1-2 phone $ 5-16; /* Replace area code for NC exchanges */ if state= 'NC' and substr(phone,5,3)='333' then phone='910-'||substr(phone,5,8); put phone 5-16; run;
data _null_; infile file-specification-1 length=a; input; a=a-20; file file-specification-2; put _infile_; run;
The START= option is also useful when you want to truncate what the PUT _INFILE_ statement copies. For example, if you do not want to copy the first 10 columns of each record, these statements copy from column 11 to the end of each record in the input buffer:
data _null_; infile file-specification start=s; input; s=11; file file-specification-2; put _infile_; run;
data _null_; infile datalines n=2 line=Linept col=Columnpt; input name $ 1-15 #2 @3 id; put linept= columnpt=; datalines; J. Brooks 40974 T. R. Ansen 4032 ;These statements produce the following line for each execution of the DATA step because the input pointer is on the second line in the input buffer when the PUT statement executes:
Linept=2 Columnpt=9 Linept=2 Columnpt=8
See Also |
Statements:
|
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.