Pypi page: https://pypi.python.org/pypi/sas7bdat
Source: https://bitbucket.org/jaredhobbs/sas7bdat
Issue tracker: https://bitbucket.org/jaredhobbs/sas7bdat/issues
This module will read sas7bdat files using pure Python (2.6+, 3+). No SAS software required! (This is also my first contribution to pypi, yay!) The project was originally based off the work done by Matt Shotwell and Clint Cummins in their R project found at https://github.com/BioStatMatt/sas7bdat but has since been completely rewritten.
Usage
To create a sas7bdat object, simply pass the constructor a file path. The SAS7BDAT instance is iterable so you can read the file contents like this:
from sas7bdat import SAS7BDAT
with SAS7BDAT('foo.sas7bdat') as f:
for row in f:
# do something...
If you'd like to get a pandas DataFrame, use the to_data_frame
method:
df = f.to_data_frame()
The project also contains a standalone command line script, sas7bdat_to_csv
,
which converts sas7bdat files to csv files.
I've tested the script on almost 300 sample files I found on the internet and the conversion has been flawless. It works on both big and little endian files, as well as CHAR and BINARY compressed files.. The script will batch convert files too; just give it a wildcard expression such as:
$ sas7bdat_to_csv *.sas7bdat
In my sample directory, the output looks like:
[Final_Candy.csv] wrote 75 of 75 lines
[Insurer_co.csv] wrote 3 of 3 lines
[OpenMed-04-e40-s003.csv] wrote 6000 of 6000 lines
[a14.csv] wrote 657 of 657 lines
[acadindx.csv] wrote 200 of 200 lines
[adults.csv] wrote 8360 of 8360 lines
[agents.csv] wrote 10 of 10 lines
[andy.csv] wrote 75 of 75 lines
[applican.csv] wrote 48 of 48 lines
[bangla.csv] wrote 34 of 34 lines
[bank8dte.csv] wrote 1000 of 1000 lines
[bank8dtr.csv] wrote 2000 of 2000 lines
[banks.csv] wrote 3 of 3 lines
[bbwt.csv] wrote 62 of 62 lines
[beef.csv] wrote 30 of 30 lines
...
[fileserrors.csv] wrote 160 of 160 lines
...
The script also has an option to display header and meta data from a sas7bdat file. Since the header information isn't compressed, this works even on the compressed files:
$ sas7bdat_to_csv --header beef.sas7bdat fileserrors.sas7bdat
[beef.sas7bdat] Header:
column_count: 9
compression: None
date_created: 2006-06-06 13:52:23.224000
date_modified: 2006-06-06 13:52:23.224000
endianess: little
file_type: DATA
filename: beef.sas7bdat
header_length: 1024
mix_page_row_count: 86
name: BEEF
os_name:
os_type:
page_count: 1
page_length: 8192
platform: windows
row_count: 30
row_length: 72
sas_release: 9.0101M3
server_type: XP_PRO
u64: False
Contents of dataset "BEEF":
Num Name Type Length Format Label
--- ------- ------ ------ ------ -----
1 TRT string 3
2 REP number 8
3 STORAGE number 8
4 BEEFY number 8
5 BLOODY number 8
6 METAL number 8
7 GRASSY number 8
8 SOUR number 8
9 SPOILED number 8
[fileserrors.sas7bdat] Header:
column_count: 3
compression: SASYZCRL
date_created: 2010-10-09 12:14:49.414000
date_modified: 2010-10-09 12:14:49.414000
endianess: little
file_type: DATA
filename: fileserrors.sas7bdat
header_length: 1024
mix_page_row_count: 52
name: FILESERRORS
os_name:
os_type:
page_count: 4
page_length: 8192
platform: windows
row_count: 160
row_length: 130
sas_release: 9.0201M0
server_type: W32_VSPRO
u64: False
Contents of dataset "FILESERRORS":
Num Name Type Length Format Label
--- ----- ------ ------ ------ -----
1 Name string 32
2 Path string 90
3 index string 8