[email protected] /sas7bdat

Pypi page: https://pypi.python.org/pypi/sas7bdat
Source: https://bitbucket.org/jaredhobbs/sas7bdat
Issue tracker: https://bitbucket.org/jaredhobbs/sas7bdat/issues

This module will read sas7bdat files using pure Python (2.6+, 3+). No SAS software required! (This is also my first contribution to pypi, yay!) The project was originally based off the work done by Matt Shotwell and Clint Cummins in their R project found at https://github.com/BioStatMatt/sas7bdat but has since been completely rewritten.


To create a sas7bdat object, simply pass the constructor a file path. The SAS7BDAT instance is iterable so you can read the file contents like this:

from sas7bdat import SAS7BDAT
with SAS7BDAT('foo.sas7bdat') as f:
    for row in f:
        # do something...

If you’d like to get a pandas DataFrame, use the to_data_frame method:

df = f.to_data_frame()

The project also contains a standalone command line script, sas7bdat_to_csv, which converts sas7bdat files to csv files.

I’ve tested the script on almost 300 sample files I found on the internet and the conversion has been flawless. It works on both big and little endian files, as well as CHAR and BINARY compressed files.. The script will batch convert files too; just give it a wildcard expression such as:

$ sas7bdat_to_csv *.sas7bdat

In my sample directory, the output looks like:

[Final_Candy.csv] wrote 75 of 75 lines
[Insurer_co.csv] wrote 3 of 3 lines
[OpenMed-04-e40-s003.csv] wrote 6000 of 6000 lines
[a14.csv] wrote 657 of 657 lines
[acadindx.csv] wrote 200 of 200 lines
[adults.csv] wrote 8360 of 8360 lines
[agents.csv] wrote 10 of 10 lines
[andy.csv] wrote 75 of 75 lines
[applican.csv] wrote 48 of 48 lines
[bangla.csv] wrote 34 of 34 lines
[bank8dte.csv] wrote 1000 of 1000 lines
[bank8dtr.csv] wrote 2000 of 2000 lines
[banks.csv] wrote 3 of 3 lines
[bbwt.csv] wrote 62 of 62 lines
[beef.csv] wrote 30 of 30 lines
[fileserrors.csv] wrote 160 of 160 lines

The script also has an option to display header and meta data from a sas7bdat file. Since the header information isn’t compressed, this works even on the compressed files:

$ sas7bdat_to_csv --header beef.sas7bdat fileserrors.sas7bdat
[beef.sas7bdat] Header:
    column_count: 9
    compression: None
    date_created: 2006-06-06 13:52:23.224000
    date_modified: 2006-06-06 13:52:23.224000
    endianess: little
    file_type: DATA
    filename: beef.sas7bdat
    header_length: 1024
    mix_page_row_count: 86
    name: BEEF
    page_count: 1
    page_length: 8192
    platform: windows
    row_count: 30
    row_length: 72
    sas_release: 9.0101M3
    server_type: XP_PRO
    u64: False

Contents of dataset "BEEF":
Num Name    Type   Length Format Label
--- ------- ------ ------ ------ -----
1 TRT     string      3             
2 REP     number      8             
3 STORAGE number      8             
4 BEEFY   number      8             
5 BLOODY  number      8             
6 METAL   number      8             
7 GRASSY  number      8             
8 SOUR    number      8             
9 SPOILED number      8

[fileserrors.sas7bdat] Header:
    column_count: 3
    compression: SASYZCRL
    date_created: 2010-10-09 12:14:49.414000
    date_modified: 2010-10-09 12:14:49.414000
    endianess: little
    file_type: DATA
    filename: fileserrors.sas7bdat
    header_length: 1024
    mix_page_row_count: 52
    page_count: 4
    page_length: 8192
    platform: windows
    row_count: 160
    row_length: 130
    sas_release: 9.0201M0
    server_type: W32_VSPRO
    u64: False

Contents of dataset "FILESERRORS":
Num Name  Type   Length Format Label
--- ----- ------ ------ ------ -----
1 Name  string     32             
2 Path  string     90             
3 index string      8