Title: | NHANES Data Retrieval |
---|---|
Description: | Utility to retrieve data from the National Health and Nutrition Examination Survey (NHANES) website <https://www.cdc.gov/nchs/nhanes/index.htm>. |
Authors: | Christopher Endres [aut, cre], Laha Ale [aut] , Robert Gentleman [aut], Deepayan Sarkar [aut] |
Maintainer: | Christopher Endres <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.6 |
Built: | 2024-11-09 23:16:08 UTC |
Source: | https://github.com/cjendres1/nhanes |
The browser may be directed to a specific year, survey, or table.
browseNHANES( year = NULL, data_group = NULL, nh_table = NULL, local = TRUE, browse = TRUE )
browseNHANES( year = NULL, data_group = NULL, nh_table = NULL, local = TRUE, browse = TRUE )
year |
The year in yyyy format where 1999 <= yyyy. |
data_group |
The type of survey (DEMOGRAPHICS, DIETARY, EXAMINATION, LABORATORY, QUESTIONNAIRE). Abbreviated terms may also be used: (DEMO, DIET, EXAM, LAB, Q). |
nh_table |
The name of an NHANES table. |
local |
logical flag. If |
browse |
logical flag, indicating whether the specific NHANES site should be opened using a browser (which is the default behaviour). |
By default, browseNHANES will open a web browser to the specified NHANES site.
A character string giving the URL, invisibly if the URL is
also opened using browseURL
.
browseNHANES(browse = FALSE) # Defaults to the main data sets page browseNHANES(2005) # The main page for the specified survey year browseNHANES(2009, 'EXAM') # Page for the specified year and survey group browseNHANES(nh_table = 'VIX_D') # Page for a specific table browseNHANES(nh_table = 'DXA') # DXA main page
browseNHANES(browse = FALSE) # Defaults to the main data sets page browseNHANES(2005) # The main page for the specified survey year browseNHANES(2009, 'EXAM') # Page for the specified year and survey group browseNHANES(nh_table = 'VIX_D') # Page for a specific table browseNHANES(nh_table = 'DXA') # DXA main page
Use to download NHANES data tables that are in SAS format.
nhanes( nh_table, includelabels = FALSE, translated = TRUE, cleanse_numeric = FALSE, nchar = 128, adjust_timeout = TRUE )
nhanes( nh_table, includelabels = FALSE, translated = TRUE, cleanse_numeric = FALSE, nchar = 128, adjust_timeout = TRUE )
nh_table |
The name of the specific table to retrieve. |
includelabels |
If TRUE, then include SAS labels as variable attribute (default = FALSE). |
translated |
translated whether the variables are translated. |
cleanse_numeric |
Logical flag. If |
nchar |
Maximum length of translated string (default = 128). Ignored if translated=FALSE. |
adjust_timeout |
Typically a logical flag indicating whether
the default |
Downloads a table from the NHANES website as is, i.e. in
its entirety with no modification or cleansing. If the
environment variable NHANES_TABLE_BASE
was set during
startup, the value of this variable is used as the base URL
instead of https://wwwn.cdc.gov (this allows the use of a
local or alternative mirror of the CDC data). NHANES tables are
stored in SAS '.XPT' format but are imported as a data frame.
The nhanes
function cannot be used to import limited
access data.
The table is returned as a data frame.
bpx_e = nhanes('BPX_E') dim(bpx_e) folate_f = nhanes('FOLATE_F', includelabels = TRUE) dim(folate_f)
bpx_e = nhanes('BPX_E') dim(bpx_e) folate_f = nhanes('FOLATE_F', includelabels = TRUE) dim(folate_f)
Returns attributes such as number of rows, columns, and memory size, but does not return the table itself.
nhanesAttr(nh_table)
nhanesAttr(nh_table)
nh_table |
The name of the specific table to retrieve |
nhanesAttr allows one to check the size and other
charactersistics of a data table before importing into R. To
retrieve these characteristics, the specified table is
downloaded, characteristics are determined, then the table is
deleted. Downloads a table from the NHANES website as is, i.e. in
its entirety with no modification or cleansing. If the
environment variable NHANES_TABLE_BASE
was set during
startup, the value of this variable is used as the base URL
instead of https://wwwn.cdc.gov (this allows the use of a
local or alternative mirror of the CDC data).
The following attributes are returned as a list
nrow = number of rows
ncol = number of columns
names = name of each column
unique = true if all SEQN values are unique
na = number of 'NA' cells in the table
size = total size of table in bytes
types = data types of each column
bpx_e = nhanesAttr('BPX_E') length(bpx_e) folate_f = nhanesAttr('FOLATE_F') length(folate_f)
bpx_e = nhanesAttr('BPX_E') length(bpx_e) folate_f = nhanesAttr('FOLATE_F') length(folate_f)
Returns full NHANES codebook including Variable Name, SAS Label, English Text, Target, and Value distribution.
nhanesCodebook(nh_table, colname = NULL, dxa = FALSE)
nhanesCodebook(nh_table, colname = NULL, dxa = FALSE)
nh_table |
The name of the NHANES table that contains the desired variable. |
colname |
The name of the table column (variable). |
dxa |
If TRUE then the 2005-2006 DXA codebook will be used (default=FALSE). |
Each NHANES variable has a codebook that provides a basic
description as well as the distribution or range of values. This
function returns the full codebook information for the selected
variable. If the environment variable NHANES_TABLE_BASE
was set during startup, the value of this variable is used as the
base URL instead of https://wwwn.cdc.gov (this allows the
use of a local or alternative mirror of the CDC documentation).
The codebook is returned as a list object. Returns NULL upon error.
nhanesCodebook('AUX_D', 'AUQ020D') nhanesCodebook('BPX_J', 'BPACSZ') bpx_code = nhanesCodebook('BPX_J') length(bpx_code)
nhanesCodebook('AUX_D', 'AUQ020D') nhanesCodebook('BPX_J', 'BPACSZ') bpx_code = nhanesCodebook('BPX_J') length(bpx_code)
Download and parse an NHANES doc file from a URL
nhanesCodebookFromURL(url)
nhanesCodebookFromURL(url)
url |
URL to be downloaded |
Downloads and parses an NHANES doc file from a URL and returns it as a list
list with one element for each variable
DXA data were acquired from 1999-2006.
nhanesDXA(year, suppl = FALSE, destfile = NULL, adjust_timeout = TRUE)
nhanesDXA(year, suppl = FALSE, destfile = NULL, adjust_timeout = TRUE)
year |
The year of the data to import, where 1999<=year<=2006. |
suppl |
If TRUE then retrieve the supplemental data (default=FALSE). |
destfile |
The name of a destination file. If NULL then the data are imported into the R environment but no file is created. |
adjust_timeout |
Typically a logical flag indicating whether
the default |
Provide destfile in order to write the data to file. If destfile is not provided then the data will be imported into the R environment.
By default the table is returned as a data frame. When downloading to file, the return argument is the integer code from download.file where 0 means success and non-zero indicates failure to download.
dxa_b <- nhanesDXA(2001) dxa_c_s <- nhanesDXA(2003, suppl=TRUE) ## Not run: dxa = nhanesDXA(1999, destfile="dxx.xpt")
dxa_b <- nhanesDXA(2001) dxa_c_s <- nhanesDXA(2003, suppl=TRUE) ## Not run: dxa = nhanesDXA(1999, destfile="dxx.xpt")
Download an NHANES table from URL
nhanesFromURL( url, translated = TRUE, cleanse_numeric = TRUE, nchar = 128, adjust_timeout = TRUE )
nhanesFromURL( url, translated = TRUE, cleanse_numeric = TRUE, nchar = 128, adjust_timeout = TRUE )
url |
URL of XPT file to be downloaded |
translated |
logical, whether variable codes should be translated |
cleanse_numeric |
Logical flag. If |
nchar |
integer, labels are truncated after this |
adjust_timeout |
Typically a logical flag indicating whether
the default |
Downloads an NHANES table from a URL and returns it as a data frame
data frame
Downloads and parses NHANES manifests for public data (available at https://wwwn.cdc.gov/Nchs/Nhanes/search/DataPage.aspx), limited access data (https://wwwn.cdc.gov/Nchs/Nhanes/search/DataPage.aspx?Component=LimitedAccess), and variables (https://wwwn.cdc.gov/nchs/nhanes/search/variablelist.aspx?Component=Demographics, etc.), and returns them as data frames.
nhanesManifest( which = c("public", "limitedaccess", "variables"), sizes = FALSE, dxa = FALSE, component = NULL, verbose = getOption("verbose"), use_cache = TRUE, max_age = 24 * 60 * 60 )
nhanesManifest( which = c("public", "limitedaccess", "variables"), sizes = FALSE, dxa = FALSE, component = NULL, verbose = getOption("verbose"), use_cache = TRUE, max_age = 24 * 60 * 60 )
which |
Either "public" or "limitedaccess" to get a manifest of available tables, or "variables" to get a manifest of available variables. |
sizes |
Logical, whether to compute data file sizes (as reported by the server) and include them in the result. |
dxa |
Logical, whether to include information on DXA tables. These tables contain imputed imputed Dual Energy X-ray Absorptiometry measurements, and are listed separately, not in the main listing. |
component |
An optional character string specifying the
component for which the public data manifest is to be
downloaded. Valid values are |
verbose |
Logical flag indicating whether information on progress should be reported. |
use_cache |
Logical flag indicating whether a cached version (from a previous download in the same session) should be used. |
max_age |
Maximum allowed age of the cache in seconds (defaults to 24 hours). Cached versions that are older are ignored, even if available. |
A data frame, with columns that depend on
which
. For a manifest of tables, columns are "Table",
"DocURL", "DataURL", "Years", "Date.Published". If
component
is specified, an additional column
"Description" giving a description of the table will be
included. If sizes = TRUE
, an additional column
"DataSize" giving the data file sizes in bytes (as reported by
the server) is included. For limited access tables, the
"DataURL" and "DataSize" columns are omitted. For a manifest of
variables, columns are "VarName", "VarDesc", "Table",
"TableDesc", "BeginYear", "EndYear", "Component", and
"UseConstraints".
Duplicate rows are removed from the result. Most of these duplicates arise from duplications in the source tables for multi-cycle tables (which are repeated once for each cycle). One special case is the WHQ table which has two variables, WHD120 and WHQ030, duplicated with differing variable descriptions. These are removed explicitly, keeping only the first occurrence.
manifest <- nhanesManifest(sizes = FALSE) dim(manifest)
manifest <- nhanesManifest(sizes = FALSE) dim(manifest)
Set and retrieve global options controlling the behaviour of certain functions in the package.
nhanesOptions(...)
nhanesOptions(...)
... |
either one or more named arguments giving options to be
set (in the form |
The 'nhanesOptions()' function can be used in two forms, to set or get options. Options can be set using 'nhanesOptions(key1 = value1, key2 = value2)'. Options can be retrieved (one at a time) using 'nhanesOptions("key")'. When called with no arguments, all currently set options are returned as a list.
Options currently used in the package are 'use.db' (logical flag controlling whether a database should be used if available), and 'log.access', a logical flag that logs any attempted URL access by printing the URL).
When retrieving an option, the value of the option, or
NULL
if the option has not been set. When setting one
or more options, a list (invisibly) containing the previous
values (possibly NULL
) of the options being set.
Deepayan Sarkar <[email protected]>
nhanesOptions(foo = "bar") nhanesOptions() print(nhanesOptions(foo = NULL))
nhanesOptions(foo = "bar") nhanesOptions() print(nhanesOptions(foo = NULL))
The descriptions in the master variable list will be filtered by the provided search terms to retrieve a list of relevant variables. The search can be restricted to specific survey years by specifying ystart and/or ystop.
nhanesSearch( search_terms = NULL, exclude_terms = NULL, data_group = NULL, ignore.case = FALSE, ystart = NULL, ystop = NULL, includerdc = FALSE, nchar = 128, namesonly = FALSE )
nhanesSearch( search_terms = NULL, exclude_terms = NULL, data_group = NULL, ignore.case = FALSE, ystart = NULL, ystop = NULL, includerdc = FALSE, nchar = 128, namesonly = FALSE )
search_terms |
List of terms or keywords. |
exclude_terms |
List of exclusive terms or keywords. |
data_group |
Which data groups (e.g. DIET, EXAM, LAB) to search. Default is to search all groups. |
ignore.case |
Ignore case if TRUE. (Default=FALSE). |
ystart |
Four digit year of first survey included in search, where ystart >= 1999. |
ystop |
Four digit year of final survey included in search, where ystop >= ystart. |
includerdc |
If TRUE then RDC only tables are included in list (default=FALSE). |
nchar |
Truncates the variable description to a max length of nchar. |
namesonly |
If TRUE then only the table names are returned (default=FALSE). |
nhanesSearch is useful to obtain a comprehensive list of relevant tables. Search terms will be matched against the variable descriptions in the NHANES Comprehensive Variable Lists. Matching variables must have at least one of the search_terms and not have any exclude_terms. The search may be restricted to specific surveys using ystart and ystop. If no arguments are given, then nhanesSearch returns the complete variable list.
Returns a data frame that describes variables that matched the search terms. If namesonly=TRUE, then a character vector of table names that contain matched variables is returned.
bladder = nhanesSearch("bladder", ystart=2001, ystop=2008, nchar=50) dim(bladder) urin = nhanesSearch("urin", exclude_terms="During", ystart=2009) dim(urin) urine = nhanesSearch(c("urine", "urinary"), ignore.case=TRUE, ystop=2006, namesonly=TRUE) length(urine)
bladder = nhanesSearch("bladder", ystart=2001, ystop=2008, nchar=50) dim(bladder) urin = nhanesSearch("urin", exclude_terms="During", ystart=2009) dim(urin) urine = nhanesSearch(c("urine", "urinary"), ignore.case=TRUE, ystop=2006, namesonly=TRUE) length(urine)
Returns a list of table names that match a specified pattern.
nhanesSearchTableNames( pattern = NULL, ystart = NULL, ystop = NULL, includerdc = FALSE, includewithdrawn = FALSE, nchar = 128, details = FALSE )
nhanesSearchTableNames( pattern = NULL, ystart = NULL, ystop = NULL, includerdc = FALSE, includewithdrawn = FALSE, nchar = 128, details = FALSE )
pattern |
Pattern of table names to match |
ystart |
Four digit year of first survey included in search, where ystart >= 1999. |
ystop |
Four digit year of final survey included in search, where ystop >= ystart. |
includerdc |
If TRUE then RDC only tables are included (default=FALSE). |
includewithdrawn |
IF TRUE then withdrawn tables are included (default=FALSE). |
nchar |
Truncates the variable description to a max length of nchar. |
details |
If TRUE then complete table information from the comprehensive data list is returned (default=FALSE). |
Searches the Doc File field in the NHANES Comprehensive Data List (see https://wwwn.cdc.gov/nchs/nhanes/search/DataPage.aspx) for tables that match a given name pattern. Only a single pattern may be entered.
Returns a character vector of table names that match the given pattern. If details=TRUE, then a data frame of table attributes is returned. NULL is returned when an HTML read error is encountered.
bmx = nhanesSearchTableNames('BMX') length(bmx) hepbd = nhanesSearchTableNames('HEPBD') dim(hepbd) hpvs = nhanesSearchTableNames('HPVS', includerdc=TRUE, details=TRUE) dim(hpvs)
bmx = nhanesSearchTableNames('BMX') length(bmx) hepbd = nhanesSearchTableNames('HEPBD') dim(hepbd) hpvs = nhanesSearchTableNames('HPVS', includerdc=TRUE, details=TRUE) dim(hpvs)
Returns a list of table names that contain the variable
nhanesSearchVarName( varname = NULL, ystart = NULL, ystop = NULL, includerdc = FALSE, nchar = 128, namesonly = TRUE )
nhanesSearchVarName( varname = NULL, ystart = NULL, ystop = NULL, includerdc = FALSE, nchar = 128, namesonly = TRUE )
varname |
Name of variable to match. |
ystart |
Four digit year of first survey included in search, where ystart >= 1999. |
ystop |
Four digit year of final survey included in search, where ystop >= ystart. |
includerdc |
If TRUE then RDC only tables are included in list (default=FALSE). |
nchar |
Truncates the variable description to a max length of nchar. |
namesonly |
If TRUE then only the table names are returned (default=TRUE). |
The NHANES Comprehensive Variable List is scanned to find all data tables that contain the given variable name. Only a single variable name may be entered, and only exact matches will be found.
By default, a character vector of table names that include the specified variable is returned. If namesonly=FALSE, then a data frame of table attributes is returned.
nhanesSearchVarName('BMXLEG') nhanesSearchVarName('BMXHEAD', ystart=2003)
nhanesSearchVarName('BMXLEG') nhanesSearchVarName('BMXHEAD', ystart=2003)
Enables quick display of all available tables in the survey group.
nhanesTables( data_group, year, nchar = 128, details = FALSE, namesonly = FALSE, includerdc = FALSE )
nhanesTables( data_group, year, nchar = 128, details = FALSE, namesonly = FALSE, includerdc = FALSE )
data_group |
The type of survey (DEMOGRAPHICS, DIETARY, EXAMINATION, LABORATORY, QUESTIONNAIRE). Abbreviated terms may also be used: (DEMO, DIET, EXAM, LAB, Q). |
year |
The year in yyyy format where 1999 <= yyyy. |
nchar |
Truncates the table description to a max length of nchar. |
details |
If TRUE then a more detailed description of the tables is returned (default=FALSE). |
namesonly |
If TRUE then only the table names are returned (default=FALSE). |
includerdc |
If TRUE then RDC only tables are included in list (default=FALSE). |
Function nhanesTables retrieves a list of tables and a description of their contents from the NHANES website. This provides a convenient way to browse the available tables. NULL is returned when an HTML read error is encountered.
Returns a data frame that contains table attributes. If namesonly=TRUE, then a character vector of table names is returned.
exam = nhanesTables('EXAM', 2007) dim(exam) lab = nhanesTables('LAB', 2009, details=TRUE, includerdc=TRUE) dim(lab) q = nhanesTables('Q', 2005, namesonly=TRUE) length(q) diet = nhanesTables('DIET', 'P') dim(diet) exam = nhanesTables('EXAM', 'Y') dim(exam)
exam = nhanesTables('EXAM', 2007) dim(exam) lab = nhanesTables('LAB', 2009, details=TRUE, includerdc=TRUE) dim(lab) q = nhanesTables('Q', 2005, namesonly=TRUE) length(q) diet = nhanesTables('DIET', 'P') dim(diet) exam = nhanesTables('EXAM', 'Y') dim(exam)
Summarize a NHANES table
nhanesTableSummary(nh_table, use = c("data", "codebook", "both"), ...)
nhanesTableSummary(nh_table, use = c("data", "codebook", "both"), ...)
nh_table |
the name of a valid NHANES table |
use |
character string, whether to create a summary from the
data itself or the codebook, which respectively use either the
NHANES SAS data files or the HTML documentation files. If
|
... |
additional arguments, usually passed on to either
|
Returns a per-variable summary of a NHANES table either using the actual data or its corresponding codebook
A data frame with one row per variable, with columns
depending on the value of the use
argument.
nhanesTableSummary('DEMO_D', use = "data") nhanesTableSummary('DEMO_D', use = "codebook")
nhanesTableSummary('DEMO_D', use = "data") nhanesTableSummary('DEMO_D', use = "codebook")
Enables quick display of table variables and their definitions.
nhanesTableVars( data_group, nh_table, details = FALSE, nchar = 128, namesonly = FALSE )
nhanesTableVars( data_group, nh_table, details = FALSE, nchar = 128, namesonly = FALSE )
data_group |
The type of survey (DEMOGRAPHICS, DIETARY, EXAMINATION, LABORATORY, QUESTIONNAIRE). Abbreviated terms may also be used: (DEMO, DIET, EXAM, LAB, Q). |
nh_table |
The name of the specific table to retrieve. |
details |
If TRUE then all columns in the variable description are returned (default=FALSE). |
nchar |
The number of characters in the Variable Description to print. Default length is 128, which is set to enhance readability cause variable descriptions can be very long. |
namesonly |
If TRUE then only the variable names are returned (default=FALSE). |
NHANES tables may contain more than 100 variables. Function nhanesTableVars provides a concise display of variables for a specified table, which helps to ascertain quickly if the table is of interest. NULL is returned when an HTML read error is encountered.
Returns a data frame that describes variable attributes for the specified table. If namesonly=TRUE, then a character vector of the variable names is returned.
lab_cbc = nhanesTableVars('LAB', 'CBC_E') dim(lab_cbc) exam_ohx = nhanesTableVars('EXAM', 'OHX_E', details=TRUE, nchar=50) dim(exam_ohx) demo = nhanesTableVars('DEMO', 'DEMO_F', namesonly = TRUE) length(demo)
lab_cbc = nhanesTableVars('LAB', 'CBC_E') dim(lab_cbc) exam_ohx = nhanesTableVars('EXAM', 'OHX_E', details=TRUE, nchar=50) dim(exam_ohx) demo = nhanesTableVars('DEMO', 'DEMO_F', namesonly = TRUE) length(demo)
Returns code translations for categorical variables, which appear in most NHANES tables.
nhanesTranslate( nh_table, colnames = NULL, data = NULL, nchar = 128, mincategories = 2, details = FALSE, dxa = FALSE, cleanse_numeric = FALSE )
nhanesTranslate( nh_table, colnames = NULL, data = NULL, nchar = 128, mincategories = 2, details = FALSE, dxa = FALSE, cleanse_numeric = FALSE )
nh_table |
The name of the NHANES table to retrieve. |
colnames |
The names of the columns to translate. It will translate all the columns by default. |
data |
If a data frame is passed, then code translation will
be applied directly to the data frame. |
nchar |
Applies only when data is defined. Code translations
can be very long. |
mincategories |
The minimum number of categories needed for code translations to be applied to the data (default=2). |
details |
If TRUE then all available table translation information is displayed (default=FALSE). |
dxa |
If TRUE then the 2005-2006 DXA translation table will be used (default=FALSE). |
cleanse_numeric |
Logical flag. If |
Most NHANES data tables have encoded values. E.g. 1 =
'Male', 2 = 'Female'. Thus it is often helpful to view the code
translations and perhaps insert the translated values in a data
frame. Only a single table may be specified, but multiple
variables within that table can be selected. Code translations
are retrieved for each variable. If the environment variable
NHANES_TABLE_BASE
was set during startup, the value of
this variable is used as the base URL instead of
https://wwwn.cdc.gov (this allows the use of a local or
alternative mirror of the CDC documentation).
The code translation table (or translated data frame when data is defined). Returns NULL upon error.
nhanesTranslate('DEMO_B', c('DMDBORN','DMDCITZN')) nhanesTranslate('BPX_F', 'BPACSZ', details=TRUE) nhanesTranslate('BPX_F', 'BPACSZ', data=nhanes('BPX_F')) trans_demo = nhanesTranslate('DEMO_B') length(trans_demo)
nhanesTranslate('DEMO_B', c('DMDBORN','DMDCITZN')) nhanesTranslate('BPX_F', 'BPACSZ', details=TRUE) nhanesTranslate('BPX_F', 'BPACSZ', data=nhanes('BPX_F')) trans_demo = nhanesTranslate('DEMO_B') length(trans_demo)