| Title: | Energy Burden Analysis Using Net Energy Return Methodology |
|---|---|
| Description: | Calculate and analyze household energy burden using the Net Energy Return aggregation methodology. Functions support weighted statistical calculations across geographic and demographic cohorts, with utilities for formatting results into publication-ready tables. Methods are based on Scheier & Kittner (2022) <doi:10.1038/s41467-021-27673-y>. |
| Authors: | Eric Scheier [aut, cre, cph] |
| Maintainer: | Eric Scheier <[email protected]> |
| License: | AGPL (>= 3) |
| Version: | 0.6.2 |
| Built: | 2026-05-29 10:00:28 UTC |
| Source: | https://github.com/ericscheier/emburden |
Calculates weighted statistical metrics (mean, median, quantiles) for a specified energy metric, with optional grouping by geographic or demographic categories. This is the primary function for aggregating household-level energy burden data using proper weighting by household counts.
calculate_weighted_metrics( graph_data, group_columns, metric_name, metric_cutoff_level, upper_quantile_view = 1, lower_quantile_view = 0 )calculate_weighted_metrics( graph_data, group_columns, metric_name, metric_cutoff_level, upper_quantile_view = 1, lower_quantile_view = 0 )
graph_data |
A data frame containing household energy burden data with columns for the metric of interest, household counts, and optional grouping variables |
group_columns |
Character vector of column names to group by, or NULL for no grouping (calculates overall statistics) |
metric_name |
Character string specifying the column name of the metric to analyze (e.g., "ner" for Net Energy Return) |
metric_cutoff_level |
Numeric value defining the poverty threshold for the metric (e.g., 15.67 for Nh corresponding to 6% energy burden) |
upper_quantile_view |
Numeric between 0 and 1 specifying the upper quantile to calculate (default: 1.0 for maximum) |
lower_quantile_view |
Numeric between 0 and 1 specifying the lower quantile to calculate (default: 0.0 for minimum) |
This function requires the spatstat package for weighted quantile
calculations. It automatically handles missing values and ensures that
statistics are only calculated when sufficient data points exist (n >= 3).
The function adds an "All" category row that aggregates across all groups, in addition to the individual group statistics.
A data frame with one row per group (or one row if ungrouped) containing:
household_count |
Total number of households in the group |
households_below_cutoff |
Number of households below poverty threshold |
pct_in_group_below_cutoff |
Proportion of group below threshold |
metric_mean |
Weighted mean of the metric |
metric_median |
Weighted median of the metric |
metric_upper |
Upper quantile value |
metric_lower |
Lower quantile value |
metric_max |
Maximum value in group |
metric_min |
Minimum value in group |
# Calculate metrics for NC cooperatives using Nh library(dplyr) # Sample data data <- data.frame( cooperative = rep(c("Coop A", "Coop B"), each = 3), ner = c(20, 15, 25, 18, 22, 12), households = c(1000, 500, 750, 900, 600, 400) ) # Calculate weighted metrics by cooperative results <- calculate_weighted_metrics( graph_data = data, group_columns = "cooperative", metric_name = "ner", metric_cutoff_level = 15.67, upper_quantile_view = 0.95, lower_quantile_view = 0.05 )# Calculate metrics for NC cooperatives using Nh library(dplyr) # Sample data data <- data.frame( cooperative = rep(c("Coop A", "Coop B"), each = 3), ner = c(20, 15, 25, 18, 22, 12), households = c(1000, 500, 750, 900, 600, 400) ) # Calculate weighted metrics by cooperative results <- calculate_weighted_metrics( graph_data = data, group_columns = "cooperative", metric_name = "ner", metric_cutoff_level = 15.67, upper_quantile_view = 0.95, lower_quantile_view = 0.05 )
Check which data sources are available locally (database, CSV files, or will require download from OpenEI).
check_data_sources(verbose = TRUE)check_data_sources(verbose = TRUE)
verbose |
Logical, print detailed status (default TRUE) |
A list with status of each data source
# Check what data is available check_data_sources()# Check what data is available check_data_sources()
Nuclear option: clears ALL cached data and database. Use with caution - will require re-downloading all data.
clear_all_cache(confirm = FALSE, verbose = TRUE)clear_all_cache(confirm = FALSE, verbose = TRUE)
confirm |
Logical, must be TRUE to proceed (safety check) |
verbose |
Logical, print progress messages |
Invisibly returns list with: cache_cleared (logical), db_cleared (logical)
# Clear everything (requires confirm = TRUE) clear_all_cache(confirm = TRUE)# Clear everything (requires confirm = TRUE) clear_all_cache(confirm = TRUE)
Removes cached CSV files and database entries for a specific dataset/vintage. Useful when you know a specific dataset is corrupted.
clear_dataset_cache( dataset = c("ami", "fpl"), vintage = c("2018", "2022"), verbose = TRUE )clear_dataset_cache( dataset = c("ami", "fpl"), vintage = c("2018", "2022"), verbose = TRUE )
dataset |
Character, "ami" or "fpl" |
vintage |
Character, "2018" or "2022" |
verbose |
Logical, print progress messages |
Invisibly returns number of items cleared
# Clear corrupted AMI 2018 cache clear_dataset_cache("ami", "2018") # Clear FPL 2022 cache clear_dataset_cache("fpl", "2022", verbose = TRUE)# Clear corrupted AMI 2018 cache clear_dataset_cache("ami", "2018") # Clear FPL 2022 cache clear_dataset_cache("fpl", "2022", verbose = TRUE)
Wraps text in color formatting appropriate for the output format (LaTeX or HTML). This function is intended for use within R Markdown/knitr documents.
colorize(x, color)colorize(x, color)
x |
Character string to colorize |
color |
Character string specifying the color name (e.g., "red", "blue") |
This function detects the knitr output format and applies appropriate color
formatting. For LaTeX output, it uses \\textcolor{}. For HTML output, it
uses <span style='color: ...'>.
Character string wrapped in LaTeX or HTML color commands, or unchanged if output format is neither
# In an R Markdown document: colorize("Important text", "red")# In an R Markdown document: colorize("Important text", "red")
Compare household energy burden metrics across different data vintages, using proper Net Energy Return (Nh) aggregation methodology.
compare_energy_burden( dataset = c("ami", "fpl"), states = NULL, group_by = "income_bracket", counties = NULL, vintage_1 = "2022", vintage_2 = "2018", format = TRUE, strict_matching = TRUE )compare_energy_burden( dataset = c("ami", "fpl"), states = NULL, group_by = "income_bracket", counties = NULL, vintage_1 = "2022", vintage_2 = "2018", format = TRUE, strict_matching = TRUE )
dataset |
Character, either "ami" or "fpl" for cohort data type |
states |
Character vector of state abbreviations to filter by (optional) |
group_by |
Character or character vector. Use keywords "income_bracket" (default), "state", or "none" for standard groupings. Or provide custom column name(s) for dynamic grouping (e.g., "geoid" for tract-level, c("state_abbr", "income_bracket") for multi-level grouping). Custom columns must exist in the loaded data. |
counties |
Character vector of county names or FIPS codes to filter by (optional).
Requires |
vintage_1 |
Character, first vintage year: "2018" or "2022" (default "2022") |
vintage_2 |
Character, second vintage year: "2018" or "2022" (default "2018") |
format |
Logical, if TRUE returns formatted percentages (default TRUE) |
strict_matching |
Logical, if TRUE (default) only compares income brackets that exist in both vintages and warns about mismatched brackets. If FALSE, compares all brackets (may result in NA values for brackets unique to one vintage). |
A data.frame with energy burden comparison showing:
neb_YYYY: Net Energy Burden for each vintage (where YYYY is the year)
change_pp: Absolute change in percentage points
change_pct: Relative percent change
# Single state comparison (fast, good for learning) nc_comparison <- compare_energy_burden("ami", "NC", "income_bracket") # Overall comparison (no grouping) compare_energy_burden("ami", "NC", "none") if (interactive()) { # Multi-state regional comparison (requires census data download) southeast <- compare_energy_burden( dataset = "fpl", states = c("NC", "SC", "GA", "FL"), group_by = "state" ) # Nationwide comparison by income bracket (all 51 states) us_comparison <- compare_energy_burden( dataset = "ami", group_by = "income_bracket" ) # Compare specific counties within a state (requires census data) compare_energy_burden("fpl", "NC", counties = c("Orange", "Durham", "Wake")) # Custom grouping by tract-level geoid (requires census data) compare_energy_burden("ami", "NC", group_by = "geoid") }# Single state comparison (fast, good for learning) nc_comparison <- compare_energy_burden("ami", "NC", "income_bracket") # Overall comparison (no grouping) compare_energy_burden("ami", "NC", "none") if (interactive()) { # Multi-state regional comparison (requires census data download) southeast <- compare_energy_burden( dataset = "fpl", states = c("NC", "SC", "GA", "FL"), group_by = "state" ) # Nationwide comparison by income bracket (all 51 states) us_comparison <- compare_energy_burden( dataset = "ami", group_by = "income_bracket" ) # Compare specific counties within a state (requires census data) compare_energy_burden("fpl", "NC", counties = c("Orange", "Durham", "Wake")) # Custom grouping by tract-level geoid (requires census data) compare_energy_burden("ami", "NC", group_by = "geoid") }
Calculates DEAR as the ratio of net income after energy spending to gross income. DEAR = (G - S) / G.
dear_func(g, s, se = NULL)dear_func(g, s, se = NULL)
g |
Numeric vector of gross income values |
s |
Numeric vector of energy spending values |
se |
Optional numeric vector of effective energy spending (defaults to s) |
Numeric vector of DEAR values (ratio of disposable income to gross income)
# Calculate DEAR dear_func(50000, 3000)# Calculate DEAR dear_func(50000, 3000)
Calculates the energy burden as the ratio of energy spending to gross income. Energy burden is defined as E_b = S/G, where S is energy spending and G is gross income.
energy_burden_func(g, s, se = NULL)energy_burden_func(g, s, se = NULL)
g |
Numeric vector of gross income values |
s |
Numeric vector of energy spending values |
se |
Optional numeric vector of effective energy spending (defaults to s) |
Numeric vector of energy burden values (ratio of spending to income)
# Calculate energy burden for households gross_income <- c(50000, 75000, 100000) energy_spending <- c(3000, 3500, 4000) energy_burden_func(gross_income, energy_spending)# Calculate energy burden for households gross_income <- c(50000, 75000, 100000) energy_spending <- c(3000, 3500, 4000) energy_burden_func(gross_income, energy_spending)
Calculates the Energy Return on Investment as the ratio of gross income to effective energy spending. EROI = G/Se.
eroi_func(g, s, se = NULL)eroi_func(g, s, se = NULL)
g |
Numeric vector of gross income values |
s |
Numeric vector of energy spending values |
se |
Optional numeric vector of effective energy spending (defaults to s) |
Numeric vector of EROI values
# Calculate EROI for households eroi_func(50000, 3000)# Calculate EROI for households eroi_func(50000, 3000)
Returns metadata about available LEAD datasets.
get_dataset_info()get_dataset_info()
Data frame with dataset information
get_dataset_info()get_dataset_info()
Returns the expected income brackets for a given dataset and vintage year. Useful for understanding what brackets are available before running analyses.
get_income_brackets(dataset, vintage)get_income_brackets(dataset, vintage)
dataset |
Character, either "ami" or "fpl" |
vintage |
Integer, the year of the data vintage (e.g., 2018, 2022) |
Character vector of income bracket names
# Get AMI brackets for 2022 get_income_brackets("ami", 2022) # Get FPL brackets for 2018 get_income_brackets("fpl", 2018)# Get AMI brackets for 2022 get_income_brackets("ami", 2022) # Get FPL brackets for 2018 get_income_brackets("fpl", 2018)
Returns column names and descriptions for LEAD cohort datasets.
list_cohort_columns(dataset = NULL, vintage = NULL)list_cohort_columns(dataset = NULL, vintage = NULL)
dataset |
Character, either "ami" or "fpl" (optional, affects available columns) |
vintage |
Character, "2018" or "2022" (optional, affects available columns) |
Data frame with columns: column_name, description, data_type
list_cohort_columns() list_cohort_columns("ami", "2022")list_cohort_columns() list_cohort_columns("ami", "2022")
Returns the income brackets available for a given dataset and vintage.
list_income_brackets(dataset = c("ami", "fpl"), vintage = "2022")list_income_brackets(dataset = c("ami", "fpl"), vintage = "2022")
dataset |
Character, either "ami" or "fpl" |
vintage |
Character, "2018" or "2022" |
Character vector of income bracket labels
list_income_brackets("ami", "2022") list_income_brackets("fpl", "2018")list_income_brackets("ami", "2022") list_income_brackets("fpl", "2018")
Returns all state abbreviations available in the LEAD dataset.
list_states()list_states()
Character vector of 51 state abbreviations (50 states + DC)
list_states()list_states()
Load census tract demographics and utility service territory information with automatic fallback to CSV or OpenEI download.
load_census_tract_data(states = NULL, verbose = TRUE)load_census_tract_data(states = NULL, verbose = TRUE)
states |
Character vector of state abbreviations to filter by (optional) |
verbose |
Logical, print status messages (default TRUE) |
A tibble with columns:
geoid: Census tract identifier
state_abbr: State abbreviation
county_name: County name
tract_name: Tract name
utility_name: Electric utility serving this tract
Additional demographic columns
if (interactive()) { # Single state (requires census data download) nc_tracts <- load_census_tract_data(states = "NC") # Multiple states (regional) southeast <- load_census_tract_data(states = c("NC", "SC", "GA", "FL")) # Nationwide (all ~73,000 census tracts) us_tracts <- load_census_tract_data() # No filter = all states }if (interactive()) { # Single state (requires census data download) nc_tracts <- load_census_tract_data(states = "NC") # Multiple states (regional) southeast <- load_census_tract_data(states = c("NC", "SC", "GA", "FL")) # Nationwide (all ~73,000 census tracts) us_tracts <- load_census_tract_data() # No filter = all states }
Load household energy burden cohort data with automatic fallback:
Try local database
Fall back to local CSV files
Auto-download from OpenEI if neither exists
Auto-import downloaded data to database for future use
load_cohort_data( dataset = c("ami", "fpl"), states = NULL, counties = NULL, vintage = "2022", income_brackets = NULL, verbose = TRUE, ... )load_cohort_data( dataset = c("ami", "fpl"), states = NULL, counties = NULL, vintage = "2022", income_brackets = NULL, verbose = TRUE, ... )
dataset |
Character, either "ami" (Area Median Income) or "fpl" (Federal Poverty Line) |
states |
Character vector of state abbreviations to filter by (optional) |
counties |
Character vector of county names or FIPS codes to filter by (optional).
County names are matched case-insensitively. Requires |
vintage |
Character, data vintage: "2018" or "2022" (default "2022") |
income_brackets |
Character vector of income brackets to filter by (optional) |
verbose |
Logical, print status messages (default TRUE) |
... |
Additional filter expressions passed to dplyr::filter() for dynamic filtering.
Allows filtering by any column in the dataset using tidyverse syntax.
Example: |
A tibble with columns:
geoid: Census tract identifier
income_bracket: Income bracket label
households: Number of households
total_income: Total household income ($)
total_electricity_spend: Total electricity spending ($)
total_gas_spend: Total gas spending ($)
total_other_spend: Total other fuel spending ($)
TEN: Housing tenure category (1=Owned free/clear, 2=Owned with mortgage, 3=Rented, 4=Occupied without rent). Enables analysis of energy burden differences between renters and owners.
TEN-YBL6: Housing tenure crossed with year structure built (6 categories). Allows analysis of how building age and ownership status interact to affect energy burden (e.g., older rental units vs newer owner-occupied homes).
TEN-BLD: Housing tenure crossed with building type (e.g., single-family, multi-unit). Enables analysis of energy burden across different housing structures and ownership patterns.
TEN-HFL: Housing tenure crossed with primary heating fuel type (e.g., gas, electric, oil). Critical for analyzing how heating fuel choice and tenure status jointly influence energy costs and burden.
# Single state (fast, good for learning) nc_ami <- load_cohort_data(dataset = "ami", states = "NC") # Load specific vintage nc_2018 <- load_cohort_data(dataset = "ami", states = "NC", vintage = "2018") if (interactive()) { # Multiple states (regional analysis - requires data download) southeast <- load_cohort_data(dataset = "fpl", states = c("NC", "SC", "GA", "FL")) # Nationwide (all 51 states - no filter) us_data <- load_cohort_data(dataset = "ami", vintage = "2022") # Filter to specific income brackets low_income <- load_cohort_data( dataset = "ami", states = "NC", income_brackets = c("0-30% AMI", "30-50% AMI") ) # Filter to specific counties within a state triangle <- load_cohort_data( dataset = "fpl", states = "NC", counties = c("Orange", "Durham", "Wake") ) # Or use county FIPS codes orange <- load_cohort_data( dataset = "fpl", states = "NC", counties = "37135" ) # Use dynamic filtering for custom criteria high_burden <- load_cohort_data( dataset = "ami", states = "NC", households > 100, total_electricity_spend / total_income > 0.06 ) # Analyze energy burden by housing characteristics # Compare renters vs owners by heating fuel type nc_housing <- load_cohort_data(dataset = "ami", states = "NC") library(dplyr) # Group by tenure and heating fuel to analyze energy burden patterns housing_analysis <- nc_housing %>% filter(!is.na(TEN), !is.na(`TEN-HFL`)) %>% group_by(TEN, `TEN-HFL`) %>% summarise( total_households = sum(households), avg_energy_burden = weighted.mean( (total_electricity_spend + total_gas_spend + total_other_spend) / total_income, w = households, na.rm = TRUE ), .groups = "drop" ) }# Single state (fast, good for learning) nc_ami <- load_cohort_data(dataset = "ami", states = "NC") # Load specific vintage nc_2018 <- load_cohort_data(dataset = "ami", states = "NC", vintage = "2018") if (interactive()) { # Multiple states (regional analysis - requires data download) southeast <- load_cohort_data(dataset = "fpl", states = c("NC", "SC", "GA", "FL")) # Nationwide (all 51 states - no filter) us_data <- load_cohort_data(dataset = "ami", vintage = "2022") # Filter to specific income brackets low_income <- load_cohort_data( dataset = "ami", states = "NC", income_brackets = c("0-30% AMI", "30-50% AMI") ) # Filter to specific counties within a state triangle <- load_cohort_data( dataset = "fpl", states = "NC", counties = c("Orange", "Durham", "Wake") ) # Or use county FIPS codes orange <- load_cohort_data( dataset = "fpl", states = "NC", counties = "37135" ) # Use dynamic filtering for custom criteria high_burden <- load_cohort_data( dataset = "ami", states = "NC", households > 100, total_electricity_spend / total_income > 0.06 ) # Analyze energy burden by housing characteristics # Compare renters vs owners by heating fuel type nc_housing <- load_cohort_data(dataset = "ami", states = "NC") library(dplyr) # Group by tenure and heating fuel to analyze energy burden patterns housing_analysis <- nc_housing %>% filter(!is.na(TEN), !is.na(`TEN-HFL`)) %>% group_by(TEN, `TEN-HFL`) %>% summarise( total_households = sum(households), avg_energy_burden = weighted.mean( (total_electricity_spend + total_gas_spend + total_other_spend) / total_income, w = households, na.rm = TRUE ), .groups = "drop" ) }
A comprehensive dataset containing energy burden data for all counties in North Carolina. This dataset includes both Federal Poverty Line (FPL) and Area Median Income (AMI) cohort data for 2018 and 2022 vintages, aggregated to the census tract × income bracket level.
nc_samplenc_sample
A named list with 4 data frames:
Federal Poverty Line cohort data for 2018 (~10,805 rows)
Federal Poverty Line cohort data for 2022 (~13,185 rows)
Area Median Income cohort data for 2018 (~6,484 rows)
Area Median Income cohort data for 2022 (~5,091 rows)
Each data frame contains:
11-digit census tract identifier (character)
Income bracket category (character)
Number of households in this cohort (numeric)
Total household income in dollars (numeric)
Total electricity spending in dollars (numeric)
Total gas spending in dollars (numeric)
Total other fuel spending in dollars (numeric)
This sample data provides full state coverage for more comprehensive analysis, testing,
and demonstrations. For lightweight quick demos, see orange_county_sample.
North Carolina (all 100 counties):
2018: 2,163 census tracts
2022: 2,642 census tracts (tract boundaries changed)
Income Brackets:
FPL: 0-100%, 100-150%, 150-200%, 200-400%, 400%+
AMI: Varies by vintage (4-6 categories)
Size: 1.3 MB compressed (.rda)
U.S. Department of Energy Low-Income Energy Affordability Data (LEAD) Tool
2018 vintage: https://data.openei.org/submissions/573
2022 vintage: https://data.openei.org/submissions/6219
orange_county_sample - Lightweight sample (94 KB) for quick demos
load_cohort_data - Load data for any state with county filtering
compare_energy_burden - Compare energy burden across vintages
calculate_weighted_metrics - Calculate weighted metrics with grouping
# Load sample data data(nc_sample) # View structure names(nc_sample) # Analyze energy burden by county library(dplyr) # Extract county FIPS (first 5 digits of geoid) nc_sample$fpl_2022 %>% mutate(county_fips = substr(geoid, 1, 5)) %>% group_by(county_fips, income_bracket) %>% summarise( households = sum(households), avg_energy_burden = sum(total_electricity_spend + total_gas_spend + total_other_spend) / sum(total_income), .groups = "drop" ) %>% filter(county_fips == "37183") # Wake County # Compare urban vs rural counties urban_counties <- c("37119", "37063", "37183") # Mecklenburg, Durham, Wake rural_counties <- c("37069", "37095", "37131") # Franklin, Hyde, Northampton nc_sample$fpl_2022 %>% mutate( county_fips = substr(geoid, 1, 5), region = case_when( county_fips %in% urban_counties ~ "Urban", county_fips %in% rural_counties ~ "Rural", TRUE ~ "Other" ) ) %>% filter(region != "Other") %>% group_by(region, income_bracket) %>% summarise( households = sum(households), energy_burden = sum(total_electricity_spend + total_gas_spend + total_other_spend) / sum(total_income), .groups = "drop" )# Load sample data data(nc_sample) # View structure names(nc_sample) # Analyze energy burden by county library(dplyr) # Extract county FIPS (first 5 digits of geoid) nc_sample$fpl_2022 %>% mutate(county_fips = substr(geoid, 1, 5)) %>% group_by(county_fips, income_bracket) %>% summarise( households = sum(households), avg_energy_burden = sum(total_electricity_spend + total_gas_spend + total_other_spend) / sum(total_income), .groups = "drop" ) %>% filter(county_fips == "37183") # Wake County # Compare urban vs rural counties urban_counties <- c("37119", "37063", "37183") # Mecklenburg, Durham, Wake rural_counties <- c("37069", "37095", "37131") # Franklin, Hyde, Northampton nc_sample$fpl_2022 %>% mutate( county_fips = substr(geoid, 1, 5), region = case_when( county_fips %in% urban_counties ~ "Urban", county_fips %in% rural_counties ~ "Rural", TRUE ~ "Other" ) ) %>% filter(region != "Other") %>% group_by(region, income_bracket) %>% summarise( households = sum(households), energy_burden = sum(total_electricity_spend + total_gas_spend + total_other_spend) / sum(total_income), .groups = "drop" )
Calculates Net Energy Burden with proper aggregation methodology via the Net Energy Return (Nh) framework. For individual households, NEB = EB = S/G. When aggregating across households (with weights), automatically uses the Nh method to avoid 1-5% aggregation errors.
neb_func(g, s, se = NULL, weights = NULL, aggregate = FALSE)neb_func(g, s, se = NULL, weights = NULL, aggregate = FALSE)
g |
Numeric vector of gross income values |
s |
Numeric vector of energy spending values |
se |
Optional numeric vector of effective energy spending (defaults to s) |
weights |
Optional numeric vector of weights for aggregation (e.g., household counts).
When provided, uses Nh method: |
aggregate |
Logical, if TRUE forces aggregation even without weights (uses unweighted mean). Default FALSE for backwards compatibility. |
Individual Level: NEB = EB = S/G (mathematically identical)
Aggregation Modes:
No aggregation (default): Returns vector of individual NEB values
neb_func(income, spending) # Returns vector
Weighted aggregation: Automatically uses Nh method when weights provided
neb_func(income, spending, weights = households) # Returns single value
Unweighted aggregation: Use aggregate = TRUE for simple mean
neb_func(income, spending, aggregate = TRUE) # Returns single value
Why Nh Method? Avoids 1-5% error from naive averaging:
CORRECT: neb_func(g, s, weights = w) → Uses Nh internally
WRONG: weighted.mean(s/g, w) → Introduces bias
The Nh method: 1 / (1 + weighted.mean(nh, weights)) where nh = (g-s)/se
uses arithmetic mean instead of harmonic mean, providing computational
simplicity and numerical stability.
If weights = NULL and aggregate = FALSE: Numeric vector of individual NEB values (S/G)
If weights provided or aggregate = TRUE: Single aggregated NEB value via Nh method
ner_func() for the Net Energy Return (Nh) calculation
energy_burden_func() for simple EB without aggregation support
# Individual household - returns vector neb_func(50000, 3000) # 0.06 neb_func(c(30000, 50000), c(3000, 3500)) # c(0.10, 0.07) # Aggregation with weights - returns single value (CORRECT method) incomes <- c(30000, 50000, 75000) spending <- c(3000, 3500, 4000) households <- c(100, 150, 200) neb_func(incomes, spending, weights = households) # Unweighted aggregation neb_func(incomes, spending, aggregate = TRUE) # Comparison: naive mean (WRONG) vs Nh method (CORRECT) neb_naive <- weighted.mean(spending/incomes, households) # Biased neb_correct <- neb_func(incomes, spending, weights = households) # Correct abs(neb_naive - neb_correct) / neb_correct # ~1-5% error# Individual household - returns vector neb_func(50000, 3000) # 0.06 neb_func(c(30000, 50000), c(3000, 3500)) # c(0.10, 0.07) # Aggregation with weights - returns single value (CORRECT method) incomes <- c(30000, 50000, 75000) spending <- c(3000, 3500, 4000) households <- c(100, 150, 200) neb_func(incomes, spending, weights = households) # Unweighted aggregation neb_func(incomes, spending, aggregate = TRUE) # Comparison: naive mean (WRONG) vs Nh method (CORRECT) neb_naive <- weighted.mean(spending/incomes, households) # Biased neb_correct <- neb_func(incomes, spending, weights = households) # Correct abs(neb_naive - neb_correct) / neb_correct # ~1-5% error
Calculates the Net Energy Return using the formula Nh = (G - S) / Se, where G is gross income, S is energy spending, and Se is effective energy spending. This metric is the preferred aggregation variable as it properly accounts for harmonic mean behavior when aggregating across households.
ner_func(g, s, se = NULL)ner_func(g, s, se = NULL)
g |
Numeric vector of gross income values |
s |
Numeric vector of energy spending values |
se |
Optional numeric vector of effective energy spending (defaults to s) |
The Net Energy Return is mathematically related to energy burden by: E_b = 1 / (Nh + 1), or equivalently: Nh = (1/E_b) - 1
Why use Nh for aggregation?
For individual household data, the Nh method enables simple arithmetic weighted mean aggregation:
Via Nh: neb = 1 / (1 + weighted.mean(nh, weights)) (arithmetic mean)
Direct EB: neb = 1 / weighted.mean(1/eb, weights) (harmonic mean)
Computational advantages of the arithmetic mean approach:
Simpler to compute - Uses standard weighted.mean() function
More numerically stable - Avoids division by very small EB values (e.g., 0.01)
More interpretable - "Average net return per dollar spent on energy"
Prevents errors - Makes it obvious you can't use arithmetic mean on EB directly
For cohort data (pre-aggregated totals), direct calculation sum(S)/sum(G)
is mathematically equivalent to the Nh method but simpler.
The 6% energy burden poverty threshold corresponds to Nh 15.67.
Numeric vector of Net Energy Return (Nh) values
# Calculate Net Energy Return gross_income <- 50000 energy_spending <- 3000 nh <- ner_func(gross_income, energy_spending) # Convert back to energy burden energy_burden <- 1 / (nh + 1)# Calculate Net Energy Return gross_income <- 50000 energy_spending <- 3000 nh <- ner_func(gross_income, energy_spending) # Convert back to energy burden energy_burden <- 1 / (nh + 1)
A sample dataset containing energy burden data for Orange County, North Carolina (FIPS code 37135). This dataset includes both Federal Poverty Line (FPL) and Area Median Income (AMI) cohort data for 2018 and 2022 vintages.
orange_county_sampleorange_county_sample
A named list with 4 data frames:
Federal Poverty Line cohort data for 2018 (135 rows)
Federal Poverty Line cohort data for 2022 (206 rows)
Area Median Income cohort data for 2018 (259 rows)
Area Median Income cohort data for 2022 (149 rows)
Each data frame contains:
11-digit census tract identifier (character)
Income bracket category (character)
Number of households in this cohort (numeric)
Total household income in dollars (numeric)
Total electricity spending in dollars (numeric)
Total gas spending in dollars (numeric)
Total other fuel spending in dollars (numeric)
This sample data is provided for quick demos, testing, and vignettes without
requiring a large download. For full state or national analysis, use
load_cohort_data() to download complete datasets from OpenEI.
Orange County NC (Chapel Hill, Carrboro, Hillsborough):
2018: 27 census tracts
2022: 42 census tracts (tract boundaries changed)
Income Brackets:
FPL: 0-100%, 100-150%, 150-200%, 200-400%, 400%+
AMI: very_low, low_mod, mid_high (aggregated from 6 AMI categories)
U.S. Department of Energy Low-Income Energy Affordability Data (LEAD) Tool
2018 vintage: https://data.openei.org/submissions/573
2022 vintage: https://data.openei.org/submissions/6219
load_cohort_data - Load full datasets for any state
compare_energy_burden - Compare energy burden across vintages
calculate_weighted_metrics - Calculate weighted metrics with grouping
# Load sample data data(orange_county_sample) # View structure names(orange_county_sample) # Quick analysis of 2022 FPL data library(dplyr) orange_county_sample$fpl_2022 %>% group_by(income_bracket) %>% summarise( households = sum(households), avg_energy_burden = sum(total_electricity_spend + total_gas_spend + total_other_spend) / sum(total_income) )# Load sample data data(orange_county_sample) # View structure names(orange_county_sample) # Quick analysis of 2022 FPL data library(dplyr) orange_county_sample$fpl_2022 %>% group_by(income_bracket) %>% summarise( households = sum(households), avg_energy_burden = sum(total_electricity_spend + total_gas_spend + total_other_spend) / sum(total_income) )
Pretty-print a comparison table from compare_energy_burden()
## S3 method for class 'energy_burden_comparison' print(x, ...)## S3 method for class 'energy_burden_comparison' print(x, ...)
x |
Comparison result from compare_energy_burden() |
... |
Additional arguments (not used) |
Returns x invisibly for use in pipe chains.
Converts numeric values to formatted strings with thousand separators (commas).
to_big(x)to_big(x)
x |
Numeric vector to format |
Character vector of formatted numbers
# Format large numbers to_big(c(1000, 25000, 1000000))# Format large numbers to_big(c(1000, 25000, 1000000))
Converts large dollar values to billions format with dollar sign prefix. Values less than 1 billion are shown in millions.
to_billion_dollar(x, suffix = " billion", override_to_k = TRUE)to_billion_dollar(x, suffix = " billion", override_to_k = TRUE)
x |
Numeric vector to format |
suffix |
Character string to append after "billion" (default: " billion") |
override_to_k |
Logical (currently unused, kept for compatibility) |
Character vector of formatted dollar amounts with "billion" or "m" suffix
# Format in billions to_billion_dollar(c(5000000, 1000000000, 2500000000))# Format in billions to_billion_dollar(c(5000000, 1000000000, 2500000000))
Converts numeric values to formatted dollar strings with appropriate decimal places and thousand separators.
to_dollar(x, latex = FALSE)to_dollar(x, latex = FALSE)
x |
Numeric vector to format |
latex |
Logical indicating whether to escape dollar sign for LaTeX (default: FALSE) |
Character vector of formatted dollar amounts
# Format dollar amounts to_dollar(c(1000, 2500.50, 10000)) # LaTeX-escaped format to_dollar(c(1000, 2500.50), latex = TRUE)# Format dollar amounts to_dollar(c(1000, 2500.50, 10000)) # LaTeX-escaped format to_dollar(c(1000, 2500.50), latex = TRUE)
Converts large numeric values to millions format with appropriate suffix. Values less than 1 million are shown in thousands.
to_million(x, suffix = " million", override_to_k = TRUE)to_million(x, suffix = " million", override_to_k = TRUE)
x |
Numeric vector to format |
suffix |
Character string to append after "million" (default: " million") |
override_to_k |
Logical indicating whether to show values < 1M as thousands (default: TRUE) |
Character vector of formatted numbers with "million" or "k" suffix
# Format in millions to_million(c(5000, 1000000, 2500000))# Format in millions to_million(c(5000, 1000000, 2500000))
Converts numeric values to formatted percentage strings with no decimal places by default.
to_percent(x, latex = FALSE)to_percent(x, latex = FALSE)
x |
Numeric vector to format (as proportions, not percentages) |
latex |
Logical indicating whether to escape percent sign for LaTeX (default: FALSE) |
Character vector of formatted percentages
# Format percentages to_percent(c(0.25, 0.50, 0.123)) # LaTeX-escaped format to_percent(c(0.25, 0.50), latex = TRUE)# Format percentages to_percent(c(0.25, 0.50, 0.123)) # LaTeX-escaped format to_percent(c(0.25, 0.50), latex = TRUE)