added first pass

2025-08-06 22:24:34 -07:00 · 2021-12-09 13:52:50 -05:00 · 2021-12-09 13:52:50 -05:00 · e82b02bd2f
commit e82b02bd2f
parent aa68a43c84
1 changed files with 328 additions and 0 deletions
--- a/data/data-pipeline/data_pipeline/ipython/hud_eda_se_12_09_2021.ipynb
+++ b/data/data-pipeline/data_pipeline/ipython/hud_eda_se_12_09_2021.ipynb
@ -0,0 +1,328 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Methodology per Blair Russell\n",
+    "\n",
+    "We may want to rethink the denominator of our equation for housing cost burden.\n",
+    "\n",
+    "\"Right now it’s all housing units with a cost burden computed. \n",
+    "\n",
+    "Alternatively, you could use low-income households (with cost burden computed) as the denominator, which would be a measure of relative cost burden just for low-income households. \n",
+    "\n",
+    "Both approaches are appropriate, but they tell a different story. You can imagine an area with few low-income households but a vast majority of them being cost burdened. In your calculation, you’d get a small percentage. \n",
+    "\n",
+    "In the alternative approach, it’s a large percentage. Just something to think about. It depends on the story you want to tell.\"\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Indicator reviewed: \n",
+    "\n",
+    "Socioeconomic Factors Indicator reviewed\n",
+    "*  [Extreme Housing Burden](#housingburden)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Packages"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import math\n",
+    "import numpy as np\n",
+    "import os\n",
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Extreme Housing Burden <a id='housingburden'></a>\n",
+    "\n",
+    "The Extreme Housing Burden indicator represents the proportion of low-income households that have to spend more than half their income on rent. These households experience higher levels of stress, report lower health, and may delay medical treatment because of its high cost.\n",
+    "\n",
+    "The Extreme Housing Burden indicator measures the percent of households in a census tract that are:\n",
+    "\n",
+    "1. Making less than 80% of the Area Median Family Income as determined by the Department of Housing and Urban Development (HUD), and\n",
+    "2. Paying greater than 50% of their income to housing costs. \n",
+    "\n",
+    "This data is sourced from the 2014-2018 Comprehensive Housing Affordability Strategy dataset from the Department of Housing and Urban Development (HUD) using the census tract geographic summary level, and contains cost burdens for households by percent HUD-adjusted median family income (HAMFI) category. This data can be found [here](https://www.huduser.gov/portal/datasets/cp.html). \n",
+    "\n",
+    "Because CHAS data is based on American Communities Survey (ACS) estimates, which come from a sample of the population, they may be unreliable if based on a small sample or population size. T\n",
+    "\n",
+    "The standard error and relative standard error were used to evaluate the reliability of each estimate using CalEnviroScreen’s methodology. \n",
+    "\n",
+    "Census tract estimates that met either of the following criteria were considered reliable and included in the analysis(CalEnviroScreen, 2017):\n",
+    "\n",
+    "- Relative standard error less than 50 (meaning the standard error was less than half of the estimate), OR \n",
+    "- Standard error less than the mean standard error of all census tract estimates \n",
+    "\n",
+    "Formulas for calculating the standard error of sums, proportions, and ratio come from the [American Communities Survey Office](https://www2.census.gov/programs-surveys/acs/tech_docs/accuracy/MultiyearACSAccuracyofData2013.pdf).\n",
+    "\n",
+    "Note that this code creates a score and rank by state, for every state."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The relevant variables in table 8 of the CHAS dataset are the following (CHAS data dictionary available [here](https://www.huduser.gov/portal/datasets/cp/CHAS-data-dictionary-14-18.xlsx)):\n",
+    "\n",
+    "|   Name  |                          Label                      |\n",
+    "|---------|-----------------------------------------------------|\n",
+    "|T1_est1  |                                   Total Occupied housing units                                      | \n",
+    "|T8_est10 |            Owner occupied less than or equal to 30% of HAMFI cost burden greater than 50%           |\n",
+    "|T8_est23 |Owner occupied greater than 30% but less than or equal to 50% of HAMFI\tcost burden greater than 50%|\n",
+    "|T8_est36 |Owner occupied\tgreater than 50% but less than or equal to 80% of HAMFI\tcost burden greater than 50%|\n",
+    "|T8_est76 |           Renter occupied less than or equal to 30% of HAMFI cost burden greater than 50%           |\n",
+    "|T8_est89 |Renter occupied\tgreater than 30% but less than or equal to 50% of HAMFI\tcost burden greater than 50%|\n",
+    "|T8_est102|Renter occupied\tgreater than 50% but less than or equal to 80% of HAMFI\tcost burden greater than 50%|\n",
+    " "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Below I also propose an alternate means for ranking census tracts\n",
+    "### These steps are outlined and commented below"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/usr/local/lib/python3.9/site-packages/pandas/core/arraylike.py:364: RuntimeWarning: invalid value encountered in sqrt\n",
+      "  result = getattr(ufunc, method)(*inputs, **kwargs)\n",
+      "/usr/local/lib/python3.9/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: \n",
+      "A value is trying to be set on a copy of a slice from a DataFrame\n",
+      "\n",
+      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
+      "  self._setitem_single_block(indexer, value, name)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Read in the data from https://www.huduser.gov/portal/datasets/cp.html\n",
+    "housing = pd.read_csv(\"Table8.csv\", \n",
+    "                      encoding = \"ISO-8859-1\",  \n",
+    "                      dtype = {'Tract_ID': object, 'st': object, 'geoid': object})\n",
+    "\n",
+    "# Save only the necessary variables\n",
+    "housing = housing[['geoid', 'name', 'st',\n",
+    "             'T8_est10', 'T8_moe10',\n",
+    "             'T8_est23', 'T8_moe23', \n",
+    "             'T8_est36','T8_moe36', \n",
+    "             'T8_est76', 'T8_moe76',\n",
+    "             'T8_est89', 'T8_moe89',\n",
+    "             'T8_est102', 'T8_moe102', \n",
+    "             'T8_est1', 'T8_moe1']]\n",
+    "\n",
+    "# Remove data for states that aren't included in the census (e.g. American Samoa, Guam, etc.):\n",
+    "housing.drop(housing.loc[housing['st'] == '72'].index, inplace = True)\n",
+    "\n",
+    "# Combine owner and renter occupied low-income households that make less than 80% of HAMFI into one variable\n",
+    "housing['summed'] = (housing['T8_est10'] + \n",
+    "                     housing['T8_est23'] + \n",
+    "                     housing['T8_est36'] + \n",
+    "                     housing['T8_est76'] + \n",
+    "                     housing['T8_est89'] + \n",
+    "                     housing['T8_est102'])\n",
+    "\n",
+    "# Create a variable for the standard error of the summed variables\n",
+    "housing['summed_se'] = np.sqrt((housing['T8_moe10'] / 1.645)**2 + \n",
+    "                                (housing['T8_moe23'] / 1.645)**2 + \n",
+    "                                (housing['T8_moe36'] / 1.645)**2 + \n",
+    "                                (housing['T8_moe76'] / 1.645)**2 + \n",
+    "                                (housing['T8_moe89'] / 1.645)**2 + \n",
+    "                                (housing['T8_moe102'] / 1.645)**2)\n",
+    "\n",
+    "# Remove the first 7 digits in the FIPS Census Tract ID \n",
+    "housing['geoid'] = housing['geoid'].str[-11:]\n",
+    "\n",
+    "# Find the estimate of the proportion of the population that is heavily rent burdened\n",
+    "housing['hbrd_score'] = housing['summed'] / housing['T8_est1']\n",
+    "\n",
+    "# Change rates where the population is 0 to nan\n",
+    "housing['hbrd_score'].replace(np.inf, np.nan, inplace = True)\n",
+    "\n",
+    "# Create function for calculating the standard error, using the proportions standard error formula\n",
+    "#  if the value under the radical is negative, use the ratio standard error formula\n",
+    "def se_prop(x, y, se_x, moe_y): \n",
+    "    se_y = moe_y / 1.645\n",
+    "    test = se_x**2 - (((x**2)/(y**2))*((se_y)**2))\n",
+    "    se = np.where(test < 0,\n",
+    "                   (1/y) * np.sqrt(se_x**2 + (((x**2)/(y**2))*(se_y**2))), \n",
+    "                   (1/y) * np.sqrt(se_x**2 - (((x**2)/(y**2))*(se_y**2))))\n",
+    "    return se\n",
+    "\n",
+    "housing['se'] = se_prop(housing['summed'], housing['T8_est1'], housing['summed_se'], housing['T8_moe1'])\n",
+    "\n",
+    "# Calculate the relative standard error\n",
+    "housing['rse'] = housing['se'] / housing['hbrd_score']*100\n",
+    "\n",
+    "# Change infinite rse's where the housing burden is 0 to np.nan\n",
+    "housing['rse'].replace(np.inf, np.nan, inplace = True)\n",
+    "\n",
+    "# Calculate the mean standard error for each state\n",
+    "housing['mean_state_se'] = np.zeros(len(housing))\n",
+    "\n",
+    "for state in housing['st'].unique():\n",
+    "    mean_se = np.mean(housing[housing['st'] == state]['se'])\n",
+    "    housing['mean_state_se'].loc[housing['st'] == state] = mean_se\n",
+    "    \n",
+    "# Find census tract estimates that meet both of the following criteria and are thus considered unreliable estimates: \n",
+    "# RSE less than 50 AND\n",
+    "# SE less than the mean state SE or housing burdened low income households\n",
+    "# Convert these scores to nan\n",
+    "housing.loc[(housing['rse'] >= 50) & (housing['rse'] >= housing['mean_state_se']), 'hbrd_score'] = np.nan\n",
+    "\n",
+    "# Rename columns\n",
+    "housing = housing.rename(columns = {'geoid' :'FIPS_tract_id',\n",
+    "                                    'st' : 'state'\n",
+    "                                   })\n",
+    "\n",
+    "# Calculate percentile rank for census tracts with a score above 0, set percentile to 0 if score is 0, for each state\n",
+    "housing['hbrd_rank'] = housing[\n",
+    "            housing['hbrd_score'] != 0][['hbrd_score',\n",
+    "                'state']].groupby('state').rank( \n",
+    "                na_option = 'keep', \n",
+    "                pct = True) * 100\n",
+    "\n",
+    "housing.loc[housing['hbrd_score'] == 0, 'hbrd_rank'] = 0\n",
+    "\n",
+    "# Create final housing burden df\n",
+    "housingburden = housing[['state', 'FIPS_tract_id','hbrd_score','hbrd_rank']]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>state</th>\n",
+       "      <th>FIPS_tract_id</th>\n",
+       "      <th>hbrd_score</th>\n",
+       "      <th>hbrd_rank</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>01</td>\n",
+       "      <td>01001020100</td>\n",
+       "      <td>0.104575</td>\n",
+       "      <td>46.298077</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>01</td>\n",
+       "      <td>01001020200</td>\n",
+       "      <td>0.191667</td>\n",
+       "      <td>83.269231</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>01</td>\n",
+       "      <td>01001020300</td>\n",
+       "      <td>0.131274</td>\n",
+       "      <td>63.653846</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>01</td>\n",
+       "      <td>01001020400</td>\n",
+       "      <td>0.088415</td>\n",
+       "      <td>34.615385</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>01</td>\n",
+       "      <td>01001020500</td>\n",
+       "      <td>0.142515</td>\n",
+       "      <td>68.221154</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "  state FIPS_tract_id  hbrd_score  hbrd_rank\n",
+       "0    01   01001020100    0.104575  46.298077\n",
+       "1    01   01001020200    0.191667  83.269231\n",
+       "2    01   01001020300    0.131274  63.653846\n",
+       "3    01   01001020400    0.088415  34.615385\n",
+       "4    01   01001020500    0.142515  68.221154"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "housingburden.head()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}