{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "dc8a46ce-3dbf-49ee-a0ab-1449fd6d176d",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sns\n",
"import sys\n",
"from pathlib import Path\n",
"import matplotlib.pyplot as plt\n",
"\n",
"sys.path.append(\"../../data_pipeline/\")"
]
},
{
"cell_type": "markdown",
"id": "ebdcdf20-08b6-48e6-b28b-4bebbb3655c2",
"metadata": {},
"source": [
"# Examining agricultural loss indicator"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "78cf4d99-6096-43a8-95e2-4e8328a78b18",
"metadata": {},
"outputs": [],
"source": [
"DATA_PATH = Path.cwd().parent / \"data\"\n",
"\n",
"urban_rural_from_geocorr = pd.read_csv(\n",
" DATA_PATH / \"dataset/geocorr/usa.csv\",\n",
" dtype={\"GEOID10_TRACT\": str},\n",
")\n",
"\n",
"score_m = pd.read_csv(\n",
" DATA_PATH / \"score/csv/full/usa.csv\",\n",
" dtype={\"GEOID10_TRACT\": str},\n",
" usecols=[\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index)\",\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile)\",\n",
" \"Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students?\",\n",
" \"Urban Heuristic Flag\",\n",
" \"Is low income and has a low percent of higher ed students?\",\n",
" \"GEOID10_TRACT\",\n",
" \"Total threshold criteria exceeded\",\n",
" ],\n",
")\n",
"\n",
"# Note that I downloaded this fresh because I am paranoid; this is not going to load on other computers and I am sorry!\n",
"nri_full = pd.read_csv(\n",
" \"/Users/emmausds/Desktop/current-work/NRI_Table_CensusTracts.csv\",\n",
" dtype={\"TRACTFIPS\": str},\n",
" usecols=[\n",
" \"TRACTFIPS\",\n",
" \"AGRIVALUE\",\n",
" \"CWAV_EALA\",\n",
" \"DRGT_EALA\",\n",
" \"HAIL_EALA\",\n",
" \"HWAV_EALA\",\n",
" \"HRCN_EALA\",\n",
" \"RFLD_EALA\",\n",
" \"SWND_EALA\",\n",
" \"TRND_EALA\",\n",
" \"WFIR_EALA\",\n",
" \"WNTW_EALA\",\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "04ed5585-e7e5-4209-8e55-75a3e1fce633",
"metadata": {},
"source": [
"## Understanding our current implementation\n",
"\n",
"In our current implementation, on average, urban areas have a higher NRI (scaled) and a higher percentile of the loss rate. The share of Rural and Urban tracts identified by this threshold is roughly equal. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "48250479-4074-4b04-9f8b-56a6105e8225",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Expected agricultural loss rate (Natural Hazards Risk Index)
\n",
"
Expected agricultural loss rate (Natural Hazards Risk Index) (percentile)
\n",
"
Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students?
\n",
"
\n",
"
\n",
"
Urban Heuristic Flag
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0.0
\n",
"
0.011502
\n",
"
0.486045
\n",
"
0.021132
\n",
"
\n",
"
\n",
"
1.0
\n",
"
0.016255
\n",
"
0.505159
\n",
"
0.018296
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Expected agricultural loss rate (Natural Hazards Risk Index) \\\n",
"Urban Heuristic Flag \n",
"0.0 0.011502 \n",
"1.0 0.016255 \n",
"\n",
" Expected agricultural loss rate (Natural Hazards Risk Index) (percentile) \\\n",
"Urban Heuristic Flag \n",
"0.0 0.486045 \n",
"1.0 0.505159 \n",
"\n",
" Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students? \n",
"Urban Heuristic Flag \n",
"0.0 0.021132 \n",
"1.0 0.018296 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"score_m.groupby(\"Urban Heuristic Flag\")[\n",
" [\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index)\",\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile)\",\n",
" \"Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students?\",\n",
" ]\n",
"].mean()"
]
},
{
"cell_type": "markdown",
"id": "da540db6-d07b-4dac-9959-23e29df0881b",
"metadata": {},
"source": [
"We can also look at the distribution of percentiles among the urban and rural tracts. This is very much not what I might expect -- I'd hope the rural areas were \"flatter\" in distribution than the urban areas. "
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7adf50e6-9293-40df-afd2-b1fe152d88ad",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n",
" warnings.warn(msg, FutureWarning)\n",
"/usr/local/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n",
" warnings.warn(msg, FutureWarning)\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"tmp = sns.FacetGrid(\n",
" data=score_m, col=\"Urban Heuristic Flag\", col_wrap=2, height=7\n",
")\n",
"tmp.map(\n",
" sns.distplot,\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile)\",\n",
" bins=20,\n",
" kde=True,\n",
" color=\"#62acef\",\n",
")\n",
"tmp.set(xlim=(0, 1.0))\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "1bff0654-6a88-4511-b17f-492d41023d9f",
"metadata": {},
"source": [
"But, if we look at just the raw loss rates, we see a very different distribution. This suggests to me that we are perhaps elevating the urban areas in our wealth-neutral metric. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b10adc64-3dae-429b-a0d3-b54f701587df",
"metadata": {},
"outputs": [],
"source": [
"nri_with_flag = (\n",
" nri_full.set_index(\"TRACTFIPS\")\n",
" .merge(\n",
" urban_rural_from_geocorr.set_index(\"GEOID10_TRACT\"),\n",
" left_index=True,\n",
" right_index=True,\n",
" how=\"left\",\n",
" )\n",
" .reset_index()\n",
")\n",
"\n",
"nri_with_flag[\"total_ag_loss\"] = nri_with_flag.filter(like=\"EALA\").sum(axis=1)\n",
"nri_with_flag[\"total_ag_loss_pctile\"] = nri_with_flag[\"total_ag_loss\"].rank(\n",
" pct=True\n",
")\n",
"\n",
"nri_with_flag.groupby(\"Urban Heuristic Flag\")[\"total_ag_loss_pctile\"].mean()"
]
},
{
"cell_type": "markdown",
"id": "ae3bd6f6-431a-4b73-a81d-a3e4eb14d708",
"metadata": {},
"source": [
"When we look at the distribution of agricultural value in census tracts that are urban and rural, we see that the agricultural value for urban tracts is quite skewed."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "38a8e0a0-0cc4-43c9-b0bb-2062da59a639",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
0.00
\n",
"
0.10
\n",
"
0.20
\n",
"
0.30
\n",
"
0.40
\n",
"
0.50
\n",
"
0.60
\n",
"
0.70
\n",
"
0.80
\n",
"
0.90
\n",
"
\n",
"
\n",
"
Urban Heuristic Flag
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0.00
\n",
"
0.00
\n",
"
407,226.75
\n",
"
1,305,567.65
\n",
"
2,586,000.00
\n",
"
4,411,909.10
\n",
"
7,385,163.62
\n",
"
11,582,526.63
\n",
"
18,205,814.38
\n",
"
30,706,464.84
\n",
"
56,067,604.96
\n",
"
\n",
"
\n",
"
1.00
\n",
"
0.00
\n",
"
0.00
\n",
"
0.00
\n",
"
0.00
\n",
"
141.35
\n",
"
1,812.91
\n",
"
9,755.31
\n",
"
44,319.94
\n",
"
202,972.50
\n",
"
1,060,793.40
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0.00 0.10 0.20 0.30 0.40 \\\n",
"Urban Heuristic Flag \n",
"0.00 0.00 407,226.75 1,305,567.65 2,586,000.00 4,411,909.10 \n",
"1.00 0.00 0.00 0.00 0.00 141.35 \n",
"\n",
" 0.50 0.60 0.70 0.80 \\\n",
"Urban Heuristic Flag \n",
"0.00 7,385,163.62 11,582,526.63 18,205,814.38 30,706,464.84 \n",
"1.00 1,812.91 9,755.31 44,319.94 202,972.50 \n",
"\n",
" 0.90 \n",
"Urban Heuristic Flag \n",
"0.00 56,067,604.96 \n",
"1.00 1,060,793.40 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.options.display.float_format = \"{:,.2f}\".format\n",
"nri_with_flag.groupby(\"Urban Heuristic Flag\")[\"AGRIVALUE\"].quantile(\n",
" q=np.arange(0, 1, step=0.1)\n",
").unstack()"
]
},
{
"cell_type": "markdown",
"id": "8aed6062-9503-4120-bab8-f840cb524365",
"metadata": {},
"source": [
"## Updating the metric\n",
"\n",
"So we clip the values such that agrivalue is defined as the maximum of the tract's agricultural value AND the 10th percentile of rural tracts' agrivalues. When we do that, we see that a lot more (proportionally) rural tracts exceed the 90th percentile. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "2b3a3b3e-76c2-4b6b-b96b-d439580de592",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"