j40-cejst-2/data/data-pipeline/data_pipeline/ipython/agricultural_loss_indicator.ipynb
Travis Newby a27ca46b1d
Update dependencies to fix safety check failures (#2142)
* Update dependencies

Update dependencies causing safety check to fail

* Remove nb_black from jupyter notebooks

Because of the build issue on M1 macs, nb_black was removed as a dev dependency. This change removes the lines referencing nb_black (%load_ext lab_black) from all jupyter notebooks.
2023-02-02 16:43:59 -06:00

1004 lines
122 KiB
Text

{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "dc8a46ce-3dbf-49ee-a0ab-1449fd6d176d",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sns\n",
"import sys\n",
"from pathlib import Path\n",
"import matplotlib.pyplot as plt\n",
"\n",
"sys.path.append(\"../../data_pipeline/\")"
]
},
{
"cell_type": "markdown",
"id": "ebdcdf20-08b6-48e6-b28b-4bebbb3655c2",
"metadata": {},
"source": [
"# Examining agricultural loss indicator"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "78cf4d99-6096-43a8-95e2-4e8328a78b18",
"metadata": {},
"outputs": [],
"source": [
"DATA_PATH = Path.cwd().parent / \"data\"\n",
"\n",
"urban_rural_from_geocorr = pd.read_csv(\n",
" DATA_PATH / \"dataset/geocorr/usa.csv\",\n",
" dtype={\"GEOID10_TRACT\": str},\n",
")\n",
"\n",
"score_m = pd.read_csv(\n",
" DATA_PATH / \"score/csv/full/usa.csv\",\n",
" dtype={\"GEOID10_TRACT\": str},\n",
" usecols=[\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index)\",\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile)\",\n",
" \"Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students?\",\n",
" \"Urban Heuristic Flag\",\n",
" \"Is low income and has a low percent of higher ed students?\",\n",
" \"GEOID10_TRACT\",\n",
" \"Total threshold criteria exceeded\",\n",
" ],\n",
")\n",
"\n",
"# Note that I downloaded this fresh because I am paranoid; this is not going to load on other computers and I am sorry!\n",
"nri_full = pd.read_csv(\n",
" \"/Users/emmausds/Desktop/current-work/NRI_Table_CensusTracts.csv\",\n",
" dtype={\"TRACTFIPS\": str},\n",
" usecols=[\n",
" \"TRACTFIPS\",\n",
" \"AGRIVALUE\",\n",
" \"CWAV_EALA\",\n",
" \"DRGT_EALA\",\n",
" \"HAIL_EALA\",\n",
" \"HWAV_EALA\",\n",
" \"HRCN_EALA\",\n",
" \"RFLD_EALA\",\n",
" \"SWND_EALA\",\n",
" \"TRND_EALA\",\n",
" \"WFIR_EALA\",\n",
" \"WNTW_EALA\",\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "04ed5585-e7e5-4209-8e55-75a3e1fce633",
"metadata": {},
"source": [
"## Understanding our current implementation\n",
"\n",
"In our current implementation, on average, urban areas have a higher NRI (scaled) and a higher percentile of the loss rate. The share of Rural and Urban tracts identified by this threshold is roughly equal. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "48250479-4074-4b04-9f8b-56a6105e8225",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Expected agricultural loss rate (Natural Hazards Risk Index)</th>\n",
" <th>Expected agricultural loss rate (Natural Hazards Risk Index) (percentile)</th>\n",
" <th>Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students?</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Urban Heuristic Flag</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0.0</th>\n",
" <td>0.011502</td>\n",
" <td>0.486045</td>\n",
" <td>0.021132</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1.0</th>\n",
" <td>0.016255</td>\n",
" <td>0.505159</td>\n",
" <td>0.018296</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Expected agricultural loss rate (Natural Hazards Risk Index) \\\n",
"Urban Heuristic Flag \n",
"0.0 0.011502 \n",
"1.0 0.016255 \n",
"\n",
" Expected agricultural loss rate (Natural Hazards Risk Index) (percentile) \\\n",
"Urban Heuristic Flag \n",
"0.0 0.486045 \n",
"1.0 0.505159 \n",
"\n",
" Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students? \n",
"Urban Heuristic Flag \n",
"0.0 0.021132 \n",
"1.0 0.018296 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"score_m.groupby(\"Urban Heuristic Flag\")[\n",
" [\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index)\",\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile)\",\n",
" \"Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students?\",\n",
" ]\n",
"].mean()"
]
},
{
"cell_type": "markdown",
"id": "da540db6-d07b-4dac-9959-23e29df0881b",
"metadata": {},
"source": [
"We can also look at the distribution of percentiles among the urban and rural tracts. This is very much not what I might expect -- I'd hope the rural areas were \"flatter\" in distribution than the urban areas. "
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7adf50e6-9293-40df-afd2-b1fe152d88ad",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n",
" warnings.warn(msg, FutureWarning)\n",
"/usr/local/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n",
" warnings.warn(msg, FutureWarning)\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1008x504 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"tmp = sns.FacetGrid(\n",
" data=score_m, col=\"Urban Heuristic Flag\", col_wrap=2, height=7\n",
")\n",
"tmp.map(\n",
" sns.distplot,\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile)\",\n",
" bins=20,\n",
" kde=True,\n",
" color=\"#62acef\",\n",
")\n",
"tmp.set(xlim=(0, 1.0))\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "1bff0654-6a88-4511-b17f-492d41023d9f",
"metadata": {},
"source": [
"But, if we look at just the raw loss rates, we see a very different distribution. This suggests to me that we are perhaps elevating the urban areas in our wealth-neutral metric. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b10adc64-3dae-429b-a0d3-b54f701587df",
"metadata": {},
"outputs": [],
"source": [
"nri_with_flag = (\n",
" nri_full.set_index(\"TRACTFIPS\")\n",
" .merge(\n",
" urban_rural_from_geocorr.set_index(\"GEOID10_TRACT\"),\n",
" left_index=True,\n",
" right_index=True,\n",
" how=\"left\",\n",
" )\n",
" .reset_index()\n",
")\n",
"\n",
"nri_with_flag[\"total_ag_loss\"] = nri_with_flag.filter(like=\"EALA\").sum(axis=1)\n",
"nri_with_flag[\"total_ag_loss_pctile\"] = nri_with_flag[\"total_ag_loss\"].rank(\n",
" pct=True\n",
")\n",
"\n",
"nri_with_flag.groupby(\"Urban Heuristic Flag\")[\"total_ag_loss_pctile\"].mean()"
]
},
{
"cell_type": "markdown",
"id": "ae3bd6f6-431a-4b73-a81d-a3e4eb14d708",
"metadata": {},
"source": [
"When we look at the distribution of agricultural value in census tracts that are urban and rural, we see that the agricultural value for urban tracts is quite skewed."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "38a8e0a0-0cc4-43c9-b0bb-2062da59a639",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0.00</th>\n",
" <th>0.10</th>\n",
" <th>0.20</th>\n",
" <th>0.30</th>\n",
" <th>0.40</th>\n",
" <th>0.50</th>\n",
" <th>0.60</th>\n",
" <th>0.70</th>\n",
" <th>0.80</th>\n",
" <th>0.90</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Urban Heuristic Flag</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0.00</th>\n",
" <td>0.00</td>\n",
" <td>407,226.75</td>\n",
" <td>1,305,567.65</td>\n",
" <td>2,586,000.00</td>\n",
" <td>4,411,909.10</td>\n",
" <td>7,385,163.62</td>\n",
" <td>11,582,526.63</td>\n",
" <td>18,205,814.38</td>\n",
" <td>30,706,464.84</td>\n",
" <td>56,067,604.96</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1.00</th>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>141.35</td>\n",
" <td>1,812.91</td>\n",
" <td>9,755.31</td>\n",
" <td>44,319.94</td>\n",
" <td>202,972.50</td>\n",
" <td>1,060,793.40</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0.00 0.10 0.20 0.30 0.40 \\\n",
"Urban Heuristic Flag \n",
"0.00 0.00 407,226.75 1,305,567.65 2,586,000.00 4,411,909.10 \n",
"1.00 0.00 0.00 0.00 0.00 141.35 \n",
"\n",
" 0.50 0.60 0.70 0.80 \\\n",
"Urban Heuristic Flag \n",
"0.00 7,385,163.62 11,582,526.63 18,205,814.38 30,706,464.84 \n",
"1.00 1,812.91 9,755.31 44,319.94 202,972.50 \n",
"\n",
" 0.90 \n",
"Urban Heuristic Flag \n",
"0.00 56,067,604.96 \n",
"1.00 1,060,793.40 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.options.display.float_format = \"{:,.2f}\".format\n",
"nri_with_flag.groupby(\"Urban Heuristic Flag\")[\"AGRIVALUE\"].quantile(\n",
" q=np.arange(0, 1, step=0.1)\n",
").unstack()"
]
},
{
"cell_type": "markdown",
"id": "8aed6062-9503-4120-bab8-f840cb524365",
"metadata": {},
"source": [
"## Updating the metric\n",
"\n",
"So we clip the values such that agrivalue is defined as the maximum of the tract's agricultural value AND the 10th percentile of rural tracts' agrivalues. When we do that, we see that a lot more (proportionally) rural tracts exceed the 90th percentile. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "2b3a3b3e-76c2-4b6b-b96b-d439580de592",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Urban Heuristic Flag</th>\n",
" <th>0.00</th>\n",
" <th>1.00</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Expected agricultural loss rate (Natural Hazards Risk Index) exceeds 90th percentile, adjusted</th>\n",
" <td>0.24</td>\n",
" <td>0.07</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Expected agricultural loss rate (Natural Hazards Risk Index) (percentile, adjusted)</th>\n",
" <td>0.77</td>\n",
" <td>0.43</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Urban Heuristic Flag 0.00 1.00\n",
"Expected agricultural loss rate (Natural Hazard... 0.24 0.07\n",
"Expected agricultural loss rate (Natural Hazard... 0.77 0.43"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"final_left_clip = nri_with_flag[nri_with_flag[\"Urban Heuristic Flag\"] == 0][\n",
" \"AGRIVALUE\"\n",
"].quantile(0.1)\n",
"\n",
"nri_with_flag[\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile, adjusted)\"\n",
"] = (\n",
" nri_with_flag[\"total_ag_loss\"]\n",
" / (nri_with_flag[\"AGRIVALUE\"].clip(lower=final_left_clip))\n",
").rank(\n",
" pct=True\n",
")\n",
"\n",
"nri_with_flag[\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) exceeds 90th percentile, adjusted\"\n",
"] = (\n",
" nri_with_flag[\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile, adjusted)\"\n",
" ]\n",
" >= 0.9\n",
")\n",
"\n",
"nri_with_flag.groupby(\"Urban Heuristic Flag\")[\n",
" [\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) exceeds 90th percentile, adjusted\",\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile, adjusted)\",\n",
" ]\n",
"].mean().T"
]
},
{
"cell_type": "markdown",
"id": "54585b02-d26e-4986-8a8a-9828582e7ac1",
"metadata": {},
"source": [
"The percentile distribution look a little better to me (but like, not fantastic)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "4fd90ace-65f7-4fa1-973f-922b56939ecd",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n",
" warnings.warn(msg, FutureWarning)\n",
"/usr/local/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n",
" warnings.warn(msg, FutureWarning)\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 288x576 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"tmp = sns.FacetGrid(\n",
" data=nri_with_flag, col=\"Urban Heuristic Flag\", col_wrap=1, height=4\n",
")\n",
"tmp.map(\n",
" sns.distplot,\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile, adjusted)\",\n",
" bins=20,\n",
" kde=True,\n",
" color=\"#62acef\",\n",
")\n",
"tmp.set(xlim=(0, 1.0))\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "a273e5b1-c4a0-45b3-8f57-3be116685c10",
"metadata": {},
"source": [
"We can also look at the distribution of actually exceeding the threshold -- so also adding in the socioeconomic information from Score M. Note that *some tracts* are not shared between score M and the NRI data."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "36670c86-230a-491b-bf44-767c1e1cd779",
"metadata": {},
"outputs": [],
"source": [
"nri_with_flag[\"low_inc_low_highed\"] = (\n",
" nri_with_flag[\"TRACTFIPS\"]\n",
" .map(\n",
" score_m.set_index(\"GEOID10_TRACT\")[\n",
" \"Is low income and has a low percent of higher ed students?\"\n",
" ].to_dict()\n",
" )\n",
" .fillna(False)\n",
")\n",
"\n",
"nri_with_flag[\"would_exceed_threshold\"] = (\n",
" nri_with_flag[\"low_inc_low_highed\"]\n",
" & nri_with_flag[\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) exceeds 90th percentile, adjusted\"\n",
" ]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "bdb9a4b3-954a-4389-86d0-84d896c1679c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sum</th>\n",
" <th>mean</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Urban Heuristic Flag</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0.00000</th>\n",
" <td>1190</td>\n",
" <td>0.08439</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1.00000</th>\n",
" <td>1158</td>\n",
" <td>0.01982</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sum mean\n",
"Urban Heuristic Flag \n",
"0.00000 1190 0.08439\n",
"1.00000 1158 0.01982"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.options.display.float_format = \"{:.5f}\".format\n",
"nri_with_flag.groupby(\"Urban Heuristic Flag\")[\"would_exceed_threshold\"].agg(\n",
" [\"sum\", \"mean\"]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "015de431-0c39-49b7-9144-1bfa776e4f75",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sum</th>\n",
" <th>mean</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Urban Heuristic Flag</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0.00000</th>\n",
" <td>298</td>\n",
" <td>0.02113</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1.00000</th>\n",
" <td>1069</td>\n",
" <td>0.01830</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sum mean\n",
"Urban Heuristic Flag \n",
"0.00000 298 0.02113\n",
"1.00000 1069 0.01830"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"score_m.groupby(\"Urban Heuristic Flag\")[\n",
" \"Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students?\"\n",
"].agg([\"sum\", \"mean\"])"
]
},
{
"cell_type": "markdown",
"id": "fd175407-07ee-4712-ac7e-ddd06ca8c622",
"metadata": {},
"source": [
"This really increases the number of tracts identified - from just under 2% to just over 3% - and adds 259 tracts, removes 116."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "9c164dfc-9053-462b-8a55-1b802694ec48",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2348"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"1367"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"259"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display(\n",
" nri_with_flag[\"would_exceed_threshold\"].sum(),\n",
" score_m[\n",
" \"Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students?\"\n",
" ].sum(),\n",
")\n",
"\n",
"all_ag_loss_tracts = nri_with_flag[nri_with_flag[\"would_exceed_threshold\"]][\n",
" \"TRACTFIPS\"\n",
"].unique()\n",
"\n",
"all_scorem_tracts = score_m[score_m[\"Total threshold criteria exceeded\"] > 0][\n",
" \"GEOID10_TRACT\"\n",
"].unique()\n",
"\n",
"display(len(set(all_ag_loss_tracts).difference(all_scorem_tracts)))"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "9a225512-758c-4807-a598-bc8c97e20a47",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"116"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"score_m[\"adjusted\"] = score_m[\"Total threshold criteria exceeded\"] - score_m[\n",
" \"Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students?\"\n",
"].astype(int)\n",
"\n",
"score_m_adjusted_tracts = set(\n",
" score_m[score_m[\"adjusted\"] > 0][\"GEOID10_TRACT\"]\n",
").union(all_ag_loss_tracts)\n",
"display(len(set(all_scorem_tracts).difference(score_m_adjusted_tracts)))"
]
},
{
"cell_type": "markdown",
"id": "73b11c1d-5661-48e3-88f3-255b0ef21768",
"metadata": {},
"source": [
"## Scratch \n",
"Choosing a left clip value"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "01e50420-a82c-42ee-bf2b-a90fdb61bcef",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"At threshold 0.00, minimum value is $0\n",
"At threshold 0.05, minimum value is $82,341\n",
"At threshold 0.10, minimum value is $407,227\n",
"At threshold 0.15, minimum value is $822,946\n",
"At threshold 0.20, minimum value is $1,305,568\n",
"At threshold 0.25, minimum value is $1,881,536\n",
"At threshold 0.30, minimum value is $2,586,000\n",
"At threshold 0.35, minimum value is $3,375,394\n",
"At threshold 0.40, minimum value is $4,411,909\n",
"At threshold 0.45, minimum value is $5,719,970\n",
"At threshold 0.50, minimum value is $7,385,164\n",
"At threshold 0.55, minimum value is $9,228,310\n",
"At threshold 0.60, minimum value is $11,582,527\n",
"At threshold 0.65, minimum value is $14,421,393\n",
"At threshold 0.70, minimum value is $18,205,814\n",
"At threshold 0.75, minimum value is $23,497,859\n",
"At threshold 0.80, minimum value is $30,706,465\n",
"At threshold 0.85, minimum value is $40,827,282\n",
"At threshold 0.90, minimum value is $56,067,605\n",
"At threshold 0.95, minimum value is $86,129,868\n"
]
}
],
"source": [
"for threshold in np.arange(0, 1, 0.05):\n",
" left_clip = nri_with_flag[nri_with_flag[\"Urban Heuristic Flag\"] == 0][\n",
" \"AGRIVALUE\"\n",
" ].quantile(threshold)\n",
" print(\n",
" \"At threshold {:.2f}, minimum value is ${:,.0f}\".format(\n",
" threshold, left_clip\n",
" )\n",
" )\n",
" tmp_value = nri_with_flag[\"AGRIVALUE\"].clip(lower=left_clip)\n",
" nri_with_flag[\"total_ag_loss_pctile_{:.2f}\".format(threshold)] = (\n",
" nri_with_flag[\"total_ag_loss\"] / tmp_value\n",
" ).rank(pct=True)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "0bcddf71-4224-4e7d-b6f7-1edaeba5893a",
"metadata": {},
"outputs": [],
"source": [
"tmp = (\n",
" nri_with_flag.groupby(\"Urban Heuristic Flag\")[\n",
" [\n",
" \"total_ag_loss_pctile\",\n",
" \"total_ag_loss_pctile_0.00\",\n",
" \"total_ag_loss_pctile_0.05\",\n",
" \"total_ag_loss_pctile_0.10\",\n",
" \"total_ag_loss_pctile_0.15\",\n",
" \"total_ag_loss_pctile_0.20\",\n",
" \"total_ag_loss_pctile_0.25\",\n",
" \"total_ag_loss_pctile_0.30\",\n",
" \"total_ag_loss_pctile_0.35\",\n",
" \"total_ag_loss_pctile_0.40\",\n",
" \"total_ag_loss_pctile_0.45\",\n",
" \"total_ag_loss_pctile_0.50\",\n",
" \"total_ag_loss_pctile_0.55\",\n",
" \"total_ag_loss_pctile_0.60\",\n",
" \"total_ag_loss_pctile_0.65\",\n",
" \"total_ag_loss_pctile_0.70\",\n",
" \"total_ag_loss_pctile_0.75\",\n",
" \"total_ag_loss_pctile_0.80\",\n",
" \"total_ag_loss_pctile_0.85\",\n",
" \"total_ag_loss_pctile_0.90\",\n",
" \"total_ag_loss_pctile_0.95\",\n",
" ]\n",
" ]\n",
" .mean()\n",
" .T\n",
")\n",
"\n",
"left_pctile = tmp.reset_index()[\"index\"].str.split(\"_\", expand=True)[4]\n",
"graph_data = (\n",
" pd.concat([tmp.reset_index(), left_pctile], axis=1)\n",
" .rename(\n",
" columns={\n",
" 0: \"Rural\",\n",
" 1: \"Urban\",\n",
" 4: \"Left clip value\",\n",
" }\n",
" )\n",
" .set_index(\"Left clip value\")[[\"Rural\", \"Urban\"]]\n",
" .stack()\n",
" .reset_index()\n",
" .rename(\n",
" columns={\"level_1\": \"Tract classification\", 0: \"Average percentile\"}\n",
" )\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "12fb9323-e932-4392-b722-8fca3c127b0e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='Left clip value', ylabel='Average percentile'>"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 792x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(11, 7))\n",
"sns.lineplot(\n",
" x=\"Left clip value\",\n",
" y=\"Average percentile\",\n",
" hue=\"Tract classification\",\n",
" data=graph_data,\n",
" palette=\"colorblind\",\n",
")"
]
},
{
"cell_type": "markdown",
"id": "c898c542-1dad-43b7-b20b-ee065f96c227",
"metadata": {},
"source": [
"Note -- some tracts have a missing urban / rural flag. They get left out from the means. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "adfcdbe0-fdcc-4ec7-8788-5628641bed8b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.0 58429\n",
"0.0 14101\n",
"NaN 209\n",
"Name: Urban Heuristic Flag, dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nri_with_flag[\"Urban Heuristic Flag\"].value_counts(dropna=False)"
]
},
{
"cell_type": "markdown",
"id": "0fba097c-b5e0-4511-b072-05511906dd9b",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}