Download column order completed (#1077)

* Download column order completed

* Kameron changes

* Lucas and Beth column order changes

* cdc_places update

* passing score

* pandas error

* checkpoint

* score passing

* rounding complete - percentages still showing one decimal

* fixing tests

* fixing percentages

* updating comment

* int percentages! 🎉🎉

* forgot to pass back to df

* passing tests

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
This commit is contained in:
Jorge Escobar 2022-01-13 15:04:16 -05:00 committed by GitHub
commit d686bb856e
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
13 changed files with 232 additions and 133 deletions

View file

@ -2,6 +2,7 @@ import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import get_module_logger, download_file_from_url
from data_pipeline.score import field_names
logger = get_module_logger(__name__)
@ -49,6 +50,20 @@ class CDCPlacesETL(ExtractTransformLoad):
values=self.CDC_VALUE_FIELD_NAME,
)
# rename columns to be used in score
rename_fields = {
"Current asthma among adults aged >=18 years": field_names.ASTHMA_FIELD,
"Coronary heart disease among adults aged >=18 years": field_names.HEART_DISEASE_FIELD,
"Cancer (excluding skin cancer) among adults aged >=18 years": field_names.CANCER_FIELD,
"Diagnosed diabetes among adults aged >=18 years": field_names.DIABETES_FIELD,
"Physical health not good for >=14 days among adults aged >=18 years": field_names.PHYS_HEALTH_NOT_GOOD_FIELD,
}
self.df.rename(
columns=rename_fields,
inplace=True,
errors="raise",
)
# Make the index (the census tract ID) a column, not the index.
self.df.reset_index(inplace=True)