This ended up being a pretty large task. Here's what this PR does:
1. Pulls in Vincent's data from island areas into the score ETL. This is from the 2010 decennial census, the last census of any kind in the island areas.
2. Grabs a few new fields from 2010 island areas decennial census.
3. Calculates area median income for island areas.
4. Stops using EJSCREEN as the source of our high school education data and directly pulls that from census (this was related to this project so I went ahead and fixed it).
5. Grabs a bunch of data from the 2010 ACS in the states/Puerto Rico/DC, so that we can create percentiles comparing apples-to-apples (ish) from 2010 island areas decennial census data to 2010 ACS data. This required creating a new class because all the ACS fields are different between 2010 and 2019, so it wasn't as simple as looping over a year parameter.
6. Creates a combined population field of island areas and mainland so we can use those stats in our comparison tool, and updates the comparison tool accordingly.
* Adds dev dependencies to requirements.txt and re-runs black on codebase
* Adds test and code for national risk index etl, still in progress
* Removes test_data from .gitignore
* Adds test data to nation_risk_index tests
* Creates tests and ETL class for NRI data
* Adds tests for load() and transform() methods of NationalRiskIndexETL
* Updates README.md with info about the NRI dataset
* Adds to dos
* Moves tests and test data into a tests/ dir in national_risk_index
* Moves tmp_dir for tests into data/tmp/tests/
* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests
can only use fixtures specified in conftests within the same package
* Fixes issue with df.equals() in test_transform()
* Files reformatted by black
* Commit changes to other files after re-running black
* Fixes unused import that caused lint checks to fail
* Moves tests/ directory to app root for data_pipeline
* Adds new methods to ExtractTransformLoad base class:
- __init__() Initializes class attributes
- _get_census_fips_codes() Loads a dataframe with the fips codes for
census block group and tract
- validate_init() Checks that the class was initialized correctly
- validate_output() Checks that the output was loaded correctly
* Adds test for ExtractTransformLoad.__init__() and base.py
* Fixes failing flake8 test
* Changes geo_col to geoid_col and changes is_dataset to is_census in yaml
* Adds test for validate_output()
* Adds remaining tests
* Removes is_dataset from init method
* Makes CENSUS_CSV a class attribute instead of a class global:
This ensures that CENSUS_CSV is only set when the ETL class is for a
non-census dataset and removes the need to overwrite the value in
mock_etl fixture
* Re-formats files with black and fixes broken tox tests
* Fixes#341 -
As a J40 developer, I want to write Unit Tests for the ETL files,
so that tests are run on each commit
* Location bug
* Adding Load tests
* Fixing XLSX filename
* Adding downloadable zip test
* updating pickle
* Fixing pylint warnings
* Updte readme to correct some typos and reorganize test content structure
* Removing unused schemas file, adding details to readme around pickles, per PR feedback
* Update test to pass with Score D added to score file; update path in readme
* fix requirements.txt after merge
* fix poetry.lock after merge
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* Adds dev dependencies to requirements.txt and re-runs black on codebase
* Adds test and code for national risk index etl, still in progress
* Removes test_data from .gitignore
* Adds test data to nation_risk_index tests
* Creates tests and ETL class for NRI data
* Adds tests for load() and transform() methods of NationalRiskIndexETL
* Updates README.md with info about the NRI dataset
* Adds to dos
* Moves tests and test data into a tests/ dir in national_risk_index
* Moves tmp_dir for tests into data/tmp/tests/
* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests
can only use fixtures specified in conftests within the same package
* Fixes issue with df.equals() in test_transform()
* Files reformatted by black
* Commit changes to other files after re-running black
* Fixes unused import that caused lint checks to fail
* Moves tests/ directory to app root for data_pipeline
* Fixes#303 : adding downloadable zip archive logic
* linter recommendations
* Pushes data directory to AWS. We'll want to move to use AWS for this ASAP, but this works for now
* updating pattern
* Fixes#456 - Our data directory should adopt standard python package structure
* a few missed references
* updating readme
* updating requirements
* Running Black
* Fixes for flake8
* updating pylint
* Adds flake8, pylint, liccheck, flake8 to dependencies for data-pipeline
* Sets up and runs black autoformatting
* Adds flake8 to tox linting
* Fixes flake8 error F541 f string missing placeholders
* Fixes flake8 E501 line too long
* Fixes flake8 F401 imported but not used
* Adds pylint to tox and disables the following pylint errors:
- C0114: module docstrings
- R0201: method could have been a function
- R0903: too few public methods
- C0103: name case styling
- W0511: fix me
- W1203: f-string interpolation in logging
* Adds utils.py to tox.ini linting, runs black on utils.py
* Fixes import related pylint errors: C0411 and C0412
* Fixes or ignores remaining pylint errors (for discussion later)
* Adds safety and liccheck to tox.ini
* Adds tox as a dev dependency to data/data-pipeline/pyproject.toml: Also updates poetry.lock and requirements.txt
* Adds tox.ini to test build of data/data-pipeline
* Sets up GitHub actions workflow for data/ directory
* Tries to get Data Checks GitHub action to run
* Fixes error with GitHub action
* Migrates data/data-roadmap from setuptools to poetry
* Sets up tox file for data/data-roadmap
* Adds github action for data/data-roadmap
* Fixes syntax error in data-checks.yml
* Second attempt at fixing data-checks.yml
* Export poetry requirements to requirements.txt
* Revert "Migrates data/data-roadmap from setuptools to poetry"
This reverts commit e8367652d43c1c9beee500f792c8f41e1c1fc462.
* Removes pyproject.toml and reverts requirements.txt as well
* initial checkin
* gitignore and docker-compose update
* readme update and error on hud
* encoding issue
* one more small README change
* data roadmap re-strcuture
* pyproject sort
* small update to score output folders
* checkpoint
* couple of last fixes