j40-cejst-2/data-roadmap/data_roadmap/data_set_description_field_descriptions.yaml

40 lines
2.4 KiB
YAML
Raw Normal View History

# There is no method for adding field descriptions to `yamale` schemas.
# Therefore, we've created a dictionary here of fields and their descriptions.
name: A short name of the data set.
source: The URL pointing towards the data set itself or more information about the
data set.
relevance_to_environmental_justice: It's useful to spell out why this data is
relevant to EJ issues and/or can be used to identify EJ communities.
spatial_resolution: Dev team needs to know if the resolution is granular enough to be useful
public_status: Whether a dataset has already gone through public release process
(like Census data) or may need a lengthy review process (like Medicaid data).
sponsor: Whether there's a federal agency or non-governmental agency that is working
to provide and maintain this data.
subjective_rating_of_data_quality: Sometimes we don't have statistics on data
quality, but we know it is likely to be accurate or not. How much has it been
vetted by an agency; is this the de facto data set for the topic?
estimated_margin_of_error: Estimated margin of error on measurement, if known. Often
more narrow geographic measures have a higher margin of error due to a smaller sample
for each measurement.
known_data_quality_issues: It can be helpful to write out known problems.
geographic_coverage_percent: We want to think about data that is comprehensive across
America.
geographic_coverage_description: A verbal description of geographic coverage.
data_formats: Developers need to know what formats the data is available in
last_updated_date: When was the data last updated / refreshed? (In format YYYY-MM-DD.
If exact date is not known, use YYYY-01-01.)
frequency_of_updates: How often is this data updated? Is it updated on a reliable
cadence?
documentation: Link to docs. Also, is the documentation good enough? Can we get the
info we need?
data_can_go_in_cloud: Some datasets can not legally go in the cloud
discussion: Review of other topics, such as
peer review (Overview or links out to peer review done on this dataset),
where and how data is available (e.g., Geoplatform.gov? Is it available from multiple
sources?),
risk assessment of the data (e.g. a vendor-processed version of the dataset might not
be open or good enough),
legal considerations (Legal disclaimers, assumption of risk, proprietary?),
accreditation (Is this source accredited?)