4 new bulk lists from Gary Price @ InfoDocket (#37)

* Update README.md

4 new bulk lists from Gary Price @ Infodocket.

* Add files via upload

4 new bulk lists from Gary Price.

* Update README.md

bulk list from LiL of data.gov catalog records

* Add files via upload

bulk list from Harvard LiL
This commit is contained in:
James R. Jacobs 2025-02-03 09:52:45 -08:00 committed by GitHub
parent 4ba7aa4008
commit 5f0ab60be1
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 18027 additions and 0 deletions

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -28,6 +28,10 @@ Seeds supplied by Dorothy Bower of the U.S. Government Publishing Office:
* PURL_server_domains_20240214.csv - report of all target domains from the PURL server; some determined to be out of scope were not included in the Nomination Tool.
* PURL_server_domains_20240214_non_gov_mil.csv - non .gov/.mil seeds from the PURL_server_domains_20240214.csv list that were determined to be in scope by Mark Phillips of UNT.
### Harvard Law School's Library Innovation Lab
* data_20250130_catalog_urls_empty-harvard-LiL.txt. List of urls of data.gov metadata records that do NOT inlude links to data files but ONLY have links to federal agency landing pages. LiL collected all of the records that included data files.
### infoDOCKET seeds
Seed lists produced by Gary Price, editor of infoDOCKET:
@ -51,6 +55,10 @@ Seed lists produced by Gary Price, editor of infoDOCKET:
* Diversity-DEI-20250119.xlsx. 2199 PDFs (with a few exceptions) from several agencies. The focus of these docs, DEI topics and issues.
* pclob-20250122.xlsx. 600 urls (PDFs and HTML) from the u.s. Privacy and Civil Liberties Oversight Board.
* MSPB-20250128.xlsx. 844 urls (html and pdf) from the Merit Service Protection Bureau.
* USDA_ClimateChange-20250201.xlsx. 1633 seeds from USDA. Topic: Climate Change. Includes approx 1000 urls (HTML and PDF) from Climatehubs.usda.gov.
* EPA 2024-PDF-20250201.xlsx 2300+ EPA seeds. Most PDFs from 2024-present.
* GENDER_ID-20250201.xlsx. 3553 lines with PDFs (from various agencies and some .mil domains) that contain the phrase "gender identity".
* DEI_FAA-20240201.xlsx 758 lines of PDFs from FCC.gov. Terms: DEI and related terms.
### Internet Archive seeds
Seeds supplied by Antoine McGrath of Internet Archive:

Binary file not shown.

File diff suppressed because it is too large Load diff