Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 May-Jun;19(3):473-8.
doi: 10.1136/amiajnl-2011-000325. Epub 2011 Sep 13.

Predicting biomedical document access as a function of past use

Affiliations

Predicting biomedical document access as a function of past use

J Caleb Goodwin et al. J Am Med Inform Assoc. 2012 May-Jun.

Abstract

Objective: To determine whether past access to biomedical documents can predict future document access.

Materials and methods: The authors used 394 days of query log (August 1, 2009 to August 29, 2010) from PubMed users in the Texas Medical Center, which is the largest medical center in the world. The authors evaluated two document access models based on the work of Anderson and Schooler. The first is based on how frequently a document was accessed. The second is based on both frequency and recency.

Results: The model based only on frequency of past access was highly correlated with the empirical data (R²=0.932), whereas the model based on frequency and recency had a much lower correlation (R²=0.668).

Discussion: The frequency-only model accurately predicted whether a document will be accessed based on past use. Modeling accesses as a function of frequency requires storing only the number of accesses and the creation date for the document. This model requires low storage overheads and is computationally efficient, making it scalable to large corpora such as MEDLINE.

Conclusion: It is feasible to accurately model the probability of a document being accessed in the future based on past accesses.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

Figure 1
Figure 1
Log–log abstract views versus number of documents (all views).
Figure 2
Figure 2
Log–log number of document downloads versus number of documents.
Figure 3
Figure 3
Log–log plot of desirability as a function of Frequency of Access.
Figure 4
Figure 4
Log–log plot of desirability as a function of Recency of Access.
Figure 5
Figure 5
Predicted access using Frequency of Access and Recency of Access.
Figure 6
Figure 6
Predicted access using Frequency of Access.

Similar articles

Cited by

References

    1. Wilson EO. Consilence. New York: Knoph, 1992
    1. Dennis C. Biology databases: information overload. Nature 2002;417:14 doi:10.1038/417014a - DOI - PubMed
    1. Stokstad E. Information overload hampers biology reforms. Science 2001;293:1609. - PubMed
    1. Smith R. What clinical information do doctors need? BMJ 1996;313:1062–8 - PMC - PubMed
    1. Fraser AG, Dunstan FD. On the impossibility of being expert. BMJ 2010;341:c6815. - PubMed

Publication types