Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Sep 12:12:487-503.
doi: 10.1007/s10791-008-9067-7.

Modeling Actions of PubMed Users with N-Gram Language Models

Affiliations

Modeling Actions of PubMed Users with N-Gram Language Models

Jimmy Lin et al. Inf Retr Boston. .

Abstract

Transaction logs from online search engines are valuable for two reasons: First, they provide insight into human information-seeking behavior. Second, log data can be used to train user models, which can then be applied to improve retrieval systems. This article presents a study of logs from PubMed((R)), the public gateway to the MEDLINE((R)) database of bibliographic records from the medical and biomedical primary literature. Unlike most previous studies on general Web search, our work examines user activities with a highly-specialized search engine. We encode user actions as string sequences and model these sequences using n-gram language models. The models are evaluated in terms of perplexity and in a sequence prediction task. They help us better understand how PubMed users search for information and provide an enabler for improving users' search experience.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Characteristics of episodes generated by applying different thresholds to segment sessions. Values above each filled square indicate the percentage of singleton episodes that consists of a single retrieve action.
Figure 2
Figure 2
Distribution of episode length, in terms of number of transactions (top) and duration (bottom). Duration is binned in 5 minute intervals (e.g., ‘5’ represents intervals between 20–25 minutes).
Figure 3
Figure 3
Perplexity of session and episode test data on different n-gram language models.
Figure 4
Figure 4
Accuracy of predicting next user action using different n-gram language models: session data on top, episode data on bottom. Solid line in each graph indicates baseline (most frequent class).
Figure 5
Figure 5
Relative likelihood of observing a particular action after a consecutive sequence of the same action. For example, the probability of ‘L’ followed by another ‘L’ is 12 times higher than expected by chance.

Similar articles

Cited by

References

    1. Agichtein Eugene, Brill Eric, Dumais Susan. Improving Web search ranking by incorporating user behavior information. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006); Seattle, Washington. 2006. pp. 19–26.
    1. Anick Peter. Using terminological feedback for Web search refinement—a log-based study. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003); Toronto, Canada. 2003. pp. 88–95.
    1. Beitzel Steven M, Jensen Eric C, Chowdhury Abdur, Grossman David, Frieder Ophir. Hourly analysis of a very large topically categorized Web query log. Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004); Sheffield, United Kingdom. 2004. pp. 321–328.
    1. Broder Andrei. A taxonomy of Web search. SIGIR Forum. 2002;36(2):3–10.
    1. Cahan Mitchell Aaron. GRATEFUL MED: A tool for studying searching behavior. Medical Reference Services Quarterly. 1989;8(4):61–79. - PubMed

LinkOut - more resources