Modeling Actions of PubMed Users with N-Gram Language Models

doi:10.1007/s10791-008-9067-7

. 2008 Sep 12:12:487-503.

doi: 10.1007/s10791-008-9067-7.

Modeling Actions of PubMed Users with N-Gram Language Models

Jimmy Lin¹, W John Wilbur

Affiliations

PMID: 19684883
PMCID: PMC2727615
DOI: 10.1007/s10791-008-9067-7

Modeling Actions of PubMed Users with N-Gram Language Models

Jimmy Lin et al. Inf Retr Boston. 2008.

. 2008 Sep 12:12:487-503.

doi: 10.1007/s10791-008-9067-7.

Authors

Jimmy Lin¹, W John Wilbur

Affiliation

¹ The iSchool, College of Information Studies, University of Maryland, College Park, Maryland, USA, jimmylin@umd.edu.

PMID: 19684883
PMCID: PMC2727615
DOI: 10.1007/s10791-008-9067-7

Abstract

Transaction logs from online search engines are valuable for two reasons: First, they provide insight into human information-seeking behavior. Second, log data can be used to train user models, which can then be applied to improve retrieval systems. This article presents a study of logs from PubMed((R)), the public gateway to the MEDLINE((R)) database of bibliographic records from the medical and biomedical primary literature. Unlike most previous studies on general Web search, our work examines user activities with a highly-specialized search engine. We encode user actions as string sequences and model these sequences using n-gram language models. The models are evaluated in terms of perplexity and in a sequence prediction task. They help us better understand how PubMed users search for information and provide an enabler for improving users' search experience.

PubMed Disclaimer

Figures

**Figure 1**
Characteristics of episodes generated by applying different thresholds to segment sessions. Values above each filled square indicate the percentage of singleton episodes that consists of a single retrieve action.

**Figure 2**
Distribution of episode length, in terms of number of transactions (top) and duration (bottom). Duration is binned in 5 minute intervals (e.g., ‘5’ represents intervals between 20–25 minutes).

**Figure 3**
Perplexity of session and episode test data on different n-gram language models.

**Figure 4**
Accuracy of predicting next user action using different n-gram language models: session data on top, episode data on bottom. Solid line in each graph indicates baseline (most frequent class).

**Figure 5**
Relative likelihood of observing a particular action after a consecutive sequence of the same action. For example, the probability of ‘L’ followed by another ‘L’ is 12 times higher than expected by chance.

See this image and copyright information in PMC

Cited by

Understanding PubMed user search behavior through log analysis.
Islamaj Dogan R, Murray GC, Névéol A, Lu Z. Islamaj Dogan R, et al. Database (Oxford). 2009;2009:bap018. doi: 10.1093/database/bap018. Epub 2009 Nov 27. Database (Oxford). 2009. PMID: 20157491 Free PMC article.
How user intelligence is improving PubMed.
Fiorini N, Leaman R, Lipman DJ, Lu Z. Fiorini N, et al. Nat Biotechnol. 2018 Oct 1. doi: 10.1038/nbt.4267. Online ahead of print. Nat Biotechnol. 2018. PMID: 30272675
Harnessing PubMed User Query Logs for Post Hoc Explanations of Recommended Similar Articles.
Shin A, Anibal J, Jin Q, Lu Z. Shin A, et al. ArXiv [Preprint]. 2024 Feb 5:arXiv:2402.03484v1. ArXiv. 2024. PMID: 38903741 Free PMC article. Preprint.
Effects of individual health topic familiarity on activity patterns during health information searches.
Puspitasari I, Moriyama K, Fukui K, Numao M. Puspitasari I, et al. JMIR Med Inform. 2015 Mar 17;3(1):e16. doi: 10.2196/medinform.3803. JMIR Med Inform. 2015. PMID: 25783222 Free PMC article.
Studying PubMed usages in the field for complex problem solving: Implications for tool design.
Mirel B, Song J, Tonks JS, Meng F, Xuan W, Ameziane R. Mirel B, et al. J Am Soc Inf Sci Technol. 2013 May 1;64(5):874-92. doi: 10.1002/asi.22796. J Am Soc Inf Sci Technol. 2013. PMID: 24376375 Free PMC article.

References

1. Agichtein Eugene, Brill Eric, Dumais Susan. Improving Web search ranking by incorporating user behavior information. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006); Seattle, Washington. 2006. pp. 19–26.
1. Anick Peter. Using terminological feedback for Web search refinement—a log-based study. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003); Toronto, Canada. 2003. pp. 88–95.
1. Beitzel Steven M, Jensen Eric C, Chowdhury Abdur, Grossman David, Frieder Ophir. Hourly analysis of a very large topically categorized Web query log. Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004); Sheffield, United Kingdom. 2004. pp. 321–328.
1. Broder Andrei. A taxonomy of Web search. SIGIR Forum. 2002;36(2):3–10.
1. Cahan Mitchell Aaron. GRATEFUL MED: A tool for studying searching behavior. Medical Reference Services Quarterly. 1989;8(4):61–79. - PubMed

Grants and funding

NIH0012203604/ImNIH/Intramural NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

[1] Agichtein Eugene, Brill Eric, Dumais Susan. Improving Web search ranking by incorporating user behavior information. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006); Seattle, Washington. 2006. pp. 19–26.

[2] Agichtein Eugene, Brill Eric, Dumais Susan. Improving Web search ranking by incorporating user behavior information. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006); Seattle, Washington. 2006. pp. 19–26.

[3] Anick Peter. Using terminological feedback for Web search refinement—a log-based study. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003); Toronto, Canada. 2003. pp. 88–95.

[4] Anick Peter. Using terminological feedback for Web search refinement—a log-based study. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003); Toronto, Canada. 2003. pp. 88–95.

[5] Beitzel Steven M, Jensen Eric C, Chowdhury Abdur, Grossman David, Frieder Ophir. Hourly analysis of a very large topically categorized Web query log. Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004); Sheffield, United Kingdom. 2004. pp. 321–328.

[6] Beitzel Steven M, Jensen Eric C, Chowdhury Abdur, Grossman David, Frieder Ophir. Hourly analysis of a very large topically categorized Web query log. Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004); Sheffield, United Kingdom. 2004. pp. 321–328.

[7] Broder Andrei. A taxonomy of Web search. SIGIR Forum. 2002;36(2):3–10.

[8] Broder Andrei. A taxonomy of Web search. SIGIR Forum. 2002;36(2):3–10.

[9] Cahan Mitchell Aaron. GRATEFUL MED: A tool for studying searching behavior. Medical Reference Services Quarterly. 1989;8(4):61–79. - PubMed

[10] Cahan Mitchell Aaron. GRATEFUL MED: A tool for studying searching behavior. Medical Reference Services Quarterly. 1989;8(4):61–79. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Modeling Actions of PubMed Users with N-Gram Language Models

Affiliation

Modeling Actions of PubMed Users with N-Gram Language Models

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources