COMPUTATIONAL BIOLOGY BRANCH Biomedical information retrieval and text analysis |
![]() |
Instructions for log data download Results on comparing log data |
Understanding PubMed® user search behavior through log analysis Abstract: An investigation of user search behaviors was conducted through the analysis of one month of PubMed logs. Each step of users' interactions with PubMed during a biomedical search process is characterized in detail with evidence from PubMed logs. Despite sharing many features in common with general Web searches, biomedical information searches have unique characteristics that were evidenced in this study. An analysis of these characteristics plays a critical role in identifying users' information needs and their search habits and, in turn, provides useful insight to improve biomedical information retrieval through PubMed. Click here for full text.
Authors: Rezarta Islamaj Dogan, G. Craig Murray, Aurélie Névéol Please use this as a reference when citing this work: Database (2009) Vol. 2009, bap018; doi:10.1093/database/bap018 Contacts: luzh@ncbi.nlm.nih.gov; islamaj@ncbi.nlm.nih.gov |
![]() |
Instructions for data download
This file contains the anonymized PubMed log data for March 01, 2008.
Each line contains tab-delimited fields as follows: Click here to see a small sample (10,000 lines) of the anonymized and transformed log data.
1. Scrambled user session ID
2. Time stamp given in the form of seconds from midnight.
3. User action ("query", "abstract" or "fulltext")
4. Number of tokens in the query if user action is "query", "-" otherwise
5. Number of returned citations if user action is "query", "-" otherwise
6. Ordinal position of the clicked citation if user action is "abstract"
The entire data for March 2008 can be downloaded here (900M)
Results on comparing log data of March 08 vs. February 09
In order to investigate the temporal factor and other ephemeral trends,
we analyzed same kind of log data for February 2009 and compared
its results to those based on March, 2008.
Comparing user actions (note that there are 31 days in March 2008, whereas only 28 days in Feburary 2009)
|
March 08 |
February 09 |
Queries |
58,026,098 |
58,666,967 |
Abtract View |
67,093,786 |
65,049,452 |
Fulltext View |
27,581,850 |
23,507,979 |
|
|
|
Avg Queries /Day |
1,871,815 |
2,095,249 |
Avg Abstract View /Day |
2,164,319 |
2,323,195 |
Avg Fulltext View /Day |
889,740 |
839,571 |
|
|
|
Total Number of User Actions |
152,701,734 |
147,224,398 |
Total Number of User Sessions |
23,017,461 |
28,011,966 |
Comparing query statistics:
Here we show the proportion of queries according to
the number of tokens (white-space separated) for both sets of data.
As can be seen from the figure below there are no major differences,
and both results suggest that PubMed queries are short.
Comparing click positions:
Here we show the proportion of user clicks according to
the position of returned results for both sets of data.
Comparing search result size:
Here we show the proportion of queries according to
the size of returned results (in log scale) for both sets of data.