pmparser enables one to easily create and maintain a relational database of data from PubMed/MEDLINE. pmparser can download the publicly available XML files, parse them, incorporate PubMed’s regular updates, and combine the data with the NIH Open Citation Collection. PMDB, our implementation of the database, is available to download on Zenodo. For a detailed description of pmparser and PMDB, check out the preprint.

Installation

If you use RStudio, go to Tools -> Global Options… -> Packages -> Add… (under Secondary repositories), then enter:

You only have to do this once. Then you can install or update the package by entering:

if (!requireNamespace('devtools', quietly = TRUE))
  install.packages('devtools')
devtools::install_github(c('r-lib/withr', 'r-lib/xml2'))

if (!requireNamespace('BiocManager', quietly = TRUE))
  install.packages('BiocManager')
BiocManager::install('pmparser')

Alternatively, you can install or update the package by entering:

if (!requireNamespace('devtools', quietly = TRUE))
  install.packages('devtools')
devtools::install_github(c('r-lib/withr', 'r-lib/xml2'))

if (!requireNamespace('BiocManager', quietly = TRUE))
  install.packages('BiocManager')
BiocManager::install('pmparser', site_repository = 'https://hugheylab.github.io/drat/')

There’s also a docker image, which has all dependencies installed.

docker pull hugheylab/hugheyverse

Usage

See the examples and detailed guidance in the reference documentation.