Linking literature to data in a meaningful way is a key requirement in life science and biomedical research. The data and literature deluge makes it difficult for scientists to find articles and data relevant for their research. Automatic services might help scientists in this phase of their research, by providing tools that identify the biological entities referenced in articles and the organism that is the main focus of a research described in an article.
LitLink is an information system - currently at prototype stage - that aggregates and support the comparison of three types of link among Open Access publications available from europePMC, entries from the European Nucleotide Archive, and the NCBI Taxonomy: organism names mined from the full-text articles, citations from articles to ENA entries (i.e., ENA entries cited by articles), and submission links (i.e., links from ENA entries to articles).
The approach implemented by LitLink allows to:
- automatically guess why an article cites a database entry
- automatically detect the focus organisms of an article
- understand if it is possible to identify citation patterns that enable to cluster articles and assign them to category such as "speciation articles", "bio-diversity article", "articles on homologues", "experimental articles"
The project has been realised in collaboration with the Literature Group of the EMBL-European Bioinformatics Institute (Wellcome Trust Genome Campus, Hinxton, Cambridge, UK).