## -*- mode: html; coding: utf-8; -*- ## This file is part of Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
This method is an extension of the classical citation count ranking method. The difference is that it takes into consideration the "age" (publication date) of each citation by introducing a time decay factor. The time decay parameter can have values between 0 and 1, where a value of 0 signifies that we never "forget" any citation, and a value of 1 signifies that we "forget" all the citations that were acquired in the past years, and we only take into account the ones from the present year. Algorithm: 1. Read the citation data from the database or from a file that has the following format: x[tab]y where the paper with recid x cites the paper with recid y. The citation data is stored in a map key:value, where the 'key' is a record id and the 'value' is the list of publications that cite publication 'key'. 2. Read the publication dates for each paper from the database or from a file that has the following format: x[tab]y, where x is a recid and y is the publication year. There are several possibilities for retrieving the dates from the database: i) using the 260__c MARC tag that contains only the publication year (this is the option that we are using); ii) using the 269__c MARC tag that contains the complete publication date. For the papers that do not have a publication date, we consider the date of insertion in the database (961__x tag). If neither the publication date nor the insertion date are available, we consider an average date (computed with the existing dates). 3. Read the time_decay factor from the configuration file. 4. Calculate the weight of each citation based on its age and time_decay. 5. Calculate the final weight of each publication by summing up the weights of its citations. In case of equality, the order of the publications is given by the publication date (newest papers first). 6. Write the ranks to the database and to a file. The name of the file and the number of ranks that should be outputted can be set into the configuration file. If the name of the file is not set, then the ranks are only written in the database.