diff --git a/TODO b/TODO index 8c1983cdb..b00b1185a 100644 --- a/TODO +++ b/TODO @@ -1,489 +1,554 @@ ;;; -*- mode: outline; coding: utf-8; outline-regexp: "[*\f]+"; -*- ;;; ;;; CDSware TODO and WISH list ;;; ========================== ;;; $Id$ ;;; ;;; ;;; This TODO and WISH list of the CDSware project is formatted in a ;;; way suitable for editing with Emacs outline mode; e.g. use C-c C-t ;;; to see only headings and hide the text, C-c C-a to make everything ;;; visible again, C-c C-d to hide one item, C-c C-s to show one item, ;;; etc. See Emacs help for more. * BibConvert -** BibConvert: MEDLINE, BibTeX examples +** BibConvert-001: MEDLINE, BibTeX examples Received: Wed Dec 17 16:20:35 2003 Progress: BibTeX example done, MEDLINE pending * BibEdit -** BibEdit: text or GUI tool +** BibEdit-001: text or GUI tool Wed Nov 3 11:40:16 2004 * BibFormat -** BibFormat(?): foresee electronic issue publication +** BibFormat-001: foresee electronic issue publication Wed Nov 3 11:42:46 2004 Introduce a possibility to easily publish issues of electronic journals, like Weekly Bulletin. Access control needed. -** BibFormat: hb_fly for CDSware distro +** BibFormat-002: hb_fly for CDSware distro Wed Nov 3 11:39:01 2004 -** BibFormat: make hd and other detailed formats on the fly for CDSware too +** BibFormat-003: make hd and other detailed formats on the fly for CDSware too Wed Nov 3 11:39:32 2004 Just as we do it for CERN. -** BibFormat: forall() +** BibFormat-004: forall() On Mon, 18 Oct 2004, pelzer@hbz-nrw.de wrote: > why is it impossible to nest forall constructions in your format > definitions like: > > forall ($6531.a) > { > forall($710.a) > { > .... > } > .... > } Because of BibFormat's internal implementation of the forall() function. The BibFormat Admin guide mentions this limitation. In the future the BibFormat module will probably get rewritten in Python some day, and we'll try to lever this limitation. (It's nothing imminent though.) -** BibFormat: language-dependent behaviour +** BibFormat-005: language-dependent behaviour > - le premier concerne la fonction link() dans la definition des > formats (BibFormat Admin). Le parametre de langue "ln" n'est pas > transmis dans l'URL. Ce qui signifie que lorsque tu cliques par > exemple sur un auteur pour faire une recherche par auteur, l'URL ne > contient pas le parametre "ln". Oui, BibFormat a le probleme de multilinguisme, car il a ete developpe avant... il faudra que l'on y ajoute la langue comme parametre dans plusieurs endroits. -** BibFormat: forsee script for downloading cfg, editing in Emacs, uploading cfg +** BibFormat-006: forsee script for downloading cfg, editing in Emacs, uploading cfg Wed Nov 3 11:58:30 2004 To ease the editing of BibFormat formats and friends, a little script is needed that would download cfg into a file that would be Emacs-editable and that one could upload back into DB. On Wed, 24 Mar 2004, Tibor Simko wrote: > When you play with BibFormat formats outside of BibFormat Web Admin, > there is an annoyance with the ``serialized'' column of the > ``flxFORMATS'' table. For example, to fill the demo site > automatically upon installation I have to resort to fancy SQL > statements like the one appended below[1]. > > Would it be easily possible if: > > 1. BibFormat core ``FormatRetriever.inc.php'' would not require > ``serialized'' but would check for its existence, and if this > column is NULL, then it would recreate it upon first load. IOW, > ``getSerializedFormat()'' should not fail but should call > something like ``setSerializedFormat($plaintextformat)''. (You > probably have such a function somewhere in the Web Admin; I have > not looked.) > > 2. We could even think of writing a more complex external CLI tool > that would provide bibformatcfgdump/bibformatcfgimport > functionality. People could then edit formats or link rules or > extraction rules or whatever outside of the BibFormat Web Admin, > in a text editor. It would enable for example an easy way to do > global changes like replacing ``100.a'' with ``110.a'' etc. > > What do you think? Would you have time for option 1, either to advise > me or to do it yourself? +** BibFormat-007: bibreformat format type lookup is wrong +Tue Dec 14 12:38:19 2004 + +When I launch bibreformat -oHD, and in the bibfmt table there was +format already for HB, then the record is not processed. BibReformat +should look for HD, not HB, when it is deciding whether to process +records or not. + * BibIndex -** BibIndex: introduce optional stemming -Wed Nov 3 11:40:39 2004 +** BibIndex-001: index conference title with individual contribution records +Fri Nov 12 14:30:31 2004 -** BibIndex: introduce optional stopwords -Wed Nov 3 11:40:53 2004 + When LKR is defined for an article, lookup the conference title and + index it together with the article metadata, for the title and the + global indexes. (Beware of phrase indexes, modification times of + the conference record WRT contribution records, etc.) -** BibIndex: make our own ACC indexes +** BibIndex-002: make our own ACC indexes Wed Dec 17 16:20:35 2003 Study phrase index generation from XML MARC. Related to bibXXx table abolition. Progress: table structure prepared for v0.3.0 -** BibIndex: --reindex +** BibIndex-003: --reindex > I suspect my fix of course. Is there a safe way to start the indexes > afresh ? dropping the content of all the idx tables ? Yes, exactly. Your fix wasn't sufficient, it would have been necessary to do similar things in more places... at the expense of speed of the searching and indexing engine. Which is why it's better to reindex all records from scratch. You can use the following technique: $ echo "TRUNCATE idxWORD01F;" | /path/to/cdsware/bin/dbexec $ echo "TRUNCATE idxWORD01R;" | /path/to/cdsware/bin/dbexec $ echo "TRUNCATE idxWORD02F;" | /path/to/cdsware/bin/dbexec $ echo "TRUNCATE idxWORD02R;" | /path/to/cdsware/bin/dbexec [...] $ echo "TRUNCATE idxWORD10F;" | /path/to/cdsware/bin/dbexec $ echo "TRUNCATE idxWORD10R;" | /path/to/cdsware/bin/dbexec $ echo "UPDATE idxINDEX SET last_updated='0000-00-00 00:00:00';" | /path/to/cdsware/bin/dbexec $ /path/to/cdsware/bin/bibindex We should invent a nice option to ``bibindex'' that would take care of all these steps. -** BibIndex: index into tmp table +** BibIndex-004: index into tmp table When reindexing everything from scratch, don't wipe out existing index but rather create new index into a temporary table and then copy it over the current index at the end of the process . Useful for end users to keep seeing existing index while the new is beeing built from scratch. * BibRank -** BibRank: rebalancing should read old records +** BibRank-001: rebalancing should read old records Wed Nov 3 11:48:32 2004 When bibrank -R option is used, the attention should not be paid to the dates of last modification of neither records not rnkMETHODDATA and friends; rather the ranking indexation should go into empty tables. * BibSched -** BibSched enable multiple hosts +** BibSched-001: enable multiple hosts Wed Dec 17 16:20:35 2003 -** BibSched task sleeping sometimes cannot be done directly +** BibSched-002: task sleeping sometimes cannot be done directly Sometimes when you try to make BibSched task sleeping, you cannot do it because MySQL is active for the task elsewhere: _mysql_exceptions.ProgrammingError: (2014, "Commands out of sync;You can't run this command now") Tue Mar 2 10:22:33 2004 -** BibSched task numbering +** BibSched-003: task numbering > 2) can you prefix the task number with 0, e.g. _task_0088.err We cannot foretell how many zeros to put there... depends on the total number of jobs. (Maybe we can put some sufficiently large number of leading zeros.) -** BibSched start/stop +** BibSched-004: start/stop > Another patch is for modules/bibsched/bin/bibsched.wml: "ps -C %s o > '%%p%%a'" does not exist on FreeBSD, and I have replaced it by "ps > -o pid,command | grep %s" We'll rewrite the offending part. We've actually been thinking about replacing the bibsched daemon behaviour via a more traditional ``apachectl start/stop'' kind of approach. -** BibSched ERROR task queue policy +** BibSched-005: ERROR task queue policy > Note that the BibSched daemon automatic mode stops as soon as some > of the tasks ends with an error. It it therefore a good idea to > inspect BibSched queue from time to time. This can be done by > running the BibSched command-line admin interface > > Wouldn't it be better that it continued in auto mode and issued a > warning (by e-mail) to the admin? that way it would not be necessary > to check on it from time to time. Indeed. The original workflow assumed a chain of actions that we wanted to stop and manually fix as soon as a problem appeared. I think we can safely change this policy now. +** BibSched-007: memory efficiency of the daemon for big schTASK tables +Thu Dec 16 15:24:34 2004 + +When there is a lot of DONE tasks in the schTASK table, ``bibsched +-d'' eats up a lot of memory during the execution. It shouldn't. + * BibUpload -** BibUpload: table bibxEP was wanted at some point +** BibUpload-001: table bibxEP was wanted at some point Wed Nov 3 11:55:27 2004 During recent demo, at some point in time when a new submission type was made the system wanted to look for bibxEP table. Check tag creation rules. -** BibUpload: must use run_sql() +** BibUpload-002: must use run_sql() to avoid connection dropping problems Fri Jan 16 10:17:11 2004 * Miscellaneous -** Miscellaneous: investigate usage of SQLRelay - +** Miscellaneous-001: introduce several record modification times (bib/ref/pdf) +Mon Nov 29 10:05:33 2004 + +Need to distinguish between several modification times: metadata +modif, reference modif, fulltext modif. Touches +BibUpload/BibReference/BibIndex and friends. + +** Miscellaneous-002: investigate usage of SQLRelay + -** Miscellaneous: INSTALL file +** Miscellaneous-003: INSTALL file Add comments on PHP and Python linking to the same MySQL library. (e.g. people should not use PHP internal MySQL library). -** Miscellaneous: INSTALL file upgrade instructions - Wed Dec 17 16:20:35 2003 - Updated Wed May 12 12:00:38 2004 - update sql targets, plus release announcements +** Miscellaneous-004: INSTALL file upgrade instructions +Wed Dec 17 16:20:35 2003 +Updated Wed May 12 12:00:38 2004 - update sql targets, plus release announcements -** Miscellaneous: test suite +** Miscellaneous-005: test suite Make version testing there too. -** Miscellaneous: backup script +** Miscellaneous-006: backup script shutdown DB, put warning message, hotbackup tables, start DB, remove warning message -** Miscellaneous: MySQLdb 1.0.0 -API has changed for BLOBs/arrays. Should adapt our interface. +** Miscellaneous-007: MySQLdb 1.0.0 +API has changed for BLOBs/arrays. Should adapt our interfaces. -** Miscellaneous: Personalization part is not I18N-ized yet - Received: Mon Mar 15 11:23:34 2004 +** Miscellaneous-008: Personalization part is not I18N-ized yet +Mon Mar 15 11:23:34 2004 The Personalization part is not I18N-ized yet and there are not very many personalization options. We plan to expand it at some point in the future, like a possibility to select default language, default number of hits per page, default sorting, etc. +** Miscellaneous-009: version numbering CDSware/0.3.3.20040929 bibindex/1.12 +Mon Dec 13 17:53:24 2004 + +Each module should have its own version numbering, to easily track +what changed. For example: + + $ /soft/cdsware-PCDH23/bin/bibindex -V + CDSware/0.3.3 bibindex/1.2 + + $ /soft/cdsware-PCDH23/bin/bibformat -V + CDSware/0.3.3 bibformat/2.3 + +After a new release: + + $ /soft/cdsware-PCDH23/bin/bibindex -V + CDSware/0.3.4 bibindex/1.8 + + $ /soft/cdsware-PCDH23/bin/bibformat -V + CDSware/0.3.4 bibformat/2.3 + +indicating that bibformat didn't change while bibindex did a lot in +between the two releases. + * OAI -** OAI: RTdata dir should be created during `make install` ? +** OAI-001: RTdata dir should be created during `make install` ? Mon Jan 19 10:41:18 2004 -** OAI: periodical harvesting +** OAI-002: periodical harvesting On Mon, 4 Oct 2004, pelzer@hbz-nrw.de wrote: > reply of martin. my new question: are you ready with OAI data > harvestor? do you have any experiences with periodical harvesting? > what do you do with doublets? For the time being we only provide command-line `bibharvest' tool without any periodical harvesting admin facility. We haven't had time to develop BibHarvest Admin yet. -** OAI: provenance information +** OAI-003: provenance information On Tue, 28 Sep 2004, pelzer@hbz-nrw.de wrote: > have a question about your oai_repository.py: > > do you plan for OAI the output of provenance information in the > "about" part of a record? e.g. > http://www.openarchives.org/OAI/2.0/guidelines-provenance.htm * WebAccess -** WebAccess: restriction by IP +** WebAccess-001: restriction by IP Wed Nov 3 11:39:58 2004 * WebAlert -** WebAlert: manage alerts for a mailing list +** WebAlert-001: nice and simple interface to set up an alert +Wed Nov 24 13:18:02 2004 + +The form we used to have to easily create alerts has gone. The only +way now is via search history. Put it back. See also +. + +** WebAlert-002: manage alerts for a mailing list Wed Nov 3 11:46:24 2004 Instead of having to create an account with the email address of mailing list (in order to be able to send alerts to mailing list), introduce a possibility to define alert mailing list management by an individual user. * WebBasket -** WebBasket: output formats in XML etc +** WebBasket-001: output formats in XML etc Wed Nov 3 11:45:37 2004 Properly support many output formats for a basket, quite like the search engine does. * WebSearch -** WebSearch: introduce possibility to search for basketid:333 +** WebSearch-001: treat stemming properly +Mon Dec 13 16:46:45 2004 + +** WebSearch-002: treat stopword search properly +Mon Dec 13 16:46:43 2004 + +** WebSearch-003: introduce possibility to search for basketid:333 Wed Nov 3 11:45:05 2004 -** WebSearch: introduce recommended terms lookup +** WebSearch-004: introduce recommended terms lookup Wed Nov 3 11:41:12 2004 -** WebSearch: introduce `advertized records' like Google +** WebSearch-005: introduce `advertized records' like Google Wed Nov 3 11:41:35 2004 -** WebSearch: introduce NEAR operator +** WebSearch-006: introduce NEAR operator Wed Nov 3 11:41:54 2004 For example, we could approximate NEAR by doing a regexp search for two words less than 20 characters apart. -** WebSearch: put search cache back +** WebSearch-007: put search cache back Wed Dec 17 16:20:35 2003 -** WebSearch: add RSS output for recent additions to the collections. +** WebSearch-008: add RSS output for recent additions to the collections. Wed Dec 17 16:20:35 2003 -** WebSearch: when searching for ``title: goo'' the space should be ignored +** WebSearch-009: when searching for ``title: goo'' the space should be ignored Wed Jan 21 11:09:00 2004 -** WebSearch: cross-searching of various CDSware installations +** WebSearch-010: cross-searching of various CDSware installations Wed Dec 17 16:20:35 2003 -** WebSearch/BibIndex: indicate when cfg_max_recID is going to be exhausted +** WebSearch-011: indicate when cfg_max_recID is going to be exhausted Wed Nov 3 11:38:40 2004 -** WebSearch: collection cache +** WebSearch-012: collection cache > URI: > http://cdsweb.cern.ch/search.py?sc=1&ln=en&p=slow+ejection&f=title&action=Search+&cc=Articles+%26+Preprints&c=Published+Articles&c=Preprints&c=Theses&c=Reports&c=CERN+Internal+Notes > Time: 24/Jun/2004:18:16:53 +0200 Browser: Mozilla/4.0 (compatible; > MSIE 6.0; Windows NT 5.0) Client: 137.138.169.154 The problem is connected to temporary cache. If it happens again, please just try to reload the page after a while. We'll fix the problem to prevent it from happening. -** WebSearch: of=nn for the search engine +** WebSearch-013: of=nn for the search engine On Tue, 22 Jun 2004, pelzer@hbz-nrw.de wrote: >> If this is not feasible (e.g. huuuge result sets), then we may >> invent another output behaviour, e.g. ``of=nn'' that would return >> you only the number of hits, that is ``12'' for the example above. > > another output behaviour would be the best solution. don't you think it's difficult for a user to search with xml output. the user don't see in xml - "mode", how many records are found. i think, it's helpful to write the hit number at the beginning of the first xml data record. what do you think about this? -** WebSearch: multiple search logs +** WebSearch-014: multiple search logs On Fri, 15 Oct 2004, Frederic Gobry wrote: > For the file log, wouldn't it be useful to keep a track of queries > with no matches? (to discover systematic errors or problems) The > current implementation seems to discard these queries. Yes, it would. A log of slow queries as well. I'll add them as new log files before the next release. -** WebSearch: safer wildcard treatment +** WebSearch-015: safer wildcard treatment > okay, just began to wonder when CERN* never returned an answer :) Yup. I wanted to plug-in a generic timeouter to the whole search engine to make sure that queries finish within 10 seconds or so. But this is not done yet. At the moment, the wildcards are simply refused for words with less that three letters, and accepted for longer words. But this does not work well for words like `CERN'. While waiting for that generic timeouter, I should rather check how many indexed terms are returned by a wildcard word, and refused to take wildcard into account in case of e.g. more than 20 terms or so... > Looks like you are sending me 'terror*', and not each word that > includes in 'terror*' Currently `cern*' could lead to hundreds of thousands of words, so it's hard to . I'll rewrite the wildcard handling part in order to retain cases with <200 words, say, and then I'll pass you the full list. +** WebSearch-016: webcoll can relax if nothing was updated + +WebColl can relax if no record was modified/added since the last run +and if no collection definifion/options/portalboxes/etc were updated +since the last run. + * WebSession -** WebSession: detect cookies and inform user if not available upon login +** WebSession-001: detect cookies and inform user if not available upon login We should detect whether cookies are disabled, and print a message on the login page. Otherwise user types good access credentials but stays guest, not knowing what went wrong. See e.g.: On Tue, 2 Nov 2004, RAMSTEIN Beatrice wrote: > I was working with Konqueror (KDE navigator). I tried now with > netscape, which is in fact my usual navigator and it works. It still > doesn't work with Konqueror, but it doesn't matter, since I usually > use Netscape anyway. Good. Note that Konqueror works perfectly fine for me. I guess that you have probably configured it not to accept cookies, which is why our session ID is rejected on your end so that user authentication cannot work. Please try to enable cookies for our domain and Konqueror should start to work just fine. * WebSubmit -** WebSubmit: MBI login link uses bad `referer' argument +** WebSubmit-001: MBI login link uses bad `referer' argument Received: Wed Nov 3 12:02:44 2004 -** WebSubmit: call elements `title' not `TI' +** WebSubmit-002: call elements `title' not `TI' Wed Nov 3 11:57:31 2004 -** WebSubmit: traceback when `brique' element was edited +** WebSubmit-003: traceback when `brique' element was edited Wed Nov 3 11:56:33 2004 When `brique' element was edited during a recent demo, a Python traceback was obtained. -** WebSubmit: admin interface should use [?] links +** WebSubmit-004: admin interface should use [?] links Wed Nov 3 11:50:01 2004 WebSubmit Admin should use [?] links to point to help pages, e.g. to the list of available functions etc. -** WebSubmit: Slovak language abbreviation is not `slo', that is Slovene +** WebSubmit-005: Slovak language abbreviation is not `slo', that is Slovene Wed Nov 3 11:50:47 2004 -** WebSubmit: MBI for hep-th/00000 looked for hep-th_00000 +** WebSubmit-006: MBI for hep-th/00000 looked for hep-th_00000 Wed Nov 3 11:54:10 2004 -** WebSubmit: publiline report number link gave traceback +** WebSubmit-007: publiline report number link gave traceback Wed Nov 3 11:54:10 2004 publiline report number link gave Python traceback during recent demo. -** WebSubmit: `your approvals' link not needed everywhere +** WebSubmit-008: `your approvals' link not needed everywhere Wed Nov 3 11:47:40 2004 On the pesonal account page, Your Approvals link should not be displayed if I'm not referee for some document. Verify! -** WebSubmit: integrate submit new record / submit new file +** WebSubmit-009: integrate submit new record / submit new file Wed Nov 3 11:43:36 2004 Do not separate out submitting new record bibliographic information and new fulltext file, but rather merge the latter into the former as page N. See also ``End submission'' and ``Finish submission'' problems. MBI and SRV button names should be user-friendly. -** WebSubmit: after adding PCV action, php mysql bad result +** WebSubmit-010: after adding PCV action, php mysql bad result Wed Nov 3 12:05:51 2004 After PCV action was added, php mysql bad result was obtained for some page. -** WebSubmit: create icons for submitted videos +** WebSubmit-011: create icons for submitted videos Wed Nov 3 11:44:39 2004 -** WebSubmit: delete unwanted fields +** WebSubmit-012: delete unwanted fields > Une petite question e propos de l'interface de soumission: Lorsqu'on > efface le contenu d'un champ (de type text input par exemple) qui a > ete prealablement saisi lors de la soumission d'un document, les > donnees de ce champ ne sont pas effacees. Elles sont toujours > visibles sur le serveur. Est-ce un bug? Merci de ta reponse. Oui, il paraet que c'est un probleme. On ne peut pas effacer un champ en faisant e correct e sur un champ vide (disons sans sous-champs), car BibUpload ignorera les champs vides. On va regarder cela et envisager un protocol souhaitable e ce propos. Sinon ce que tu peux faire en attendant c'est de faire le e replace e complet de la notice entiere, comme suit: # telecharger la notice recID=123: $ wget -O z_z.xml 'http://pcdh23.cern.ch/search.py?recid=123&of=xm' # editer la notice et enlever le champ en trop: $ vi z_z.xml # soumettre la notice en mode replace: $ bibupload -r z_z.xml ce qui fera l'affaire. * End of file ;;; End of file.