Homec4science

WebJournal: improved article header recognition

Authored by Jerome Caffaro <jerome.caffaro@cern.ch> on Sep 21 2009, 18:14.

Description

WebJournal: improved article header recognition

  • When header is not clearly recognized, and first sentence must be extracted to play the role of the header, left-strip empty spaces that might have been left by the washing function.
  • Also, be more resistant to an invalid HTML markup.

Details

Committed
Tibor Simko <tibor.simko@cern.ch>Sep 21 2009, 18:15
Parents
R3600:f602b3dfc166: htmlutils: washing of <script> and <style> tags
Branches
Unknown
Tags
Unknown

Event Timeline

Tibor Simko <tibor.simko@cern.ch> committed R3600:550e72ca7e56: WebJournal: improved article header recognition (authored by Jerome Caffaro <jerome.caffaro@cern.ch>).Sep 21 2009, 18:15