Homec4science

Correctly handle entity and character references when washing HTML.

Authored by Jerome Caffaro <jerome.caffaro@cern.ch> on Oct 14 2008, 15:53.

Description

Correctly handle entity and character references when washing HTML.

Entity ("&name;") and character ("&#ref;") references were simply lost when washing HTML, as RecordHTMLParser class simply did not subclass handle_charref(..) and
handle_entityref(..) functions. Now these values are transformed to their text equivalent.
Also correctly close the parser in get_as_text(..) once the input text has been fed.

Event Timeline

Jerome Caffaro <jerome.caffaro@cern.ch> committed R3600:63454f8452bb: Correctly handle entity and character references when washing HTML. (authored by Jerome Caffaro <jerome.caffaro@cern.ch>).Oct 14 2008, 15:53