Graphmaster
Graph
master
History Graph
History Graph
Commit | Author | Details | Committed | ||||
---|---|---|---|---|---|---|---|
c6b295092771 | Jan Linder | Fix the bug with extracting the participants | Nov 17 2020 | ||||
f4dcd0f64366 | Jan Linder | Reverse the change with pagenumbers because it inserted a lot more problems | Nov 16 2020 | ||||
5bee42e32c08 | Jan Linder | Fix two bugs with line_comes_first: i) clean category dict after each part of a… | Nov 16 2020 | ||||
1db59f7ef7b2 | Jan Linder | Keep page numbers during extractions (bug fix) and other minor fixes | Nov 16 2020 | ||||
6a0cb9b440d1 | Jan Linder | Adjust generate_complete_dataset to use the metadata file | Nov 16 2020 | ||||
b456f2b06428 | Jan Linder | Fix the Sri Lanka bug (was taken as a male participant (Sr) | Nov 16 2020 | ||||
9e93900c46b3 | Jan Linder | Adapt do_all.py s.t. it uses the metadata file | Nov 16 2020 | ||||
86343b0a38b3 | Jan Linder | Rename the Analyzer files and create factory | Nov 16 2020 | ||||
850a9306adc8 | Jan Linder | Implement the MeetingAnalyzer class (abstract) | Nov 16 2020 | ||||
b14274a4a3cc | Jan Linder | Add metadata and sb48b | Nov 16 2020 | ||||
62f3c636c617 | Jan Linder | Let OCR lists also start by default with parties as category | Nov 16 2020 | ||||
d59172c8e7a5 | Jan Linder | Add script to find biggest delegations | Nov 16 2020 | ||||
d5a39367f24c | Jan Linder | Minor changes | Nov 15 2020 | ||||
aa46a8340129 | Jan Linder | Fix bug in DigitalPdfExtr that made pdfs with several list parts to fail | Nov 15 2020 | ||||
4ec3f290fe66 | Jan Linder | Minor corrections for the parties over time | Nov 15 2020 | ||||
427418ee24c1 | Jan Linder | Add the country library to unify parties | Nov 15 2020 | ||||
165b9b13f17f | Jan Linder | UNTESTED Refactoring and renaming of the text extractor part | Nov 13 2020 | ||||
bcde773d7e59 | Jan Linder | Bugfix of the issue on pages with new categories (order was wrong) | Nov 13 2020 | ||||
3ee4c9e68479 | Jan Linder | Add minor change for complete dataset | Nov 10 2020 | ||||
9dd37a46c53e | Jan Linder | Minor changes and todo | Nov 10 2020 | ||||
fa5ec0d659bc | Jan Linder | Add experience graph s.t. it works | Nov 10 2020 | ||||
4d6352e6b52f | Jan Linder | Added experience graph but failed | Nov 9 2020 | ||||
90dfb209b07e | Jan Linder | Add file to find average number of participants | Nov 9 2020 | ||||
d3111e1d6608 | Jan Linder | Add delegation size plot | Nov 9 2020 | ||||
9484071d42bd | Jan Linder | add gender analysis | Nov 9 2020 | ||||
5414f53caa32 | Jan Linder | Add hasTitle to complete dataset | Nov 3 2020 | ||||
50cb623d6541 | Jan Linder | Add gender in the complete dataset | Nov 3 2020 | ||||
fd2a12e427de | Jan Linder | Add todo week8 | Nov 3 2020 | ||||
ed5b12fba5f8 | Jan Linder | Make other Analyzers unstatic. | Nov 3 2020 | ||||
85156b2d7027 | Jan Linder | Made Analyzer 1-5 unstatic and solved issue with encoding for cop5 | Nov 3 2020 | ||||
71ee7f02893d | Jan Linder | Extraction of SB now possible. Now you need to give meeting label to… | Nov 2 2020 | ||||
e13b8d6a2ade | Jan Linder | Minor changes to todo.md | Nov 2 2020 | ||||
2eecd75d75b8 | Jan Linder | Restructuring of repo | Nov 2 2020 | ||||
27baf866d95e | Jan Linder | Add todo week7 | Oct 27 2020 | ||||
a46d37c100b6 | Jan Linder | Add the description lines extraction | Oct 26 2020 | ||||
73d06bab31b6 | Jan Linder | Add translation from french to english for cop2 | Oct 26 2020 | ||||
b5c1185e3a2e | Jan Linder | Generate the complete dataset now possible | Oct 26 2020 | ||||
7f289412e9cb | Jan Linder | minor change to todo | Oct 26 2020 | ||||
5d558671cc2f | Jan Linder | Update todo week6 | Oct 23 2020 | ||||
db6864b1a785 | Jan Linder | Add the possibility to have more than one affiliation category per page | Oct 22 2020 | ||||
120cd4c49b66 | Jan Linder | Insert new plot and boxing for tesseract to presentation | Oct 22 2020 | ||||
0ad97826fa8a | Jan Linder | Added new issues in source to readme | Oct 22 2020 | ||||
2e7f25ecd1bc | Jan Linder | Add presentation for Paula and Marlene, week6 | Oct 22 2020 | ||||
5531d854d21a | Jan Linder | add generate plots | Oct 21 2020 | ||||
62f11a926efb | Jan Linder | Minor bugfixes and improvements | Oct 20 2020 | ||||
0b5560f27517 | Jan Linder | Explain Analyzer in readme | Oct 20 2020 | ||||
3fd706cf5bd0 | Jan Linder | Rename cop1to5, included cop5 there | Oct 20 2020 | ||||
97b00603c9f4 | Jan Linder | Improve Analyzers, especially cop7to8 | Oct 20 2020 | ||||
67a5ee11403c | Jan Linder | Implement cop7to8_analyzer and affilition list extractor. | Oct 19 2020 | ||||
883bce5e3184 | Jan Linder | Added party recognition for all cops | Oct 19 2020 | ||||
8b9912f6d1fa | Jan Linder | First edition of affiliation category extraction. Only works for cop5+ yet | Oct 17 2020 | ||||
fa3369344206 | Jan Linder | minor adaption op salutory addresses | Oct 14 2020 | ||||
2186bf7609de | Jan Linder | Make pdfToText unstatic. | Oct 14 2020 | ||||
7514566bb3e2 | Jan Linder | Add todo week5 | Oct 13 2020 | ||||
dc91f390862e | Jan Linder | Improve pdftotext and its analysis. | Oct 12 2020 | ||||
e22f3e528e6a | Jan Linder | Implement pdfToTxt manually quite correctly | Oct 12 2020 | ||||
a0f6ce99d856 | Jan Linder | implement PDFPageDetailedAggregator to get the positions of LTContainers | Oct 10 2020 | ||||
c66cb28f7eac | Jan Linder | bring copnewer_analyzer to work | Oct 6 2020 | ||||
9dd74b6c1ab6 | Jan Linder | add todos week4 | Oct 6 2020 | ||||
0f3b244e9c77 | Jan Linder | Remove old code files | Oct 6 2020 | ||||
29e001a8ff57 | Jan Linder | IMPLEMENT PDF TO TXT CORRECTLY Now use pdfminer with the laparam argument to… | Oct 6 2020 | ||||
66d3e48b2e2a | Jan Linder | the ocr now works correctly | Oct 5 2020 | ||||
154caa57dcc3 | Jan Linder | blacklist | Oct 5 2020 | ||||
2f69d7e02c12 | Jan Linder | whitelist ocr: not perfect | Oct 5 2020 | ||||
9cc36860b206 | Jan Linder | Improve analysis for cop3 and cop4 | Oct 5 2020 | ||||
0d36b15c21a8 | Jan Linder | Small corrections for OCR | Oct 5 2020 | ||||
67d367f6243d | Jan Linder | finish modularization | Oct 5 2020 | ||||
e2ddd912c5b8 | Jan Linder | File structure updated. (untested) | Oct 4 2020 | ||||
26f03fd369c3 | Jan Linder | Begin with making a proper modularization | Oct 4 2020 | ||||
2d93960b8990 | Jan Linder | inserted boxes for OCR and right parameters | Sep 30 2020 | ||||
75edac7ca39e | Jan Linder | added todo week3 | Sep 29 2020 | ||||
d37074c1d20f | Jan Linder | minor changes of process, added raw of cop3 | Sep 28 2020 | ||||
5b6aa3b3ac8b | Jan Linder | progress on cop2-4 | Sep 28 2020 | ||||
57cdc59e9210 | Jan Linder | Use of process_copX.py precised in README | Sep 27 2020 | ||||
355f0c163915 | Jan Linder | Implemented processing of cop2-4. Works good for countries, but has major… | Sep 27 2020 | ||||
a557a870c334 | Jan Linder | try with pypdf2 | Sep 27 2020 | ||||
13691532eec5 | Jan Linder | implemented process cop for 5 - 25 with textract but there are major errors in… | Sep 26 2020 | ||||
7f3f311f331c | Jan Linder | progress on the class and process script | Sep 23 2020 | ||||
c747832a97fe | Jan Linder | Began the copx file | Sep 22 2020 | ||||
0055a1d8e4a0 | Jan Linder | added todo | Sep 22 2020 | ||||
aa92caa65e57 | Jan Linder | added raw txt for cop25 | Sep 22 2020 | ||||
37b4e9f47446 | Jan Linder | added my testing files for OCR with cop1 | Sep 22 2020 | ||||
9514a37465f4 | Jan Linder | data complete | Sep 17 2020 | ||||
77b17d5e7bd6 | Jan Linder | data complete | Sep 17 2020 | ||||
45310b6c42f1 | Jan Linder | first part of the lists | Sep 17 2020 |
c4science · Help