<h2 id="Definition,-context-and-best-practices">Definition, context and best practices<a class="anchor-link" href="#Definition,-context-and-best-practices">¶</a></h2>
<h2 id="Inclusion-of-the-participants'-questions">Inclusion of the participants' questions<a class="anchor-link" href="#Inclusion-of-the-participants'-questions">¶</a></h2>
<li>The definition of research data is not fixed or rigid: several definitions are possible based on specific fields, institutions, and organizations.</li>
<li>For the Organization for Economic Cooperation and Development <a href="http://www.oecd.org/fr/sti/sci-tech/38500823.pdf">OCDE</a>, research data are defined as factual recording (numbers, texts, images and sounds), which are used as principal sources for scientific research and which are often recognized by the scientific community as being necessary to validate research results.</li>
<li>One key element to take into consideration during research data management are the legal, ethical and political aspects based on the sensitivity of the data.</li>
<h3 id="Useful-resources">Useful resources<a class="anchor-link" href="#Useful-resources">¶</a></h3><p>The Digital Curation Center has set up many resources to help institutions develop their own institutional policies and guidelines for research data management:</p>
<ul>
<li><a href="http://www.dcc.ac.uk/resources/policy-and-legal/policy-tools-and-guidance/policy-tools-and-guidance">DCC Policy tools and guidance</a></li>
<li><a href="http://www.dcc.ac.uk/sites/default/files/documents/publications/DCC-FiveStepsToDevelopingAnRDMpolicy.pdf">Five Steps to Developing a Research Data Policy</a></li>
<h3 id="Examples-of-institutional-policies:">Examples of institutional policies:<a class="anchor-link" href="#Examples-of-institutional-policies:">¶</a></h3><ul>
<li><a href="http://www.data.cam.ac.uk/research-data-policies">University of Cambridge</a></li>
<li><a href="http://www.admin.ox.ac.uk/media/global/wwwadminoxacuk/localsites/researchdatamanagement/documents/Policy_on_the_Management_of_Research_Data_and_Records.pdf">University of Oxford</a></li>
<li><a href="http://www.ed.ac.uk/information-services/about/policies-and-regulations/research-data-policy">University of Edinburgh</a></li>
<li><a href="https://www.cms.hu-berlin.de/de/ueberblick/projekte/dataman/policy/policy-en/rdm-eng-policy">Humboldt-Universität zu Berlin</a></li>
<h2 id="Requirements-regarding-research-data-management">Requirements regarding research data management<a class="anchor-link" href="#Requirements-regarding-research-data-management">¶</a></h2>
<li><a href="http://research-office.epfl.ch/financements/international/horizon-2020">Horizon 2020</a>: is the biggest funding agency from the European Commission
with nearly €80 billion of funding available over 7 years from 2014 to 2020. Its
main objective is to promote and support excellence in the scientific field.</li>
<li>Horizon 2020 requires for some research projects the preparation of a <a href="http://ec.europa.eu/programmes/horizon2020/en/what-horizon-2020">data management plan</a>, which is mandatory in order to receive research funding. </li>
<li><a href="https://ec.europa.eu/digital-single-market/en/news/communication-european-cloud-initiative-building-competitive-data-and-knowledge-economy-europe">As of 2017</a>, the Commission will make <strong>open research data the default option</strong>, while ensuring opt-outs, for all new projects of the Horizon 2020 program.</li>
<li><p>SNSF Policy and coordination with research communities and other actors <a href="http://forscenter.ch/wp-content/uploads/2014/11/DART_Slides_iki.pdf">to be established</a>:</p>
</li>
<li><p>Ongoing developement of a <strong>research data management policy</strong> together with infrastructure policy</p>
</li>
<li><p>Submission of <strong>data management plans</strong> with the grant application</p>
<h2 id="Best-practices-examples:-EPFL-(Switzerland)">Best practices examples: EPFL (Switzerland)<a class="anchor-link" href="#Best-practices-examples:-EPFL-(Switzerland)">¶</a></h2><p>To provide guidance in preparing a DMP, the <strong><a href="http://library.epfl.ch/files/content/sites/library3/files/research-data/dmp/Data_management_plan_checklist_EPFL_2016.pdf">EPFL-ETHZ checklist</a></strong> includes
four categories to cover questions related to:</p>
<ul>
<li>Research Data Acquisition : type, quantity, license, etc.</li>
<li>Research Data Format : format, metadata, identification, etc.</li>
<li>Research Data Sharing : embargo, intellectual property, etc.</li>
<li>Data Preservation : storage, sensitivity of the data, archiving, etc.<center><img src="./Images/EPFL-checklist.png" width="600" height="450" /></center></li>
<h2 id="Part-2.1---Issues-related-to-data:">Part 2.1 - Issues related to data:<a class="anchor-link" href="#Part-2.1---Issues-related-to-data:">¶</a></h2><h3 id="Reproducibility-issues">Reproducibility issues<a class="anchor-link" href="#Reproducibility-issues">¶</a></h3><p>According to a Nature study in 2012, <strong>47 out of 53</strong> medical research papers are irreproducible (1).</p>
<p><font size="1">(1) Begley, C. G.; Ellis, L. M. (2012). "Drug development: Raise standards for preclinical cancer research". Nature 483 (7391): 531–533.<br /> (2) Ioannidis JPA, Allison DB, Ball CA, et al. Repeatability of published microarray gene expression analyses. Nat Genet 2009;41(2):149–55.<br /> (3) Vandewalle, Patrick, Jelena Kovacevic, and Martin Vetterli. "Reproducible research in signal processing." Signal Processing Magazine, IEEE 26.3 (2009): 37-47 </font><br /></p>
<p><font size="1">[Slide inspired by https://github.com/saloot/IPythonClass , Amir Hessam Salavati & ,Robin Schiebler 2015 ]</font></p>
<h3 id="Data-access-sustainability">Data access sustainability<a class="anchor-link" href="#Data-access-sustainability">¶</a></h3><p>A Plos One study showed in 2014 that <strong>more than 60% of links to datasets are broken after 10 years</strong> (1).</p>
<font size="3">For more tools, see <a href="https://infoscience.epfl.ch/record/211157">A Selection of Research Data Management Tools Throughout the Data Lifecycle / Jan Krause</a></font></p>
<h2 id="2.2.1---A-trusted-data-repository">2.2.1 - A trusted data repository<a class="anchor-link" href="#2.2.1---A-trusted-data-repository">¶</a></h2><p>Criteria:</p>
<ul>
<li><strong>Broken links</strong>: use persistent identifiers such as <strong>DOIs</strong>,</li>
<li><strong>Reliability</strong>: data preservation (e.g. OAIS standard),</li>
<li><strong>Visibility</strong>: schema.org for search engines, OAI-PMH2 standard and/or <strong>well known community repository</strong></li>
<li><strong>Searchability</strong>: at least a basic metadata standard (e.g. <strong>DublinCore</strong>).</li>
<li>Numerous specilized metadata formats are available for most disciplines, the Research Data Alliance <a href="http://rd-alliance.github.io/metadata-directory/">Metadata Directory</a> is a good starting point.</li>
<h3 id="Some-open-formats-to-take-into-account">Some open formats to take into account<a class="anchor-link" href="#Some-open-formats-to-take-into-account">¶</a></h3><ul>
<li>Portable Document Format <strong>PDF/A, ISO standard</strong>, text [PDF for archiving, no ciphers, included fonts...]</li>
<li>Interactive <strong>Jupyter Notebooks</strong> documents. Richtext, formulas (LaTeX), charts and code. All dynamic. It can also be used for presentations.</li>
<h2 id="Data-formats-list">Data formats list<a class="anchor-link" href="#Data-formats-list">¶</a></h2><p>Sustainability of digital formats by the US Library of Congress. <a href="http://www.digitalpreservation.gov/formats/">This list</a> is categorized by datatypes (text, audio, image, video, geospacial, dataset, etc.)</p>
<h2 id="Part-2.2.4---Adequate-licences">Part 2.2.4 - Adequate licences<a class="anchor-link" href="#Part-2.2.4---Adequate-licences">¶</a></h2><p>A licence allows to define the way your data can be reused. For instance:</p>
<p>Creative Commons (<strong>CC0</strong> and <strong>CC-BY</strong>) <a href="http://creativecommons.org/">http://creativecommons.org/</a> Since CC4.0, sui generis law protecting database content is taken into account (in addition to the form protected by copyright) <a href="https://wiki.creativecommons.org/wiki/Data">https://wiki.creativecommons.org/wiki/Data</a></p>
<p>EPFL offers many storage options, as described on the VPSI page <a href="https://it.epfl.ch/business_service.do?sysparm_document_key=cmdb_ci_service,90cbd58e0ff121009f8579f692050eb7&sysparm_service=Bases_de_donnees_et_Stockage_Serveurs">Databases, Storage and Virtualization</a>.</p>
<h3 id="Code-sharing,-branching-and-versioning">Code sharing, branching and versioning<a class="anchor-link" href="#Code-sharing,-branching-and-versioning">¶</a></h3><p><img src="Images/git.png" alt="."></p>
<p><a href="https://c4science.ch/">c4science</a> is the Swiss collaborative development platform. Accessible to all academic members via Switch AAI, will allow invitation of external colleagues (probably starting in June 2016). c4science offers:</p>
<h3 id="Scientific-workflow-management">Scientific workflow management<a class="anchor-link" href="#Scientific-workflow-management">¶</a></h3><p>Scientific results are often the outcome of complex worflows. Computation operations constitute a graph, which may be difficult to reproduce.</p>
<p>Taverna is an excellent workflow engine. It includes the desktop oriented <a href="https://taverna.incubator.apache.org/download/ (multi-platform and open source">Taverna Workbench</a>, command-line and server applications:</p>
<li><p><strong><a href="https://www.authorea.com/">Authorea</a></strong>: collaborative writing, easy to use, LaTeX supported but not required (EPFL licence provided by the Library) <img src="Images/Authorea.png" alt="."></p>
</li>
<li><p><strong><a href="https://de.sharelatex.com/">Sahre LaTeX</a></strong>: collaborative writing based on LaTeX. Suited for LaTeX power users. <img src="Images/ShareLaTeX.png" alt="."></p>
<li><strong><a href="https://www.zotero.org/">Zotero</a></strong>: bibliographic management, citation, sharing and discovery tool (SFP cours from 2016 by the Library and <a href="http://library.epfl.ch/doctor-zotero/en">Dr Zotero</a>) <img src="Images/Zotero.png" alt="."></li>
<h2 id="Part-2.2.6---Why-Open-Data-and-Reproducibility?">Part 2.2.6 - Why Open Data and Reproducibility?<a class="anchor-link" href="#Part-2.2.6---Why-Open-Data-and-Reproducibility?">¶</a></h2><ul>
<li>For better reproducibility & for the sake of science</li>
(1) Piwowar, H. a et al. Sharing detailed research data is associated with increased citation rate. PloS one. 2, (2007), 308. <br />
(2) Antelman, Kristin. "Do open-access articles have a greater research impact?." College & research libraries 65.5 (2004):372-382. <br />
(3) Vandewalle, Patrick, Jelena Kovacevic, and Martin Vetterli. "Reproducible research in signal processing." Signal Processing Magazine, IEEE 26.3 (2009): 37-47. <br />
<h3 id="2.3.1---Interactive-data-visualization-examples">2.3.1 - Interactive data visualization examples<a class="anchor-link" href="#2.3.1---Interactive-data-visualization-examples">¶</a></h3><ul>
<li><a href="https://www.washingtonpost.com/graphics/national/power-plants/">US Electricity Generation by Power Source</a> (Washington Post, 2015)</li>
<li><a href="http://www.bloomberg.com/graphics/2016-oil-rigs/">Oil Drilling Collapse in the USA</a> (Bloomberg, 2016)</li>
<li><a href="https://map.geo.admin.ch/?lang=fr&topic=energie&bgLayer=ch.swisstopo.pixelkarte-grau&layers_visibility=false,false,false,false&layers_timestamp=18641231,,,&catalogNodes=2419,2420,2427,2480,2429,2431,2434,2436,2441">Swiss Federal Office of Topography</a> (SwissTopo, 2016)</li>
<h3 id="2.3.3---Visualization-software">2.3.3 - Visualization software<a class="anchor-link" href="#2.3.3---Visualization-software">¶</a></h3><p>Visualization tools may be categorized by their flexibility and simplicity of use. Here is a short selection:</p>
<h4 id="Gephi">Gephi<a class="anchor-link" href="#Gephi">¶</a></h4><p><a href="https://gephi.org/">Gephi</a> : free multiplatform data analysis software. <a href="https://gephi.org/features/">More information and examples</a>. <a href="https://player.vimeo.com/video/9726202">See the video presentation</a>.</p>
<p>To explore in more depth, see <a href="https://www.youtube.com/watch?v=yZ0G9jljCto">video tutorial</a>.</p>
<p><a href="http://circos.ca/">Circos</a> is an open source desktop application for visualizing data in circular layouts. It is ideal for exploring relationships between objects or positions. <a href="http://circos.ca/images/published/">Examples</a>.</p>
<h4 id="Tableau">Tableau<a class="anchor-link" href="#Tableau">¶</a></h4><p><a href="https://www.tableau.com">Tableau</a> : commercial software, coming with different varieties: desktop, server, cloud, reader, online, or public. See <a href="http://www.tableau.com/products/desktop">here</a> for example.</p>
<li><a href="http://pandas.pydata.org/">Pandas</a> is a powerful library providing high-performance, easy-to-use data structures and data analysis tools. <a href="http://pandas.pydata.org/pandas-docs/stable/visualization.html">Examples</a>.</li>
<li><a href="https://stanford.edu/~mwaskom/software/seaborn/">Seaborn</a> relies on Pandas (see below). <a href="https://stanford.edu/~mwaskom/software/seaborn/examples/">Examples</a>.</li>
<li><a href="https://networkx.github.io/">NetworkX</a> is suited for complex netrwoks analysis and representation. <a href="http://networkx.github.io/documentation/latest/gallery.html">Examples</a>.</li>
<li><a href="http://matplotlib.org/">Matplotlib</a> is a ploting library with a great flexibility. It has comparable features to Matlab ploting. <a href="http://matplotlib.org/gallery.html">Examples</a>.</li>
<h4 id="Web-oriented">Web oriented<a class="anchor-link" href="#Web-oriented">¶</a></h4><h5 id="D3.js">D3.js<a class="anchor-link" href="#D3.js">¶</a></h5><p>D3.js](<a href="https://d3js.org/">https://d3js.org/</a>) is an open source JavaScript library for creating interactive documents based on data**. D3 helps bringing data to life using HTML, SVG, and CSS. As mentioned above it can be used in conjunction with matplotlib via <a href="http://mpld3.github.io/">mpld3</a>. <a href="https://github.com/mbostock/d3/wiki/Gallery">D3.js examples</a>. In addition, the <a href="http://code.shutterstock.com/rickshaw/">rickshaw</a> library extends D3 for time-series representation. The <a href="http://nvd3.org/examples/index.html">NVD3 project</a> offers a collection of reusable charts based on D3.js.</p>
<li><a href="http://www.humblesoftware.com/envision">Envision</a> are libraries for representing time-series. </li>
</ul>
</li>
<li><p><strong>Maps:</strong></p>
<ul>
<li><a href="http://kartograph.org/">Kartograph</a> is a lightweight framework for building interactive maps applications. The tool has two components: a Python library for creating maps, and and a JavaScript library to create interactive maps on the Web. </li>
<li><a href="http://polymaps.org/">Polymaps</a> is library for making dynamic and interactive maps. <a href="http://polymaps.org/ex/">Examples</a>. </li>
<li><a href="http://leafletjs.com/">Leaflet</a> is mobile-friendly library for interactive maps.</li>
</ul>
</li>
<li><p><strong>Data applications:</strong></p>
<ul>
<li><a href="http://okfnlabs.org/recline/">Recline</a> is library for building data applications in pure Javascript and HTML.</li>
<h2 id="2.4---Resources-for-more-information">2.4 - Resources for more information<a class="anchor-link" href="#2.4---Resources-for-more-information">¶</a></h2><ul>
<li><p><a href="http://www.univ-paris-diderot.fr/DocumentsFCK/recherche/Realiser_un_DMP_V1.pdf">How to prepare a DMP Paris</a></p>
</li>
<li><p><a href="https://docs.google.com/document/d/1WNYDmqEfv8OdiHQvdC63_yocUx7rgqMTOYUoqOt4R8U/edit?pli=1">Art and humanities</a></p>