diff --git a/lectures/week1.html b/lectures/week1.html index 5e8ceea..ea31658 100644 --- a/lectures/week1.html +++ b/lectures/week1.html @@ -1,1226 +1,1226 @@ talk slides

Scientific Programming
for Engineers (SP4E)

Welcome

  • About the class
    • The goal is to provide some useful knowledge (tool box) to the job of a PhD student who wants to perform a numerical work
    • Uncover the details of computers and numerical simulations: bring power (analogy blue/red pill of Matrix movie)
    • The class will be a bit particular (uncovering knowledge), thus it is personal and needs participation.
    • Students have to build their own understanding/experience on the various tasks required.
    • Evaluation will be made by evaluating homeworks.

Difficulties in making scientific software

  • Real numbers (and all rounding errors)
  • Units
    • Conversion of units to be able to plug modules together
    • Non SI unit system needs explicit conversion between quantities
    • Mars Orbiter disaster
  • Rapid changes

    • Need of quick and easy implementation/insertion of new features
    • Need non regression tests
  • Large problems

    • Need to manipulate large number of DOFs to increase accuracy
    • Need for collaborative tools to make cutting edge applications

"Computer scientists are/should be LAZY (but smartly)"


Cycles for engineering production of scientific software

  1. Scientific question
  2. Analysis and formulation of the problem
  3. Conception: algorithmic decision
  4. Implementation
  5. Deployment of the code
  6. Analysis/post-treatment of the results

Scientific question

Analysis and formulation of the problem

  • Math formulation: PDE, optimization, statistical, ...
  • Define the input and output of the model
  • Separate the post-treatment
  • If there are some analytical solutions you need to be aware

Conception: algorithmic decision

  • Make a chart diagram of the algorithm: separate the subparts
  • Choose/identify how to solve the subparts
  • Choose the appropriate data structures
  • For new algorithms, evaluate the complexity, make an effort of generalization
  • If global complexity unaffordable with the desired grain (number of DOFs) consider changing the model (back to 2:formulation )

Implementation

  • Identify existing software for the entire project and for the subparts
  • Decide a programming language
  • Decide a coding convention (question of style)
  • Decide where the code is hosted (for backups and revisions)
  • Decide a source documentation format
  • Program the thing
  • Setup of tests for each necessary sub-part (during programming)
  • Debug and check for memory leaks
  • The programming feasibility could here lead to change algorithms or model (back to 2:formulation or 3:conception)

Deployment of the code

  • Identify the target machine
  • Fix compilation issues (multi-system, portability, architecture CPU vs. GPU)
  • Parametric study management (scripting the launch)
  • Computing shared/cluster of resources: queue system management
  • Compilation issue, version issues, could lead to implementation change (back to 4:implementation)
  • Specific hardware could invalidate the algorithmic decision (back to 3:conception)

Analysis/post-treatment of the results

  • Sorting/treating the output
  • Compute digested measures for the brute output (this should be faster than the computation itself otherwise back to 2:formulation)
  • Produce graphs (for papers and presentations)
  • 3D vizualization
  • Validate numerical results, stability, coherence, comparison to experiment
  • If the validation fail get back to 5,4,3 or 2 is possible

The method for this class

  • Computer place: Your laptop
    • Linux as main OS, Dualboot, or on a VirtualBox(might get slow)
    • Python framework
    • C++ framework
  • Open source philosophy $\Rightarrow$ access to source
    • Possibility to check the content of the code
    • Possibility to modify the code for your own purpose
    • Power of the community
    • Business model based on knowledge rather than license

The method for this class

  • Why C++ and in python
    • Object oriented is a way to organize the programs for re-usability
    • C++ allows objects and efficiency
    • Python allows interactivity and quick testing
  • On the repositories
    • Program = bug
    • Since programs can be complex we need a tool to tag source revisions
    • Collaborative tool to software development
    • cvs, svn, git. There is a git at EPFL that we will use.

The method for this class

  • On post-treatment programs
    • Python is perfect for manipulating files, for parsing outputs, for making graphs
    • Many program exists for visualization. An opensource one: Paraview

Previous student evaluations

(-)

  • The pace is too fast. [...] However, I think content covered is important and should not be reduced
  • [...] considering the programming level of the students attending the course, I think that they get overloaded with too much information

Previous student evaluations

(-)

  • Specific distributions, environments and package versions to work with are rarely specified. Hence, students must spend a significant amount of time figuring out version issues, package incompatibilities...

We have prepared the class material for LINUX

(+)

  • This was honestly one of the best courses I have ever had. The Professor and the assistants are all very knowledgeable about the course material and make a lot of effort to answer all our questions both in class and on the forum and at any time

Class plan description

  • Thursday 19-th Sep: Introduction, GIT, python hello world
  • Thursday 26-th Sep: C++ hello world, floating point numbers, pi computation
  • Thursday 03-th Oct: C++ STL, Coding convention (C++11)
  • Thursday 10-th Oct: Paraview, Numpy&Scipy, Matplotlib, Exercise: conjugate gradient (Homework)
  • Thursday 17-th Oct: Object oriented with python
  • Thursday 24-th Oct: Object oriented with C++ (Homework)
  • Thursday 31-th Oct: Interactive session to design Particle's code, Design patterns, Implementation of simple kepler orbit
  • Thursday 07-th Nov: Continuing adimentional kepler's orbit

Class plan description

  • Thursday 14-th Nov: Google test. Exercise Produce a test suite for particle's code
  • Thursday 21-th Nov: Code optimization and Templates + pingpong ball
  • Thursday 28-th Nov: Using external libraries. Exercise with FFTW (porting to C++) (Homework)
    -
  • Thursday 19-th Dec: Template library: Eigen. Exercise: mass-spring equilibrium
  • -
  • Thursday 26-th Dec: PyBind+cppimport. Exercise porting the particle's code to python (Homework)
  • +
  • Thursday 05-th Dec: Template library: Eigen. Exercise: mass-spring equilibrium
  • +
  • Thursday 12-th Dec: PyBind+cppimport. Exercise porting the particle's code to python (Homework)

The GIT

What would you demand to a tool that will hold your program sources ?

  • Manage history (evolution in time)
  • Rewind time
  • Transport/Backup through network
  • Team/Concurrent working

This is the standard of most Version control systems such as GIT

Revision manager:
The GIT & c4science

  • Git is a free distributed version control system (DVCS), used for source code management (SCM)
  • Git operates on a decentralized architecture, so every git working directory has the complete history
  • Git was initiallydesigned and created by Linus Torvalds for Linux kernel development
  • EPFL has a GIT repository service (http://c4science.ch/)

wikipedia: revision control system
wikipedia: git

GIT - Clone repositories

git clone ssh://git@c4science.ch/source/sp4e.git mydir
  • The working copy is the state (can be modified) of a selected branch (definition comes later)
  • To know the status of the working copy:
    git status
  • See the log
    git log

GIT - Commit your modifications

git commit -m "interesting modification" file.cc

GIT - Branches

  • All modern VCSs have a mechanism for branches
  • Branching means you diverge from the main line of development and continue without perturbing the code
  • Branches can evolve independently
  • The main branch in GIT is usually called master
  • GIT doc on branches
  • See/create branches:
    git branch
  • Change the working copy to another branch
    git checkout stable-branch

GIT - Push your modifications

git push origin master
  • This operation sends the current branch and merge it into the remote branch

GIT - Pull modifications

git pull origin master
  • This operation actually fetch the remote branch and merge into current branch

GIT - remotes

  • You can pull/push to more than a single remote
  • list the declared remotes:
    git remote -v
  • add/remove remotes
    git remote add/remove

GIT - commands

git log
 
git status
 
git checkout
 
git add file.cc
 
git rm file.cc
 
git mv file.cc
 
git commit -m "nice message" file.cc
 
git push remote_name branch_name
 git push origin master
 
git diff
 git diff revision_hash
 
git help whatever_command
 

GIT - creation/import

  • Create a repository: git init
  • Adding new files: git add file1 file2 file3
  • Commit the status: git commit
  • Add the remote: git remote add origin URI

.gitignore

  • File to place file patterns to be ignored
  • example: *~ *.o *.pyc

Implicit rule: do not commit something
that is not compiling
to a shared/public repository

c4science.ch

What is c4science ?

C4 Science is a co-creation platform, curation and code sharing. This platform includes:

  • Version management system
  • Common authentication to all Swiss universities to local + external collaborators
  • Social dimension (wikis, bug tracking, ...)
  • Code test system (continuous integration)
  • Swiss alternative to github

c4science.ch

Connect to c4science

The recommended way to connect to the c4science server (and actually any distant linux machine) is through the SSH protocol:

  • You need a pair of keys: one public and one private
  • They are stored in the directory .ssh in your home directory
  • The public can be distributed, the private should stay secret
  • A good habit is to generate one key-pair per client and never transport the private key

Interactive session on manipulating git

  • installing git
  • creating a repository on c4science
  • cloning it to private laptop
  • fix permissions
  • manage conflicts

Python language
concepts and syntax

Please install Python/Anaconda on you laptop
  • comments
#my super comment
 
  • command print
print(10)
 
  • type
a = 10
 b = 10.
 type(a)
 type(b)
 
  • Converting string to integer or float
a = "1.2"
 print(a)
 a_int = int(a)
 print(a_int)
 a_float = float(a)
 print(a_float)
 
  • list
    a = list()
     a.append(2)
     print(a)
     a = [2,4,5]
     a.append([2,4,5])
     print(a)
     a += [2,4,5]
     b = [1,2]*4
     print(a[1])
     print(a[2])
     
  • maps/dictionaries
    planet = dict()
     planet["name"] = "mars"  
     planet["radius"] = 1.2
     planet["mass"] = 0.4
     print(planet)
     
  • blocks in python identified by ':' and with the indentation
    block_start:
      #start of the block
      instruction1
      instruction2
      instruction3
      #end of the block
     
  • if conditions
    a = 1
     if not a == 0:
          print("a != 0")
     else:
      print("a == 0")
     
  • for loops (beware the indentation)
for i in range(0,10):
     print(i)
 
for i in mylist:
     print(i)
 
  • defining function
    def foo(arg1, arg2, arg3):
      ... 
      some_code
      ...
     
  • opening files for writing
f = open("my_super_filename.csv", 'w')
 f.write("#X Y Z VX VY VZ FX FY FZ mass radius name\n")
 f.close()
 
  • opening a file and read line by line
f = open("my_super_filename.csv", 'r')
 for line in f:
     entries = line.split()
     print(entries)
 
  • using other modules
# request to use another module
 import sys
 
  • splitting your program in several files
#my_module.py
 
 def foo(arg):
     return arg+1
 

I can call this function from another file with

import my_module
 
 res = my_module.foo(10)
 print(res)
 
  • The file must be accessible
  • To control what is accessible, the environment variable PYTHONPATH is checked
  • launching a python script from the terminal
python3 exe.py arg1 arg2 arg3
  • how to get the arguments passed to the program ?
import sys
 
 # the arguments are stored in the list sys.argv
 # for instance you can print it with
 
 print(sys.argv)
 

Interactive session on the Hello world

  • Use GIT to access the first exercise
  • How to split a source file into modules
  • Manipulating program arguments (argv)
  • Series calculation
  • Debugger PDB