R3127/faf1a2a8a353multiuser
multiuser vs master
Commit | Author | Details | Committed | ||||
---|---|---|---|---|---|---|---|
2d6abddd88a7 | anciaux | fix a little error preventing completion | Nov 25 2019 | ||||
cfd783950a35 | anciaux | update the examples and coater | Nov 25 2019 |
/
README.md
<center> <span style="font-size: 4em"> BlackDynamite </span> <img width="50%" src=doc/Black-Dynamite.png/> </center>
Installation
Installation of client side
The easiest is through pip, that needs first to be installed:
bash sudo apt-get python-pip
Then for a system wide installation (recommended):
bash pip install --user git+https://anciaux@c4science.ch/diffusion/3127/blackdynamite.git
Then for a user scope installation:
bash pip install --user git+https://anciaux@c4science.ch/diffusion/3127/blackdynamite.git
Installing completion
To benefit the autocompletion for BlackDynamite the following steps are needed. You first need to install the argcomplete modules. Either by typing (Depending of your Ubuntu/Debian version) :
bash sudo apt-get install python-argcomplete
or:
bash sudo apt-get install python-pip sudo pip install argcomplete
Then you must insert the following in your .bashrc
bash eval "$(register-python-argcomplete getRunInfo.py)" eval "$(register-python-argcomplete launchRuns.py)" eval "$(register-python-argcomplete canYouDigIt.py)" eval "$(register-python-argcomplete cleanRuns.py)" eval "$(register-python-argcomplete updateRuns.py)" eval "$(register-python-argcomplete enterRun.py)" eval "$(register-python-argcomplete enterRun.py)" eval "$(register-python-argcomplete saveBDStudy.py)" export PATH=$PATH:~/.../blackdynamite/bin/ export PYTHONPATH=$PYTHONPATH:~/.../blackdynamite/python
Register hosts to BlackDynamite
In the .blackdynamite folder (in your home) you should add the servers where your databases are, with the option and information of your choice.
For each database you can add a file .bd of the name of the server (or an alias and specify the host inside: host = yourHost.domain.countryID).
It is also recommended to specify the password of the database to avoid typing it when using auto-completion.
Here is an example of a valid blackdynamite config file:
bash cat ~/.blackdynamite/lsmssrv1.epfl.ch.bd host = lsmssrv1.epfl.ch password = XXXXXXXXX
Installation of the server side: setting up the PostGreSQL database (for admins)
If you want to setup a PostGreSQL server to store BlackDynamite data, then you have to follow this procedure.
Install the PSQL server:
bash sudo apt-get install postgresql-9.4
You know need privileges to create databases and users. This can be done using the following:
bash bash:> sudo su postgres bash:> psql postgres=#
Then you should create a user:
psql postgres=# create user mylogin;
configure plpgsql language to the database:
psql postgres=# CREATE PROCEDURAL LANGUAGE plpgsql;
And a database associated with that user:
psql postgres=# create database mylogin; postgres=# alter database mylogin owner to mylogin; postgres=# grant create on database mylogin TO mylogin; postgres=# alter role mylogin with password 'my_pass';
Introduction and philosophy
Blackdynamite is merely a tool to help achieving a few things:
- Launching a program repeatedly with varying parameters, to explore the chosen parametric space.
- Collect and sort results of Small sizes benefiting from the power of modern databases.
- Analyze the results by making requests to the associated databases.
Launching is made simple by allowing any executable to be launched. The set of directories will be generated and managed by BlackDynamite to prevent errors. Requests of any kind will then be made to the underlying database through friendly commands of BlackDynamite.
Collecting the results will be possible thanks to the Blackdynamite C/C++ and python API which will let you send results directly to the database and thus automatically sort them. This is extremely useful. However heavy data such as Paraview files or any other kind of data should not be pushed to the database for obvious performance issues.
Analysis of the results can be made easy thanks to Blackdynamite which can retrieve data information in the form of Numpy array to be used, analyzed or plotted thanks to the powerful and vast Python libraries such as Matplotlib and Scipy.
Setting up a parametric study
Chose the parameters of the study
The first thing to do is to setup the table in the database associated with the study we want to perform. For this to be done you need, first of all, to list all the parameters that decide a specific case/computation. This parameters can be of simple types like string, integers, floats, etc. At current time no vectorial quantity can be considered as an input parameter. Once this list is done you need to create a script, usually named 'createDB.py' that will do this task. Let us examine such an example script.
Setting up blackdynamite python modules
First we need to set the python headers and to import the BlackDynamite modules by
python #!/usr/bin/env python import BlackDynamite as BD
Then you have to create a generic black dynamite parser and parse the system (including the connection parameters and credentials)
python parser = BD.BDParser() params = parser.parseBDParameters()
This mechanism allows to easily inherit from the parser mechanism of BlackDynamite, including the completion (if activated: see installation instructions). Then you can connect to the black dynamite database
python base = BD.base.Base(**params)
Setting up of the parametric space: the jobs pattern
Then you have to define the parametric space (at present time, the parametric space cannot be changed once the study started: be careful with your choices). Any particular job is defined as a point in the parametric space. For instance, to create a job description and add the parameters with int, float or list parameters, you can use the following python sequence.
python myjob_desc = BD.job.Job(base) myjob_desc.types["param1"] = int myjob_desc.types["param2"] = float myjob_desc.types["param3"] = str
Important remark: Do not name your parameters like PostGreSQL keywords.
Setting up of the run space
Aside of the jobs, a run will represent a particular realisation (computation) of a job. To get clearer, the run will contain information of the machine it was run on, the executable version, or the number of processors employed. For instance creating the run pattern can be done with:
python myruns_desc = run.Run(base) myruns_desc.types["compiler"] = str
There are default entries to the description of runs. These are:
- machine\_name: the name of the machine where the run must be executed
- job\_id (integer): the ID of the running job
- has\_started (bool): flag to know whether the job has already started
- has\_finished (bool): flag to know whether the job has already finished
- run\_name (string): the name of the run
- wait\_id (int): The id of a run to wait before starting
- start\_time (TIMESTAMP): The start time for the run
Commit the changes to the database
Then you have to request for the creation of the database
python base.createBase(myjob_desc,myruns_desc,**params)
You have to launch the script. As mentioned, all BlackDynamite scripts inherit from the parsing system. So that when needing to launch one of these codes, you can always claim for the valid keywords:
bash ./createDB.py --help usage: createDB.py [-h] [--job_constraints JOB_CONSTRAINTS] [--study STUDY] [--port PORT] [--host HOST] [--user USER] [--truerun] [--run_constraints RUN_CONSTRAINTS] [--yes] [--password] [--list_parameters] [--BDconf BDCONF] [--binary_operator BINARY_OPERATOR] BlackDynamite option parser optional arguments: -h, --help show this help message and exit General: --yes Answer all questions to yes. (default: False) --binary_operator BINARY_OPERATOR Set the default binary operator to make requests to database (default: and) BDParser: --job_constraints JOB_CONSTRAINTS This allows to constraint run selections by job properties (default: None) --study STUDY Specify the study from the BlackDynamite database. This refers to the schemas in PostgreSQL language (default: None) --port PORT Specify data base server port (default: None) --host HOST Specify data base server address (default: None) --user USER Specify user name to connect to data base server (default: None) --truerun Set this flag if you want to truly perform the action on base. If not set all action are mainly dryrun (default: False) --run_constraints RUN_CONSTRAINTS This allows to constraint run selections by run properties (default: None) --password Flag to request prompt for typing password (default: False) --list_parameters Request to list the possible job/run parameters (default: False) --BDconf BDCONF Path to a BlackDynamite file (*.bd) configuring current optons (default: None)
An important point is that most of the actions are only applied when the 'truerun' flag is set. Also, you always have to mention the host and the study you are working on (all scripts can apply to several studies). To launch the script and create the database you should launch:
bash ./createDB.py --host lsmssrv1.epfl.ch --study MysuperCoolStudy --truerun
Creating the jobs
The goal of the parametric study is to explore a subpart of the parametric space. We need to create jobs that are the points to explore. This script is usually named 'createJobs.py'.
We need to write a python script to generate this set of jobs. We start by setting the modules and the parser as for the 'createDB.py' script. Then we need to create job object:
bash job = job.Job(base)
It is up to us to decide the values to explore. for convenience, it is possible to insert ranges of values:
bash job["param1"] = 10 job["param2"] = [3.14,1.,2.] job["param3"] = 'toto'
This will create 3 jobs since we provided a range of values for the second parameter. The actual creation is made by calling:
python base.createParameterSpace(job)
Launching the script is made with:
python ./createJobs.py --host lsmssrv1.epfl.ch --study test --truerun
Creating the runs and launching them
At this point the jobs are in the database. You need to create runs that will precise the conditions of the realization of the jobs. For example the machine onto which the job will run, path dependent information, executable information and others. We have to write the last script, usually named 'createRuns.py' to specify run creations.
Again we start with the modules. However this time, we can use another parser class more adapted to the manipulation of runs:
python parser = BD.RunParser() params = parser.parseBDParameters() base = BD.Base(**params)
The default parameters for runs will then be automatically included in the parameters.
python myrun = run.Run(base)
Some of the standard parameters might have been parsed directly by the RunParser, so that we have to forward them to the Run object:
python myrun.setEntries(params)
A run now specify what action to perform to realize the job. Usually, an end-user has a script(s) and wish to attach it to the run. To attach a file you can for instance do:
python myrun.addConfigFiles(['file1','file2','launch.sh'])
Then, one has to specify which of these files is the entry point:
python myrun.setExecFile("launch.sh")
Finally, we have to create Run objects and attach them to jobs. The very first task is to claim the jobs from the database.
To that end the object JobSelector shall be your friend:
python jobSelector = BD.JobSelector(base) job_list = jobSelector.selectJobs()
This will return a job list that you can loop through and attach the runs to:
python for j in job_list: myrun['compiler'] = 'gcc' myrun.attachToJob(j)
Everything should then be committed to the database:
python if params["truerun"] is True: base.commit()
To create the run one should eventually launch the script by typing:
bash ./createRuns.py --host lsmssrv1.epfl.ch --study test --machine_name lsmspc41 --run_name toto --nproc int --truerun
The runs are eventually launched using the tool 'launchRuns.py'.
bash ./launchRuns.py --host lsmssrv1.epfl.ch --study test --outpath /home/user/ --truerun (--nruns int)
Accessing and manipulating the database
The runs can actually be controlled in the database with the tool 'getRunInfo.py', and one can go to the run folder with 'enterRun.py'. The runs are then launched using the tool 'launchRuns.py'.
bash ./getRunInfo.py --host lsmssrv1.epfl.ch --study test ./enterRun.py --host lsmssrv1.epfl.ch --study test --run_id ID
The status of the run can be manually modified using the command 'cleanRuns.py', the default status is CREATED (it can be turned to delete)
bash ./cleanRuns.py --host lsmssrv1.epfl.ch --study test (--runid ID) --truerun (--delete)
The status and the other run parameters (e.g. the compiler in the example file) can also be modified with 'updateRuns.py'. This can be done in the executed scrip to automatically set the selected parameter
bash updateRuns.py --host lsmssrv1.epfl.ch --study test --updates 'state = toto' --truerun
The function 'canYouDigIt.py' is an example of how to collect data in the runs to draw graphs. Example to plot the crack length in function of the time for different sigma\_c (the study parameter):
bash canYouDigIt.py --host lsmssrv1.epfl.ch --study test --quantity time, crack\_length --using \percent 0.x:\percent 1.y --xlabel 'time' --ylabel \percent'crack\_length' --legend 'sigma\_c = \percent j.sigma\_c'
Eventually, the database can be saved in .zip format to be exported and used offline with 'saveBDStudy.py'.
Instrumenting a *C++* simulation code
Within you program you need a pusher correctly initialized in order to push data to the database. The 'test\_blackdynamite.cc' is an example of such pusher.
First *blackdynamite* includes are required:
cpp #include "blackdynamite.hh"
Then you need to create a Pusher object and initialize it.
cpp BlackDynamite::RunManager bd; bd.startRun();
The constructor by default reads environment variables to get the database connection and schema informations:
- RUN\_ID: the identifier of the runid
- SCHEMA: the schema where the parametric study is to be stored
- HOST: the database hostname
Then in the places where values are created you push the values to the database
cpp bd.push(val1,"quantity1",step); bd.push(val2,"quantity2",step);
Step is a stage identifier. It can be the step index within an explicit loop, or a within a convergence descent or whatever you whish. It will serve later to compare quantity entries.
Finally, when the job ended the following call inform the database that the run is finished:
cpp bd.endRun();
Instrumenting a *Python* simulation code
Within your program you need a run object to push data to the database. This is done by selecting the run from the \code{run\_id} (usually passed as parameter).
python params = parser.parseBDParameters() ## params['run_id'] should exist mybase = BD.Base(**params) runSelector = BD.RunSelector(mybase) myrun = runSelector(params)
In order to have time entries for run times, the \code{start()} and \code{finish()} of the run need to be called.
python myrun.start() # ... # Important stuff # ... myrun.finish()
Pushing data is can be done with \code{pushVectorQuantity()} and \code{pushScalarQuantity()}.
python myrun.pushVectorQuantity(vector_quantity, step, "quantity_id", is_integer) myrun.pushScalarQuantity(scalar_quantity, step, "quantity_id", is_integer)
Fecthing the results
Under construction...
Useful Postgresql commands
How to list the available schemas ?
psql > \dn
How to get into the schema or the study ?
psql > set search path to schema_name; > set search path to study_name;
How to list all the tables ?
psql > \d
How to list entries from a table (like the jobs) ?
psql > SELECT * from table_name ;