# BlackDynamite
# Installation ## Dependencies ```bash sudo apt-get install python-psycopg2 sudo apt-get install python-numpy sudo apt-get install python-argcomplete ``` ## Installation of client side The easiest is through pip, that needs first to be installed: ```bash sudo apt-get install python-pip ``` Then for a system wide installation (recommended): ```bash sudo pip install https://gitlab.com/ganciaux/blackdynamite.git ``` Then for a user scope installation: ```bash pip install --user https://gitlab.com/ganciaux/blackdynamite.git ``` ## Getting the sources You can clone the GIT repository: ```bash git clone https://gitlab.com/ganciaux/blackdynamite.git ``` ## Installing completion To benefit the autocompletion for **BlackDynamite** the following steps are needed. You first need to install the argcomplete modules. Either by typing (Depending of your Ubuntu/Debian version) : ```bash sudo apt-get install python-argcomplete ``` or: ```bash sudo apt-get install python-pip sudo pip install argcomplete ``` Then you must insert the following in your .bashrc ```bash eval "$(register-python-argcomplete getRunInfo.py)" eval "$(register-python-argcomplete launchRuns.py)" eval "$(register-python-argcomplete canYouDigIt.py)" eval "$(register-python-argcomplete cleanRuns.py)" eval "$(register-python-argcomplete updateRuns.py)" eval "$(register-python-argcomplete enterRun.py)" eval "$(register-python-argcomplete enterRun.py)" eval "$(register-python-argcomplete saveBDStudy.py)" ``` ## Register hosts to BlackDynamite In the .blackdynamite folder (in your home) you should add the servers where your databases are, with the option and information of your choice. For each database you can add a file .bd of the name of the server (or an alias and specify the host inside: ```bash host = yourHost.domain.countryID ``` It is also recommended to specify the password of the database to avoid typing it when using auto-completion. Here is an example of a valid blackdynamite config file: ```bash cat ~/.blackdynamite/lsmssrv1.epfl.ch.bd ``` ```bash host = lsmssrv1.epfl.ch password = XXXXXXXXX ``` # Introduction and philosophy **Blackdynamite** is merely a tool to help achieving a few things: 1) Launching a program repeatedly with varying parameters, to explore the chosen parametric space. 2) Collect and sort results of **Small sizes** benefiting from the power of modern databases. 3) Analyze the results by making requests to the associated databases. **Launching** is made simple by allowing any executable to be launched. The set of directories will be generated and managed by BlackDynamite to prevent errors. Requests of any kind will then be made to the underlying database through friendly commands of BlackDynamite. **Collecting** the results will be possible thanks to the Blackdynamite C/C++ and python API which will let you send results directly to the database and thus automatically sort them. This is extremely useful. However heavy data such as Paraview files or any other kind of data should not be pushed to the database for obvious performance issues. **Analysis** of the results can be made easy thanks to Blackdynamite which can retrieve data information in the form of Numpy array to be used, analyzed or plotted thanks to the powerful and vast Python libraries such as Matplotlib and Scipy. The construction of a **BlackDynamite** parametric study follows these steps: - Describing the parametric space - Creating jobs (specific points in the parametric space) - Creating runs (instances of the jobs) - Launching runs - Intrumenting the simulation to send results - Analyzing the results # Setting up a parametric study ## Chose the parameters of the study The first thing to do is to setup the table in the database associated with the study we want to perform. For this to be done you need, first of all, to list all the parameters that decide a specific case/computation. This parameters can be of simple types like string, integers, floats, etc. At current time no vectorial quantity can be considered as an input parameter. Once this list is done you need to create a script, usually named 'createDB.py' that will do this task. Let us examine such an example script. ### Setting up blackdynamite python modules First we need to set the python headers and to import the **BlackDynamite** modules by ```python #!/usr/bin/env python import BlackDynamite as BD ``` Then you have to create a generic black dynamite parser and parse the system (including the connection parameters and credentials) ```python parser = BD.BDParser() params = parser.parseBDParameters() ``` This mechanism allows to easily inherit from the parser mechanism of BlackDynamite, including the completion (if activated: see installation instructions). Then you can connect to the black dynamite database ```python base = BD.base.Base(**params) ``` ## Setting up of the parametric space: the jobs pattern Then you have to define the parametric space (at present time, the parametric space cannot be changed once the study started: be careful with your choices). Any particular job is defined as a point in the parametric space. For instance, to create a job description and add the parameters with int, float or list parameters, you can use the following python sequence. ```python myjob_desc = BD.job.Job(base) myjob_desc.types["param1"] = int myjob_desc.types["param2"] = float myjob_desc.types["param3"] = str ``` **Important remark: Do not name your parameters like PostGreSQL keywords.** ## Setting up of the run space Aside of the jobs, a run will represent a particular realisation (computation) of a job. To get clearer, the run will contain information of the machine it was run on, the executable version, or the number of processors employed. For instance creating the run pattern can be done with: ```python myruns_desc = run.Run(base) myruns_desc.types["compiler"] = str ``` There are default entries to the description of runs. These are: - machine_name: the name of the machine where the run must be executed - job_id (integer): the ID of the running job - has_started (bool): flag to know whether the job has already started - has_finished (bool): flag to know whether the job has already finished - run_name (string): the name of the run - wait_id (int): The id of a run to wait before starting - start_time (TIMESTAMP): The start time for the run ## Commit the changes to the database Then you have to request for the creation of the database ```python base.createBase(myjob_desc,myruns_desc,**params) ``` You have to launch the script. As mentioned, all BlackDynamite scripts inherit from the parsing system. So that when needing to launch one of these codes, you can always claim for the valid keywords: ```bash ./createDB.py --help usage: createDB.py [-h] [--job_constraints JOB_CONSTRAINTS] [--study STUDY] [--port PORT] [--host HOST] [--user USER] [--truerun] [--run_constraints RUN_CONSTRAINTS] [--yes] [--password] [--list_parameters] [--BDconf BDCONF] [--binary_operator BINARY_OPERATOR] BlackDynamite option parser optional arguments: -h, --help show this help message and exit General: --yes Answer all questions to yes. (default: False) --binary_operator BINARY_OPERATOR Set the default binary operator to make requests to database (default: and) BDParser: --job_constraints JOB_CONSTRAINTS This allows to constraint run selections by job properties (default: None) --study STUDY Specify the study from the BlackDynamite database. This refers to the schemas in PostgreSQL language (default: None) --port PORT Specify data base server port (default: None) --host HOST Specify data base server address (default: None) --user USER Specify user name to connect to data base server (default: None) --truerun Set this flag if you want to truly perform the action on base. If not set all action are mainly dryrun (default: False) --run_constraints RUN_CONSTRAINTS This allows to constraint run selections by run properties (default: None) --password Flag to request prompt for typing password (default: False) --list_parameters Request to list the possible job/run parameters (default: False) --BDconf BDCONF Path to a BlackDynamite file (*.bd) configuring current optons (default: None) ``` An important point is that most of the actions are only applied when the 'truerun' flag is set. Also, you always have to mention the host and the study you are working on (all scripts can apply to several studies). To launch the script and create the database you should launch: ```bash ./createDB.py --host lsmssrv1.epfl.ch --study MysuperCoolStudy --truerun ``` ## Creating the jobs The goal of the parametric study is to explore a subpart of the parametric space. We need to create jobs that are the points to explore. This script is usually named 'createJobs.py'. We need to write a python script to generate this set of jobs. We start by setting the modules and the parser as for the 'createDB.py' script. Then we need to create job object: ```bash job = job.Job(base) ``` It is up to us to decide the values to explore. for convenience, it is possible to insert ranges of values: ```bash job["param1"] = 10 job["param2"] = [3.14,1.,2.] job["param3"] = 'toto' ``` This will create 3 jobs since we provided a range of values for the second parameter. The actual creation is made by calling: ```python base.createParameterSpace(job) ``` Launching the script is made with: ```python ./createJobs.py --host lsmssrv1.epfl.ch --study test --truerun ``` ## Creating the runs and launching them At this point the jobs are in the database. You need to create runs that will precise the conditions of the realization of the jobs. For example the machine onto which the job will run, path dependent information, executable information and others. We have to write the last script, usually named 'createRuns.py' to specify run creations. Again we start with the modules. However this time, we can use another parser class more adapted to the manipulation of runs: ```python parser = BD.RunParser() params = parser.parseBDParameters() base = BD.Base(**params) ``` The default parameters for runs will then be automatically included in the parameters. ```python myrun = run.Run(base) ``` Some of the standard parameters might have been parsed directly by the RunParser, so that we have to forward them to the Run object: ```python myrun.setEntries(params) ``` A run now specify what action to perform to realize the job. Usually, an end-user has a script(s) and wish to attach it to the run. To attach a file you can for instance do: ```python myrun.addConfigFiles(['file1','file2','launch.sh']) ``` Then, one has to specify which of these files is the entry point: ```python myrun.setExecFile("launch.sh") ``` Finally, we have to create Run objects and attach them to jobs. The very first task is to claim the jobs from the database. To that end the object JobSelector shall be your friend: ```python jobSelector = BD.JobSelector(base) job_list = jobSelector.selectJobs() ``` This will return a job list that you can loop through and attach the runs to: ```python for j in job_list: myrun['compiler'] = 'gcc' myrun.attachToJob(j) ``` Everything should then be committed to the database: ```python if params["truerun"] is True: base.commit() ``` To create the run one should eventually launch the script by typing: ```bash ./createRuns.py --host lsmssrv1.epfl.ch --study test --machine_name lsmspc41 --run_name toto --nproc int --truerun ``` The runs are eventually launched using the tool 'launchRuns.py'. ```bash ./launchRuns.py --host lsmssrv1.epfl.ch --study test --outpath /home/user/ --truerun (--nruns int) ``` ## Accessing and manipulating the database The runs can actually be controlled in the database with the tool 'getRunInfo.py', and one can go to the run folder with 'enterRun.py'. The runs are then launched using the tool 'launchRuns.py'. ```bash ./getRunInfo.py --host lsmssrv1.epfl.ch --study test ./enterRun.py --host lsmssrv1.epfl.ch --study test --run_id ID ``` The status of the run can be manually modified using the command 'cleanRuns.py', the default status is CREATED (it can be turned to delete) ```bash ./cleanRuns.py --host lsmssrv1.epfl.ch --study test (--runid ID) --truerun (--delete) ``` The status and the other run parameters (e.g. the compiler in the example file) can also be modified with 'updateRuns.py'. This can be done in the executed scrip to automatically set the selected parameter ```bash updateRuns.py --host lsmssrv1.epfl.ch --study test --updates 'state = toto' --truerun ``` The function 'canYouDigIt.py' is an example of how to collect data in the runs to draw graphs. Example to plot the crack length in function of the time for different sigma_c (the study parameter): ```bash canYouDigIt.py --host lsmssrv1.epfl.ch --study test --quantity time, crack_length --using %0.x:%1.y --xlabel 'time' --ylabel 'crack_length' --legend 'sigma_c = %j.sigma_c' ``` Eventually, the database can be saved in .zip format to be exported and used offline with 'saveBDStudy.py'. # Instrumenting a *C++* simulation code Within you program you need a pusher correctly initialized in order to push data to the database. The 'test_blackdynamite.cc' is an example of such pusher. First *blackdynamite* includes are required: ```cpp #include "blackdynamite.hh" ``` Then you need to create a Pusher object and initialize it. ```cpp BlackDynamite::RunManager bd; bd.startRun(); ``` The constructor by default reads environment variables to get the database connection and schema informations: - RUN_ID: the identifier of the runid - SCHEMA: the schema where the parametric study is to be stored - HOST: the database hostname Then in the places where values are created you push the values to the database ```cpp bd.push(val1,"quantity1",step); bd.push(val2,"quantity2",step); ``` Step is a stage identifier. It can be the step index within an explicit loop, or a within a convergence descent or whatever you whish. It will serve later to compare quantity entries. Finally, when the job ended the following call inform the database that the run is finished: ```cpp bd.endRun(); ``` # Instrumenting a *Python* simulation code Within your program you need a run object to push data to the database. This is done by selecting the run from the 'run_id' (usually passed as parameter). ```python parser = BD.RunParser() params = parser.parseBDParameters() # params['run_id'] should exist mybase = BD.Base(**params) runSelector = BD.RunSelector(mybase) myrun = runSelector(params) ``` In order to have time entries for run times, the 'start()' and 'finish()' of the run need to be called. ```python myrun.start() # ... # Important stuff # ... myrun.finish() ``` Pushing data is can be done with 'pushVectorQuantity()' and 'pushScalarQuantity()'. ```python myrun.pushVectorQuantity(vector_quantity, step, "quantity_id", is_integer=False) myrun.pushScalarQuantity(scalar_quantity, step, "quantity_id", is_integer=False) ``` # Fecthing the results Under construction... ## Installation of the server side: setting up the PostGreSQL database (for admins) If you want to setup a PostGreSQL server to store BlackDynamite data, then you have to follow this procedure. Install the PSQL server: ```bash sudo apt-get install postgresql-9.4 ``` You know need privileges to create databases and users. This can be done using the following: You should add a database named blackdynamite (only the first time): ```bash psql --command "CREATE USER blackdynamite WITH PASSWORD '';" createdb -O blackdynamite blackdynamite ``` ## Adding a user You should create a user: ```bash psql --command "CREATE USER mylogin WITH PASSWORD 'XXXXX';" ``` And add permissions to create tables to the user ```bash psql --command "grant create on database blackdynamite to mylogin" ``` This can also be done with the commodity tool ```bash createUser.py --user admin_user --host hostname ``` # Useful Postgresql commands How to list the available schemas ? ```psql > \dn ``` How to get into the schema or the study ? ```psql > set search path to schema_name; > set search path to study_name; ``` How to list all the tables ? ```psql > \d ``` or ```psql > SELECT * FROM pg_catalog.pg_tables; ``` How to list entries from a table (like the jobs) ? ```psql > SELECT * from table_name ; ``` How to list all the databases ? ```psql > \l ``` How to list the available databases ? ```psql > select datname from pg_catalog.pg_database; ``` How to know the current database ? ```psql > select current_database(); ```