diff --git a/doc/manual.tex b/doc/manual.tex index e6c5d82..e3f3003 100644 --- a/doc/manual.tex +++ b/doc/manual.tex @@ -1,469 +1,477 @@ \documentclass[openright,a4paper,9pt,fleqn]{manual} \usepackage{manual} \usepackage{manual-macros} \setlength{\oddsidemargin}{-1cm} % Marge gauche sur pages impaires \setlength{\evensidemargin}{-1cm} % Marge gauche sur pages paires \setlength{\marginparwidth}{0cm} % Largeur de note dans la marge \setlength{\textwidth}{18cm} % Largeur de la zone de texte (17cm) \setlength{\marginparsep}{0pt} % Separation de la marge \setlength\parindent{0pt} \author{} \date{} \newcommand{\version}{0.1} \newcommand{\blackdynamite}{\textbf{BlackDynamite}\xspace} \title{\textbf{\Huge \blackdynamite}\\ \vspace{0.5cm} \textbf{\huge User's Guide}\\ \vspace{1cm} {\small \today{} --- Version \version} } \begin{document} \setcounter{page}{1} \renewcommand{\thepage}{\roman{page}} \pdfbookmark[0]{Titlepage}{maintitlepage} \label{maintitlepage} \maketitle \tableofcontents \ifodd\value{page} \insertblankpage \else \insertblankpage\insertblankpage \fi \setcounter{page}{1} \renewcommand\thepage{\arabic{page}} \chapter{Installing and compiling} \section{Prerequisites} Under Debian/Ubuntu, you should install several packages in order to use plainly BlackDynamite: \begin{command} sudo apt-get install python-argcomplete python-psycopg2 \end{command} \section{Setting up the PostGreSQL database} If you want to setup a PostGreSQL server to store BlackDynamite data, then you have to follow this procedure. Install the PSQL server: \begin{command} sudo apt-get install postgresql-9.4 \end{command} You know need privileges to create databases and users. This can be done using the following: \begin{command} bash:> sudo su postgres bash:> psql postgres=# \end{command} Then you should create a user: \begin{command} postgres=# create user mylogin; \end{command} configure plpgsql language to the database: \begin{command} postgres=# CREATE PROCEDURAL LANGUAGE plpgsql; \end{command} And a database associated with that user: \begin{command} postgres=# create database mylogin; postgres=# alter database mylogin owner to mylogin; postgres=# grant create on database mylogin TO mylogin; -postgres=# alter role anciaux with password to pass; +postgres=# alter role mylogin with password to pass; \end{command} \section{Installing completion} To benefit the autocompletion for BlackDynamite the following steps are needed. You first need to install the argcomplete modules. Either by typing (Depending of your Ubuntu/Debian version) : \begin{command} sudo apt-get install python-argcomplete \end{command} or: \begin{command} sudo apt-get install python-pip sudo pip install argcomplete \end{command} Then %activate-global-python-argcomplete you must insert the following in your .bashrc \begin{command} eval "$(register-python-argcomplete getRunInfo.py)" eval "$(register-python-argcomplete launchRuns.py)" eval "$(register-python-argcomplete canYouDigIt.py)" eval "$(register-python-argcomplete cleanRuns.py)" eval "$(register-python-argcomplete updateRuns.py)" +export PATH=$PATH:~/.../blackdynamite/bin/ +export PYTHONPATH=$PYTHONPATH:~/.../blackdynamite/python \end{command} + +\section{Register hosts to BlackDynamite} + +In the .blackdynamite folder (in your home) you should add the servers where your databases are, with the option and information of your choice. For each database you can add a file .db of the name of the server (or an alias and specify the host inside: host = yourHost.domain.countryID). It is also recommended to specify the password of the database to avoid typing it when using autocompletion. + + \chapter{Introduction and philosophy} Blackdynamite is merely a tool to help achieving a few things: \begin{enumerate} \item Launching a program repeatedly with varying parameters, to explore the chosen parametric space. \item Collect and sort results of \textbf{\color{red}Small sizes} benefiting from the power of modern databases. \item Analyse the results by making requests to the associated databases. \end{enumerate} \paragraph{Launching} is made simple by allowing any executable to be launched. The set of directories will be generated and managed by BlackDynamite to prevent errors. Requests of any kind will then be made to the underlying database through friendly commands of BlackDynamite. \paragraph{Collecting} the results will be possible thanks to the Blackdynamite C/C++ and python API which will let you send results directly to the database and thus automatically sort them. This is extremely useful. However heavy data such as Paraview files or any other kind of data should not be pushed to the database fr obvious performance issues. \paragraph{Analysis} of the results can be made easy thanks to Blackdynamite which can retrieve data information in the form of Numpy array to be used, analyzed or plotted thanks to the powerful and vast Python libraries such as Matplotlib and Scipy. \chapter{Setting up a parametric study} \section{Setting up blackdynamite python modules} The first thing to do is to setup the table in the database associated with the study we want to perform. For this to be done you need, first of all, to list all the parameters that decide a specific case/computation. This parameters can be of simple types like string, integers, floats, etc. At current time no vectorial quantity can be considered as an input parameter. Once this list is done you need to create a script, usually named \code{createDB.py} that will do this task. Let us examine such an example script. \\ First we need to set the python headers and to import the \blackdynamite modules by \begin{command} #!/usr/bin/env python import BlackDynamite as BD \end{command} Then you have to create a generic black dynamite parser and parse the system (including the connection parameters and credentials) \begin{command} parser = bdparser.BDParser() params = parser.parseBDParameters() \end{command} This mechanism allows to easily inherit from the parser mechanism of BlackDynamite, including the completion (if activated: see installation instructions). Then you can connect to the black dynamite database \begin{command} -base = base.Base(**params) +base = DB.base.Base(**params) \end{command} \section{Setting up of the parametric space: the jobs pattern} Then you have to define the parametric space (at present time, the parametric space cannot be changed once the study started: be careful with your choices). -Any particular job defines a point in the parametric space. +Any particular job is defined as a point in the parametric space. For instance, to create a job description and add the parameters with int, float or list parameters, you can use the following python sequence. \begin{command} myjob_desc = job.Job(base) myjob_desc.types["param1"] = int myjob_desc.types["param2"] = float myjob_desc.types["param3"] = str \end{command} \section{Setting up of the run space} Aside of the jobs, a run will represent a particular realisation (computation) of a job. To get clearer, the run will contain information of the machine it was run on, the executable version, or the number of processors employed. For instance creating the run pattern can be done with: \begin{command} myruns_desc = run.Run() myruns_desc.types["compiler"] = str \end{command} There are default entries to the description of runs. These are: \begin{itemize} \item machine\_name: the name of the machine where the run must be executed \item job\_id (integer): the ID of the running job \item has\_started (bool): flag to know whether the job has already started \item has\_finished (bool): flag to know whether the job has already finished \item run\_name (string): the name of the run \item wait\_id (int): The id of a run to wait before starting \item start\_time (TIMESTAMP): The start time for the run \end{itemize} \section{Commit the changes to the database} Then you have to request for the creation of the database \begin{command} base.createBase(myjob_desc,myruns_desc,**params) \end{command} You have to launch the script. As mentioned, all BlackDynamite scripts inherit from the parsing system. So that when needing to launch one of these codes, you can always claim for the valid keywords: \begin{command} ./createDB.py --help usage: createDB.py [-h] [--job_constraints JOB_CONSTRAINTS] [--study STUDY] [--port PORT] [--host HOST] [--user USER] [--truerun] [--run_constraints RUN_CONSTRAINTS] [--yes] [--password] [--list_parameters] [--BDconf BDCONF] [--binary_operator BINARY_OPERATOR] BlackDynamite option parser optional arguments: -h, --help show this help message and exit General: --yes Answer all questions to yes. (default: False) --binary_operator BINARY_OPERATOR Set the default binary operator to make requests to database (default: and) BDParser: --job_constraints JOB_CONSTRAINTS This allows to constraint run selections by job properties (default: None) --study STUDY Specify the study from the BlackDynamite database. This refers to the schemas in PostgreSQL language (default: None) --port PORT Specify data base server port (default: None) --host HOST Specify data base server address (default: None) --user USER Specify user name to connect to data base server (default: None) --truerun Set this flag if you want to truly perform the action on base. If not set all action are mainly dryrun (default: False) --run_constraints RUN_CONSTRAINTS This allows to constraint run selections by run properties (default: None) --password Flag to request prompt for typing password (default: False) --list_parameters Request to list the possible job/run parameters (default: False) --BDconf BDCONF Path to a BlackDynamite file (*.bd) configuring current optons (default: None) \end{command} An important point is that most of the actions are only applied when the 'truerun' flag is set. Also, you always have to mention the host and the study you are working on (all scripts can apply to several studies). To launch the script and create the database you should launch: \begin{command} ./createDB.py --host lsmssrv1.epfl.ch --study MysuperCoolStudy --truerun \end{command} \section{Creating the jobs} The goal of the parametric study is to explore a subpart of the parametric space. We need to create jobs that are the points to explore. This script is usually named 'createJobs.py'. We need to write a python script to generate this set of jobs. We start by setting the modules and the parser as for the 'createDB.py' script. Then we need to create job object: \begin{command} job = job.Job() \end{command} It is up to us to decide the values to explore. for convenience, it is possible to insert ranges of values: \begin{command} job["param1"] = 10 job["param2"] = [3.14,1.,2.] job["param3"] = 'toto' \end{command} This will create 3 jobs since we provided a range of values for the second parameter. The actual creation is made by calling: \begin{command} base.createParameterSpace(job) \end{command} Launching the script is made with: \begin{command} ./createJobs.py --host lsmssrv1.epfl.ch --study test --truerun \end{command} \section{Creating the runs and launching them} At this point the jobs are in the database. You need to create runs that will precise the conditions of the realisation of the jobs. For example the machine onto which the job will run, path dependent information, executable information and others. We have to write the last script, usually named 'createRuns.py' to specify run creations. Again we start with the modules. However this time, we can use another parser class more adapted to the manipulation of runs: \begin{command} parser = BD.RunParser() params = parser.parseBDParameters() base = BD.Base(**params) \end{command} The default parameters for runs will then be automatically included in the parameters. \begin{command} myrun = run.Run(base) \end{command} Some of the standard parameters might have been parsed directly by the RunParser, so that we have to forward them to the Run object: \begin{command} myrun.setEntries(params) \end{command} A run now specify what action to perform to realize the job. Usually, an end-user has a script(s) and wish to attach it to the run. To attach a file you can for instance do: \begin{command} myrun.addConfigFiles(['file1','file2','launhch.sh']) \end{command} Then, one has to specifiy which of these files is the entry point: \begin{command} myrun.setExecFile("launch.sh") \end{command} Finally, we have to create Run objects and attach them to jobs. The very first task is to claim the jobs from the database. To that end the object JobSelector shall be your friend: \begin{command} jobSelector = BD.JobSelector(base) job_list = jobSelector.selectJobs() \end{command} This will return a job list that you can loop through and attach the runs to: \begin{command} for j in job_list: myrun['compiler'] = 'gcc' myrun.attachToJob(j) \end{command} Everything should then be commited to the database: \begin{command} if ("truerun" in params): base.commit() \end{command} To create the run on should launch the script by typing: \begin{command} ./createRuns.py --host lsmssrv1.epfl.ch --study test --machine_name lsmspc41 --run_name toto --nproc 1 --truerun \end{command} You can control the runs actually in the database with the tool 'getRunInfo.py'. You can launch the runs by using the tool 'launchRuns.py'. \chapter{Instrumenting the a simulation code} Within you program you need a pusher correctly initialized in order to push data to the database.\\ First \blackdynamite includes are required: \begin{cpp} #include "pusher.hh" \end{cpp} Then you need to create a Pusher object and initialize it. \begin{cpp} BlackDynamite::Pusher bd; bd.init(); \end{cpp} The constructor by default reads environment variables to get the database connection and schema informations: \begin{itemize} \item RUN\_ID: the identifier of the runid \item SCHEMA: the schema where the parametric study is to be stored \item HOST: the database hostname \end{itemize} Then in the places where values are created you push the values to the database \begin{cpp} bd.push(val1,"quantity1",step); bd.push(val2,"quantity2",step); \end{cpp} Step is a stage identifier. It can be the step index within an explicit loop, or a within a convergence descent or whatever you whish. It will serve later to compare quantity entries. Finally, when the job ended the following call inform the database that the run is finished: \begin{cpp} bd.endRun(); \end{cpp} \chapter{Instrumenting a python script} \chapter{Useful Postgresql commands} How to list the available schemas ? \begin{command} > \dn \end{command} How to get into the schema ? \begin{command} > set search path to schema_name; \end{command} How to list all the tables ? \begin{command} > \d \end{command} How to list entries from a table ? \begin{command} > SELECT * from table_name ; \end{command} \end{document}