RSS

Server Guide

  1. Introduction
  2. Installation
    1. Requirements
    2. Downloading
    3. Starting the server
  3. Configuration
    1. Hostname
    2. Temp file location
    3. Port number
    4. Maximum Simultaneous Jobs
    5. Using privilege escalation
    6. Server Library
    7. Grid engine native specifications
    8. Grid total slots & Grid total slots command
    9. Days to persist status & Clear old temp files enabled
    10. Log file location
    11. Persistence URL
    12. HTTP server port
    13. Failover
      1. Failover Enabled
      2. Failover Check Interval
      3. Failover Retries
      4. Failover Alias Interface
      5. Failover Alias Sub Interface Num
    14. Directory-based executable access control
      1. Directory Access Control Mode
      2. Directory Access Control Users
      3. Directory Access Control Paths
      4. Examples
    15. Grid Plugin JAR Files
    16. Grid Plugin Class
    17. Grid Complex Resource Attributes
    18. Grid Maximum Submit Threads
    19. Grid Job Accounting URL
    20. Grid Job Accounting Username
    21. Grid Job Accounting Password
  4. Authentication
    1. Authentication Quickstart
  5. Adding module definitions
1. Introduction

Setting up your own Pipeline server is a great way to remotely take advantage of the power of a cluster or a just a dedicated computer with many helpful programs installed on it. More importantly, you can enable many people to take advantage of all this power all through the easy to use interface of the Pipeline client.

2. Installation

2.1 Requirements

The Pipeline server can run on any system that is supported by JRE 1.5 or higher, so the first thing to do is head over to Sun and download the latest JRE/JDK. If you run the server on Windows, you will not be able to use privilege escalation (you might not even need/want it). Also the Failover feature is only supported by Unix/Linux systems. All other features are available for all platforms.

The amount of memory required varies based on the load you will expect on the server, but for a reference point, the Pipeline server running on cranium.loni.ucla.edu has been set to accept a max load of 620 jobs, and its memory footprint hovers between 50-300MB depending on the load and garbage collection scheme.

2.2 Downloading

Head over to the Pipeline download page and download the latest version of the program for Linux/Unix. The server and the client are both in the same jar file, so you only need to change the Main entry point when starting up the server. Extract the contents of the download to the location you want to install the server at.

2.3 Starting the server

Now let’s start the server for the first time. Get to a prompt and switch to the directory where you copied the Pipeline.jar and lib directory and type:

$ java -classpath Pipeline.jar server.Main

Assuming you have java in your path, you should have received the following message back in your terminal window:

Loading server library..........................DONE [1100ms]
Creating 50 database connections................DONE [1400ms]
Starting server on port 8001....................DONE [1500ms]

That’s not enough to have a fully functional server yet, but we’re a step closer, so go ahead and break out of the process by hitting Ctrl-C and then let’s begin configuration process.

3. Configuration

We need to setup our preferences file. When you run the server for the first time, it should have created a directory where all your preferences and logs will be stored. Depending on your operating system, you can find this directory in one of the following locations:

  • Linux/Unix – $HOME/.pipeline/8001/
  • OS X – $HOME/Library/Preferences/Pipeline/8001/
  • Windows – %HOME%\Application Data\LONI\pipeline\8001\
  • Windows Vista/Seven – %HOME%\AppData\LONI\Pipeline\8001\

Open up your favorite text editor, and paste in the following sample preferences file:

<?xml version="1.0" encoding="UTF-8"?>
<preferences>
<Hostname>cranium.loni.ucla.edu</Hostname>
<TempFileLocation>/ifs/tmp/</TempFileLocation>
</preferences>

Save the file out as “preferences.xml”
When you launch the server, it will have the host name “cranium.loni.ucla.edu” and all Temporary files will be in /ifs/tmp directory.

Now let’s look all the options supported by Pipeline.

3.1 Hostname

The <Hostname> element specifies the hostname of the computer that you want the server to run on. Ironically, this element requires the fully qualified domain name of the computer that it is on, not just the hostname. For example, “mycomputername” would be a hostname, whereas, “mycomputer.labname.university.edu” would be my fully qualified domain name.

3.2 Temp file location

The <TempFileLocation> element specifies where all intermediate files for all the executed programs are stored on the computer. This directory should be accessible from the Pipeline server as well as compute nodes. The Pipeline server will create a structure under there, and the compute nodes will read from and write to that directory.  For example if you specify

<TempFileLocation>/ifs/tmp</TempFileLocation>

Pipeline will create a directory /ifs/tmp/username/timestamp and put all the working files there.

Where username is the user that is running the server and timestamp is the time at which each workflow gets translated before execution. Inside each of those ‘timestamp’ folders will be all the intermediate files produced by executables from submitted workflows. Depending on the number of users using your server and the kind of work they do, this directory can balloon up very quickly.

3.3 Port number

If no port number is specified in the preferences, then the server will attempt to list on port 8001. If you want to change the port number use the <ServerPort> element in your preferences.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<preferences>
<Hostname>cranium.loni.ucla.edu</Hostname>
<ServerPort>8020</ServerPort>
<TempFileLocation>/ifs/tmp/</TempFileLocation>
</preferences>

3.4 Maximum Simultaneous Jobs

As your server becomes busier and busier, at times you will have users submitting more jobs at once than your server has enough capacity to handle. In order to prevent your system or cluster from coming to a grinding halt, you can set the maximum number of simultaneous jobs in the preferences. By default, the Pipeline server will set this value equal to the number of cores/cpus that you have available in your computer. For example, a computer with dual processor quad core chips, will have a maximum number of simultaneous jobs of 8. If you want to change this (because you have a grid available) you can set this preference to any value you want.

<?xml version="1.0" encoding="UTF-8"?>
<preferences>
<Hostname>cranium.loni.ucla.edu</Hostname>
<ServerPort>8020</ServerPort>
<TempFileLocation>/ifs/tmp/</TempFileLocation>
<MaximumThreadPoolSize>620</MaximumThreadPoolSize>
</preferences>

Take note that this will not reject jobs submitted by users after the limit has reached. It will just queue them up until there is an available slot for execution. For grid setups, you should probably have the limit be a little higher than the number of compute nodes available to you, because submitting to the grid takes a non-negligible amount of time, and it’s best to keep your compute nodes crunching at all times.

3.5 Using privilege escalation

When you have different users connecting to your Pipeline server, you might want to enforce different access restrictions on each user. If you’re running your Pipeline server on a Linux/Unix based system (including OS X), you can enable privilege escalation which will make the Pipeline server issue commands as the user who submits a workflow for execution. For example, if user ‘jdoe’ connects to a Pipeline server with privilege escalation enabled, any command that is issued on behalf of that user will be prefixed with ’sudo -u jdoe ‘. This way all the files that are accessed and written by the user on the Pipeline server will be done on behalf of ‘jdoe’.

Remember, there is no harm in not enabling privilege escalation on your Pipeline server. All files will simply be created and read as the Pipeline server user. You will be giving uniform access to your system to all users. Additionally, it makes it easy to lock down the access of all Pipeline users because you only have to lock down the access of one actual user on your system; the Pipeline user.

In order to enable this feature in the Pipeline, you need to do two things. 1) Add the <UsePrivilegeEscalation> preference to your preferences file with a value of “true” and 2) modify your system’s sudoers file to allow the user that runs the Pipeline server to sudo as any user that will be allowed to connect to the system. How to modify the sudoers file is outside of the scope of this guide, but if you want/need this feature you probably already know how to do it. Now your preferences should look something like this:

<?xml version="1.0" encoding="UTF-8"?>
<preferences>
<Hostname>cranium.loni.ucla.edu</Hostname>
<ServerPort>8020</ServerPort>
<TempFileLocation>/ifs/tmp/</TempFileLocation>
<MaximumThreadPoolSize>620</MaximumThreadPoolSize>
<UsePrivilegeEscalation>true</UsePrivilegeEscalation>
</preferences>

3.6 Server Library

When Pipeline client users connect to a server, the client syncs up the library of module definitions available on that server. The location of that library on the server is specified by the <ServerLibraryLocation> element in the preferences. By default, the location is set to one of the following locations (based on OS), so you don’t need to specify this preference if you’re happy with it:

  • Linux/Unix – $HOME/documents/Pipeline/ServerLib/
  • OS X – $HOME/Documents/Pipeline/ServerLib/
  • Windows – %HOME%\Application Data\LONI\pipeline\ServerLib\
  • Windows Vista/Seven – %HOME%\Documents\Pipeline\ServerLib\

When the server starts up, it reads in all the .pipe files in the ServerLibraryLocation directory (and all its subdirectories) and monitors it for changes/additions in any of the files while it runs. Simply put all the module definitions that you want to make available to users into this directory, and when clients connect they will obtain a copy of the library on their local system. If you add/delete/change any of the definitions in this directory, the server will automatically see the change (no restart required) and synchronize clients again when they reconnect. Even when clients are connected during the change, they will get the new version of ServerLib without reconnection. Remember that changes should be affected on the root directory, otherwise server will not notice the change and Server Library files will not be updated. For example If you have a pipe file somewhere like ServerLib->LONI->Modules->example.pipe and you have a change only in example.pipe file. Although the “Modules” directory and “examples.pipe” will have new “modified time”, the ServerLib (which is the root in our case) directory will not change its modification time, so in this case you have to manually change the ServerLib modification time.
After updating server library files go and check the Output Stream of the server, you should see a log like this

Loading server library..........................DONE [1100ms]

If this log exists, it means that the server captured the change in the library, otherwise will mean that library has not been updated.

3.7 Grid engine native specifications

If you have a grid at your disposal, you’ll probably want to take advantage of it for your processing. The LONI Pipeline server can do this, if you have an attached plugin to it. Once you’ve setup your grid engine, you might need to specify a native specification string, that goes along with your job submission (if none of that makes any sense, just skip this preference because you don’t need to use it on your server). To set the string for the native specifications make sure that you have set your server to use plugins and place your native specifications string inside <GridEngineNativeSpecification>. On the LONI Pipeline server we use the following native spec. preference:

<GridEngineNativeSpecification>-shell y -S /bin/csh -q pipeline.q -l pipeline -N _pjob </GridEngineNativeSpecification>

By default, Grid Plugins are disabled, you must set Grid Plugin JAR Files and Grid Plugin Class if you want the Pipeline server to use your grid engine. The native spec you should use for your installation will vary, but if you’re using a Sun Grid Engine installation and you want to use the same string, you’ll want to change the -q pipeline.q to reflect the submission queue (if any) that you will be using.

Optionally, you can add _pmem and _pstack to the GridEngineNativeSpecification tag. _pmem enables user define maximum memory per module, and _pstack enables user to define the stack size. Both of these can be configured by the user using the latest Pipeline client, and they all use the default set by the grid engine unless user specifies.

Starting from version 4.4 if you want to use Grid Complex Resource Attributes you can also add _pcomplex which refers to tag Grid Complex Resource Attributes.

3.8 Grid total slots and Grid total slots command

The <GridTotalSlots> specifies number of total grid slots for the cluster. This enables connected user to see how busy is the server in terms of number of running jobs in grid and number of total slots available.

Alternatively, you can use the <GridTotalSlotsCmd> tag which contains a command line query to get the total number of available slots for the queue.  Refer to your cluster management documentation for the appropriate query. By using this tag, the server will query the grid engine periodically to get the latest number of available slots, and update the number automatically, and broadcast the new number to clients.

3.9 Days to persist status & Clear old temp files enabled

The <DaysToPersistStatus> specifies number of days a workflow can be running. Every 24 hours, the Pipeline server will check and cleanup workflow sessions older than the number of days specified. The default value is 30 days. If a session is cleared, all its temporary files under the temporary directory will be removed.

If <ClearOldTempFilesEnabled> is set to true, then any temporary session directory that are older than two times the <DaysToPersistStatus> will be removed. This will not happen under normal circumstances, because persistence database keeps track of all sessions, and no temporary directories older than <DaysToPersistStatus> should exist. It only applies when Pipeline server restarts with its persistence database manually deleted. The default is false.

3.10 Log file location

If you want to explicitly set the directory location that your log files will write to, you can specify the path using the <LogFileLocation> preference. In order to define the prefix in which the log files will be named, simply add that to the end of the directory path. The unique number denoting the log file will be appended onto the file name.

<LogFileLocation>/nethome/users/pipelnv4/server/events.log</LogFileLocation>

In the above example, log files will be created in the /nethome/users/pipelnv4/server/ directory, and will be named events.log.0, events.log.1, and so forth.

3.11 Persistence URL

Pipeline server uses hsqldb to store information, including workflow status and module status. By default, it is stored in Pipeline server’s memory, and will be removed when the Pipeline server stops. Alternatively, you can start a hsqldb server and make it save to an external file. You can go to hsqldb website to download the jar file. To start a hsqldb process, run something like this:

java -cp ./lib/hsqldb.jar org.hsqldb.Server -database.0 file:/user/foo/mydb -dbname.0 xdb

After successfully starting hsqldb, you can put <PersistenceURL> to Pipeline server’s preference file, something like the following:

<PersistenceURL>jdbc:hsqldb:hsql://localhost/xdb</PersistenceURL>

3.12 HTTP server port

The <HTTPServerPort> specifies the port number in which the Pipeline server provides API for querying workflow data, including session list, session status, output files. It is helpful when you (or your program) want to query workflows on Pipeline server, without the need of Pipeline client. Please note, once enabled, it does not require any login authorization to see any workflows on the server. By default, this feature is not enabled on the Pipeline server.

For example, we have a preference file like this:

<?xml version="1.0" encoding="UTF-8"?>
<preferences>
<Hostname>cerebro-rsn2.loni.ucla.edu</Hostname>
<ServerPort>8020</ServerPort>
<HTTPServerPort>8021</HTTPServerPort>
</preferences>

When the server is running, you can go to http://cerebro-rsn2.loni.ucla.edu:8021/ and it shows an XML file listing all the APIs. Currently there are five functions:

  • getSessionsList
  • getSessionWorkflow
  • getSessionStatus
  • getInstanceCommand
  • getOutputFiles

getSessionsList returns all the active sessions on this Pipeline server. It does not take any argument, and the query URL looks like this:

http://cerebro-rsn2.loni.ucla.edu:8021/getSessionsList

The Pipeline server returns an XML file listing all the active sessions, with their session IDs.

<sessions count="1">
<session>
cerebro-rsn2.loni.ucla.edu:8020-453da129-c81b-4473-9fc0-8fe03481e492
</session>
</sessions>

getSessionWorkflow returns the workflow file (.pipe file). It takes session ID as argument. The query URL looks like this:

http://cerebro-rsn2.loni.ucla.edu:8021/getSessionWorkflow?sessionID=cerebro-rsn2.loni.ucla.edu:8020-453da129-c81b-4473-9fc0-8fe03481e492

getSessionStatus returns the status of the workflow execution, when it started, if it has finished, what time it finished, what are the nodes and instances in this workflow, and for each node, if they finished successfully. The query URL looks like this:

http://cerebro-rsn2.loni.ucla.edu:8021/getSessionStatus?sessionID=cerebro-rsn2.loni.ucla.edu:8020-453da129-c81b-4473-9fc0-8fe03481e492

getInstanceCommand returns the command of the execution. It takes session ID, node name (which can be found by calling getSessionStatus), and instance number (which can also be found by calling getSessionStatus). The query URL looks like this:

http://cerebro-rsn2.loni.ucla.edu:8021/getInstanceCommand?sessionID=cerebro-rsn2.loni.ucla.edu:8020-453da129-c81b-4473-9fc0-8fe03481e492&nodeName=BET_0&instanceNumber=0

getOutputFiles returns the path of output files generated by the node. It takes session ID, node name, instance number, and parameter ID. The query URL looks like this:

http://cerebro-rsn2.loni.ucla.edu:8021/getOutputFiles?sessionID=cerebro-rsn2.loni.ucla.edu:8020-453da129-c81b-4473-9fc0-8fe03481e492&nodeName=BET_0&instanceNumber=0&parameterID=BET.OutputFile_0

3.13 Failover

Starting from version 4.2 Pipeline has failover feature which is supported only on UNIX/Linux machines.
Failover capabilities have been implemented in Pipeline 4.2, improving robustness and minimizing service disruptions in the case of a single Pipeline server failure. This was achieved by using two actual servers, a primary and a secondary, a virtual Pipeline Server name, and de-coupling and running the Persistence Database on a separate system. The two servers monitor the state of its counterpart. In the event that the primary server with the virtual Pipeline server name has a catastrophic failure, the secondary server will assume the virtual name, establish a connection to the Persistence Database and take ownership of all current Pipeline jobs dynamically.

Requirements
Minimum of 3 separate hosts.
Virtual IP address of the server.
User who runs pipeline should have full access to execute command ifconfig

Instructions how to configure failover

Copy pipeline server stuff to two different hosts, let’s say Host A and Host B. Also we will need database to be in third node ( Host C ).
Let’s say address of server will be server1.loni.ucla.edu and address of database will be database.loni.ucla.edu:9002 with name xdb.

Steps to configure failover

1. Open preferences.xml file of both hosts ( Host A and Host B )
2. Add Hostname preference ( for example virtualName.loni.ucla.edu)
3. Add Failover Enabled option and set it to true
4. Optional: Add Failover Check Interval preference
5. Optional: Add Failover Retries preference
6. Optional: Add Failover Alias Interface preference
7. Optional: Add Failover Alias Sub Interface Num preference
8. Add Persistence URL preference and make it to point to the database server Host C ( for our example
jdbc:hsqldb:hsql://database.loni.ucla.edu:9002/xdb 9. Save files and close them
10. Before starting servers go to Host C and start the database
11. Start the server on Host A. Now you have a configured server. On startup it will check for specified hostname address if there is already a Pipeline running with same address.
For our case it will ensure that it is the first server started and will switch to Master mode and will be fully functional.
12. Check the output stream of Host A’s server and ensure that server successfully started.
13. Go to Host B and start the server. This server will check for specified hostname address and if it is already in use ( in our case it should be ) it will switch to Slave mode and will wait until Host A crashes. When Host A will go down, this server will wake up and continue Host A’s work.

How it works

Server of Host B pings to server Host A every milliseconds. When there is no response it retries pings for times and if all retries are unsuccessful then Host B creates an IP alias on network interface specified by and and switches to Master mode.

3.13.1 Failover Enabled

The <FailoverEnabled> indicates that server enabled failover feature. It accepts boolean values true or false By default, if this preference does not exist, Pipeline sets it to false.

3.13.2 Failover Check Interval

The <FailoverCheckInterval> specifies the time in milliseconds for Secondary server to ping to Master server. If nothing specified, Pipeline will use default value which is 5000.

3.13.3 Failover Retries

The <FailoverRetries> specifies the number of retries before starting secondary server as master when ping fails. If nothing specified, Pipeline will use default value which is 3.

3.13.4 Failover Alias Interface

The <FailoverAliasInterface> specifies the name of interface on which Pipeline will create a sub interface to do IP Aliasing. If nothing specified, Pipeline will automatically find the primary network interface and first available sub interface number and will add IP Alias on it. For example if your primary interface is eth0 and eth0:0 and eth0:1 are busy with another IP addresses, Pipeline will use eth0:3.
WARNING: If one of sub-interfaces contains IP Address of specified Hostname, Pipeline will give an error and exit.

3.13.5 Failover Alias Sub Interface Num

The <FailoverAliasSubInterfaceNum> specifies the number of sub interface on which Pipeline should create the Alias IP Address. If nothing specified, Pipeline will automatically find first available sub interface number and will add IP Alias on it. For example if your primary interface is eth0 and eth0:0 and eth0:1 are busy with another IP addresses, Pipeline will use eth0:3.
WARNING: If one of sub-interfaces contains IP Address of specified Hostname, Pipeline will give an error and exit.

3.14. Directory-based executable access control

To improve security, directory-based Boolean access control for permitted executables was implemented. This is an extra layer on top of operating system’s authentication and access control. Restricted users are not allowed to run executables outside the specified directories, and/or not allowed to browse the file system using remote file browser.

3.14.1 Directory Access Control Mode

The <DirAccessControlMode> is an integer which indicates the access control configuration for running executables and remote file browser. Below is a matrix chart for different mode and their meaning.

Mode Remote File Browser Access Control Executables Access Control
0 Never Never
1 Never No with exceptions
2 Never Yes with exceptions
3 No with exceptions No with exceptions
4 Yes with exceptions Yes with exceptions
5 Same as Shell permissions No with exceptions
6 Same as Shell permissions Yes with exception
7* Same as Shell permissions Same as Shell permissions

* Available starting from Pipeline version 4.2.2

Never means Pipeline server will not do any access control restrictions for any user. Note this will not affect operating system’s authentication and access control, in other words, the credentials required to connect to the Pipeline server and the rights required to execute programs will not be affected by the settings here. No with exceptions means access control is not enabled for all users except those marked in Directory Access Control Users will be restricted. Yes with exceptions means all users will be restricted except for those specified in Directory Access Control Users will be allowed. Same as Shell permissions means the remote file browser will act as if user logged in to the server using Shell.

3.14.2 Directory Access Control Users

The <DirAccessControlUsers> is a list of users seperated by commas (i.e. john,bob,mike) which will indicate conditional users. Depending on the Directory Access Control Mode, These users will be restricted or allowed.

3.14.3 Directory Access Control Paths

The <DirAccessControlPaths> is a list of directories separated by commas (i.e. /usr/local,/usr/bin), which will be the only directories allowed for restricted users.

3.14.4 Examples

For example, we want to restrict user john, bob, mike to execute programs only in: /usr/local and /usr/bin, and let every user browse using remote file browser as Shell does, we would have these configurations:
<DirAccessControlMode>5</DirAccessControlMode>
<DirAccessControlUsers>john,bob,mike</DirAccessControlUsers>
<DirAccessControlPaths>/usr/local,/usr/bin</DirAccessControlPaths>

Another example, if we want to restrict all users to execute programs only in: /usr/local and /usr/bin, but allow users john, bob, mike to run without restrictions, and let every user browse using remote file browser as Shell does, we would have these configurations:
<DirAccessControlMode>6</DirAccessControlMode>
<DirAccessControlUsers>john,bob,mike</DirAccessControlUsers>
<DirAccessControlPaths>/usr/local,/usr/bin</DirAccessControlPaths>

3.15 Grid Plugin JAR Files

Starting from version 4.4 developers have opportunity to create their own plugins for Pipeline to communicate with various Grid managers ( see also Pipeline Grid Plugin API Developers Guide ) Pipeline package contains two built-in plugins for Sun Grid Engine which are JGDIPlugin and DRMAAPlugin. In installed package of Pipeline, under the lib directory there is directory called plugins in which you can find these two plugins.

IMPORTANT: Starting from version 4.4 it is required to set grid plugin options in order to make pipeline server to work with Grid resource managers. Otherwise Pipeline server will start all the jobs on the same host where the server is located.

To configure Pipeline to use one of default plugins you need to add special tags in preferences.xml file. First tag is <GridPluginJARFiles> which should contain paths to plugin JAR file and the libraries it uses. Paths must be separated by comma. For example if you want to use built in DRMAA or JGDI plugins your prferences file will look like following

JGDI Plugin
<?xml version="1.0" encoding="UTF-8"?>
<preferences>
<Hostname>cranium.loni.ucla.edu</Hostname>
<ServerPort>8020</ServerPort>
<TempFileLocation>/ifs/tmp/</TempFileLocation>
<MaximumThreadPoolSize>620</MaximumThreadPoolSize>
<GridPluginJARFiles>/usr/pipeline/dist/lib/plugins/JGDIPlugin.jar,
/usr/pipeline/dist/lib/plugins/jgdi.jar</GridPluginJARFiles>
</preferences>
DRMAA Plugin
<?xml version="1.0" encoding="UTF-8"?>
<preferences>
<Hostname>cranium.loni.ucla.edu</Hostname>
<ServerPort>8020</ServerPort>
<TempFileLocation>/ifs/tmp/</TempFileLocation>
<MaximumThreadPoolSize>620</MaximumThreadPoolSize>
<GridPluginJARFiles>/usr/pipeline/dist/lib/plugins/DRMAAPlugin.jar,
 /usr/pipeline/dist/lib/plugins/drmaa.jar</GridPluginJARFiles>
</preferences>

IMPORTANT: Some plugins require to be defined in class path. For example DRMAA Plugin requires from you to put the path of drmaa.jar in classPath when starting the server. So to start the server with DRMAA plugin you need to have

$ java -cp .:/usr/pipeline/dist/lib/plugins/drmaa.jar Pipeline.jar server.Main

Only this tag is not enough to have plugins enabled and ready to use, you also need to set tag Grid Plugin Class

3.16 Grid Plugin Class

This tag should contain the class name of the Plugin used by Pipeline. Following are class names for built in plugins.

JGDI Plugin
<?xml version="1.0" encoding="UTF-8"?>
<preferences>
<Hostname>cranium.loni.ucla.edu</Hostname>
<ServerPort>8020</ServerPort>
<TempFileLocation>/ifs/tmp/</TempFileLocation>
<MaximumThreadPoolSize>620</MaximumThreadPoolSize>
<GridPluginJARFiles>/usr/pipeline/dist/lib/plugins/JGDIPlugin.jar,
/usr/pipeline/dist/lib/plugins/jgdi.jar</GridPluginJARFiles>
<GridPluginClass>jgdiplugin.JGDIPlugin</GridPluginClass>
</preferences>
DRMAA Plugin
<?xml version="1.0" encoding="UTF-8"?>
<preferences>
<Hostname>cranium.loni.ucla.edu</Hostname>
<ServerPort>8020</ServerPort>
<TempFileLocation>/ifs/tmp/</TempFileLocation>
<MaximumThreadPoolSize>620</MaximumThreadPoolSize>
<GridPluginJARFiles>/usr/pipeline/dist/lib/plugins/DRMAAPlugin.jar,
 /usr/pipeline/dist/lib/plugins/drmaa.jar</GridPluginJARFiles>
</preferences>
<GridPluginClass>drmaaplugin.DRMAAPlugin</GridPluginClass>
</preferences>

3.17 Grid Complex Resource Attributes

Pipeline 4.4 has a new feature to checks for the jobs which are submitted by Pipeline but not monitored by it anymore. This happens when the job is in a submission process and the server turns off. When the job submission is complete and Pipeline is down, the job id will not be written in the Pipeline database. Which means that this job will use the slot, but Pipeline will not “remember” the job id. When server restarts it gets the list of running jobs on cluster and compares with its database. To determine which jobs are submitted with current server Pipeline uses Grid Complex Resource Attributes. When Pipeline finds jobs which are submitted by current Pipeline but are out of control, it deletes them to free up the slot.

This tag allows you to assign custom complex attributes to all submitted jobs by the server, which will make jobs identifiable. You can have multiple values in the tag seperated by comma. For example

<GridComplexResourceAttributes>pipeline,
serverId=server1</GridComplexResourceAttributes>

Following defines two attributes 1) pipeline which is equal to TRUE and 2) serverId which is equal to server1.

This tag is just a definition of complex attributes. In order to use them you have to define _pcomplex in Grid engine native specifications. In our case, the _pcomplex will be replaced with -l pipeline -l serverId=server1 when submitting the job to the grid.

Note that the Grid manager has to be configured properly to accept jobs with given resource attributes.

3.18 Grid Maximum Submit Threads

Starting from version 4.4 it is possible to configure the number of parallel job submissions. You can specify if you want to submit jobs one by one by setting this parameter to 1 or any other number.

<GridMaxSubmitThreads>10</GridMaxSubmitThreads>

The example will allow maximum of 10 parallel submissions at a time.

3.19 Grid Job Accounting URL

After Pipeline server restarts some jobs may already been finished or changed their status. This events haven’t been caught as Pipeline server was not running at that moment. In order to get the status of “missed” events, Pipeline gets information from configured Sun’s Accounting and Reporting Console (ARCo) database. Note this feature is only tested for Sun Grid Engine with JGDI and DRMAA plugins, if you are using another grid manager, and it does not work, please report it on our Pipeline forum.

Assuming ARCo database is configured and running (refer to Sun’s website and your system administrator for help). To configure ARCo database in Pipeline you need to put information about the database URL, username, password in preferences.xml file.

<GridJobAccountingURL>jdbc:mysql://hostname/db_name</GridJobAccountingURL>

hostname is the addres of the host where the ARCo database is running ( i.e. arco.loni.ucla.edu )
db_name is the name of database ( i.e. cranium_db )

3.20 Grid Job Accounting Username

This tag should contain the username to connect to ARCo database. The preferences.xml should have following line.

<GridJobAccountingUsername>username</GridJobAccountingUsername>

3.21 Grid Job Accounting Password

This tag should contain the password of the specified username declared in <GridJobAccountingUsername>.

<GridJobAccountingPassword>password</GridJobAccountingPassword>

Note that this password is stored as a clear text in preferences.xml, which is not secure. It is recommended to restrict access to preferences file for other users.

4. Authentication

Now that you’ve configured everything, its time to set up authentication so you can actually let users into your server. The Pipeline authenticates users using the Java Authentication and Authorization Service (JAAS), which allows the server operator to authenticate usernames and passwords against any type of system that they want. When a user connects to the server, the Pipeline tries to create a new LoginContext and if the creation is successful, attempts to call the login() method. If true is returned, we allow the user to continue and otherwise the user is disconnected from the server with an “Authentication Failed” message.

In order for the Pipeline server to successfully create a LoginContext, we need to write a little code in Java that handles the authentication scheme. This essentially boils down to 1) implementing the LoginModule interface, 2) packaging the class into a jar file, and 3) making sure its contents are available in the classpath of the server when you launch it. For steps 1 and 2, I will redirect you to the excellent documentation provided by Sun on how to complete those tasks.

Once you’ve got your jar file, you need to create a configuration file to reference the LoginModule inside your jar file. So fire up your favorite text editor and type the following:

/** Login Configuration for the Pipeline **/
PipelineLogin {
edu.ucla.loni.pipeline.security.LONILoginModule required debug=true;
};

In your configuration file, you should replace “edu.ucla.loni.pipeline.security.LONILoginModule” with the path to the LoginModule class you implemented. Now save the file out as pipeline_security.config into the same directory where you placed the Pipeline.jar file and start up the server.

$ java -Djava.security.auth.login.config=pipeline_security.config -classpath Pipeline.jar server.Main

As you can see, we’re setting the system property java.security.auth.login.config to pipeline_security.config, so when the Pipeline tries to create a LoginContext, JAAS will check this property for a filename, go into the file and read in the class name of the LoginModule specified. Using reflection, it’ll load the class and return it to the Pipeline.

4.1 Authentication Quickstart

If you don’t really care about protecting access to the Pipeline server and you just want to get a server running for testing or whatever other reason, we have a LoginModule that you can use, but with a big warning.

WARNING: Using this LoginModule will grant access to anybody who tries to connect to your Pipeline server. It won’t even check their username and password. It will just let them in, no questions asked. This is in no way secure and is a bad thing to do. If you use this module, it is at your own risk.

Now that you’ve been sternly lectured ; -) go ahead and download the jar file that contains the LoginModule class into the lib directory that you extracted out of the pipeline download. Next, download the configuration file and place that into the same directory as the Pipeline.jar file. If you care, you can download the source code of the PassiveLoginModule class too, but it’s not necessary.

Now let’s start the server using the new configuration file and LoginModule:

$ java -Djava.security.auth.login.config=pipeline_jaas.config -classpath Pipeline.jar server.Main

Try connecting to your server now (you can fill in the username and password garbage strings because it won’t be checked). If it says ‘Authentication failed’, kill the server by pressing Ctrl-C and try starting it by explicitly placing the PassiveLoginModule.jar into the classpath with this command:

$ java -Djava.security.auth.login.config=pipeline_jaas.config -classpath lib/PassiveLoginModule.jar:Pipeline.jar server.Main

5. Adding module definitions

Once you’ve finished setting up your server, you’ll want to add some module definitions into its library, so users can take advantage of the tools available on your server. To do that, just define the modules as described in the user guide and place each .pipe file into the server library directory. Now when users connect to the server, they will automatically get a copy of the server library. Any additions, updates, removals to the server library will automatically be reflected in a user’s server library cache when they reconnect to your server.

To make sure icons show up on the modules distributed from the server follow these steps:

  1. Place the icons somewhere on the server that you’re running the Pipeline server on
  2. Open up the .pipe file in a text editor (not in the Pipeline client software) and look for the icon attribute on the outermost <moduleGroup>, and set it to pipeline://serverhostname//path/to/iconimage.png. For example, pipeline://cranium.loni.ucla.edu//home/pipelineUser/documents/Pipeline/icons/air.png
  3. Save the .pipe file into your server library