************************
Condor Workflow Job Type
************************
**Last Updated:** January 2022
.. important::
This feature requires the ``condorpy`` library to be installed. Starting with Tethys 5.0 or if you are using ``micro-tethys-platform``, you will need to install ``condorpy`` using conda or pip as follows:
.. code-block:: bash
# conda: conda-forge channel strongly recommended
conda install -c conda-forge condorpy
# pip
pip install condorpy
A Condor Workflow provides a way to run a group of jobs (which can have hierarchical relationships) as a single (Tethys) job. The hierarchical relationships are defined as parent-child relationships. For example, suppose a workflow is defined with three jobs: ``JobA``, ``JobB``, and ``JobC``, which must be run in that order. These jobs would be defined with the following relationships: ``JobA`` is the parent of ``JobB``, and ``JobB`` is the parent of ``JobC``.
.. seealso::
The Condor Workflow job type uses the CondorPy library to submit jobs to HTCondor compute pools. For more information on CondorPy and HTCondor see the `CondorPy documentation `_ and specifically the `Overview of HTCondor `_.
Creating a Condor Workflow
==========================
Creating a Condor Workflow job involves 3 steps:
1. Create an empty Workflow job from the job manager.
2. Create the jobs that will make up the workflow with `CondorWorkflowJobNode`
3. Define the relationships among the nodes
::
from tethysapp.my_first_app.app import MyFirstApp as app
from tethys_sdk.jobs import CondorWorkflowJobNode
from tethys_sdk.workspaces import app_workspace
@app_workspace
def some_controller(request, app_workspace):
workflow = job_manager.create_job(
name='MyWorkflowABC',
user=request.user,
job_type='CONDORWORKFLOW',
scheduler=app.get_scheduler('condor_primary'),
)
workflow.save()
job_a = CondorWorkflowJobNode(
name='JobA',
workflow=workflow,
condorpy_template_name='vanilla_transfer_files',
remote_input_files=(
os.path.join(app_workspace, 'my_script.py'),
os.path.join(app_workspace, 'input_1'),
os.path.join(app_workspace, 'input_2')
),
attributes=dict(
executable='my_script.py',
transfer_input_files=('../input_1', '../input_2'),
transfer_output_files=('example_output1', 'example_output2'),
)
)
job_a.save()
job_b = CondorWorkflowJobNode(
name='JobB',
workflow=workflow,
condorpy_template_name='vanilla_transfer_files',
remote_input_files=(
os.path.join(app_workspace, 'my_script.py'),
os.path.join(app_workspace, 'input_1'),
os.path.join(app_workspace, 'input_2')
),
attributes=dict(
executable='my_script.py',
transfer_input_files=('../input_1', '../input_2'),
transfer_output_files=('example_output1', 'example_output2'),
),
)
job_b.save()
job_c = CondorWorkflowJobNode(
name='JobC',
workflow=workflow,
condorpy_template_name='vanilla_transfer_files',
remote_input_files=(
os.path.join(app_workspace, 'my_script.py'),
os.path.join(app_workspace, 'input_1'),
os.path.join(app_workspace, 'input_2')
),
attributes=dict(
executable='my_script.py',
transfer_input_files=('../input_1', '../input_2'),
transfer_output_files=('example_output1', 'example_output2'),
),
)
job_c.save()
job_b.add_parent(job_a)
job_c.add_parent(job_b)
workflow.save()
# or
workflow.execute()
.. note::
The `CondorWorkflow` object must be saved before the `CondorWorkflowJobNode` objects can be instantiated, and the `CondorWorkflowJobNode` objects must be saved before you can define the relationships.
Before a controller returns a response the job must be saved, otherwise, the changes made to the job will be lost (executing the job automatically saves it). If submitting the job takes a long time (e.g. if a large amount of data has to be uploaded to a remote scheduler) then it may be best to use AJAX to execute the job.
API Documentation
=================
.. autoclass:: tethys_compute.models.CondorWorkflow
.. autoclass:: tethys_compute.models.CondorWorkflowNode
.. autoclass:: tethys_compute.models.CondorWorkflowJobNode