Condor Workflow Job Type

Last Updated: January 2022

Important

This feature requires the condorpy library to be installed. Starting with Tethys 5.0 or if you are using micro-tethys-platform, you will need to install condorpy using conda or pip as follows:

bash
# conda: conda-forge channel strongly recommended
conda install -c conda-forge condorpy

# pip
pip install condorpy

A Condor Workflow provides a way to run a group of jobs (which can have hierarchical relationships) as a single (Tethys) job. The hierarchical relationships are defined as parent-child relationships. For example, suppose a workflow is defined with three jobs: JobA, JobB, and JobC, which must be run in that order. These jobs would be defined with the following relationships: JobA is the parent of JobB, and JobB is the parent of JobC.

See also

The Condor Workflow job type uses the CondorPy library to submit jobs to HTCondor compute pools. For more information on CondorPy and HTCondor see the CondorPy documentation and specifically the Overview of HTCondor.

Creating a Condor Workflow

Creating a Condor Workflow job involves 3 steps:

  1. Create an empty Workflow job from the job manager.

  2. Create the jobs that will make up the workflow with CondorWorkflowJobNode

  3. Define the relationships among the nodes

python
from tethysapp.my_first_app.app import MyFirstApp as app
from tethys_sdk.jobs import CondorWorkflowJobNode
from tethys_sdk.workspaces import app_workspace


@app_workspace
def some_controller(request, app_workspace):
    workflow = job_manager.create_job(
        name='MyWorkflowABC',
        user=request.user,
        job_type='CONDORWORKFLOW',
        scheduler=app.get_scheduler('condor_primary'),
    )
    workflow.save()

    job_a = CondorWorkflowJobNode(
        name='JobA',
        workflow=workflow,
        condorpy_template_name='vanilla_transfer_files',
        remote_input_files=(
            os.path.join(app_workspace, 'my_script.py'),
            os.path.join(app_workspace, 'input_1'),
            os.path.join(app_workspace, 'input_2')
        ),
        attributes=dict(
            executable='my_script.py',
            transfer_input_files=('../input_1', '../input_2'),
            transfer_output_files=('example_output1', 'example_output2'),
        )
    )
    job_a.save()

    job_b = CondorWorkflowJobNode(
        name='JobB',
        workflow=workflow,
        condorpy_template_name='vanilla_transfer_files',
        remote_input_files=(
            os.path.join(app_workspace, 'my_script.py'),
            os.path.join(app_workspace, 'input_1'),
            os.path.join(app_workspace, 'input_2')
        ),
        attributes=dict(
            executable='my_script.py',
            transfer_input_files=('../input_1', '../input_2'),
            transfer_output_files=('example_output1', 'example_output2'),
        ),
    )
    job_b.save()

    job_c = CondorWorkflowJobNode(
        name='JobC',
        workflow=workflow,
        condorpy_template_name='vanilla_transfer_files',
        remote_input_files=(
            os.path.join(app_workspace, 'my_script.py'),
            os.path.join(app_workspace, 'input_1'),
            os.path.join(app_workspace, 'input_2')
        ),
        attributes=dict(
            executable='my_script.py',
            transfer_input_files=('../input_1', '../input_2'),
            transfer_output_files=('example_output1', 'example_output2'),
        ),
    )
    job_c.save()

    job_b.add_parent(job_a)
    job_c.add_parent(job_b)

    workflow.save()
    # or
    workflow.execute()

Note

The CondorWorkflow object must be saved before the CondorWorkflowJobNode objects can be instantiated, and the CondorWorkflowJobNode objects must be saved before you can define the relationships.

Before a controller returns a response the job must be saved, otherwise, the changes made to the job will be lost (executing the job automatically saves it). If submitting the job takes a long time (e.g. if a large amount of data has to be uploaded to a remote scheduler) then it may be best to use AJAX to execute the job.

API Documentation

class tethys_compute.models.CondorWorkflow(*args, **kwargs)

CondorPy Workflow job type

class tethys_compute.models.CondorWorkflowNode(*args, **kwargs)

Base class for CondorWorkflow Nodes

class tethys_compute.models.CondorWorkflowJobNode(*args, **kwargs)

CondorWorkflow JOB type node