High Level ODI Pipeline Workflow Architecture

Francisco Valdes, Nov. 11, 2011

1. The Pipeline Developer's View

The architecture of the ODI Automatic Calibration Pipeline (AuCaP) can be viewed at many levels of granularity. The pipeline developer's view of the AuCaP "pipeline application" is of a number of small functional "pipelines" in a map/reduce organization. This is also the view for a native NHPPS pipeline application running on a dedicated cluster.

Figure 1 shows this pipeline developer's view where each element of the diagram is one of the functional pipelines. The function of each of these pipelines and details of the organization may be found in Valdes [PL013]. As an aside, the pieces in red indicate a prototype workflow. The lines and terminators indicate the transformations of mapping (breaking up a dataset into parallel pieces) and reducing (aggregating the parallel pieces).

The connections also illustrate the traversal of the transformations through the pipelines. The traversal is a depth first tree moving from left to right and top to bottom in the figure. For example, from the top level pipeline, called "top", the workflow proceeds first to the caB pipeline and then the seq pipeline. When all nodes in the caB pipeline have been traversed the caF pipeline and its children are next and similarly for the ftr and ftG pipelines.

Figure 1: Pipeline Developer's View (red = prototype)

2. The PPA Developer's View

At a higher architectural level, the primary subject of this document, the view is as shown in figure 2. This is the view of the PPA (Pipeline, Portal, and Archive) system developers. In this view all the structure in the pipeline tree beyond the first set of nodes is collapsed into "workflows".

Figure 2: PPA Developer's View

The figure further shows the two key PPA components which encapsulate the functionality. The Data Engine is the component that handles data staging, archiving, and calling the OGCE (Open Grid Computing Environment) workflow engine to execute the four workflows indicated.

One thing to note is that the "crowfoot" (map) and "vee" (reduce) terminators indicate the map/reduce transformations. In figure 2, there is only one execution of the caB workflow which processes all the bias data, hence there is no crowfoot terminator. The other workflows can be parallized by filters; i.e. the dome flat processing, the static exposure processing, and the guided exposure processing. This parallelization is indicated by the crowfoot terminator.

In figure 3 the architecture of figure 2 is depicted as a sequence diagram. The top pipeline is effectively an orchestrator that executes the sequence of steps shown for the Data Engine. The top label in the figure is included simply to indicate that the Data Engine and the top pipeline are functionally the same.

Figure 3: PPA Sequence View

3. The Expanded Data Engine Sequence Logic and Components

The full PPA system dealing with the AuCaP processing involves several additional components and orchestration logic centered around the Data Engine as the top level of the workflow. Figure 4 shows the sequence logic and components (summarized below).