Beamline 7.3.3 Flows

This page documents the workflows supported by Splash Flows Globus at ALS Beamline 7.3.3 (SAXS/WAXS/GISAXS).

Beamline 7.3.3 supports hard x-ray scattering techniques, including small- and wide-angle x-ray scattering (SAXS/WAXS) and grazing-incidence SAXS/WAXS (GISAXS/GIWAXS).

Data at 7.3.3

The data collected from 7.3.3 are typically 2D scattering images, where each pixel records scattering intensity as a function of scattering angle.

File Watcher

There is a file watcher on the system data733 that listens for new scans that have finished writing to disk. From there, a Prefect Flow we call dispatcher kicks off the downstream steps in a new_733_file_task task.

Prefect Configuration

Registered Flows

`dispatcher.py`

The Dispatcher Prefect Flow manages the logic for handling the order and execution of data tasks. As as soon as the File Watcher detects that a new file is written, it calls the dispatcher() Flow. In this case, the dispatcher handles the synchronous call to move.py, with a potential to add additional steps (e.g. scheduling remote HPC analysis code).

`move.py`

Flow to process a new file at BL 7.3.3 1. Copy the data from data733 to Lamarr (our common staging area). 2. Copy the file from the data733 to NERSC CFS. 3. Ingest the data from Lamarr into SciCat. 4. Schedule pruning from data733 for 6 months from now. 5. Archive the file from NERSC CFS to NERSC HPSS at some point in the future.

Prefect Server + Deployments

This beamline is starting fresh with Prefect==3.4.2 (an upgrade from 2.19.5). With the latest Prefect versions, we can define deployments in a yaml file rather than build/apply steps in a shell script. create_deployments_733.sh is the legacy way we support registering flows. Now, flows are defined in orchestration/flows/bl733/prefect.yaml. Keeping the prefect config for the beamline within the flows folder makes it easier to keep track of different Prefect deployments for different beamlines.

Note that we still must create work pools manually before we can register flows to them.

For example, here is how we can now create our deployments:

# cd to the directory
cd orchestration/flows/bl733/

# add Prefect API URL + Key to the environment (if not already present)
export PREFECT_API_URL=http://<your-prefect-server-for-bl733>:4200/api

# create the work-pools
prefect work-pool create new_file_733_pool
prefect work-pool create dispatcher_733_pool
prefect work-pool create prune_733_pool

prefect deploy

We can also preview a deployment: prefect deploy --output yaml, or deploy only one flow prefect deploy --name run_733_dispatcher.

The following script follows the above logic for deploying the flows in a streamlined fashion for the latest version of Prefect: splash_flows/init_work_pools.py

Diagrams

Sequence Diagram

sequenceDiagram participant T as Trigger Components participant F as Prefect Flows participant S as Storage & Processing %% Initial Trigger T->>T: Detector → File Watcher T->>F: File Watcher triggers Dispatcher F->>F: Dispatcher coordinates downstream Flows %% Flow 1: new_file_733 rect rgb(220, 230, 255) note over F,S: FLOW 1: new_file_733 F->>S: Access data733 S->>S: Globus Transfer to NERSC CFS S->>S: Ingest metadata to SciCat end %% Flow 2: HPSS Transfer rect rgb(220, 255, 230) note over F,S: FLOW 2: Scheduled HPSS Transfer F->>S: Access NERSC CFS S->>S: SFAPI Transfer to HPSS Tape S->>S: Ingest metadata to SciCat end %% Flow 3: HPC Analysis rect rgb(255, 230, 230) note over F,S: FLOW 3: HPC Downstream Analysis F->>S: Access data733 S->>S: Globus Transfer to HPC S->>S: Run HPC Compute Processing S->>S: Return scratch data to data733 end %% Flow 4: Scheduled Pruning rect rgb(255, 255, 220) note over F,S: FLOW 4: Scheduled Pruning F->>S: Scheduled pruning jobs S->>S: Prune old files from CFS S->>S: Prune old files from data733 end

Data Infrastructure Workflows

--- config: theme: neo layout: elk look: neo --- flowchart subgraph s1["new_file_733 Flow"] n20["data733"] n21["NERSC CFS"] n22@{ label: "SciCat [Metadata Database]" } end subgraph s2["HPSS Transfer Flow"] n38["NERSC CFS"] n39["HPSS Tape Archive"] n40["SciCat [Metadata Database]"] end subgraph s3["HPC Analysis Flow"] n41["data733"] n42["HPC Filesystem"] n43["HPC Compute"] end n23["data733"] -- File Watcher --> n24["Dispatcher [Prefect Worker]"] n25["Detector"] -- Raw Data --> n23 n24 --> s1 & s2 & s3 n20 -- Raw Data [Globus Transfer] --> n21 n21 -- "Metadata [SciCat Ingestion]" --> n22 n32["Scheduled Pruning [Prefect Workers]"] --> n35["NERSC CFS"] & n34["data733"] n38 -- Raw Data [SFAPI Slurm htar Transfer] --> n39 n39 -- "Metadata [SciCat Ingestion]" --> n40 s2 --> n32 s3 --> n32 s1 --> n32 n41 -- Raw Data [Globus Transfer] --> n42 n42 -- Raw Data --> n43 n43 -- Scratch Data --> n42 n42 -- Scratch Data [Globus Transfer] --> n41 n20@{ shape: internal-storage} n21@{ shape: disk} n22@{ shape: db} n38@{ shape: disk} n39@{ shape: paper-tape} n40@{ shape: db} n41@{ shape: internal-storage} n42@{ shape: disk} n23@{ shape: internal-storage} n24@{ shape: rect} n25@{ shape: rounded} n35@{ shape: disk} n34@{ shape: internal-storage} n20:::storage n20:::Peach n21:::Sky n22:::Sky n38:::Sky n39:::storage n40:::Sky n41:::Peach n42:::Sky n43:::compute n23:::collection n23:::storage n23:::Peach n24:::collection n24:::Rose n25:::Ash n32:::Rose n35:::Sky n34:::Peach classDef collection fill:#D3A6A1, stroke:#D3A6A1, stroke-width:2px, color:#000000 classDef Rose stroke-width:1px, stroke-dasharray:none, stroke:#FF5978, fill:#FFDFE5, color:#8E2236 classDef storage fill:#A3C1DA, stroke:#A3C1DA, stroke-width:2px, color:#000000 classDef Ash stroke-width:1px, stroke-dasharray:none, stroke:#999999, fill:#EEEEEE, color:#000000 classDef visualization fill:#E8D5A6, stroke:#E8D5A6, stroke-width:2px, color:#000000 classDef Peach stroke-width:1px, stroke-dasharray:none, stroke:#FBB35A, fill:#FFEFDB, color:#8F632D classDef Sky stroke-width:1px, stroke-dasharray:none, stroke:#374D7C, fill:#E2EBFF, color:#374D7C classDef compute fill:#A9C0C9, stroke:#A9C0C9, stroke-width:2px, color:#000000 style s1 stroke:#757575 style s2 stroke:#757575 style s3 stroke:#757575

VM Details

The computing backend runs on a VM in the B15 server room that is managed by ALS IT staff.

Name: flow-733 OS: Ubuntu 24.04 LTS

We are using Ansible to streamline the development and support of this virtual machine. See https://github.com/als-computing/als_ansible/pull/4 for details.

Data Access for Users

Users can download their data from SciCat, our metadata database, where we keep track of file location history, and additional experiment metadata.