Beamline 7.3.3 Flows
This page documents the workflows supported by Splash Flows Globus at ALS Beamline 7.3.3 (SAXS/WAXS/GISAXS).
Beamline 7.3.3 supports hard x-ray scattering techniques, including small- and wide-angle x-ray scattering (SAXS/WAXS) and grazing-incidence SAXS/WAXS (GISAXS/GIWAXS).
Data at 7.3.3
The data collected from 7.3.3 are typically 2D scattering images, where each pixel records scattering intensity as a function of scattering angle.
File Watcher
There is a file watcher on the system data733 that listens for new scans that have finished writing to disk. From there, a Prefect Flow we call dispatcher kicks off the downstream steps in a new_733_file_task task.
Prefect Configuration
Registered Flows
dispatcher.py
The Dispatcher Prefect Flow manages the logic for handling the order and execution of data tasks. As as soon as the File Watcher detects that a new file is written, it calls the dispatcher() Flow. In this case, the dispatcher handles the synchronous call to move.py, with a potential to add additional steps (e.g. scheduling remote HPC analysis code).
move.py
Flow to process a new file at BL 7.3.3
1. Copy the data from data733 to Lamarr (our common staging area).
2. Copy the file from the data733 to NERSC CFS.
3. Ingest the data from Lamarr into SciCat.
4. Schedule pruning from data733 for 6 months from now.
5. Archive the file from NERSC CFS to NERSC HPSS at some point in the future.
Prefect Server + Deployments
This beamline is starting fresh with Prefect==3.4.2 (an upgrade from 2.19.5). With the latest Prefect versions, we can define deployments in a yaml file rather than build/apply steps in a shell script. create_deployments_733.sh is the legacy way we support registering flows. Now, flows are defined in orchestration/flows/bl733/prefect.yaml. Keeping the prefect config for the beamline within the flows folder makes it easier to keep track of different Prefect deployments for different beamlines.
Note that we still must create work pools manually before we can register flows to them.
For example, here is how we can now create our deployments:
# cd to the directory
cd orchestration/flows/bl733/
# add Prefect API URL + Key to the environment (if not already present)
export PREFECT_API_URL=http://<your-prefect-server-for-bl733>:4200/api
# create the work-pools
prefect work-pool create new_file_733_pool
prefect work-pool create dispatcher_733_pool
prefect work-pool create prune_733_pool
prefect deploy
We can also preview a deployment: prefect deploy --output yaml, or deploy only one flow prefect deploy --name run_733_dispatcher.
The following script follows the above logic for deploying the flows in a streamlined fashion for the latest version of Prefect:
splash_flows/init_work_pools.py
Diagrams
Sequence Diagram
sequenceDiagram
participant T as Trigger<br/>Components
participant F as Prefect<br/>Flows
participant S as Storage &<br/>Processing
%% Initial Trigger
T->>T: Detector → File Watcher
T->>F: File Watcher<br/>triggers Dispatcher
F->>F: Dispatcher coordinates downstream Flows
%% Flow 1: new_file_733
rect rgb(220, 230, 255)
note over F,S: FLOW 1: new_file_733
F->>S: Access data733
S->>S: Globus Transfer to NERSC CFS
S->>S: Ingest metadata to SciCat
end
%% Flow 2: HPSS Transfer
rect rgb(220, 255, 230)
note over F,S: FLOW 2: Scheduled HPSS Transfer
F->>S: Access NERSC CFS
S->>S: SFAPI Transfer to HPSS Tape
S->>S: Ingest metadata to SciCat
end
%% Flow 3: HPC Analysis
rect rgb(255, 230, 230)
note over F,S: FLOW 3: HPC Downstream Analysis
F->>S: Access data733
S->>S: Globus Transfer to HPC
S->>S: Run HPC Compute Processing
S->>S: Return scratch data to data733
end
%% Flow 4: Scheduled Pruning
rect rgb(255, 255, 220)
note over F,S: FLOW 4: Scheduled Pruning
F->>S: Scheduled pruning jobs
S->>S: Prune old files from CFS
S->>S: Prune old files from data733
end
Data Infrastructure Workflows
---
config:
theme: neo
layout: elk
look: neo
---
flowchart
subgraph s1["new_file_733 Flow"]
n20["data733"]
n21["NERSC CFS"]
n22@{ label: "SciCat<br style=\"--tw-scale-x:\">[Metadata Database]" }
end
subgraph s2["HPSS Transfer Flow"]
n38["NERSC CFS"]
n39["HPSS Tape Archive"]
n40["SciCat <br>[Metadata Database]"]
end
subgraph s3["HPC Analysis Flow"]
n41["data733"]
n42["HPC<br>Filesystem"]
n43["HPC<br>Compute"]
end
n23["data733"] -- File Watcher --> n24["Dispatcher<br>[Prefect Worker]"]
n25["Detector"] -- Raw Data --> n23
n24 --> s1 & s2 & s3
n20 -- Raw Data [Globus Transfer] --> n21
n21 -- "<span style=color:>Metadata [SciCat Ingestion]</span>" --> n22
n32["Scheduled Pruning <br>[Prefect Workers]"] --> n35["NERSC CFS"] & n34["data733"]
n38 -- Raw Data [SFAPI Slurm htar Transfer] --> n39
n39 -- "<span style=color:>Metadata [SciCat Ingestion]</span>" --> n40
s2 --> n32
s3 --> n32
s1 --> n32
n41 -- Raw Data [Globus Transfer] --> n42
n42 -- Raw Data --> n43
n43 -- Scratch Data --> n42
n42 -- Scratch Data [Globus Transfer] --> n41
n20@{ shape: internal-storage}
n21@{ shape: disk}
n22@{ shape: db}
n38@{ shape: disk}
n39@{ shape: paper-tape}
n40@{ shape: db}
n41@{ shape: internal-storage}
n42@{ shape: disk}
n23@{ shape: internal-storage}
n24@{ shape: rect}
n25@{ shape: rounded}
n35@{ shape: disk}
n34@{ shape: internal-storage}
n20:::storage
n20:::Peach
n21:::Sky
n22:::Sky
n38:::Sky
n39:::storage
n40:::Sky
n41:::Peach
n42:::Sky
n43:::compute
n23:::collection
n23:::storage
n23:::Peach
n24:::collection
n24:::Rose
n25:::Ash
n32:::Rose
n35:::Sky
n34:::Peach
classDef collection fill:#D3A6A1, stroke:#D3A6A1, stroke-width:2px, color:#000000
classDef Rose stroke-width:1px, stroke-dasharray:none, stroke:#FF5978, fill:#FFDFE5, color:#8E2236
classDef storage fill:#A3C1DA, stroke:#A3C1DA, stroke-width:2px, color:#000000
classDef Ash stroke-width:1px, stroke-dasharray:none, stroke:#999999, fill:#EEEEEE, color:#000000
classDef visualization fill:#E8D5A6, stroke:#E8D5A6, stroke-width:2px, color:#000000
classDef Peach stroke-width:1px, stroke-dasharray:none, stroke:#FBB35A, fill:#FFEFDB, color:#8F632D
classDef Sky stroke-width:1px, stroke-dasharray:none, stroke:#374D7C, fill:#E2EBFF, color:#374D7C
classDef compute fill:#A9C0C9, stroke:#A9C0C9, stroke-width:2px, color:#000000
style s1 stroke:#757575
style s2 stroke:#757575
style s3 stroke:#757575
VM Details
The computing backend runs on a VM in the B15 server room that is managed by ALS IT staff.
Name: flow-733
OS: Ubuntu 24.04 LTS
We are using Ansible to streamline the development and support of this virtual machine. See https://github.com/als-computing/als_ansible/pull/4 for details.
Data Access for Users
Users can download their data from SciCat, our metadata database, where we keep track of file location history, and additional experiment metadata.