Files
attune/work-summary/phases/StackStorm-Lessons-Learned.md
2026-02-04 17:46:30 -06:00

2.6 KiB

StackStorm Lessons Learned

This project is inspired by another similar piece of software called StackStorm that suffers from some critical issues. It positions itself as an "if-this-then-that" job management system as well with actions, rules, and triggers at its center. Here are a few of the pitfalls that you should avoid:

  • StackStorm (st2) encourages high coupling with itself for custom actions, and provides minimal documentation around action and sensor services
  • There are minimal type hints (it is written in Python) throughout the project, and much of the properties are injected at runtime, so determining the types of things like the action service and sensor service by reading documentation is a massive pain. Attune should avoid this by implementing its core system in a type-strict language like Rust to avoid ambiguity.
  • Only Python-based packs are natively supported. Packs including scripts or code in other languages must be installed with custom logic. Accommodations should be made for packs to be able to declare language ecosystem dependencies to allow for dynamic installation of dependencies. This can be achieved through a workflow that is built for each dependency ecosystem.
  • The Python version used to run the st2 services is nearing or past EOL, and upgrading brings all the issues of dependency hell. Building in Rust won't directly solve this issue, but custom jobs should not be at all coupled to the Attune system so that upgrading dependencies the Attune system does not impact the viability of the indepenently built actions and workflows.
  • Inputs to all action executions are passed as env vars or cli args. This means that any user with login access to the server that runs StackStorm can read the secrets passed directly to actions. To avoid this security gap, there should be options to pass parameters via a standard-formatted text stream to stdin or to a file (like JSON or YAML)
  • Data streamed to stderr during the execution of actions is persisted to the database, which can cause jobs to fail unexpectedly just by logging too much. In order to address these issues, the stderr output from jobs should be dumped to a managed logfile system, with a rotating size-bound logfile per execution.
  • When utilizing policies to limit the execution of actions, when multiple executions of an action are simultaneously delayed because of the policy, the order that the delayed executions are scheduled in is not guaranteed to be the same order that they were requested. This implementation should include some kind of queueing system for actions that are delayed to ensure that this isn't an issue.