Parallel Execution

The executor is the third stage of datamitsu's execution pipeline. It takes the ordered task groups from the planner and runs them — sequentially between groups, in parallel within groups — while enforcing fail-fast behavior so errors surface immediately.

Pre-Install Phase

Before any task runs, the runner installs every tool the plan needs up front. Immediately after planning and before the executor starts, it calls EnsureTools with the plan's deduped tool set, installing each distinct tool exactly once.

This is a hard invariant: all plan tools are installed before parallel execution begins. Installing ahead of time means no task ever triggers a lazy, on-demand install while sharing a tool with another task running concurrently. Same-tool concurrent installs cannot occur, so there is no race where one task observes a half-written or briefly-absent binary. Distinct tools may still install in parallel (that is safe — they touch different paths). Dry-run planning skips the installs entirely. A failure here aborts before any task runs, surfacing a clear install error instead of a confusing mid-execution failure.

Skipped Tools

A tool can end up skipped — neither run nor failed — for one of two reasons, both decided during planning so they appear in --explain and never reach the pre-install phase:

Disabled in config — the tool sets skip: true. Use this instead of conditionally omitting the tool: an omitted tool is invisible, a skipped one is reported.
No binary for this platform — the tool's binary has no build for the current OS/architecture/libc. This is a soft skip: the run still succeeds. (Previously this aborted the whole run with an install error.)

Skipped tools render as faint ⊘ <tool> skipped (<reason>) lines, contribute a · N skipped count to the operation footer, and surface as a skipped array in --explain=json. Other non-runs — a tool that matched no files, doesn't apply to the detected project types, or is disabled by .datamitsuignore — stay silent, as before.

The --fail-on-skip flag (on check/lint/fix) makes the run exit non-zero only for platform skips — a tool you expected to run had no binary for the runner. Intentional skip: true skips never fail the run.

Two-Layer Execution Model

Execution follows a two-layer structure:

Outer layer (sequential): Task groups execute one after another, ordered by priority. Group 1 must fully complete before Group 2 starts.
Inner layer (parallel): Within each group, non-overlapping tasks run concurrently across available CPU cores.

Why This Design

The two-layer model balances two competing goals:

Correctness: Formatters must finish before linters run, because formatters change files that linters check. Sequential groups enforce this ordering.
Performance: Independent tools at the same priority level (like eslint and tsc checking different file types) can safely run at the same time. Parallel execution within groups exploits this.

A fully sequential approach would be too slow in large monorepos. A fully parallel approach would produce incorrect results when tools depend on each other's output.

Fail-Fast Semantics

When any task fails, datamitsu cancels all remaining work immediately rather than continuing to run tools that are likely to fail or produce misleading results.

The mechanism works in three steps:

Failure detected: A task in the current group exits with a non-zero status.
Context cancelled: The executor calls the context cancellation function, signaling all in-flight and pending tasks.
Cleanup: Running tasks receive the cancellation signal. Tasks that haven't started yet skip execution immediately.

FailureReason Classification

Not all failures are equal. The executor classifies each failed task:

FailureReason	Meaning	Shown to User?
Independent	The tool failed on its own merit (real failure)	Yes
Cancelled	The task was terminated by another task's failure	No

This distinction is critical for usable error output. If eslint fails and causes tsc to be cancelled, the user sees only the eslint error — not a confusing cascade of cancellation messages. The runner filters out cancelled results when displaying output, so you only see the root cause.

Example scenario:

Group 2 contains: eslint (checking .ts files), tsc (checking .ts files)
Both start in parallel.

1. eslint finds errors → exits with code 1 → marked as Independent failure
2. Context cancelled
3. tsc receives cancellation → marked as Cancelled
4. User sees: only eslint errors (tsc result hidden)

Tool Ordering Best Practices

Priority values directly control the execution order. Here's a recommended pattern for a typical JavaScript/TypeScript project:

export function getConfig(input) {
  return {
    ...input,
    tools: {
      // Priority 10: Formatters run first
      // They modify files, so everything else must wait
      prettier: {
        name: "prettier",
        operations: {
          fix: {
            app: "prettier",
            args: ["--write", "{files}"],
            priority: 10,
            globs: ["**/*.{js,ts,tsx,css,json,md}"],
          },
        },
      },

      // Priority 20: Type checking and linting run after formatting
      // These read files but don't modify them, so they can run in parallel
      tsc: {
        name: "tsc",
        projectTypes: ["typescript"],
        operations: {
          lint: {
            app: "tsc",
            args: ["--noEmit"],
            priority: 20,
            globs: ["**/*.{ts,tsx}"],
            scope: "per-project",
          },
        },
      },
      eslint: {
        name: "eslint",
        operations: {
          lint: {
            app: "eslint",
            args: ["{files}"],
            priority: 20,
            globs: ["**/*.{js,ts,tsx}"],
          },
        },
      },

      // Priority 30: Custom checks that depend on lint passing
      "custom-check": {
        name: "custom-check",
        operations: {
          lint: {
            app: "custom-check",
            args: ["{files}"],
            priority: 30,
            globs: ["**/*.ts"],
          },
        },
      },
    },
  };
}

Why this order matters with fail-fast:

If prettier fails at priority 10, eslint and tsc never run — there's no point linting unformatted code.
If eslint fails at priority 20, custom-check at priority 30 never starts — it would likely fail too.
tsc and eslint share .ts/.tsx extensions, so at the same priority they run in separate sequential sub-groups within that group. Overlapping tools always run sequentially — the executor enforces this automatically.

Common mistakes:

Putting formatters and linters at the same priority: formatters modify files, so linters might see stale content if they run simultaneously.
Putting all tools at different priorities: this forces fully sequential execution, losing parallelism between independent tools.

Progress Tracking

The executor reports progress differently depending on the environment:

Interactive terminal (non-CI):

A live progress bar shows task completion across the current group.
Per-file progress updates the bar as each file is processed.
The bar disappears when the group completes.

CI environment:

No progress bar (CI terminals don't support carriage return rewriting).
A start message is printed when each tool begins: Starting prettier...
Percentage milestones are printed at 25% intervals to indicate liveness.
Final results appear after all groups complete.

The environment is detected automatically. CI mode activates when the CI environment variable is set (a convention followed by GitHub Actions, GitLab CI, Jenkins, and most CI systems). Datamitsu never sets CI=true itself — it respects whatever the system provides.

Concurrency Control

The number of parallel tasks is controlled by DATAMITSU_MAX_PARALLEL_WORKERS. The default is dynamic based on CPU count:

max(4, floor(NumCPU * 0.75)), capped at 16

On an 8-core machine, this means 6 parallel workers. On a 2-core machine, the minimum of 4 still applies. You can override this for CI environments where you want more or fewer workers:

# Use more workers on a beefy CI runner
DATAMITSU_MAX_PARALLEL_WORKERS=12 datamitsu check

# Use fewer workers to reduce memory pressure
DATAMITSU_MAX_PARALLEL_WORKERS=2 datamitsu check

Batch Chunking

When a tool operates on many files, the executor may need to split them into multiple batches to stay within command-line length limits (OS-imposed limits on argument length). Each batch runs as a separate process, and batches within the same task run in parallel.

This is transparent to tool configuration — you don't need to handle batching yourself. The executor automatically chunks the file list and merges results.

Pre-Install Phase​

Skipped Tools​

Two-Layer Execution Model​

Why This Design​

Fail-Fast Semantics​

FailureReason Classification​

Tool Ordering Best Practices​

Progress Tracking​

Concurrency Control​

Batch Chunking​