Skip to main content

Internal Architecture

This section explains how datamitsu works under the hood. Understanding the internal execution model helps wrapper maintainers optimize tool configurations and advanced users debug unexpected behavior.

How It All Fits Together

When you run datamitsu check, the system moves through four stages:

  1. File Discovery walks the repository tree, respecting .gitignore rules, and collects all files that match tool glob patterns.
  2. Task Planning groups matched files into tasks based on tool priorities, scopes, and project boundaries. Overlapping globs are detected and resolved.
  3. Parallel Execution runs task groups sequentially by priority level, but tasks within each group run in parallel across available CPU cores.
  4. Cache Update records results per file so unchanged files are skipped on the next run.

Why This Matters

For wrapper maintainers: Understanding how priorities and overlap detection work lets you write tool configurations that maximize parallelism. Misconfigured priorities can serialize tools that could run in parallel, slowing down CI pipelines.

For advanced users: Knowing how file discovery and caching interact explains why certain files are or aren't processed, and how to force cache invalidation when needed.

Components

Each stage has its own detailed documentation:

ComponentWhat It DoesKey Concepts
Task PlanningGroups files into prioritized task batchesPriority chunking, overlap detection, CWD-subtree restriction
Parallel ExecutionRuns tasks with fail-fast semanticsTwo-layer model, context cancellation, progress tracking
File DiscoveryWalks the repo respecting ignore rules.gitignore-aware traversal, project auto-detection
Caching StrategyTracks per-file results for incremental runsXXH3-128 invalidation keys, separate lint/fix tracking

Reading Order

If you're new to datamitsu's internals, read in this order:

  1. File Discovery -- how files enter the system
  2. Task Planning -- how files become tasks
  3. Parallel Execution -- how tasks run
  4. Caching Strategy -- how results persist between runs

If you're debugging a specific issue, jump directly to the relevant component page.