Build with tensors, not toolchains

Because making a model work isn’t enough—it needs to behave the same, everywhere, and explainably. You’re not just chasing accuracy; you’re chasing determinism, traceability, and portability.

Open in Playground See architecture Read the math note

We respect your privacy. Unsubscribe any time.

Swap the index set, not the operator.

Attention map

Architecture

The DSL parses into a graph-based IR. Execution adapters lower IR to the chosen backend with minimal impedance.

Fuse is speaking

01.

Facts (sparse booleans)

Write the base relations as sparse tensor facts.
02.

Symbolic rule as an einsum + step

Compose the Aunt rule via einsum + step with shape validation.
03.

Embeddings + learnable relation

Project embeddings and learn the relation tensor W.
04.

Fuse symbolic + neural signal

Blend rule outputs with relation scores.
05.

Super simple training objective

Minimize loss and prepare artifacts.

Execution flow

What Fuse provides

Define compact tensor equations, parse to an IR, optimize, and execute across backends with consistent semantics.

Shared IR

One place to optimize & audit.

Multi-backend

Identical semantics across NumPy, Torch, JAX.

Optimizations

Shape checks, fusion, and plan‑level transforms.

Why Fuse vs raw PyTorch/JAX/Einops?

Capability

Fuse

Raw

Shape‑safe joins
Built‑in checks at IR level

Manual asserts, runtime errors
Declarative masks
Masks as first‑class tensors

Ad‑hoc boolean logic
Set‑algebra over positions
Index sets: unions, intersects

Loops and masking by hand
Single IR
Optimize once, emit many

Rewrite per backend
Multi‑backend parity
NumPy/Torch/JAX aligned

Divergent kernels/semantics

Get started

Choose one

pip install fuse

uv add fuse

Performance parity

Fuse lowers to native kernels; the IR schedules matmul/softmax paths that vendors optimize. You keep equations high‑level while traces ensure no extra copies.

Comp = softmax(Q Kᵀ / sqrt(Dk))
# emits fused-kernel path on backend

Gradient correctness

The IR is differentiable; backends reuse their autograd/JVP machinery. We expose checks to assert shapes and stop gradients where intended.

dLoss/dW = grad(Loss, W)

Export to PyTorch modules

Emit Torch modules with the same shapes and parameters. Keep training loops and optimizers unchanged.

fuse emit --target=torch --module AttnBlock

Debugging (log Comp/Entropy)

Log attention maps and entropy with one line at the IR. No manual hooks or tensor cloning required.

log('H', entropy(Comp[b,h,p,p′]))

Incremental adoption

Start with one block; keep the rest in PyTorch/JAX. Interfaces are tensors in/tensors out, so you can land and expand safely.

attn = fuse_block(stream)  # drop-in within your model
loss = criterion(attn, target)

Early access

Join the waitlist

Be first to try Fuse. Only major updates and launches, no noise.