ARM TOSA Research

Quantized Softmax.
Reimagined.

Advanced integer-only approximations for ARM TOSA architecture. Pushing the boundaries of transformer efficiency on edge hardware.

Research Journey

The Story Behind the Research

From problem identification to novel solutions - explore the key chapters of this thesis.

1. The Challenge

Why deploying transformers on edge devices requires integer-only softmax approximations.

Read more

2. TOSA Framework

ARM's hardware specification: integer operations, TABLE lookup, and NPU constraints.

Learn more

3. Background 3 TOPICS

Integer arithmetic fundamentals, homogeneous operations, and non-linear challenges.

Explore topics

4. Full Integer Translation

Complete workflow for converting LLMs to integer arithmetic.

Learn more

5. DIGmax Solution

Adaptive multi-table approach achieving 80,000× better precision than baseline.

Learn more

6. Real-World Validation

Testing error propagation on Qwen2 language models through transformer layers.

Learn more

7. Key Findings

Which methods work best for different hardware constraints and accuracy requirements.

Learn more

8. Full Thesis

Complete methodology, analysis, experiments, and conclusions.

View PDF
Choose Your Path

Two Ways to Explore

Test real-world transformer inference or dive into theoretical mathematical analysis.

Error Propagation

Port
Offline

Real-world transformer inference with quantized attention. Observe how approximation errors propagate through actual language model layers.

  • Qwen2 language models (0.5B & 1.5B parameters)
  • 8+ softmax implementations compared
  • Layer-by-layer error tracking
  • Real-time streaming results
  • Batch testing capabilities
Launch Experiment

Theoretical Analysis

Port
Offline

Mathematical comparison of softmax approximation methods. Analyze error characteristics across different input ranges and visualize precision trade-offs.

  • Pure exp(x) function analysis
  • Custom input range selection
  • Interactive error visualization
  • Absolute & relative error metrics
  • Export plots as high-res PNG
Launch Experiment