Building a Compiler from Scratch
TypeScript · Node.js · Recursive Descent Parser · AST · Bytecode Generation · 2025
Overview
A production-grade TypeScript compiler for a domain-specific scripting language used in an embedded hardware automation environment. Built as a 5-stage pipeline targeting a constrained bytecode runtime, with full language coverage, actionable error messages, and deep integration with IDE tooling.
The Challenge
The target language runs in a constrained embedded environment — limited memory, strict frame timing, and unusual control-flow constructs. The goal was a modern compiler pipeline that slots into an existing ecosystem while unlocking better developer experience.
Architecture
Source Code (.gpc)
|
[Scanner] -- Tokenization (keywords, literals, operators)
|
[Parser] -- Recursive descent, AST construction
|
[Analyzer] -- Semantic analysis, scope resolution, type checking
|
[Compiler] -- AST to intermediate representation
|
[Generator] -- IR to bytecode assembly + raw binary
|
Output (.bin) Key Technical Decisions
Recursive Descent over Parser Generators — Chose hand-written recursive descent for full control over error messages and recovery. The language has unusual constructs (combo blocks, hardware-specific keywords) that would fight a generated parser.
Two-Pass Semantic Analysis — First pass collects all declarations (functions, defines, data sections). Second pass resolves references, validates types, and checks constraints. This allows forward references without requiring declaration order.
Constrained Bytecode Generation — The target runtime imposes strict memory and timing envelopes. The code generator budgets instruction counts, aligns data sections, and emits output that slots cleanly into the existing ecosystem toolchain.
Compiler Statistics
| Metric | Value |
|---|---|
| Opcodes implemented | 61 (full language surface) |
| Language features | All (functions, combos, data sections, defines, remaps) |
| Scanner tokens | 45+ token types |
| AST node types | 30+ |
| Error codes | 41 with human-readable messages |
| Test coverage | Golden-file regression suite + runtime acceptance tests |
Verification Strategy
The test suite runs real-world scripts through the full pipeline and asserts that the generated output interoperates correctly with the runtime under the expected memory and timing envelope.
Key Learnings
- Compiler pipelines reward discipline — every pass should have one job and one invariant, and stages should be testable in isolation
- Error messages are a product feature — users see compiler errors more than they see working code
- The scanner is the simplest stage but has the most edge cases (string escaping, numeric formats, comment nesting)
- Forward references make users happy but make the compiler author's life harder — worth the trade