On the Wire
The MIDI & MPE page describes what Timbre does with MIDI data — voice allocation, CC mappings, MPE topology morphing. This page describes what Timbre looks like from the other side of the cable. When a performer plugs Timbre into a laptop and opens Ableton, Logic, or Bitwig, what appears in the MIDI preferences panel? When a MIDI monitor captures traffic, what SysEx headers identify the device? When a tech runs lsusb, what descriptor fields come back? These are not implementation details to figure out later — they are specification commitments that affect firmware, PCB layout, and third-party compatibility. Some must be finalized before hardware ships.
USB Device Identity
Section titled “USB Device Identity”The ESP32-S3’s native USB peripheral, running the TinyUSB stack, presents Timbre as a class-compliant USB Audio device with a MIDI Streaming subclass. No driver installation is required on any major operating system — macOS, Windows 10+, and Linux all enumerate it automatically.
| Descriptor Field | Value | Notes |
|---|---|---|
| idVendor (VID) | 0x303A | Espressif Systems — development use. Production PID via Espressif’s allocation program (TBD Phase 3) |
| idProduct (PID) | TBD | Unique product identifier within the VID namespace |
| iManufacturer | ”Tonifex” | tonifex.warehack.ing. Decision point: finalize Phase 1 |
| iProduct | ”Timbre” | Device name as it appears in DAW MIDI preferences |
| bcdDevice | Firmware version | Date-based: YYMM encoding (e.g., 0x2601 for 2026-01) |
| bDeviceClass | Audio (0x01) | USB Audio Class |
| bDeviceSubClass | MIDI Streaming (0x03) | Standard USB-MIDI, no proprietary driver |
Espressif maintains a free PID allocation program for products built on their silicon. Any manufacturer using an ESP32 variant can apply for a dedicated PID under Espressif’s VID (0x303A), avoiding the $6,000 USB-IF membership fee and the complexity of obtaining a private VID. During development, a default PID from the TinyUSB examples works fine — DAWs do not validate VID/PID pairs. For a shipping product, a registered PID ensures that no two devices on a customer’s USB bus claim the same identity.
The iProduct string “Timbre” is what a performer sees in their DAW’s MIDI device list. It should be short, distinctive, and stable — changing it between firmware versions would break saved MIDI routing configurations in every DAW project that references it.
DIN MIDI Physical Layer
Section titled “DIN MIDI Physical Layer”USB is the primary interface for studio use, but DIN MIDI remains essential for live performance, vintage controller compatibility, and standalone operation without a computer. The MIDI 1.0 electrical specification, unchanged since Dave Smith and Ikutaro Kakehashi defined it in 1983, calls for a current-loop interface at 31,250 baud — an oddball rate chosen because it divides evenly from a 1 MHz clock, a practical concern for the 6850 UARTs common in early 1980s synthesizers.
The ESP32-S3’s UART1 peripheral handles the serial protocol. The physical interface requires external components for electrical isolation and line driving.
graph TD
DIN_IN["DIN IN\n(5-pin)"] --> OPTO["H11L1\nOptoisolator"]
OPTO --> RX["ESP32\nUART1 RX"]
ESP32_TX["ESP32\nUART1 TX"] --> DRIVER["Line Driver\n(74HC04 / transistor)"]
DRIVER --> DIN_OUT["DIN OUT\n(5-pin)"]
DIN_IN --> |"buffered\npass-through"| THRU["DIN THRU\n(5-pin)"]
style DIN_IN fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
style OPTO fill:#134e4a,stroke:#0d9488,color:#ccfbf1
style RX fill:#134e4a,stroke:#0d9488,color:#ccfbf1
style ESP32_TX fill:#134e4a,stroke:#0d9488,color:#ccfbf1
style DRIVER fill:#134e4a,stroke:#0d9488,color:#ccfbf1
style DIN_OUT fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
style THRU fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
The optoisolator on the MIDI IN path is not optional — it is a specification requirement. The current-loop design intentionally galvanically isolates the sender from the receiver, breaking ground loops that would otherwise introduce hum in analog audio chains. The H11L1 (Schmitt-trigger output) is the industry standard choice; the 6N138 works but requires an external pull-up and has slower rise times. Either costs under $0.50.
DIN THRU is a buffered copy of DIN IN, passed through before the ESP32 processes anything. In a live performance chain — controller → Timbre → another synth — the THRU connector lets downstream devices receive the same MIDI stream without latency added by Timbre’s processing. The buffer is a simple logic gate or transistor stage, not a microcontroller pass-through.
TRS MIDI (3.5mm Type A, per the MMA standard) is increasingly common on modern instruments as a space-saving alternative to the full-size 5-pin DIN connector. The Arturia KeyStep, Novation Circuit, and Korg NTS-1 all use TRS MIDI. Timbre’s PCB should include a TRS header alongside the DIN connectors — it costs nothing but a few pads, and a performer with a TRS-equipped controller appreciates not needing an adapter.
One Port or Many
Section titled “One Port or Many”USB MIDI allows a single physical USB connection to present multiple virtual MIDI ports, each carrying 16 channels independently. The question for Timbre is whether to expose one port or two.
A single-port configuration is the simplest and most compatible. All 16 MIDI channels arrive on one port. Every DAW, every controller, every MIDI routing utility handles single-port devices without configuration. DIN MIDI is inherently single-port (one serial stream, 16 channels), so a single USB port keeps the two interfaces behaviorally identical.
A dual-port configuration separates performance data (notes, CCs, pitch bend, MPE) from configuration data (SysEx, parameter queries, diagnostic commands). DAWs can route the performance port to a track and the configuration port to an editor/librarian panel, without either interfering with the other. Sequential’s Prophet Rev2 and several Native Instruments controllers use this approach — performance on port 1, configuration on port 2.
The recommendation is single port for Phase 1 and Phase 2, dual port from Phase 3 onward. The single-port approach reduces firmware complexity during the proof-of-concept stages and avoids compatibility edge cases with older DAWs and MIDI utilities. The transition to dual port is a firmware change (TinyUSB descriptor update), not a hardware change — the USB physical layer is identical either way.
SysEx Specification
Section titled “SysEx Specification”System Exclusive messages are the MIDI mechanism for manufacturer-specific communication — patch dumps, parameter edits, firmware updates, diagnostic queries. Every SysEx message begins with F0 (Start of Exclusive), a manufacturer ID, and ends with F7 (End of Exclusive). Everything between those bookends is defined by the manufacturer.
The Manufacturer ID is the namespace. The MIDI Manufacturers Association (MMA) assigns registered IDs — a one-byte or three-byte prefix that guarantees global uniqueness. Registration costs $260–$400 depending on membership tier. For development, the specification reserves ID 0x7D as a non-commercial/educational identifier that any project may use without registration. Timbre uses 0x7D through Phase 2; a registered MMA ID is a Phase 3 milestone.
Message Format
Section titled “Message Format”Every Timbre SysEx message follows this structure:
| Byte | Value | Description |
|---|---|---|
| 1 | F0 | Start of Exclusive |
| 2 | 7D (dev) or registered ID | Manufacturer ID |
| 3 | DD | Device ID (00–7F, or 7F for broadcast) |
| 4 | CC | Command byte |
| 5–N | data | Command-specific payload (7-bit encoded) |
| N+1 | F7 | End of Exclusive |
The device ID allows multiple Timbre units on the same MIDI bus to be addressed individually. In most setups there is only one, and 7F (broadcast) works fine. Multi-unit rigs — two Timbres in a keyboard split, for instance — assign each unit a unique device ID via a front-panel setting.
Command Table
Section titled “Command Table”| Command | Byte | Direction | Payload | Description |
|---|---|---|---|---|
| Version Query | 01 | Host → Timbre | none | Request firmware version |
| Version Reply | 02 | Timbre → Host | ASCII string (7-bit) | Firmware version, date-based (e.g., “2026.03.08”) |
| Patch Dump Request | 10 | Host → Timbre | patch number (1 byte) | Request a stored patch |
| Patch Dump | 11 | Either | patch data (7-bit encoded) | Complete patch state |
| Patch Load | 12 | Host → Timbre | patch number + data | Write a patch to storage |
| Parameter Get | 20 | Host → Timbre | voice ID + param ID | Query a single parameter |
| Parameter Set | 21 | Host → Timbre | voice ID + param ID + value | Set a single parameter |
| Parameter Reply | 22 | Timbre → Host | voice ID + param ID + value | Response to Parameter Get |
| Voice Architecture Query | 30 | Host → Timbre | voice ID | Query current topology of a voice |
| Voice Architecture Reply | 31 | Timbre → Host | topology descriptor | SC block configuration map |
| Diagnostic Register Read | 40 | Host → Timbre | voice ID + register addr | Read a PSoC register (debug) |
| Diagnostic Register Reply | 41 | Timbre → Host | register addr + value | Register contents |
| Diagnostic Register Write | 42 | Host → Timbre | voice ID + register addr + value | Write a PSoC register (debug) |
All payload data is 7-bit encoded — MIDI reserves the high bit of every data byte for status detection. An 8-bit value like 0xFF must be split into two 7-bit bytes (0x01, 0x7F). This is a universal MIDI constraint, not a Timbre-specific choice. Version strings use printable ASCII, which is already 7-bit clean — no encoding overhead for human-readable text.
The diagnostic commands (40–42) provide direct register access to individual PSoC voice chips via the ESP32’s I2C bus. These are development and diagnostic tools, not performance controls — they bypass the voice allocation layer entirely and address hardware registers by chip ID and address. A SysEx-aware MIDI monitor becomes a live hardware debugger.
MPE Configuration Messages
Section titled “MPE Configuration Messages”MPE Configuration Messages (MCM) use RPN 6 (Registered Parameter Number 6) on channels 1 and 16 to negotiate MPE zones between a controller and an instrument. Timbre should announce its zone configuration on USB enumeration — when a performer plugs in, controllers that support MCM auto-detect the zone layout without manual setup.
The default configuration is a lower zone on channel 1 with 15 member channels (channels 2–16). This gives every voice its own MIDI channel for full per-note expression, which is the natural mapping for Timbre’s one-chip-per-voice architecture.
| Configuration | Manager Channel | Member Channels | Use Case |
|---|---|---|---|
| Full lower zone | 1 | 2–16 (15 voices) | Single controller, full MPE expression |
| Split — lower + upper | 1, 16 | 2–8, 9–15 (7+7) | Two controllers or keyboard split |
| Lower zone — reduced | 1 | 2–5 (4 voices) | Phase 2 prototype (4 voices) |
| Standard MIDI (no MPE) | — | 1–16 independent | Legacy controllers, multi-timbral use |
Zone configuration is stored in the ESP32’s patch memory and transmitted via MCM on connection. Controllers like the Linnstrument, Roli Seaboard, and Haken Continuum respond to these messages automatically. Older controllers that do not understand MCM simply ignore them — the messages are RPN sequences on channels that the controller is not using, so they cause no harm.
MIDI Implementation Chart
Section titled “MIDI Implementation Chart”The standard MMA-format implementation chart belongs in every synthesizer’s documentation. It codifies exactly what Timbre transmits and recognizes across the full MIDI specification — the definitive compatibility reference.
| Function | Transmitted | Recognized | Remarks |
|---|---|---|---|
| Basic Channel | 1–16 | 1–16 | Default: 1 (standard), 1+2–16 (MPE) |
| Mode | Mode 3 (Poly) | Mode 3, Mode 4 (MPE) | Mode 4 = MPE mono-per-channel |
| Note Number | — | 0–127 | Musically useful: 24–108 (~5 octaves) |
| Velocity | — | ✓ (Note On/Off) | Maps to VCA gain + optional filter offset |
| Aftertouch — Key | — | ✓ | Per-note: filter, VCA, or osc mode (MPE) |
| Aftertouch — Channel | — | ✓ | Global: configurable target |
| Pitch Bend | — | ✓ | Per-channel in MPE; SC clock ratio offset |
| CC 1 | — | ✓ | Modulation wheel — LFO depth |
| CC 7 | — | ✓ | Channel volume |
| CC 10 | — | ✓ | Pan position |
| CC 71 | — | ✓ | Resonance / Q |
| CC 74 | — | ✓ | Filter cutoff (standard) / Slide (MPE) |
| CC 91 | — | ✓ | Reverb send (if external) |
| CC 120 | — | ✓ | All Sound Off |
| CC 123 | — | ✓ | All Notes Off |
| RPN 6 (MCM) | ✓ | ✓ | MPE Configuration Message (via CC 101/100/6) |
| Program Change | — | ✓ | Patch selection from stored presets |
| System Exclusive | ✓ | ✓ | See SysEx Specification above |
| System Common | — | — | Not used |
| System Real Time — Clock | — | ✓ | Optional: LFO sync to external clock |
| System Real Time — Start/Stop | — | ✓ | Optional: sequencer transport |
| Active Sensing | — | ✓ | If received, monitors for timeout |
| Universal SysEx — Identity Reply | ✓ | — | Phase 1: responds to Identity Request |
The “Transmitted” column is intentionally sparse — Timbre is primarily a sound source, not a controller. It transmits SysEx responses (version, patch dumps, parameter replies), MPE Configuration Messages, and Universal SysEx Identity Reply. Everything else flows inbound.
Latency Budget
Section titled “Latency Budget”End-to-end latency — from a key press on a controller to audible sound from the output jack — is the metric that determines whether an instrument feels responsive or sluggish. The MMA considers latency below 3 ms imperceptible to performers. Timbre’s direct hardware path, with no DAW software in the loop, targets well below that threshold.
graph TD
USB["USB Poll\n1 ms max"] --> PARSE["ESP32 Parse\n~50 μs"]
DIN["DIN Byte\n320 μs/byte\n(~1 ms for Note On)"] --> PARSE
PARSE --> ALLOC["Voice Alloc\n~10 μs"]
ALLOC --> I2C["I2C Write\n~200 μs"]
I2C --> REG["PSoC Register\nApply < 12 μs"]
REG --> SETTLE["SC Settle\n~100 μs"]
SETTLE --> SOUND["Audible\nOutput"]
style USB fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
style DIN fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
style PARSE fill:#134e4a,stroke:#0d9488,color:#ccfbf1
style ALLOC fill:#134e4a,stroke:#0d9488,color:#ccfbf1
style I2C fill:#134e4a,stroke:#0d9488,color:#ccfbf1
style REG fill:#134e4a,stroke:#0d9488,color:#ccfbf1
style SETTLE fill:#134e4a,stroke:#0d9488,color:#ccfbf1
style SOUND fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
Pipeline Stages
Section titled “Pipeline Stages”| Stage | USB Path | DIN Path | Notes |
|---|---|---|---|
| Transport | ≤1 ms (USB poll interval) | ~1 ms (3 bytes × 320 μs) | USB: host polls at 1 ms intervals. DIN: serial byte time at 31250 baud |
| ESP32 MIDI parse | ~50 μs | ~50 μs | Running buffer, status byte detection, channel routing |
| Voice allocation | ~10 μs | ~10 μs | Lookup in voice table, round-robin or lowest-available |
| I2C write | ~200 μs | ~200 μs | 8 bytes at 400 kHz Fast Mode to target voice chip |
| PSoC register apply | <12 μs | <12 μs | Register write propagates to analog block configuration |
| SC settling | ~100 μs | ~100 μs | Switched-capacitor filter reaches steady state |
| Total (single voice) | ~1.3 ms | ~1.4 ms | Well under the 3 ms perceptibility threshold |
For a polyphonic chord, the I2C stage dominates. Each additional voice adds ~200 μs of I2C transaction time (the bus is shared and transactions are sequential). A six-note chord takes approximately 1.2 ms of I2C time, bringing the total to ~2.2 ms USB or ~2.3 ms DIN. Still under 3 ms. A full 16-voice cluster — every voice triggered simultaneously — reaches ~3.6 ms of I2C time, which is perceptible only in direct A/B comparison and inaudible in any musical context.
The ESP32-S3’s dual-core architecture helps isolate jitter. Core 0 handles MIDI reception (USB polling and DIN UART interrupts), parsing, and voice allocation. Core 1 handles the I2C bus — writing parameter updates to voice chips, reading back status, and managing the envelope update loop. This separation means a burst of incoming MIDI data does not stall I2C transactions, and a slow I2C write to a distant voice chip does not cause MIDI input buffer overruns.
USB jitter is inherent to the polling model — the host decides when to poll, and the interval varies up to 1 ms. This is the largest source of timing uncertainty in the USB path. DIN jitter is negligible — the UART is interrupt-driven and the byte timing is deterministic. For performers who need absolute timing precision (finger-drumming, tight rhythmic playing), DIN MIDI is technically superior to USB, though the difference is below the threshold of human perception for sustained notes.
For comparison: a typical DAW round-trip (MIDI in → software instrument → audio buffer → output) adds 5–10 ms of latency from audio buffer size alone, with jitter proportional to buffer size. Timbre’s direct hardware path — no software instrument, no audio buffer, no DAC — eliminates that entire pipeline. The signal is analog from the PSoC output jack onward.
MIDI 2.0 and MIDI-CI
Section titled “MIDI 2.0 and MIDI-CI”The MIDI 2.0 specification, ratified by the MMA in 2020, introduces three layers of capability above MIDI 1.0: MIDI-CI (Capability Inquiry) for negotiation, Property Exchange for JSON-structured parameter access, and the Universal MIDI Packet (UMP) transport format with higher resolution (32-bit velocity, per-note pitch bend, per-note CCs).
Timbre’s relationship to MIDI 2.0 is pragmatic. The architecture does not depend on any MIDI 2.0 feature — everything described on this page and the MIDI & MPE page works over MIDI 1.0. The question is which MIDI 2.0 features provide enough benefit to justify the implementation cost at each phase.
Universal SysEx Identity Reply should ship from Phase 1. When a host sends a Universal SysEx Identity Request (F0 7E 7F 06 01 F7), Timbre responds with its manufacturer ID, device family, model number, and firmware version. This is simple to implement (a fixed-format response to a fixed-format query), widely supported by MIDI utilities, and gives every MIDI monitor and DAW a machine-readable way to identify the connected device. There is no reason to defer it.
MIDI-CI Profile Negotiation and Property Exchange are Phase 4 features. Profile Negotiation lets a controller discover that Timbre supports specific profiles (e.g., “drawbar organ,” “analog synth”) and configure itself accordingly — useful but not critical when the instrument is manually configured. Property Exchange exposes parameters as JSON-structured key-value pairs over SysEx, enabling rich editor/librarian interfaces without custom SysEx parsing — genuinely valuable, but the implementation cost (JSON serialization on the ESP32, schema definition, bidirectional state synchronization) is substantial.
UMP transport is contingent on the TinyUSB ecosystem. As of early 2026, TinyUSB’s MIDI 2.0 UMP support is experimental. The ESP32-S3’s USB peripheral can handle the packet format, but the software stack is not production-ready. UMP offers real musical benefits for Timbre — 32-bit velocity eliminates the 7-bit quantization that limits fine dynamic control, and per-note pitch bend at 32-bit resolution would make MPE slide-to-topology-morph dramatically smoother. When TinyUSB’s UMP class driver stabilizes, it deserves adoption. Until then, the 7-bit MIDI 1.0 values provide 128 steps per parameter — coarse by audiophile standards, but the SC block registers they ultimately control are themselves 7-bit, so higher-resolution MIDI data would not improve the analog output until the PSoC firmware interpolates between register steps.
Design Decisions
Section titled “Design Decisions”Several specification choices on this page remain open. They are documented here as decisions with current recommendations and the phase at which they should be finalized.
| Decision | Current Recommendation | Finalize By |
|---|---|---|
| USB manufacturer string | ”Tonifex” | Phase 1 |
| VID/PID | Espressif dev VID (0x303A) now; dedicated PID via their program later | Phase 3 |
| SysEx Manufacturer ID | Development ID (0x7D) now; registered MMA ID later ($260–$400) | Phase 3 |
| USB MIDI port count | 1 port (Phase 1–2), 2 ports (Phase 3+) | Phase 3 |
| DIN MIDI THRU | Include — needed for live performance chains | Schematic review |
| TRS MIDI (3.5mm Type A) | Include as optional header on PCB | Schematic review |
| MIDI 2.0 adoption | Identity Reply only (Phase 1); full MIDI-CI Phase 4 | Phase 4 |
References
Section titled “References”Cross-References
Section titled “Cross-References”Voice allocation strategies and MPE per-note expression are detailed in MIDI & MPE. The I2C bus architecture, voice chip addressing, and bus timing are covered in System Architecture. Topology switching mechanics — the reconfiguration that MPE slide actually controls — are in Reconfiguration. Phase definitions and milestones are in the Development Roadmap. For a byte-level MIDI protocol reference, see MIDI Protocol. For Timbre’s consolidated spec sheet, see MIDI Implementation. For the printable MMA-standard chart, see the MIDI Implementation Chart.
External
Section titled “External”| Resource | Relevance |
|---|---|
| MIDI 1.0 Detailed Specification (MMA) | Message format, SysEx structure, electrical specification, implementation chart format |
| MIDI Polyphonic Expression (MPE) Specification (MMA) | Zone configuration, MCM messages, per-note CC semantics |
| MIDI 2.0 Core Specification (MMA/AMEI) | MIDI-CI, Property Exchange, UMP transport |
| TinyUSB MIDI Class Documentation | ESP32-S3 USB-MIDI descriptor configuration, multi-port setup |
| Espressif PID Allocation Program | Free PID registration for ESP32-based products |