Skip to content

On the Wire

The MIDI & MPE page describes what Timbre does with MIDI data — voice allocation, CC mappings, MPE topology morphing. This page describes what Timbre looks like from the other side of the cable. When a performer plugs Timbre into a laptop and opens Ableton, Logic, or Bitwig, what appears in the MIDI preferences panel? When a MIDI monitor captures traffic, what SysEx headers identify the device? When a tech runs lsusb, what descriptor fields come back? These are not implementation details to figure out later — they are specification commitments that affect firmware, PCB layout, and third-party compatibility. Some must be finalized before hardware ships.

The ESP32-S3’s native USB peripheral, running the TinyUSB stack, presents Timbre as a class-compliant USB Audio device with a MIDI Streaming subclass. No driver installation is required on any major operating system — macOS, Windows 10+, and Linux all enumerate it automatically.

Descriptor FieldValueNotes
idVendor (VID)0x303AEspressif Systems — development use. Production PID via Espressif’s allocation program (TBD Phase 3)
idProduct (PID)TBDUnique product identifier within the VID namespace
iManufacturer”Tonifex”tonifex.warehack.ing. Decision point: finalize Phase 1
iProduct”Timbre”Device name as it appears in DAW MIDI preferences
bcdDeviceFirmware versionDate-based: YYMM encoding (e.g., 0x2601 for 2026-01)
bDeviceClassAudio (0x01)USB Audio Class
bDeviceSubClassMIDI Streaming (0x03)Standard USB-MIDI, no proprietary driver

Espressif maintains a free PID allocation program for products built on their silicon. Any manufacturer using an ESP32 variant can apply for a dedicated PID under Espressif’s VID (0x303A), avoiding the $6,000 USB-IF membership fee and the complexity of obtaining a private VID. During development, a default PID from the TinyUSB examples works fine — DAWs do not validate VID/PID pairs. For a shipping product, a registered PID ensures that no two devices on a customer’s USB bus claim the same identity.

The iProduct string “Timbre” is what a performer sees in their DAW’s MIDI device list. It should be short, distinctive, and stable — changing it between firmware versions would break saved MIDI routing configurations in every DAW project that references it.

USB is the primary interface for studio use, but DIN MIDI remains essential for live performance, vintage controller compatibility, and standalone operation without a computer. The MIDI 1.0 electrical specification, unchanged since Dave Smith and Ikutaro Kakehashi defined it in 1983, calls for a current-loop interface at 31,250 baud — an oddball rate chosen because it divides evenly from a 1 MHz clock, a practical concern for the 6850 UARTs common in early 1980s synthesizers.

The ESP32-S3’s UART1 peripheral handles the serial protocol. The physical interface requires external components for electrical isolation and line driving.

graph TD
    DIN_IN["DIN IN\n(5-pin)"] --> OPTO["H11L1\nOptoisolator"]
    OPTO --> RX["ESP32\nUART1 RX"]
    ESP32_TX["ESP32\nUART1 TX"] --> DRIVER["Line Driver\n(74HC04 / transistor)"]
    DRIVER --> DIN_OUT["DIN OUT\n(5-pin)"]
    DIN_IN --> |"buffered\npass-through"| THRU["DIN THRU\n(5-pin)"]

    style DIN_IN fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
    style OPTO fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style RX fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style ESP32_TX fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style DRIVER fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style DIN_OUT fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
    style THRU fill:#1a3a37,stroke:#0d9488,color:#ccfbf1

The optoisolator on the MIDI IN path is not optional — it is a specification requirement. The current-loop design intentionally galvanically isolates the sender from the receiver, breaking ground loops that would otherwise introduce hum in analog audio chains. The H11L1 (Schmitt-trigger output) is the industry standard choice; the 6N138 works but requires an external pull-up and has slower rise times. Either costs under $0.50.

DIN THRU is a buffered copy of DIN IN, passed through before the ESP32 processes anything. In a live performance chain — controller → Timbre → another synth — the THRU connector lets downstream devices receive the same MIDI stream without latency added by Timbre’s processing. The buffer is a simple logic gate or transistor stage, not a microcontroller pass-through.

TRS MIDI (3.5mm Type A, per the MMA standard) is increasingly common on modern instruments as a space-saving alternative to the full-size 5-pin DIN connector. The Arturia KeyStep, Novation Circuit, and Korg NTS-1 all use TRS MIDI. Timbre’s PCB should include a TRS header alongside the DIN connectors — it costs nothing but a few pads, and a performer with a TRS-equipped controller appreciates not needing an adapter.

USB MIDI allows a single physical USB connection to present multiple virtual MIDI ports, each carrying 16 channels independently. The question for Timbre is whether to expose one port or two.

A single-port configuration is the simplest and most compatible. All 16 MIDI channels arrive on one port. Every DAW, every controller, every MIDI routing utility handles single-port devices without configuration. DIN MIDI is inherently single-port (one serial stream, 16 channels), so a single USB port keeps the two interfaces behaviorally identical.

A dual-port configuration separates performance data (notes, CCs, pitch bend, MPE) from configuration data (SysEx, parameter queries, diagnostic commands). DAWs can route the performance port to a track and the configuration port to an editor/librarian panel, without either interfering with the other. Sequential’s Prophet Rev2 and several Native Instruments controllers use this approach — performance on port 1, configuration on port 2.

The recommendation is single port for Phase 1 and Phase 2, dual port from Phase 3 onward. The single-port approach reduces firmware complexity during the proof-of-concept stages and avoids compatibility edge cases with older DAWs and MIDI utilities. The transition to dual port is a firmware change (TinyUSB descriptor update), not a hardware change — the USB physical layer is identical either way.

System Exclusive messages are the MIDI mechanism for manufacturer-specific communication — patch dumps, parameter edits, firmware updates, diagnostic queries. Every SysEx message begins with F0 (Start of Exclusive), a manufacturer ID, and ends with F7 (End of Exclusive). Everything between those bookends is defined by the manufacturer.

The Manufacturer ID is the namespace. The MIDI Manufacturers Association (MMA) assigns registered IDs — a one-byte or three-byte prefix that guarantees global uniqueness. Registration costs $260–$400 depending on membership tier. For development, the specification reserves ID 0x7D as a non-commercial/educational identifier that any project may use without registration. Timbre uses 0x7D through Phase 2; a registered MMA ID is a Phase 3 milestone.

Every Timbre SysEx message follows this structure:

ByteValueDescription
1F0Start of Exclusive
27D (dev) or registered IDManufacturer ID
3DDDevice ID (007F, or 7F for broadcast)
4CCCommand byte
5–NdataCommand-specific payload (7-bit encoded)
N+1F7End of Exclusive

The device ID allows multiple Timbre units on the same MIDI bus to be addressed individually. In most setups there is only one, and 7F (broadcast) works fine. Multi-unit rigs — two Timbres in a keyboard split, for instance — assign each unit a unique device ID via a front-panel setting.

CommandByteDirectionPayloadDescription
Version Query01Host → TimbrenoneRequest firmware version
Version Reply02Timbre → HostASCII string (7-bit)Firmware version, date-based (e.g., “2026.03.08”)
Patch Dump Request10Host → Timbrepatch number (1 byte)Request a stored patch
Patch Dump11Eitherpatch data (7-bit encoded)Complete patch state
Patch Load12Host → Timbrepatch number + dataWrite a patch to storage
Parameter Get20Host → Timbrevoice ID + param IDQuery a single parameter
Parameter Set21Host → Timbrevoice ID + param ID + valueSet a single parameter
Parameter Reply22Timbre → Hostvoice ID + param ID + valueResponse to Parameter Get
Voice Architecture Query30Host → Timbrevoice IDQuery current topology of a voice
Voice Architecture Reply31Timbre → Hosttopology descriptorSC block configuration map
Diagnostic Register Read40Host → Timbrevoice ID + register addrRead a PSoC register (debug)
Diagnostic Register Reply41Timbre → Hostregister addr + valueRegister contents
Diagnostic Register Write42Host → Timbrevoice ID + register addr + valueWrite a PSoC register (debug)

All payload data is 7-bit encoded — MIDI reserves the high bit of every data byte for status detection. An 8-bit value like 0xFF must be split into two 7-bit bytes (0x01, 0x7F). This is a universal MIDI constraint, not a Timbre-specific choice. Version strings use printable ASCII, which is already 7-bit clean — no encoding overhead for human-readable text.

The diagnostic commands (4042) provide direct register access to individual PSoC voice chips via the ESP32’s I2C bus. These are development and diagnostic tools, not performance controls — they bypass the voice allocation layer entirely and address hardware registers by chip ID and address. A SysEx-aware MIDI monitor becomes a live hardware debugger.

MPE Configuration Messages (MCM) use RPN 6 (Registered Parameter Number 6) on channels 1 and 16 to negotiate MPE zones between a controller and an instrument. Timbre should announce its zone configuration on USB enumeration — when a performer plugs in, controllers that support MCM auto-detect the zone layout without manual setup.

The default configuration is a lower zone on channel 1 with 15 member channels (channels 2–16). This gives every voice its own MIDI channel for full per-note expression, which is the natural mapping for Timbre’s one-chip-per-voice architecture.

ConfigurationManager ChannelMember ChannelsUse Case
Full lower zone12–16 (15 voices)Single controller, full MPE expression
Split — lower + upper1, 162–8, 9–15 (7+7)Two controllers or keyboard split
Lower zone — reduced12–5 (4 voices)Phase 2 prototype (4 voices)
Standard MIDI (no MPE)1–16 independentLegacy controllers, multi-timbral use

Zone configuration is stored in the ESP32’s patch memory and transmitted via MCM on connection. Controllers like the Linnstrument, Roli Seaboard, and Haken Continuum respond to these messages automatically. Older controllers that do not understand MCM simply ignore them — the messages are RPN sequences on channels that the controller is not using, so they cause no harm.

The standard MMA-format implementation chart belongs in every synthesizer’s documentation. It codifies exactly what Timbre transmits and recognizes across the full MIDI specification — the definitive compatibility reference.

FunctionTransmittedRecognizedRemarks
Basic Channel1–161–16Default: 1 (standard), 1+2–16 (MPE)
ModeMode 3 (Poly)Mode 3, Mode 4 (MPE)Mode 4 = MPE mono-per-channel
Note Number0–127Musically useful: 24–108 (~5 octaves)
Velocity✓ (Note On/Off)Maps to VCA gain + optional filter offset
Aftertouch — KeyPer-note: filter, VCA, or osc mode (MPE)
Aftertouch — ChannelGlobal: configurable target
Pitch BendPer-channel in MPE; SC clock ratio offset
CC 1Modulation wheel — LFO depth
CC 7Channel volume
CC 10Pan position
CC 71Resonance / Q
CC 74Filter cutoff (standard) / Slide (MPE)
CC 91Reverb send (if external)
CC 120All Sound Off
CC 123All Notes Off
RPN 6 (MCM)MPE Configuration Message (via CC 101/100/6)
Program ChangePatch selection from stored presets
System ExclusiveSee SysEx Specification above
System CommonNot used
System Real Time — ClockOptional: LFO sync to external clock
System Real Time — Start/StopOptional: sequencer transport
Active SensingIf received, monitors for timeout
Universal SysEx — Identity ReplyPhase 1: responds to Identity Request

The “Transmitted” column is intentionally sparse — Timbre is primarily a sound source, not a controller. It transmits SysEx responses (version, patch dumps, parameter replies), MPE Configuration Messages, and Universal SysEx Identity Reply. Everything else flows inbound.

End-to-end latency — from a key press on a controller to audible sound from the output jack — is the metric that determines whether an instrument feels responsive or sluggish. The MMA considers latency below 3 ms imperceptible to performers. Timbre’s direct hardware path, with no DAW software in the loop, targets well below that threshold.

graph TD
    USB["USB Poll\n1 ms max"] --> PARSE["ESP32 Parse\n~50 μs"]
    DIN["DIN Byte\n320 μs/byte\n(~1 ms for Note On)"] --> PARSE
    PARSE --> ALLOC["Voice Alloc\n~10 μs"]
    ALLOC --> I2C["I2C Write\n~200 μs"]
    I2C --> REG["PSoC Register\nApply < 12 μs"]
    REG --> SETTLE["SC Settle\n~100 μs"]
    SETTLE --> SOUND["Audible\nOutput"]

    style USB fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
    style DIN fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
    style PARSE fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style ALLOC fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style I2C fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style REG fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style SETTLE fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style SOUND fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
StageUSB PathDIN PathNotes
Transport≤1 ms (USB poll interval)~1 ms (3 bytes × 320 μs)USB: host polls at 1 ms intervals. DIN: serial byte time at 31250 baud
ESP32 MIDI parse~50 μs~50 μsRunning buffer, status byte detection, channel routing
Voice allocation~10 μs~10 μsLookup in voice table, round-robin or lowest-available
I2C write~200 μs~200 μs8 bytes at 400 kHz Fast Mode to target voice chip
PSoC register apply<12 μs<12 μsRegister write propagates to analog block configuration
SC settling~100 μs~100 μsSwitched-capacitor filter reaches steady state
Total (single voice)~1.3 ms~1.4 msWell under the 3 ms perceptibility threshold

For a polyphonic chord, the I2C stage dominates. Each additional voice adds ~200 μs of I2C transaction time (the bus is shared and transactions are sequential). A six-note chord takes approximately 1.2 ms of I2C time, bringing the total to ~2.2 ms USB or ~2.3 ms DIN. Still under 3 ms. A full 16-voice cluster — every voice triggered simultaneously — reaches ~3.6 ms of I2C time, which is perceptible only in direct A/B comparison and inaudible in any musical context.

The ESP32-S3’s dual-core architecture helps isolate jitter. Core 0 handles MIDI reception (USB polling and DIN UART interrupts), parsing, and voice allocation. Core 1 handles the I2C bus — writing parameter updates to voice chips, reading back status, and managing the envelope update loop. This separation means a burst of incoming MIDI data does not stall I2C transactions, and a slow I2C write to a distant voice chip does not cause MIDI input buffer overruns.

USB jitter is inherent to the polling model — the host decides when to poll, and the interval varies up to 1 ms. This is the largest source of timing uncertainty in the USB path. DIN jitter is negligible — the UART is interrupt-driven and the byte timing is deterministic. For performers who need absolute timing precision (finger-drumming, tight rhythmic playing), DIN MIDI is technically superior to USB, though the difference is below the threshold of human perception for sustained notes.

For comparison: a typical DAW round-trip (MIDI in → software instrument → audio buffer → output) adds 5–10 ms of latency from audio buffer size alone, with jitter proportional to buffer size. Timbre’s direct hardware path — no software instrument, no audio buffer, no DAC — eliminates that entire pipeline. The signal is analog from the PSoC output jack onward.

The MIDI 2.0 specification, ratified by the MMA in 2020, introduces three layers of capability above MIDI 1.0: MIDI-CI (Capability Inquiry) for negotiation, Property Exchange for JSON-structured parameter access, and the Universal MIDI Packet (UMP) transport format with higher resolution (32-bit velocity, per-note pitch bend, per-note CCs).

Timbre’s relationship to MIDI 2.0 is pragmatic. The architecture does not depend on any MIDI 2.0 feature — everything described on this page and the MIDI & MPE page works over MIDI 1.0. The question is which MIDI 2.0 features provide enough benefit to justify the implementation cost at each phase.

Universal SysEx Identity Reply should ship from Phase 1. When a host sends a Universal SysEx Identity Request (F0 7E 7F 06 01 F7), Timbre responds with its manufacturer ID, device family, model number, and firmware version. This is simple to implement (a fixed-format response to a fixed-format query), widely supported by MIDI utilities, and gives every MIDI monitor and DAW a machine-readable way to identify the connected device. There is no reason to defer it.

MIDI-CI Profile Negotiation and Property Exchange are Phase 4 features. Profile Negotiation lets a controller discover that Timbre supports specific profiles (e.g., “drawbar organ,” “analog synth”) and configure itself accordingly — useful but not critical when the instrument is manually configured. Property Exchange exposes parameters as JSON-structured key-value pairs over SysEx, enabling rich editor/librarian interfaces without custom SysEx parsing — genuinely valuable, but the implementation cost (JSON serialization on the ESP32, schema definition, bidirectional state synchronization) is substantial.

UMP transport is contingent on the TinyUSB ecosystem. As of early 2026, TinyUSB’s MIDI 2.0 UMP support is experimental. The ESP32-S3’s USB peripheral can handle the packet format, but the software stack is not production-ready. UMP offers real musical benefits for Timbre — 32-bit velocity eliminates the 7-bit quantization that limits fine dynamic control, and per-note pitch bend at 32-bit resolution would make MPE slide-to-topology-morph dramatically smoother. When TinyUSB’s UMP class driver stabilizes, it deserves adoption. Until then, the 7-bit MIDI 1.0 values provide 128 steps per parameter — coarse by audiophile standards, but the SC block registers they ultimately control are themselves 7-bit, so higher-resolution MIDI data would not improve the analog output until the PSoC firmware interpolates between register steps.

Several specification choices on this page remain open. They are documented here as decisions with current recommendations and the phase at which they should be finalized.

DecisionCurrent RecommendationFinalize By
USB manufacturer string”Tonifex”Phase 1
VID/PIDEspressif dev VID (0x303A) now; dedicated PID via their program laterPhase 3
SysEx Manufacturer IDDevelopment ID (0x7D) now; registered MMA ID later ($260–$400)Phase 3
USB MIDI port count1 port (Phase 1–2), 2 ports (Phase 3+)Phase 3
DIN MIDI THRUInclude — needed for live performance chainsSchematic review
TRS MIDI (3.5mm Type A)Include as optional header on PCBSchematic review
MIDI 2.0 adoptionIdentity Reply only (Phase 1); full MIDI-CI Phase 4Phase 4

Voice allocation strategies and MPE per-note expression are detailed in MIDI & MPE. The I2C bus architecture, voice chip addressing, and bus timing are covered in System Architecture. Topology switching mechanics — the reconfiguration that MPE slide actually controls — are in Reconfiguration. Phase definitions and milestones are in the Development Roadmap. For a byte-level MIDI protocol reference, see MIDI Protocol. For Timbre’s consolidated spec sheet, see MIDI Implementation. For the printable MMA-standard chart, see the MIDI Implementation Chart.

ResourceRelevance
MIDI 1.0 Detailed Specification (MMA)Message format, SysEx structure, electrical specification, implementation chart format
MIDI Polyphonic Expression (MPE) Specification (MMA)Zone configuration, MCM messages, per-note CC semantics
MIDI 2.0 Core Specification (MMA/AMEI)MIDI-CI, Property Exchange, UMP transport
TinyUSB MIDI Class DocumentationESP32-S3 USB-MIDI descriptor configuration, multi-port setup
Espressif PID Allocation ProgramFree PID registration for ESP32-based products