Skip to content

Multi-Chip Vocoder

The multi-chip architecture enables roles beyond polyphony. Instead of every CY8C29466 running the same voice firmware on different notes, chips can be assigned fundamentally different jobs — and the most compelling demonstration of this is a channel vocoder. One chip listens: spectral analysis of microphone input, decomposing a voice into frequency bands. Other chips speak: a carrier signal shaped by that analysis, resynthesized through the same switched-capacitor fabric that produces normal synth voices. The result is the classic vocoder effect — a musical instrument that talks — built entirely from reconfigurable analog blocks and an I2C bus.

The block budget arithmetic eliminates single-chip vocoder designs immediately. A minimal channel vocoder needs analysis filters (bandpass bank on the modulator), envelope followers for each band, a carrier source, synthesis filters (matching bandpass bank on the carrier), and gain control elements to apply the extracted envelopes. For an 8-band vocoder, the component count works out to roughly 44 analog blocks — nearly four times what a single CY8C29466 provides.

ComponentBlocks per band4-band total8-band total
Analysis bandpass (biquad)2 SC816
Envelope follower1 SC + 1 CT816
Carrier source3–43–4
Synthesis bandpass (biquad)2 SC816
VCA / gain element~0.524
Output mixer11
Total~30~56

Even a 3-band vocoder needs roughly 19 blocks. The single-chip path does not produce intelligible speech — three bands capture broad tonal contour but lose the consonant detail that makes words recognizable.

The answer is the same architecture Timbre already uses for polyphony: distribute the work across multiple chips connected by I2C, with the ESP32 as the bridge. This is Model D (Hybrid) applied to a new purpose — different firmware images loaded via ISSP, each chip assigned a specialized role.

graph LR
    subgraph analysis["Analysis Chip (CY8C29466)"]
        MIC["Electret Mic"] --> PRE["Preamp<br/>(Type C)"]
        PRE --> AA["Anti-alias<br/>LPF"]
        AA --> ADC["ADC<br/>(SC blocks)"]
        ADC --> GOER["M8C Goertzel<br/>8–16 bins"]
    end

    GOER --> ESP["ESP32<br/>Bridge"]

    subgraph synth["Synthesis Chip (CY8C29466)"]
        CAR["Carrier<br/>(self-osc)"] --> BP1["Biquad<br/>Band 1"]
        CAR --> BP2["Biquad<br/>Band 2"]
        CAR --> BP3["Biquad<br/>Band 3"]
        CAR --> BP4["Biquad<br/>Band 4"]
        BP1 --> MIX["Output<br/>Mixer"]
        BP2 --> MIX
        BP3 --> MIX
        BP4 --> MIX
        MIX --> AOUT["Audio Out"]
    end

    ESP --> |"envelope data<br/>(I2C)"| synth

    style MIC fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
    style PRE fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style AA fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style ADC fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style GOER fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style ESP fill:#134e4a,stroke:#0d9488,color:#ccfbf1
    style CAR fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
    style BP1 fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
    style BP2 fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
    style BP3 fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
    style BP4 fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
    style MIX fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
    style AOUT fill:#1a3a37,stroke:#0d9488,color:#ccfbf1
    style analysis fill:#0f2d2a,stroke:#0d9488,color:#ccfbf1
    style synth fill:#0f2d2a,stroke:#0d9488,color:#ccfbf1

The analysis chip decomposes the microphone signal into spectral envelope data. The ESP32 reads that data over I2C, optionally smooths or normalizes it, and writes it to one or more synthesis chips as per-band gain values. The synthesis chips shape a carrier signal (self-oscillation from the voice engine) through matched bandpass filters whose gains track the spectral envelope. The output is the carrier’s pitch with the microphone’s spectral shape imposed on it — a talking synthesizer.

Three approaches exist for extracting spectral envelopes from the microphone signal, each trading off analog purity against band count and complexity. The recommended approach uses the M8C’s CPU for frequency analysis, reserving the analog blocks for signal conditioning.

The most aesthetically appealing approach: build the entire analysis bank from switched-capacitor biquad filters, each tuned to a different center frequency, with envelope followers extracting the amplitude of each band. A biquad bandpass requires 2 SC blocks. An envelope follower (half-wave rectifier into lowpass) takes 1 SC + 1 Type C block. That is 4 blocks per band — 3 bands maximum from 12 blocks.

Three bands is marginal for speech intelligibility, but time-multiplexing helps. SC blocks above 800 Hz settle fast enough to be reconfigured between analysis passes. A hybrid scheme — 2 static low-frequency bands (below 800 Hz, where settling time matters) plus swept high-frequency bands — could push the effective band count to 6–8 by rapidly reconfiguring 4 blocks through multiple center frequencies per analysis frame. This is elegant but requires careful firmware choreography to hit the timing windows.

The pragmatic approach: dedicate 3–4 analog blocks to signal conditioning (preamp, anti-alias filter, ADC), and run the spectral analysis entirely in firmware using the Goertzel algorithm. Goertzel computes the energy at specific frequencies — equivalent to individual DFT bins — with minimal memory and no FFT butterfly overhead. Each bin requires roughly 3 multiplies and 5 additions per sample.

At an 8 kHz sample rate (adequate for speech, well within the M8C’s ADC capability), the CPU budget per sample is:

BinsCycles per sample% of M8C budget (24 MHz)
8~1,600~53%
12~2,400~80%
16~3,200tight (~107%)

Eight bins provide recognizable speech. Twelve is comfortable for most vocoder applications. Sixteen pushes the M8C to its limit — possible with careful cycle counting and reduced overhead, but 12 is the sweet spot. The remaining CPU handles I2C slave communication (responding to envelope data reads from the ESP32) and housekeeping.

The advantage over the pure analog approach is decisive: 8–12 frequency bins from 3–4 analog blocks, versus 3 bins from all 12 blocks. The tradeoff is that the analysis path passes through an ADC — it is no longer purely analog. For a vocoder, where the analysis signal is a microphone and the output quality depends on band count rather than analog purity, this is an easy trade.

A middle path: configure a single SC biquad (4 blocks including envelope follower) and reconfigure its center frequency for each band in sequence. The M8C sweeps the bandpass across the spectrum, sampling the envelope at each center frequency before moving to the next. One complete sweep produces amplitude data for all bands.

SC blocks above 800 Hz settle in under 100 microseconds, allowing a full 8-band sweep in roughly 5–8 ms. Below 800 Hz, settling time increases — the lowest bands may need dedicated static filters or longer dwell times. This approach uses fewer blocks than the pure analog bank (4 vs. 12 for 3 bands) and preserves analog purity throughout, but adds latency proportional to the number of bands.

ApproachMax bandsAnalog blocksAnalog purityCPU loadLatency
Pure analog (SC bank)3 (static), 6–8 (time-muxed)12FullLowLow
ADC + Goertzel8–123–4ADC in pathHighLow
Swept bandpass8+4FullMedium5–8 ms per sweep

Each synthesis chip implements a fixed-band vocoder output stage: a carrier oscillator feeding parallel bandpass filters whose gains are modulated by the envelope data from the ESP32.

FunctionBlocksType
Carrier (self-oscillating biquad)3–42 SC + 1–2 CT
Bandpass filter 1 (biquad)2SC
Bandpass filter 2 (biquad)2SC
Bandpass filter 3 (biquad)2SC
Bandpass filter 4 (biquad)2SC
Output mixer1Type C
Total12–13Full budget

Four bands per synthesis chip — the block budget is fully committed. An 8-band vocoder needs two synthesis chips, each handling four bands of the spectrum. The carrier oscillator uses the same self-oscillation modes described in The Voice Engine — bandpass self-oscillation for a clean carrier, relaxation oscillation for a harmonically richer one. The choice of carrier shape is a timbral decision: a sawtooth carrier produces a brighter, buzzier vocoder voice; a sine carrier produces a smoother, more organ-like one.

Gain control for each band uses the SC block’s capacitor ratio registers. The ratio between input and feedback capacitors sets the gain of each biquad stage. These registers are 7-bit (128 steps), providing roughly 42 dB of dynamic range in discrete steps. At the ESP32’s I2C update rate of 1–2 ms per cycle, the gain steps are fast enough to track speech envelope dynamics — syllable boundaries change on a 30–50 ms timescale, and even plosive consonants have attack times of 5–10 ms.

The ESP32 sits between analysis and synthesis, reading envelope data from the analysis chip and writing gain values to the synthesis chips. The I2C bandwidth is comfortable for this role.

OperationTimeDetail
Read 8 envelope bytes from analysis chip~200 μs8 bytes at 400 kHz Fast Mode
Write 4 gain bytes to synthesis chip 1~100 μs4 bands per chip
Write 4 gain bytes to synthesis chip 2~100 μsSecond chip for 8-band
Full round trip (8-band)~400 μsWell under 1 ms
Full round trip (4-band)~300 μsSingle synthesis chip

Speech envelopes change on a 5–10 ms timescale. The I2C round trip of 400 μs provides over 10× headroom — the ESP32 can update synthesis gains at a rate far higher than the signal demands. This slack is useful: the ESP32 can apply smoothing, normalization, or nonlinear mapping to the envelope data in transit. Logarithmic compression of the envelope values improves intelligibility by preventing loud bands from dominating. Per-band attack and release times, computed in the ESP32, can be tuned to favor consonant transients (fast attack, moderate release) without modifying the PSoC firmware.

The envelope data format is simple: one byte per band, transmitted using the same I2C parameter protocol that normal voice operation uses. From the synthesis chip’s perspective, an envelope gain update is indistinguishable from a filter cutoff or VCA parameter change — the I2C command format is identical, only the target register differs.

The analysis chip needs a microphone input stage. An electret condenser microphone with a simple bias circuit provides a line-level signal suitable for the PSoC’s analog input:

ComponentValuePurpose
Electret capsuleStandard 2-terminalTransducer
Bias resistor2.2 kΩ to VCCPhantom power for electret FET
DC-blocking capacitor100 nF – 1 μFRemove DC bias from signal
Input resistor10 kΩLimit current into PSoC pin

The electret output (typically 5–50 mV peak) feeds a Type C block configured as a preamp with gain of 10–40×, bringing the signal to a level suitable for the ADC or SC filter input. If using the Goertzel approach, an anti-alias lowpass filter (SC biquad, 2 blocks, cutoff at half the sample rate) precedes the ADC to prevent high-frequency content from folding into the analysis bins.

The number of frequency bands determines what the vocoder can reproduce. More bands means finer spectral detail and more recognizable speech — but also more hardware (analysis complexity, synthesis chips, I2C traffic).

Band countCenter frequencies (Hz)Character
4250, 700, 2000, 5500Tonal modulation — vowels recognizable, consonants lost. Talkbox-adjacent.
8200, 400, 700, 1200, 2000, 3200, 5000, 8000Classic robot voice — words distinguishable, speech recognizable.
12150, 250, 400, 630, 1000, 1600, 2500, 3500, 4500, 5500, 7000, 9000Clear vocoder speech — approaching traditional hardware vocoder quality.
16Bark-scale spacing, 100–10000 HzHigh-fidelity vocoder — subtle formant detail preserved.

The center frequencies follow approximately logarithmic (Bark-scale) spacing, matching how the human auditory system divides the frequency spectrum. Lower bands are narrower (capturing individual formants), while upper bands are wider (capturing sibilance and fricative energy as a group). The 8-band configuration is the practical sweet spot for Timbre — two synthesis chips, one analysis chip, clearly intelligible speech, and enough spectral detail to distinguish vowels and most consonants.

An honest assessment: lo-fi, granular, and unmistakably analog. This is not a Waves plugin vocoder with 512 FFT bins and 24-bit precision. It is 8 bands of switched-capacitor filters with 7-bit gain resolution, updated over I2C at 1–2 ms intervals, driven by a carrier that is itself a self-oscillating SC biquad.

The SC clock noise — the constant companion of everything the PSoC produces — adds a textural grit that sits underneath the vocoder effect. The 7-bit gain quantization (128 levels per band) is coarser than a VCA-based hardware vocoder, but at a 1–2 ms update rate, the steps blend into a warm, slightly grainy envelope following rather than harsh staircase tracking. The I2C bridge latency (~400 μs) adds a barely perceptible softening to transients that a direct analog connection would not have.

The closest reference points are the Korg VC-10 and early Moog Vocoder — both analog channel vocoders with limited band counts and discrete filter banks. The Timbre vocoder would share their thick, characterful quality while adding the PSoC’s unique SC texture. A 4-band configuration sounds talkbox-adjacent — broad vowel shapes imposed on the carrier, useful for musical textures but not intelligible speech. An 8-band configuration produces the classic vocoder “robot voice” with recognizable words and the synthetic clarity that made vocoders iconic. Finding its voice, quite literally.

The vocoder is not a separate operating mode — it is a firmware image assignment. The analysis chip receives a vocoder_analysis firmware image via ISSP. The synthesis chips receive vocoder_synth firmware images. The ESP32 recognizes these roles during its I2C device enumeration at startup and routes data accordingly.

Normal polyphonic voices coexist on the same bus. In a 16-chip system, dedicating 3 chips to the vocoder (1 analysis + 2 synthesis for 8 bands) leaves 13 chips available as normal polyphonic voices. The vocoder output mixes with the polyphonic output at the analog summing stage — a performer can sing into the vocoder while playing a 13-voice polysynth, all from the same instrument.

Firmware imagePurposeAnalog configM8C role
vocoder_analysisSpectral decomposition of mic inputPreamp + anti-alias + ADC (3–4 blocks)Goertzel computation, I2C slave (envelope reads)
vocoder_synthCarrier shaping per 4 bandsCarrier osc + 4 biquads + mixer (12 blocks)I2C slave (gain writes), register management
voice_standardNormal polyphonic voicePer voice-engine config (12 blocks)Parameter interpretation, topology switching

Switching a chip between vocoder and normal voice duty is an ISSP reprogram — the same mechanism used to change voice architectures between songs. The physical hardware is identical; only the firmware role differs.

The multi-chip bus topology and I2C mux architecture are described in System Architecture. The self-oscillation modes used for the vocoder carrier are covered in The Voice Engine. The ISSP programming mechanism and Model D hybrid operation that enables per-chip firmware roles are detailed in Reconfiguration. Analog block specifications and register-level details are in the CY8C29466 Reference.

DocumentRelevance
AN2094 — PSoC 1 Switched Capacitor Analog BlocksSC biquad filter design for both analysis and synthesis banks
AN2219 — PSoC 1 Analog Block ChainingCascading SC blocks for higher-order filters and multi-stage signal paths

The Goertzel algorithm is a single-bin DFT evaluation — computationally equivalent to one bin of an N-point FFT, but requiring only 3 multiplies and 5 additions per sample per bin, with no bit-reversal or butterfly stages. For speech analysis, where only 8–16 specific frequency bins are needed, Goertzel is more efficient than a full FFT and fits naturally on the M8C’s 8-bit architecture.