
Stephanie Haro gave a STEMJazz talklet on Wednesday. It was eye-opening, not only in the importance of what she’s doing, but in the difficulty of the problem.
Most of us are familiar with prosthetic and orthotic devices that replace or support people who’ve lost limbs. We’re also mostly familiar with the promise of “bionic” versions of such devices whose movements are directed by residual muscle, and in what we hope will be the nearer term future, signals culled from and inserted into residual peripheral efferent and afferent nerve fibers to make such replacements as good as or better than what was lost.
Those problems, however, pale in comparison to providing the ability to speak to those who have lost control of their vocal apparatus through diseases like ALS and other neurodegenerative maladies. That is, for limb prostheses the “intent” signals and the sensory signals reside in often patent and easily identifiable peripheral nerve fibers proximal (or almost proximal) to the injury. For neurodegenerative diseases, these nerve “stumps” do not exist because it is the nerve itself that has ceased to carry action potentials to and from the central nervous system.
For such cases there is no choice but to insert electrodes into the brain, sometimes relatively superficially and often into deeper structures to try to tap either the motor signal sources (essentially tracing the peripheral signals back to their sources in the motor cortex) or more eerily (to me) the source of “intent” where the thoughts of the words are formed.
This is where Stephanie Haro lives, intellectually and experimentally. She seeks to give the gift of speech back to the most profoundly impaired individuals by studying real signals emanating from real brain electrodes implanted in real people with real lives all in real time.
The basic setup is to digitize signals from multi-electrode implanted arrays (8X8 at 30kHz sampling rate for each “rail”). These electrodes are fine enough that individual action potentials can be resolved (as opposed to what one might assume are mean field potentials, as Sonya Mayoral pointed out). And for signal processing and machine learning folks, that’s where we’d like to start the analysis.
The data processing theorem says you can only lose information through processing. Theorists are scandalized at the fact that the high speed raw data from each channel is distilled into 50Hz hunks before it is fed into a large language model which spits out phoneme (speech sound) estimates with an eye toward the structure of the language (English only at this point). But the high data rates of the raw signals and limited storage made engineering tradeoffs necessary.
Stephanie lives in that somewhat post-processed world, trying to make sense of what the processed data has to say about phonemes and how such estimates subsequently interact with the LLM that actually produces speech. She’s looking for ways to provide better separation of phonemes (she showed us vowel and consonant data) by trying to organize and characterize a very large signal space.
While the technology is not perfect, even in what would have been previously hopeless cases, the methods perform well above random and in some cases rather well, heartbreakingly so. In one case the technology allowed a dad who’d never spoken to his daughter to say (and I’m paraphrasing) “I see you Tiger” when she walked into the room and for her to hear her dad speak (synthesized) for the first time.
And to Stephanie, who finishes this program’s postdoc on November 1 and moves on to another postdoc afterward, EXCELSIOR! What you’re doing is inspiring!