Benjamin Paine's picture

Benjamin Paine PRO

benjamin-paine

AI & ML interests

A software engineer with an AI habit

Recent Activity

liked a Space about 17 hours ago
benjamin-paine/goku-moviegen-bench-viewer
liked a model about 19 hours ago
Zyphra/Zonos-v0.1-hybrid
published a Space about 20 hours ago
benjamin-paine/goku-moviegen-bench-viewer
View all activity

Organizations

Taproot AI's profile picture

benjamin-paine's activity

replied to Xenova's post 3 days ago
view reply

Yup! That stays one chunk.

chunker.push("Last week she said, “Hi there. How are you?”");
chunker.flush()
Emitting "Last week she said, “Hi there. How are you?”"

The only exception is with newlines - I wanted it to emit when a newline was encountered.

chunker.push("Last week she said,\n“Hi there. How are you?”");
chunker.flush()
Emitting "Last week she said,"
Emitting "“Hi there. How are you?”"

If you want to disable this behavior, pass in {emitParagraphs: false} to the constructor, i.e.:

const chunker = new SentenceChunker({emitParagraphs: false});

There's also chunkLength to determine the character length maximum (128 by default), and emitTrimmed on whether or not each emit should trim leading/trailing whitespace (default true.) One last thing, if your input is always growing - like if you're streaming one response and just concatenating it as one big string - you can use GrowingSentenceChunker instead (in the same file.) Example:

const chunker = new GrowingSentenceChunker();
chunker.onChunk((chunk) => { console.log(`Emitting "${chunk}"`); });
chunker.push("Last week");
chunker.push("Last week she said");
chunker.push("Last week she said, “Hi there. How are you?”");
chunker.flush()
Emitting "Last week she said, “Hi there. How are you?”"

And just in case it's not obvious, the .flush() call will just emit anything left in the buffer, even if it's shorter than the maximum length. If you don't call .flush(), it will wait for another input that pushes it over the chunk limit before emitting again.

replied to Xenova's post 4 days ago
view reply

I spent a bit of time working on a JavaScript sentence splitter - it might work right out of the box for this purpose! It tries to split on punctuation when possible for smooth flow, but has a max length option to ensure run-on sentences still get split, too. It also maintains a buffer so you can just keep pushing streaming text into it and it will emit when it has a full chunk.

https://raw.githubusercontent.com/painebenjamin/anachrovox/refs/heads/main/www/sentence.js

Example:

const chunker = new SentenceChunker();
chunker.onChunk((sentenceChunk) => { console.log(`Emitting "${sentenceChunk}"`); });
chunker.push("The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.");
chunker.flush()

Output:

Emitting "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration."
Emitting "The best performing models also connect the encoder and decoder through an attention mechanism."
Emitting "We propose a new simple network architecture, the Transformer, based solely on attention mechanisms,"
Emitting "dispensing with recurrence and convolutions entirely."
Emitting "Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train."
reacted to Xenova's post with 🔥 4 days ago
view post
Post
4988
We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?
·
upvoted an article 11 days ago
view article
Article

The AI tools for Art Newsletter - Issue 1

51
New activity in benjamin-paine/Lumina-Image-2.0 11 days ago