Skip to main content

Intro

Tokenization

Bubbleprompt's input handling uses the concept of tokenization to parse user input. This makes it easier to perform operations on specific parts of the user input. A common tokenization method is to split input on whitespace. For example, if the input is this is some text, the tokenized input would be ["this", " ", "is", " ", "some", " ", "text"]. Each word would be considered a word token and each space would be a delimiter token. Manipulating inputs in terms of tokens makes it easier to do common operations such as getting the current word under the cursor.

The above example could be handled by a simple strings.Split(str, " "), but what if we wanted to parse this is "some text" as ["this", " ", "is", " ", "some text"]? Something like this would be complicated to do using simple string manipulation. Instead we can use regex to define our tokens. A regex to define a token which either contains no whitespace or does contain whitespace only if it's enclosed by quotes could look like this: ("[^"]*"?)|[^\s]+. And a regex for whitespace delimiters would be \s+.

Under the hood, Bubbleprompt uses Participle to handle input tokenization. Defining these rules yourself can be complicated, which is why Bubbleprompt offers the Simple input for common uses like handling whitespace delimited tokens as described above.

If you need more control over your tokenization, you can use the Lexer input to define custom tokenization strategies.

Grammars and Parsing

Sometimes, operating on a simple structure like a list of tokens isn't enough if your app requires a more structured input. Take a simple CLI command for example. It contains a command, zero or more subcommands, zero or more arguments, and any number of short flags and long flags. Each of these components has unique meaning which would be difficult to parse with a list of tokens alone. For these use cases, you can define a custom grammar that represents your input structure.

For handling POSIX-style CLI input as described above, we provide the Command input. To implement custom grammars, you can use the Parser input.