tokenizer


This class provides functions to break a string into tokens. A token is a portion of a string identified by its relationship to separators. Once constructed, a tokenizer is a factory for tokens, taking an input string and returning a vector of tokens.

The os_tokenizer class is conceptually similar to the C strtok() function.

Library

Helper<ToolKit>

Declaration

#include <ospace/helper/tokenize.h>

class os_tokenizer

Interface

Constructor
os_tokenizer()
Constructs a tokenizer with defaults.
Constructor
os_tokenizer( const string& separators , bool allow_empty_tokens , const string& ignore , const string& terminators , const string& terminals , bool include_separators )
Constructs a tokenizer using the separators separators (default " \t"), ignoring all characters in ignore (default ""), terminating at a null or any character in terminators (default "\n"), and treating all characters in terminals (default "") as individual tokens. If allow_empty_tokens (default false ) is true , two consecutive separators are treated as an empty token sandwich. If include_separators (default false ) is true , include separators in the tokens.
Constructor
os_tokenizer( const os_tokenizer& tokenizer )
Constructs a copy of tokenizer .
=
os_tokenizer& operator=( const os_tokenizer& tokenizer )
Assigns tokenizer from tokenizer .
allow_empty_tokens
void allow_empty_tokens( bool flag )
If flag is true (default), empty tokens are valid.
allow_empty_tokens
bool allow_empty_tokens() const
Returns true if the tokenizer allows empty tokens.
ignore
const string& ignore() const
Returns the ignore characters.
ignore
void ignore( const string& str )
Sets the ignore characters to str .
include_separators
bool include_separators() const
Returns true if the tokenizer includes separators.
include_separators
void include_separators( bool flag )
If flag (default true ) is true , includes separators in the tokens.
separators
const string& separators() const
Returns the separators.
separators
void separators( const string& str )
Sets the separator characters to str .
terminals
const string& terminals() const
Returns the terminal characters.
terminals
void terminals( const string& str )
Sets the terminal characters to str .
terminators
const string& terminators() const
Returns the terminator characters.
terminators
void terminators( const string& str )
Sets the terminator characters to str .
tokenize
vector< string > tokenize( const string& str )
Returns the vector of the tokens found by parsing str with current tokenizer settings.
tokenize
void tokenize( const string& str , vector< string >& tokens )
Inserts the tokens, found by parsing str with current tokenizer settings, into the vector argument.
tokenize
void tokenize( const string& str , deque< string >& tokens )
Inserts the tokens, found by parsing str with current tokenizer settings, into the deque argument.

Non-Member Functions

<<
ostream& operator<<( ostream& stream , const os_tokenizer& tokenizer )
Prints tokenizer to stream .

Universal Streaming Service

#include <ospace/uss/helper.h>

<<
os_bstream& operator<<( os_bstream& stream , const os_tokenizer& tokenizer )
Writes tokenizer to stream .
>>
os_bstream& operator>>( os_bstream& stream , os_tokenizer& tokenizer )
Reads tokenizer from stream .

Copyright©1994-2026 Recursion Software LLC
All Rights Reserved - For use by licensed users only.