tokenizer
|
 |
This class provides functions to
break a string into tokens. A token is a portion of a string identified by its
relationship to separators. Once constructed, a tokenizer is a factory for
tokens, taking an input string and returning a vector of tokens.
The os_tokenizer
class is conceptually similar to the C strtok()
function.
Declaration
#include <ospace/helper/tokenize.h>
class os_tokenizer
Interface
Constructor
os_tokenizer()
Constructs a tokenizer
with defaults.
Constructor
os_tokenizer(
const string& separators ,
bool allow_empty_tokens ,
const string& ignore ,
const string& terminators ,
const string& terminals ,
bool include_separators )
Constructs a tokenizer
using the separators separators (default
" \t"), ignoring all characters in ignore
(default ""), terminating at a null
or any character in terminators (default
"\n"), and treating all characters in terminals
(default "") as individual tokens. If allow_empty_tokens
(default false ) is true
, two consecutive separators are treated as an empty token sandwich. If include_separators
(default false ) is true
, include separators in the tokens.
Constructor
os_tokenizer(
const os_tokenizer& tokenizer )
Constructs a copy of tokenizer
.
=
os_tokenizer&
operator=( const os_tokenizer& tokenizer
)
Assigns tokenizer from tokenizer
.
allow_empty_tokens
void
allow_empty_tokens( bool flag )
If flag
is true (default), empty tokens are valid.
allow_empty_tokens
bool
allow_empty_tokens() const
Returns true
if the tokenizer allows empty tokens.
ignore
const
string& ignore() const
Returns the ignore
characters.
ignore
void
ignore( const string& str )
Sets the ignore
characters to str .
include_separators
bool
include_separators() const
Returns true
if the tokenizer includes separators.
include_separators
void
include_separators( bool flag )
If flag
(default true ) is true
, includes separators in the tokens.
separators
const
string& separators() const
Returns the separators.
separators
void
separators( const string& str )
Sets the separator
characters to str .
terminals
const
string& terminals() const
Returns the terminal
characters.
terminals
void
terminals( const string& str )
Sets the terminal
characters to str .
terminators
const
string& terminators() const
Returns the terminator
characters.
terminators
void
terminators( const string& str )
Sets the terminator
characters to str .
tokenize
vector<
string > tokenize( const string& str
)
Returns the vector of the
tokens found by parsing str with current tokenizer
settings.
tokenize
void tokenize(
const string& str ,
vector< string >& tokens )
Inserts the tokens, found by parsing str
with current tokenizer settings, into the vector argument.
tokenize
void tokenize(
const string& str ,
deque< string >& tokens )
Inserts the tokens, found by parsing str
with current tokenizer settings, into the deque argument.
Non-Member Functions
<<
ostream&
operator<<( ostream& stream ,
const os_tokenizer& tokenizer )
Prints tokenizer
to stream .
Universal Streaming Service
#include <ospace/uss/helper.h>
<<
os_bstream&
operator<<( os_bstream& stream ,
const os_tokenizer& tokenizer )
Writes tokenizer
to stream .
>>
os_bstream&
operator>>( os_bstream& stream ,
os_tokenizer& tokenizer )
Reads tokenizer
from stream .
Copyright©1994-2026 Recursion
Software LLC
All Rights Reserved - For use by licensed users only.