Classes | Typedefs | Enumerations | Functions
rexdb.h File Reference

Public interface for creating regular expressions. More...

Go to the source code of this file.

Classes

struct  rexdb_s

Typedefs

typedef struct rexdb_s rexdb_t

Enumerations

enum  rexdb_type_t { REXDB_TYPE_NFA = 0, REXDB_TYPE_DFA = 1 }

Functions

rexdb_trex_db_create (rexdb_type_t type)
rexdb_trex_db_createdfa (rexdb_t *nfa, unsigned long start)
void rex_db_destroy (rexdb_t *rexdb)
long rex_db_addexpression (rexdb_t *nfa, unsigned long prev, const char *str, unsigned int size, rexuserdata_t userdata)
long rex_db_addexpression_s (rexdb_t *nfa, unsigned long prev, const char *str, rexuserdata_t userdata)
rexdfa_trex_db_todfa (rexdb_t *db, int withsubstates)
 Convert rexdb_t of type REXDB_TYPE_DFA to rexdfa_t object.
long rex_db_setblanks (rexdb_t *nfa, const char *str, unsigned int size)
 Set which chars should be treated as blank chars.
long rex_db_setblanks_s (rexdb_t *nfa, const char *str)
 Set which chars should be treated as blank chars.

Detailed Description

Public interface for creating regular expressions.

Synopsis

The following APIs are used to create an automata object, based on regular expressions.


Typedef Documentation

typedef struct rexdb_s rexdb_t

Define the rexdb_t type. This structure is used to create and manage the states of the finite automata (NFA or DFA depending on the type). If the automaton is DFA the sub-states member will contain information about the NFA states that produced this DFA.


Enumeration Type Documentation

Define rexdb_t type. It could either REXDB_TYPE_NFA or REXDB_TYPE_DFA

Enumerator:
REXDB_TYPE_NFA 

The automaton is NFA, empty transitions are allowed.

REXDB_TYPE_DFA 

The automaton is DFA, there are no empty transitions.


Function Documentation

long rex_db_addexpression ( rexdb_t nfa,
unsigned long  prev,
const char *  str,
unsigned int  size,
rexuserdata_t  userdata 
)

This function is use to add new regular expression to the NFA. All expression added with this create a union.

Parameters:
nfaNFA object.
prevThis is the previous start state of the automata, returned from a previous call to this function. If this is the first call to this function prev is ignored.
strUTF8 encoded regular expression string.
sizeThe size of the regular expression string.
userdataThe value of this parameter is stored in the accepting state of the NFA(which also becomes a sub-state in an accepting DFA state). You can use this value to identify which of the many regular expressions compiled into the automaton is actually matching. A DFA state can have multiple sub-states, this means it can have multiple accepting sub-states(multiple regular expressions matched). You can examine the values of the userdata for these states to find out which are the underlying regular expressions.
Returns:
New starting state for the automaton.
long rex_db_addexpression_s ( rexdb_t nfa,
unsigned long  prev,
const char *  str,
rexuserdata_t  userdata 
)

This functions is the same as rex_db_addexpression, but it assumes the str parameter is 0 terminated string.

Examples:
js-tokenizer.c.
rexdb_t* rex_db_create ( rexdb_type_t  type)

Create a new empty object of type rexdb_t

Parameters:
typeThis is REXDB_TYPE_NFA or REXDB_TYPE_DFA
Returns:
Empty automata object. You should never need to create an object of type REXDB_TYPE_DFA directly with this function, instead use rex_db_createdfa.
Examples:
js-tokenizer.c.
rexdb_t* rex_db_createdfa ( rexdb_t nfa,
unsigned long  start 
)

Create a new DFA object of type rexdb_t, constructed from the states of the NFA, passed as parameter.

Parameters:
nfaThis is REXDB_TYPE_NFA type automata object used to construct the DFA.
startStart state of the NFA.
Returns:
DFA object.
Examples:
js-tokenizer.c.
void rex_db_destroy ( rexdb_t rexdb)

This function is used to destroy rexdb_t objects, created with rex_db_create or rex_db_createdfa.

Examples:
js-tokenizer.c.
long rex_db_setblanks ( rexdb_t nfa,
const char *  str,
unsigned int  size 
)

Set which chars should be treated as blank chars.

This function can be used only if the rexdb_t is of type REXDB_TYPE_NFA

Parameters:
nfaNFA object
strASCII string of chars to be treated as blanks.
sizeThe number of chars in the string.
Returns:
Return 0 on success and negative on error.
long rex_db_setblanks_s ( rexdb_t nfa,
const char *  str 
)

Set which chars should be treated as blank chars.

This function can be used only if the rexdb_t is of type REXDB_TYPE_NFA

Parameters:
nfaNFA object
strASCII string of chars to be treated as blanks.
Returns:
Return 0 on success and negative on error.
rexdfa_t* rex_db_todfa ( rexdb_t db,
int  withsubstates 
)

Convert rexdb_t of type REXDB_TYPE_DFA to rexdfa_t object.

This function is used to generate rexdfa_t object from a rexdb_t. The rexdfa_t has a smaller memory footprint and is easier to use. Also, you have the option to eliminate the sub-states because they tend to take a lot of memory and are pretty much useless once the DFA is constructed.

Parameters:
dbPointer to the rexdb_t DFA to be converted.
withsubstatesSupported values are:
  • 0 No sub-states information will be written.
  • 1 Sub-states information from the underlying NFA will be written.
Returns:
Pointer to rexdfa_t object or NULL on error. If the functions succeeds, the return object must be destroyed with rex_dfa_destroy when not needed any more.
Examples:
js-tokenizer.c.