Skip to content

Modifier(methods)

MatchRequirement

  • MustBeFound - which means we must necessarily get a match from this regular expression
  • MustNotBeFound - which means, based on this regular expression, we must not get a match
MatchRequirement Match found does this rule have any subrules ? Result
MustBeFound Yes Yes Continue processing subrules
MustBeFound Yes No Finish processing
MustBeFound No Yes Error (match must have been found)
MustBeFound No No Error (match must have been found)
MustNotBeFound Yes Yes Continue processing subrules
MustNotBeFound Yes No Error (match should not have been found)
MustNotBeFound No Yes Finish processing
MustNotBeFound No No Finish processing

For example, we have a regular expression r"\d+" and we want to get a match from it. We create a rule with the modifier MustBeFound. If we get a match, we continue to process the subrules. If we don't get a match, we get an error.

#=======================================
text = "txt txt txt 910 301 44 text"
#=======================================

CustomError
|
root rule : r"\d+" with modifier MustBeFound
   |     
   |___ if true,true -> new captures from root: 910, 301, 44,
         |__ subrule from root rule : "\d{2}" with modifier MustBeFound
               |__ if true,true -> new captures from subrule: 44,
                  |__  ... 
                     |__   ...
                        |__   ...

Different situations

As you may have noticed, there is a difference between these two options:

MatchRequirement Match found does this rule have any subrules ? Result
MustNotBeFound Yes Yes Continue processing subrules
MustNotBeFound Yes No Error (match should not have been found)

This is done so that if you should not find this, but you do find it, you can create a subrules for additional checks with modifiers. If nothing is found, the subcorrections will simply be skipped.

Extend Rule

Modification to extend the rule with subrules. This is a very important modifier, because it allows you to create a tree of rules, and also allows you to create a tree of rules inside a tree of rules, etc.

1
2
3
4
5
6
7
8
Rule::new(r"1 - Root rule", MatchRequirement::MustBeFound)
    .extend(vec![
        Rule::new(r"1 - Subrule", MatchRequirement::MustBeFound),
        Rule::new(r"2 - Subrule", MatchRequirement::MustBeFound).extend(vec![
            Rule::new(r"1 - Subrule of subrule", MatchRequirement::MustBeFound),
            Rule::new(r"2 - Subrule of subrule", MatchRequirement::MustBeFound),
        ]),
    ]);
1
2
3
4
5
6
7
8
9
new Rule(String.raw`1 - Root rule`, MatchRequirement.MustBeFound)
    .extend([
        new Rule(String.raw`1 - Subrule`, MatchRequirement.MustBeFound),
        new Rule(String.raw`2 - Subrule`, MatchRequirement.MustBeFound)
            .extend([
                new Rule(String.raw`1 - Subrule of subrule`, MatchRequirement.MustBeFound),
                new Rule(String.raw`2 - Subrule of subrule`, MatchRequirement.MustBeFound),
            ]),
    ]);
1
2
3
4
5
6
7
Rule(r"1 - Root rule", MatchRequirement.MustBeFound).extend([
    Rule(r"1 - Subrule", MatchRequirement.MustBeFound),
    Rule(r"2 - Subrule", MatchRequirement.MustBeFound).extend([
        Rule(r"1 - Subrule of subrule", MatchRequirement.MustBeFound),
        Rule(r"2 - Subrule of subrule", MatchRequirement.MustBeFound),
    ]),
])

Matching mode

Before we looked at modifiers that affect one Rule, but now we will study modifiers that affect all subrules within one root rule

  • all_rules_for_all_matches (default mode)
  • all_rules_for_at_least_one_match (all_r_for_any_m)
  • at_least_one_rule_for_all_matches (any_r_for_all_m)
  • at_least_one_rule_for_at_least_one_match (any_r_for_any_m)

all_rules_for_all_matches

In this mode, all rules must be tested for all matches

Operation scheme of the mode

#=======================================
text = "txt [123] txt [456] txt [789]"
#=======================================
CustomError
|
|__ Rule "\[[^\[\]]+\]" (MustBeFound)
     |   [123], [456], [789]
     |___ Subrule ".+" (MustBeFound) ---> [123] -> [456] -> [789] -- TRUE
     |                                      |       |        |
     |___ Subrule "\[\d+\]" (MustBeFound) __|_______|________|

all_rules_for_at_least_one_match (all_r_for_any_m)

In this mode, all rules must pass the test for at least one match

#=======================================
text = "txt [123] txt [456] txt [789]"
#=======================================

CustomError
|
|__ Rule "\[[^\[\]]+\]" (MustBeFound)
    |   [123], [456], [789]
    |___ Subrule ".+" (MustBeFound) ---> [123] -- TRUE 
    |                                      |
    |___ Subrule "\[\d+\]" (MustBeFound) __|
    |___ Subrule "[a-z]+" (MustBeFound) ---> No Match -- ERROR

at_least_one_rule_for_all_matches (any_r_for_all_m)

In this mode, at least one rule must pass the test for all matches.

#=======================================
text = "txt [123] txt [456] txt [789]"
#=======================================

CustomError
|
|__ Rule "\[[^\[\]]+\]" (MustBeFound)
    |   [123], [456], [789]
    |___ Subrule ".+" (MustBeFound) ---> [123] -- TRUE -- [456] -- TRUE -- [789] -- TRUE
    |                                      |               |                 |
    |___ Subrule "\[\d+\]" (MustBeFound) __|_______________|_________________|
    |___ Subrule "[a-z]+" (MustBeFound) ---> No Match -- TRUE (since other rules matched)

at_least_one_rule_for_at_least_one_match (any_r_for_any_m)

In this mode, at least one rule must pass at least one match check

#=======================================
text = "txt [123] txt [456] txt [789]"
#=======================================

CustomError
|
|__ Rule "\[[^\[\]]+\]" (MustBeFound)
    |   [123], [456], [789]
    |___ Subrule ".+" (MustBeFound) ---> [123] -- TRUE 
    |                                      |
    |___ Subrule "\[\d+\]" (MustBeFound) __|
    |___ Subrule "[a-z]+" (MustBeFound) ---> No Match -- TRUE (since other rules matched for at least one match)

Matching counter

Before we see the following modifiers, let's see how text match capture works when a rule is triggered, for example pattern \d+, for text 123 123 123 54 6 7 8. We only get 123, 6, 7, 8.

What happened now? For matches in the library we use IndexSet, this is necessary so as not to check once again all the rules of three matches 123, but we always keep the number of identical matches. This way, we only keep unique values and keep a count of them so we always know how many times they are repeated.

counter_is_equal X

Adding a match counter, where the condition is:

there must be exactly x matches

Rule::new(r"\[\d+\]", MatchRequirement::MustBeFound).counter_is_equal(30)
Rule(String.raw`\[\d+\]`, MatchRequirement.MustBeFound).counter_is_equal(30)
 Rule(r"\[\d+\]", MatchRequirement.MustBeFound).counter_is_equal(30)

counter_more_than X

Adding a match counter, where the condition is:

there must be greater than or equal to x matches

Rule::new(r"\[\d+\]", MatchRequirement::MustBeFound).counter_more_than(30)
Rule(String.raw`\[\d+\]`, MatchRequirement.MustBeFound).counter_more_than(30)
 Rule(r"\[\d+\]", MatchRequirement.MustBeFound).counter_more_than(30)

counter_less_than X

Adding a match counter, where the condition is:

there must be less than or equal to x matches

Rule::new(r"\[\d+\]", MatchRequirement::MustBeFound).counter_less_than(30)
Rule(String.raw`\[\d+\]`, MatchRequirement.MustBeFound).counter_less_than(30)
 Rule(r"\[\d+\]", MatchRequirement.MustBeFound).counter_less_than(30)