This Post will clear all
your doubts about what greedy and non-greedy means in the context of Regex.
Input String: AV assd 201708 DC18 ROUTE
PO205960-205961-206200-
206129ASSS 90a852585
108524724A
Req Output: 201708 i.e. first six digits in the input
string.
I am not going to write
the answer at once, rather we will go step by step from here.
Try to have some online tool
ready where you can test while going through the below steps.
You may use the below available tools or any other as of your choice:
Step 1: What if we want to match to each digit i.e. every
single digit in the given
String.
Possible Sol: (\d) or ^.*(\d) looks good, will match to
each and every occurrence of
digit in the string.
(\d):
- Will make as many full match as single digit in the
string(i.e.2,0,2,7....4)
- Will make as many groups as single digit in the
string(i.e.2,0,2,7....4)
^.*(\d):
- This will also do the same thing, only difference will
be:
- Will make as one full match that will start from the
beginning of the string and will last up to the last single digit in the
string.
(i.e. AV
assd 201708 DC18 ROUTE PO205960-205961-206200-206129ASSS 90a852585 108524724)
- Will make as many groups as single digit in the string.
(i.e.2,0,2,7....4)
And this difference is because of “^.*” let’s see how it is effecting.
- ^ matches to the start of the line i.e. in our case it
will point before first char "A"
- .* matches to anything.
In general if we write *txt this means
everything that ends with txt.
Similarly, we are writing
^.*(\d), here
anything can be present before(\d) but should last with digit(\d.
So you can ask it should
end on this first digit i.e 2 in our case AV assd 2.
But no it will go up to
the last digit in the string i.e. 4(see screen shot below).
This is because .* is
greedy after finding the first match it is not going to stop, rather it will
keep on traversing and will find all the possible match
And when you will use it
in your code it will give the last matched group as the result i.e. 4 in our
case
Step 2: Similarly, what if we
want to match to 6 consecutive digits.
(\d{6}) or ^.*(\d{6})
looks possible solutions.
(\d{6})
- Will make as many full match as six consecutive digits
occur in the string(i.e. 201708,205960.......108524)
- Will make as many groups as six consecutive digits
occurs in the string(i.e. 201708,205960.......108524)
^.*(\d{6})
Again we can write it as
^.*(\d{6}) i.e. go to the start of line, and match as many possible 6
consecutive no’s.
It will start from
"A" will find first 6 consecutive nos i.e. "201708" but
will not stop and keep on finding till it find last 6 consecutive digits i.e.
"524724" and will give the last group as result i.e.
"524724"
- This will also do the same thing, only difference will
be:
- Will make as one full match that will start from the
beginning of the string and will last up to the last six
consecutive digits occurs in the string.
(i.e. AV
assd 201708 DC18 ROUTE PO205960-205961-206200-206129ASSS 90a852585 108524724)
- Will make as many groups as six consecutive digits
occurs in the string.
Step 4: Here comes our
requirement i.e. match only the first six digits
As I already mentioned .* is greedy, so how to
stop its greediness and make it lazy.
We saw in previous steps
that how our expression is not stopping after the first match occurred rather
it finds all the possible match and the last match is the output.
This is the meaning of greedy in regex
To make it lazy or say
non greedy ? comes
into play
? Will restrict the search up to the first match only, even if matches are available after the first match, it will make your regex lazy to avoid all other matches.
^.*?(\d{6})
- Will make as one full match that will start from the
beginning of the string and will last up to the first six
consecutive digits occurs in the string(i.e. AV assd 201708)
- Will make as one groups as six consecutive digits
occurs in the string.(i.e.201708)
Fell free to mention doubts in comment section.
Hope this post will help you.
Hope this post will help you.
No comments:
Post a Comment