But it does match fag
in fag-mobile
?
At least in PHP, JS and Python, tested here: http://regex101.com/
Yours matches 'fag-'
, 'Fag '
and ' fag'
in:
fag-mobile Fag had it coming. What a fag
Read file >cat ...file
send output to the next command > |
execute the following Perl language code against each line of text > perl -ne
If the following search script(regex) is true print the line of text >print if /(.)\1(.)\2(.)\3/
Explanation of the search script from http://regex101.com
1st Capturing Group (.)
. matches any character (except for line terminators)
\1 matches the same text as most recently matched by the 1st capturing group
2nd Capturing Group (.)
. matches any character (except for line terminators)
\2 matches the same text as most recently matched by the 2nd capturing group
3rd Capturing Group (.)
. matches any character (except for line terminators)
\3 matches the same text as most recently matched by the 3rd capturing group
so...
Group1([AnySingleCharacter])[SameAsGroup1]
Group2([AnySingleCharacter])[SameAsGroup2]
Group3([AnySingleCharacter])[SameAsGroup3]
http://www.regular-expressions.info/tutorial.html
There are also a ton of sites out there that will do some wysiwig regex testing, most with different regular expression engines. Regexr is the one I always remember off the top of my head but Regex101 is also good.
I use http://regex101.com all the time so I can play with it interactively until the matches are the way I want them to. It also gives textual explanation what is going on and has a list of all available tokens and their description.
Not strictly an array, but a list of characters that should be replaced. The square brackets indicate that whatever characters are between them should be replaced.
Look at the bottom explanation here: http://regex101.com/#aqr
<strong>What have you tried?</strong>
Giving the solution (even asking for the solution) is explicitly forbidden here.
Use regex101 to develop your regular expressions. This site is easy to use, explains the parameters in detail, and analyzes the final regex.
Hint: You need something like number with length 2, delimiter (as literal), number with length 2, delimiter (as literal), number with length 2.
I'm still learning regex, but the one thing I think I see is that there's a problem with the braces. It looks you're asking for: OpenBrace, 0 or 1 commas, OpenBrace, DoubleQuote, and then "Name".
I could be misreading it though.
edit: I don't think you need the 8th capture group (,??) as long as you're using the global modifier. I think that and the 3rd capture group are always going to capture nothing.
edit 2: This is what I came up with to match name. But it renders 6th and 7th capture groups empty on the example you used.
({(.*?(,??)"(Name)":"(.*?)"|"(.*?)":"(.*?)")})
Regex101 link
In your second example
strsplit(x,",? (- )? ?!")
This is splitting on (maybe) a comma, a space, (maybe) a dash followed by a space, (maybe) a space, an exclamation point.
So, the smallest thing it could split on is (removing the maybes) a space followed by an exclamation point (" !"
).
That doesn't exist in your string, so no splitting.
Easiest is to split on anything which isn't a word character (a-z, A-Z, _) by doing,
strsplit(x, "\W+")
A good site to learn and practice regex is http://regex101.com
Just be aware in your patterns in R, you need to "escape the escape" character, so "\\W+"
instead of "\W+"
.
You haven't specified a language (nobody seems to bother with this extremely important detail) so I'll give you a solution in Perl.
First, as always, let's define a set of example data we want to test our regular expression against:
"log": true blah "write": false blah blah "log": false blah "log: true blah "write": true blah blah "log": true blah blah "write": false
Now we want any "log":true ... "write": false
section to turn into a "log":true ... "write": true
section.
To do this we will try the following regular expression and test it at http://regex101.com/ by setting the /g
(global search), /x
(extended so we can split the regex over several lines), \m
(so ^
and $
mean start and end of lines), and s
(so .
also includes newlines) flags.
s/ ( # start capturing prefix ^\s*"log":\s*true # "log": true line .? # non-greedy anything until... ^\s"write":\s* # "write": ) # stop capturing prefix (.*?)$ # whatever comes after "write": on that line /\1true/smgx
http://RegEx101.com is a great site to use to explain what's going on with a RegEx. https://regex101.com/r/SJZ81Y/1
^ asserts position at start of the string Non-capturing group (?:[^\/]\/){7} {7} Quantifier — Matches exactly 7 times Match a single character not present in the list below [^\/] * Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy) \/ matches the character / literally (case sensitive) \/ matches the character / literally (case sensitive) 1st Capturing Group ([^\/]+) Match a single character not present in the list below [^\/]+ + Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy) \/ matches the character / literally (case sensitive) .* matches any character (except for line terminators) * Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy) $ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
Tip: fiddle around a bit on websites like Regex101. They have a sidebar explaining what your input expression exactly does and a small cheat sheet at the bottom to refer to when in doubt. :)
Other sites with the same functionality from the related discussions:
Learn by doing. Here is a great website that I use:
it will allow you to edit your regex and see the results against your test data in real time. (very helpful!)
Check out RegEx101.com, it will explain each part of a regex.
In this case the explanation is:
(?:\s|^) Non-capturing group
1st Alternative: \s \s match any white space character [\r\n\t\f ] 2nd Alternative: ^ ^ assert position at start of the string
The dot matches any one character. The asterisk means the dot matches zero or more characters.
Check out http://regex101.com/ While it doesn't have a specific option for perl, you can still plug most regexs into it and it will give you exacting details on what each part of the regex is doing. For example, on your last expression, it tells you the following:
/-d(.*)/
-d matches the characters -d literally (case sensitive) 1st Capturing group (.) . matches any character (except newline) Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
You know, just a couple of months ago, I had no idea how Regular Expressions worked.
I found this lovely tool that helps you build and test Regex live and now I would say that I can write most regex I need by hand without using any tools or help!
:)
You would need to look into regular expressions and use preg_match_all to extract the markup http://uk1.php.net/preg_match.
Pretty good tool for checking your regular expression pattern http://regex101.com/#PCRE
Try this:
$pattern = "/{{2}([a-zA-Z]*)}{2}/";
$subject = "Dear {{name}}
In response to your query, your {{order}} will cost {{cost}}";
preg_match_all($pattern, $subject, $matches);
echo "<pre>"; var_dump($matches); echo "</pre>";
Give this:
array (size=2) 0 => array (size=3) 0 => string '{{name}}' (length=8) 1 => string '{{order}}' (length=9) 2 => string '{{cost}}' (length=8) 1 => array (size=3) 0 => string 'name' (length=4) 1 => string 'order' (length=5) 2 => string 'cost' (length=4)
It's saying 'find an exact copy of group 1'. Group 1 is the first found group (aka the thing found that matches the inside of the first ()).
So if I have 'abcdef' and I look for (ab)(cd)(ef)
group 1 is 'ab' . So that means \1 is also 'ab'. If I did (a\w)(cd)(ef)
group 1 is still 'ab' because that is the string that matches the inside of the first parentheses.
I'd recommend going to regex101.com for an interactive explanation.
Don't know how good it is but the current Humble Books Bundle has the Regular Expressions Cookbook in the $1 tier. O'Reilly books are generally good quality, so I assume that this is not different.
I frequently use
to build my regex because both offer step by step explanations and full descriptions of regex syntax, the latter even visually.
I'm not into instagram, but since nobody answered yet I give my 2 cents:
I think the caption part should be possible with regex.
Small example as I only tested it with one url. Most definitely needs some additional modifications.
https://www.icloud.com/shortcuts/038242453d5443259e7bc9b256b480f4
​
Could you maybe provide an actual sample string and highlight what you're trying to extract. I could probably whip up something. Also not sure if you know about it, but there's a handy site where you can test your regex
Neat writeup. Thoughts as I'm reading it:
string.Contains()
and -contains
are for different tasks but .Replace()
and -replace
do the same task.[regex]
methods are case sensitive.Spelling/typos:
powerfull
-> powerful
If we chance our coma to a period
-> change, commait is much easier for out users
-> ourwarry of using them if performance is a consern
- wary, concernFor simply messing around with regex or testing that a regex actually does what you expect, I highly recommend Regex101. It also has a handy quick reference for nearly every feature regex has to offer, plus the ability to easily switch between a few common regex engines that all work slightly differently.
Note: I typed the link from memory, if it doesn't work a simple Google search should suffice.
For example like this: \w*@\w*\.\w*
(don't forget to duplicate the \
for Java) - Bear in mind that this is only a very rudimentary example that does not take periods or dashes in the name part or in the domain part into account.
Play with the regexes on regex101 until they work as they are supposed to. Then you can easily use them in Java.
When testing if an IP address is in a subnet - I tend to convert the address into hex (you can use unisgned decimal). I also convert the subnet into a network address and broadcast address as hex (or decimal) then you can use comparison operators with impunity :)
Here is the code I wrote (a while ago, and it has been improved somewhat) : https://www.reddit.com/r/PowerShell/comments/3jd4p4/decodecidrblock/?ref=search_posts
To convert an IP address to hex, simply pass it in as a /32 CIDR
As far as regex goes - the CIDR validation is a complex regex that ensures the value passed in is OK
Also, http://regex101.com is awesome when testing regex - free, and online :)
Here are my recommendations:
Book: Mastering Regular Expressions, 3rd Edition - Jeffrey E.F. Friedl - This book is very comprehensive and not only teaches you the "hows", but also the "whys". If you REALLY want to learn REGEX, get this book.
Website: phpro.org - I've read through many online tutorials for learning REGEX. This one made a lot of sense for me. It was well written and easy to follow.
Testing website: regex101.com - Use this site to play around with different expressions and see what parts are matched. It's very easy to use and is entirely helpful in making sure that your expression will match what you are hoping it does.
Thanks. I like how we each have our languages of choice, but most of us agree on regex, and even use the same pattern. While you're learning, I second the recommendations for http://regex101.com. It helps tremendously to play around with examples/counterexamples.
Without regex, I'd probably write it like this (using Data.List.Split
)
parse :: String -> [(String, Int)] parse s = map (pair . splitOn ":") . splitOn "," . tail . dropWhile (/=':') . filter (/=' ') $ s where pair [k, v] = (k, read v) pair _ = error $ "could not parse: " ++ s
Your parser's not too shabby either. The complication comes from the extra colons and commas, so you can filter
them out first. Then it becomes
parse :: String -> (Int, [(String, Int)])
parse s = (read n, [(k1,read v1), (k2,read v2), (k3,read v3)])
where (_:n: k1:v1: k2:v2: k3:v3:[]) = words $ filter (notElem
":,") s
which I think is quite readable.
http://regex101.com is fantastic.
http://i.imgur.com/FUbqvCk.png for instance, solves your problem.
You can cut it down a bit if you're only searching one line of input from ([a-j](?:10|\d)[\n$])
to ([a-j](?:10|\d)$)
.
If your placeholder is your example test, your first part of your regex should be [0-9]{4} instead of [0-9]{3}.
jqBootstrapValidator uses javascript for its regex, nothing special.
I recommend http://regex101.com/ for testing/building regex for javascript/php/python.
I use a sandbox type site like http://regex101.com to test out my regex before putting into production code. Perhaps there is one specifically for python? (Not sure if patterns or syntax change significantly with the language)
Have you selected "regular expression" mode in the find window? Have you selected "wrap around"? What exactly are you typing into the search box?
Here's an example of it working in an online tester.
Edit: wait, are you attempting to search for a file that contains those strings, or search a document for those strings? I am not certain if regex supports the former.
I believe this is what you're asking but the solution by /u/BashAtTheBeach96 should work unless you have other lines matching "rpg" outside the "<et>" tag
It's regular expressions - basically telling AutoMod to search for a pattern rather than an exact sequence of characters.
http://regex101.com/r/uX7aU4/2
(there are more \
's in my comment above, which are necessary due to the way AutoMod interprets the wiki page before interpreting the regex)
That won't catch comments as url
and domain
apply only to submissions.
--- url+body "ebay(\.[a-z]{2,3}){1,2}/(itm/|link/\?nav=item)" modifiers: regex action: spam message: | Your [{{kind}}]({{permalink}}) has been automatically removed because it violates Rule II of /r/{{subreddit}}. This rule discourages the use of /r/{{subreddit}} for 'personal promotion'.
If you have a question about a particular item, please submit a screenshot. You are trying to sell something, go to /r/redditbay, /r/flipping or /r/ebaytreasures.
http://regex101.com/r/hN9iJ1/1 (the reason why you see \\
in the code above, but not in the regex101 link, is that AutoMod interprets everything through YAML first, and then through Python, so double-escapes are required.)
Sorry, that was the solution to a previous problem I encountered, but I just realized you are working with non-delimited strings.
The catch-22 here is that regex can't tell the difference between 30 03 and 3 00 3, because they can't calculate a given position as odd or even (zero length assertions cannot be repeated or contain back-references). The engine would need to consider the distance from the beginning recursively, which for this reason would not work (especially for variable-length strings).
If I were you, I would run two regexes. The first would delimit the string into each binary value to prevent this. This adds the information of where one value ends and the other begins, which we know is true because hex strings are even-numbered.
First replace:
(\d\d)
with:
\1
notice the space on the end of the \1.
Then run
(?!00)(\d\d)
The first bracket group is non-capturing, and it checks that the match is not 00. The second bracket group is capturing, and note my use of the g flag, so that all are captured.
Edit: I just noticed your string is null-delimited. If that's always so, use the solution in /u/Caek_ 's link.
Are you looking for these exact patterns? Or do some of the letters/numbers change?
I'm going to make a few assumptions.
1. JLX\ doesn't change
2. The numbers can change, but are always 5 numbers for the first match and 6 numbers for the second.
3. The letters B, V, A, and E can change. If they don't, the following expression could be tightened up a bit.
If that's the case, then this expression should work for you:
/JLX\(?:(?:\w(\w{2}\d{5}))|(\w\d{6}))/i
Here are examples on regexe101.com
No, using \S
doesnt give the "more efficient" way as you claim... In your first try you were on the right path, but you seem to have limited regex knowledge (we can tell from your pattern :o)
You can try <code>this</code> instead... A performace difference will be noticed as long as the searched text gets bigger...
To learn more about REGEX, I'll advice you <code>this wonderful tutorial</code>
Well, why complicating things... You want to remove the dollar sign, then just replace it with an empty string...
Something like this(it removes also multiple spaces): http://regex101.com/r/nR7yR2/5
This should work without problem as long as you input is what you said
Good one... That's what I wanted to do :P
Since the 2nd part of the regex is almost identical to the 1st part, you could use recursion for a shorter pattern (must check if it's supported where it's going to be used)
This does not validate the time, so something like 5:90 pm would break it.
/(?P<time>1?\d(?::\d\d)?(?: [ap].m.)?(?:-1?\d(?::\d\d)?(?: [ap].m.))?)/gi
Hello,
would something like this do the trick ?
You can use \1
in a pattern to reference the content of the first capturing parenthesis : <([a-z]+)></\1>
/^(?!.(anime|manga|hentai|sex|xxx)).$/i
My regex will just pass/fail for single sentences that you run the test on. In this example I set up captures so you can see what passed and what failed.
I'm not familiar with what you are doing so I can adjust my regex if you answer my questions.
Are you trying to capture anything? or are you just trying to see if a string (sentence) passes/fails? It sounds like you want to test against sentences but your existing regex appears that it is trying to parse URLs. Can you provide some data to test against? That would really help.
Try this out,
http://regex101.com/r/mT5uW0/1
/([Widget(\w*)])/g
If you have any questions then feel free to ask. :)
Reason that your example won't work.
The problem with using .*
is that it matches every character between two widgets for example,
[Widget(Test1)] [Widget(Test2)]
The .* would match 'Test1)] [Widget(Test2' which is less than ideal, Using \w would mean that only numbers and characters are matched. If you need special characters then you could use .* in combination with [^\)]
which should stop .* on closed brackets.
One thing to point out is that this match is greedy. If you have more than one "<!-- End section -->", it will match up to that last one and everything in between, including other End Sections. See what it would match here.
That site (Regex101.com) will also explain what each part of the regex is doing.
I would use Perl for this.
perl -n -e 'print "$1","\n" if /(^(?:\s?[\w\d]+){8})/gm' your_file.txt
Check it here: http://regex101.com/r/eL1sG7
The "." after the "z" is a literal . (Its literal when it's the first or last in a character class.
{5,} Means five or more of the preceding.
Have you tried Debuggex? It seems to support it (I had to remove the newlines manually, but did no further changes)
edit: Regex101 doesn't seem to choke on it either, and regex.larsolavtorvik.com copes as well (no permalink though).
Do you need to use regex? If they're actual links, you could getElementsByTagName
and loop over the list processing links with a hostname
attribute of 'i.imgur.com'.
But if you prefer a regex, try this: http://regex101.com/r/rF8yA9
This regex should work...
^([a-z0-9]{6})\.[a-z]{3}$
> {6,}
Quantifier: Between 6 and unlimited times, as many times as possible, giving back as needed [greedy]
Try this website: http://regex101.com/
It will explain what every part of your regex does and show matches.
EDIT: after reading your post again, I'm not sure how much help this will be for you. I have no idea how that objective c stuff works.
Then you can simply use /^The (\S+) ball is \1\.$/
\S+ will match all non space caracters (including words with - and ')...
\1 refers to the 1st captured group
First of all, since it's a regex question. You should really specify what language you're using. Since there are different engines with different capabilities.
Let's suppose we're using PCRE and let's try to answer your first question:
Suppose we define a word as
\w{3,}\b
for simplicity
What you basically want, is go through every "word" and check if that word exists previously in the string/line. You will need a backreference like \1
There are several problems though :
(?<!word.*?)word
in some flavors like .NET that would be possible.
Let's say we could and you were using .NET, we said we need to use backreferences
(?<!\1.*?)(word)
But how in the world will the regex engine know what "\1" is if it didn't even reach (word) ?
Conclusion : IMO, I think it's close to impossible* to do this in one go.
Workaround : there is a not-well known escape sequence in PCRE \K, it will forget what was matched before. What you could do is the following
(\w{3,}\b).*?\K\1
the problem with this is that it will consume the characters, so you will need to use the pattern several times until 0 replaces were done. See this demo http://regex101.com/r/mE4hN7
As for your second question. It doesn't make sense.
Q: "If I do not want to remove Duplicate words that occur together" -> A: well you do nothing ?
Even the output doesn't make sense, I mean CCCCCC and DDDDDD are alone, so why aren't they removed ?
* I don't want to say that it's impossible, I mean who knows some guru shows up with a solution ?
PS: I signed up just because of this question o_o
I want to add to the u/OwlbearWrangler answer capturing groups. Doing
(\d+)d(\d+)
You can get the two numbers like match_object[1] and match_object[2] (match_object[0] is the full match).
You can use http://regex101.com to test your regex
Edit: for this task there is no need to use regex. A simple
times, faces = [int(i) for i in " 4d20".split("d")]
Is enough.
Regex, at least for basic stuff is not super hard to learn.
If you have any issue try to use http://regex101.com
Then you need some extra help click the save button on the regex101 page, copy the link and post it on /r/regex explaining what you're trying to do.
I'd suggest changing your pattern so it catches your whole number, so you can use it right away. And avoid using variable names such as "string" it can mess up your code if say you have to import module "string".
import re
string = 'the number is 94.9%' finder = re.findall(r'([0-9]+(?:.[0-9]+)?)%', string) print (finder) print (float(finder[0])) #all numbers numbers = list(map(float, finder)) print(numbers)
http://regex101.com/ is a good site to test your patterns. Has explanation of each matching group and quick reference to look for.
So basically you would need to define a string with capturing and non-capturing groups. Sounds scary but it's actually pretty simple. re will search for a string that matches every group but give back only the capturing groups. So if I have
myStr = 'INFO ABC 1234' pattern = re.compile('(INFO)(?: ABC )(\d+)') result = pattern.match(myStr).groups()
Then result will be a tuple of 'INFO' and '1234' but NOT ' abc ' because () indicates a capturing group and (?:) indicates a non-capturing group. If you need more help with regex check out Regex101.
You can seriously pick up pretty advanced regex over a few weekends. I learned it really early on (as a kid for MUD client matching) but I still had pretty much an average programmer's understanding of regex before taking the http://regex101.com/ quiz which is fucking impossible but also does a good job of introducing more and more advanced syntax.
regex101 taught me how to make this, and for that it should be put on trial
You can try something like this
~?\d{1,3}-?(h|ish|mo|min)?
That will catch stuff like 12h, 3-ish, 6mo, ~5, 127min
I suggest trying your expression on http://regex101.com to see what works/doesn't.
Frankly this is one of the best things for Regex in general. Regexes are pretty simple alas totally unreadable.
Good idea with regex is to think in terms of "full terms" or Duck debug it so to speak. For example, if you have "hello_world_12333abc" and you want to extract just a numbers try to explain it to yourself in simplest terms.
I want to match 1 or more numbers in row between 1 and 3
Answer is:
[1-3] -- character between 1 and 3 + -- 1 or more ? -- match non-greedy way, up to last matching character in row.
With more complex stuff is pretty similar. As long as you are not trying to check primarity of numbers then of course.
Most of the times you don't need look-ahead or look-behind or others.
If you are python user additional tricks are named groups
and nonmatching groups
first let you split regex in named chunks, second let you group unnamed chunks or groups of named chunks.
I always test my regular expressions using http://regex101.com, which breaks down each expression into steps and shows exactly how it's interpreted. It has an option for python, which is what AM uses.
There's so much text, not shared as a code block (every line needs to start with 4 spaces, in addition to any spaces already in the original) that I'm finding it difficult to see the specifics. Usually, I'd make a local copy and have a play.
What is the sequence of characters that precedes each occurrence of the data you seek?
You might find regex101.com very useful as you can develop and test various regex expressions against your data until you find what you want, and they have it generate code in any of several programming languages including Python 3. It provides lots of guidance as well.
Whatever code editor you use should have a wildcard search. Just learning that (if you don't already know it) can really increase productivity, even if you never convert it into an automated filter. (Most of my text filters right now are essentially strings of regex filters, put together and automated.)
(I don't know if your friend's using regex to do what you're saying, but that would do some of it.) :)
> Just knowing any text editor inside and out would help do wonders though I guess.
Yep. I actually only moved to BBEdit recently—I've been using TextEdit, which was essentially BBEdit Lite, but they've changed things so now BBEdit has a free version that's the Lite. Even the free version has a few more features than TextEdit did, like an autocomplete function and more built-in filters.
If I upgrade to the Pro version, I'll have even more useful features.
Whatever you're using, I suggest you read its documentation. You'll probably at least find something that helps your workflow.
Everybody has been telling you what is wrong: it is your regular expression that only works for a single character. The rest is up to you to find out.
Hint: test your regular expressions in regex101 - this site also explains their meaning perfectly well.
> I hate regular expressions.
I used to have the same feeling until AdventOfCode 2015 and 2016 where I really learned to appreciate them. They have quite a learning curve, but are totally worth it.
For me, regex101 does the trick. I create my pattern (with the help on the page) and copy some sample data in, then tweak the pattern to my liking.
I consider myself still a mere beginner with regex because I have yet to understand the finer details about look ahead, look behind, pattern repetition, etc. but so far, once you get the hang, it is really fun to work with regex.
Went even as far as to use regex (with the MS script host library) in a VBA project in Access - and it works great. Saved me loads of searching and indexing.
In addition to /u/IGotaBlueShirt's detailed and excellent explanation:
OP, copy and paste your regex into http://regex101.com this will explain your regex in detail and you can test whether it works or not.
:D
I, my liege, am now your debtor. I always wondered how it feels to be gilded.
It feels good.
As a small thank you, I edited my comment to add explanations about the regexp, if ever it's of any use to anyone. And so that you have a better idea of what I did (but my biggest gift is http://regex101.com for those who didn't know it.)
Thanks for the detailed reply - it's kind of hard to grasp what's going on fully.
Anyways this may be of help to you:
>>> re.search(r'(<([^>\s]+)[^>]>.?<A NAME="tx15167_1">.*?</\2>)', html).group() '<P STYLE="margin-top:0px;margin-bottom:0px" ALIGN="center"><FONT STYLE="font-family:ARIAL" SIZE="2"><B><A NAME="tx15167_1"></A>PART I. FINANCIAL INFORMATION </B></FONT></P>'
Hmm http://regex101.com didn't let me. Perhaps it works in actual engines. I'll test it.
EDIT: Nope. At least not in JavaScript. I would encourage you check to make sure it's working as you expect. The \1 will evaluate to a single character code.
> /(.)\1.*([^\1])\2/.test('aabaa') true
Same. I can never find regexes that exactly match the situation I'm looking for (they're usually pretty specific, hence the use of regexes in the first place) so http://regex101.com is my best friend.
Http://regex101.com was my choice when I was learning. Two main reasons: it lets you debug and shows you step by step what it's doing. And you can choose between a few different flavors. All the JavaScript based ones are more limiting with no support for look behinds.
Although I see myself as rather proficient when it comes to regular expressions, I still use regex101.com a lot to quickly verify my expression against a few lines of data my expression should (not) match against.
Besides that, regex101.com dissects and explains regular expressions really well which is nice when it comes to understanding regular expressions as a beginner or when having to debug an existing complex regular expression.
First of all you don't need to calculate the result here. Also, /[0-9]*/
will match any string.
Something like the following will work perfectly fine:
int stackSize = 0;
for (String token : tokens) { if (isInteger(token)) { stackSize++; } else if (isOperator(token)) { if (stackSize > 2) { //we have two operands stackSize--; } else { //not enough operands return false; } } else { //not a valid token return false; } }
return stackSize == 1;
Here's a simple way to go: http://regex101.com/r/rD4mW7/1
But if the order isnt fixed, then, that is a problem... A workarround would be to use the operator OR "|" and add all the possible combinations wich will give a huuuuuuuuuuge regex xD
Check THIS and try to find out what's wrong :)
From your answer on Q.1, there's an easy fix :)
This may not be right, but I'm going to assume that the file is some sort of MBox email file. If that's the case, then it will sort of go like this (though I could be wrong):
Each message starts with "From ", then a bunch of headers, including the Subject line. Then there will be two new lines, then the body of the message, then two more new lines followed by either the end of file or another "From " to start a new message. There is supposed to be some sort of escape for occurrences of "From " within the message.
Take a look at this regex (regex101.com). I'm ignoring the part where you mention "TECH" since it's grabbing the entire body based upon the subject. Hopefully you can use this as a starting point.
Assuming that all the src tags start with "/storage/" and have the question mark, you could just ignore everything else in the HTML tag and grab the filename.
Take a look at this and see how it works for you.
How about using look arounds for this!
Make sure we have a letter preceding the comma and NO space after the comma.
s/(?<=\pL),(?!\s)/, /g
[Regex101](regex101.com) is a nice site for testing regular expressions.
You're a bit confusing with what you want, so please explain further if I'm mistaken.
Do you want 1) to split on each cap and first number? Like Word - Word - 999. or 2) just at numbers so it will be WordWord - 999?
I'm assuming the first, which you can then just use something like:
preg_match('#([A-Z][a-z]+|[0-9]+)#', $str, $matches);
To catch each part into array $matches where first part is the whole string and rest of array is the parts.
I can explain the regex if you are new to it, regex101 explains it quite well though.
>the VIN numbers will never have a Zero/Zed, but will occasionally have O's.
This is different than what the wiki link mk5p describes, which says there will never be an I, O, or Q. It also seems to be different than what I see on that site, for example this link has several zeroes in it.
Anywhere, here's (Rexex101.com link) what I came up with, following the rules on the wiki. I couldn't find any examples of cars that didn't have VIN numbers to test exclusion, though.
(Just notice that you had two different websites there, I didn't test for other links for that second site)
%[A-Za-z0-9]+%[!.,]?
Let's look at this. I think the %
signs are obvious (match verbatim percent signs). You didn't specify what characters a word can contain, so I presumed alphanumeric. The [A-Za-z0-9]
means that it matches any character within the braces (although instead of individual characters, I specified ranges of characters. The +
sign after that means one or more (since words have to have at least one letter).
The [!\.,]
is another set of characters that can be matched. So the exclaimation mark, period, and comma are the punctuation symbols. The backslash is because the period has special meaning in regex, so must be escaped to function correctly. The ?
after that means "zero or one". In other words, the punctuation is optional.
Example: http://regex101.com/r/tD6xA1/2
Right, sorry. I assumed the title would always be the at the end of the string. Try .*\/Title\(([^\)]*)\).*
- Example
String line = "randomtestdata/Title(Here is my title) 4012 obj<</CreationDate(D:5423415))>>"; Pattern p = Pattern.compile(".\/Title\(([^\)])\).*"); Matcher m = p.matcher(line); if(m.matches()) { System.out.println(m.group(1)); // Returns => "This is my title" }
You could also use \/Title\(([^\)]+)\)
in combination with the Matcher.find() method. Example
String line = "randomtestdata/Title(Here is my title) 4012 obj<</CreationDate(D:5423415))>>"; Pattern p2 = Pattern.compile("\/Title\(([^\)]+)\)"); Matcher m2 = p2.matcher(line); if(m2.find()) { System.out.println(m2.group(1)); // Returns => "Here is my title" }
That regex doesn't make sense. It looks like you have a character class and are trying to perform an OR inside. This one has what you need.
/^(?\$([1-9][,\d]*(?:.\d\d)?))?$/gm
I'd like to ask a few follow up questions if I may.
<time>
string necessary?(?::
?(?
appear to be the most important components to doing what I wanted to do. What is that called?a.m.
or p.m.
at the end of the time string, so I don't think the (?
characters are necessary. I would remove them myself but I don't want to screw anything up. Would you please help.1?\d[1-5](?::\d[0-5]\d[1-9]...
characters to tell the regex that we don't want "20:00" hours. Is that the general idea?
*Finally, what does the /gi
mean a the end of the line? ps: I use python btw. I don't know if that would change anything that you wrote. It worked for me here (and I adjusted it to only get times with a hyphen present).
pps: I am enormously grateful for your help. Sometimes working in regex (having just started learning any programming at all) can be enormously frustrating. Thank you!!!
Put the cases with similarities into groups, then build your regex...
You have two groups here, so you can try: thank (yo)?u|th(anks|x)
BTW, this is the best tutorial I used myself to learn regex => http://www.regular-expressions.info/tutorial.html
After that, you'll just need to practice, a looooot and like forever xD
The problem was the character classes you were making.
/^(?:Copy of )?(?P<showName>.*?)(?: - )?(?P<episode>S\d\dE\d\d)$/gm
EDIT: This one is better: /^(?:Copy of )?(?P<showName>.+?)(?: -)? (?P<episode>S\d\dE\d\d)$/gm
/(?:^|[_\s])(SK|PL|TESTCAT|FOO)(?:$|[\s_.])/g
This is pretty much the same as the last one. I just turned my OR statements into character classes. This one also accounts for file extensions.
(?:^|[_\s])
(SK|PL|TESTCAT|FOO)
(?:$|[\s_.])
EDIT: Try it out http://regex101.com/r/tB5gA5/1
I didnt read the other answer (too long and my head hurts xD), but try this: /^(?=[A-Z_]+$).*?_QPSK.*?_(SK)/g
I hope it will put you on the right path :)
Plenty of proper explanations above, all I want to add is this website. It takes your expression and tests it on your input as well as analyses it bit by bit. So you can see what a expression does. :) It also has a broad reference of characters and whatnot.
I generated a folder list using command prompt of some arbitrary folders:
cd path/to/root/folder dir /s /b /o:n /ad > folderlist.txt
I whipped up the following regex statement to capture the directory paths of folders n levels deep. Play with the number in the {curly} brackets.
I can explain the regex in layman's terms if you'd like.
Now that you're trying to grab multiple things and not simply shrink a string to a single part, it might be smarter to use RegExMatch, which automatically stores captured subpatterns.
RegExMatch(clipboard, "PATTERN", SubPat)
You can then use SubPat1, SubPat2 etc. But before you even use RegExMatch, declare the variable SubPat (or whatever you use) as global, so that you can use the outputted variables outside the function.
Here is something to get the ideas flowing. Instead of using the substitution panel, consider the "match information" panel on the right side as a list of variables.
Edit: Also, note my use of the U ungreedy option.
You're just trying to extract the username?
The idea behind this is you need to find a way to 'mark' the location of the info to capture. In this example, I use the "Username:[tab]" and the fact that there is a newline right after the username.
Hi again friend,
I must say I'm loving this REGEX stuff, it seems extremely flexible. I'm having a bit of trouble cutting off the regex on my other searches. I suppose the IP was easy because it was all numbers and ended in a number.
Once this is figured out I can use it to scrape the Name, E-mail, Customer number, so many many things.
You are 100% right: that is the beauty of programming. The possibilities are truly limitless and every new problem is a lesson waiting to be learned. I'm glad I could help!
For your problem of matching your IP, use a negative lookahead. What this does is it says "make sure the following doesn't equal this".
(?!tomato).+
The above will match any character(s), as long as they are not tomato.
You'll want to put it right before the capture brackets of the numbers.
(?!63.3.245.123)
Example seen here. Try it out by changing your IP by one digit in the string, and watch regex match your IP instead.
I'm a PHP guy. Couldn't figure out why the regex was invalid in the Ruby tool, but it works with PHP.
/(?:[a-z]+_and_)?(?P<superhero>[a-z]+man)(?:_and_[a-z]+)?(?:\s|$)/gi
Regex is your friend here:
([^]+)$
Using the global and multi-line modifiers, that would capture the text you're looking for. See a demo & explanation here.
Things inside brackets () are captured as a subpattern. When you state the needle to look for, you can tell it to "capture" a part of the match so you can use it in the replacement. The haystack
123foo321
and needle
123(.*)321
will capture everything (dot means anything, asterisk means any amount of) in between 123 and 321. You can then use $1 to refer to it in the replacement. If the replacement is just $1, 123foo321 will become just foo. And 123bar321 will become just bar. etc.
This can be used to solve your problem by making the needle match everything, but only capturing the IP: stuff. Then replacing it with $1.
Edit: This webapp will help explain what I mean by that last part:
If you edit that first box, it'll load the explanations (mine bugs out and you need to add and delete a space). Basically, the idea is you make AHK match everything, so you can easily just say "replace the match with this captured part of the match". The ^.* and .*$ on either side of the brackets tells AHK to match everything on either side of the brackets.
One more thing. Notice the s option to the right of the regex. It tells the parser that a dot (which means any character) should also include newlines. Different regex parsers have different ways of declaring options, but in AHK you need to preface your needle with s). i.e.
Clipboard := RegExReplace(Clipboard, "s)^.*(IP: \d+.\d+.\d+.\d+).*$", "$1")
Sure, it's just a regex that replaces ( or ) with a blank string. The forward slashes denote a regex literal. The pipe | is an 'or', and the back slashes escape the parentheses, since parentheses are special characters in regex. The g is a flag that tells it to replace all matches, rather than just the first one.
A good resource for playing with regular expressions is regex101.
What do you mean by: > what I would really like is the text up to the second to the last colon
Do you mean that you want all the text from the 2nd : to the last : ?
If so, try <code>THIS</code>... If not please explain clearly what you want ? what is the expected result you want that .+:
nor my example dont give ?
Weird question :>
What about /(?=(\d{1}))(?=(\d{2})?)(?=(\d{3})?)(?=(\d{4})?)(?=(\d{5})?)/g
It uses lookaheads, depending on the language you're using (you didnt say) it may or may not be supported!
This case is easy... Try this
> ^([^-]+-)([^(]+)( [^.]+)(.{4})$
and replace with: $1$3$2$4
It will work for the examples you gave above, so be sure there's no special cases where it wont work before renaming !!
For example if the 1st group between parentheses contains a dot, the patern wont work, and it should be modified to the following for example:
> ^([^-]+-)([^(]+)( .+)(.{4})$
I hope it will work directly in TC since im too lazy to test it xD
Regex makes this simple, and really it's what you'd use to match in a performance sensitive environment anyways.
Something like:
/^\W*gg\W*\n?$/i
should match gg, GG, or any combination of that with spaces afterward and before but nothing else.
For an example, test it out here and try to beat it.