EDIT4: tldr: two simple regexp-based solutions, one that finds the words where vowels are correctly ordered:
/^[^eiou][^aiou][^aeou][^aeiu][^aeio]*$/
and an equivalent one that finds the words where vowels are not correctly ordered:
/e.a|i.[ae]|o.[aei]|u.[aeio]/
ORIGINAL POST:
/e.a|i.[ae]|o.[aei]|u.[aeio]/
Basically, to check if all the vowels are correctly ordered, we just need to check that there are no vowels badly ordered. You can see in http://rubular.com/r/8TFTbzKfAd that it correctly match the three last words.
/e.*?a|i.*?[ae]|o.*?[aei]|u.*?[aeio]/
may or may not be quicker by using the non-greedy version of *
.
EDIT: to correct your regexp, you need to modify some things:
a
so .
need to become [^aeiou]*
a*[^aeiou]*
becomes (a*[^aeiou]*)*
^
and $
So your regexp would become for example (http://rubular.com/r/ZG5yxwuSvw):
/^[^aeiou](a[^aeiou])(e[^aeiou])(i[^aeiou])(o[^aeiou])(u[^aeiou])*$/
You can also see that we actually find the same result as /u/amdpox: by removing /[\^aeiou]/
from the string, we only need to check /^a*e*i*o*u*$/
EDIT2: In fact, the reason your regexp matches everything is that a word where the vowels are bardly ordered is simply the concatenation of multiples subwords where the vowels are well ordered ("Broke" = "Brok" + "e"
), and your regexp finds all these subwords. You can verify if by running "Broke".scan(/.a*[^aeiou]*e*[^aeiou]*i*[^aeiou]*o*[^aeiou]*u*[^aeiou]*/)
EDIT3: the first regexp was originally /e(!?.*a)|i(!?.*[ae])|o(!?.*[aei])|u(!?.*[aeio])/
I use Chris Pine's book to teach newbies in my job, all of whom have zero programming experience. It's a great book, and it teaches good habits (not to mention the Ruby way) right from the get-go.
It's a great place to start. Once you've completed everything up to blocks and procs, I'd give this site a shot: https://rubymonk.com/
It's quite thorough, and if you've got a grasp of the basics, you'll get it quickly. Don't be afraid to Google and bookmark all of the ruby doc! Here are two reference sites that are invaluable:
And if you have any questions, post them over at /r/askruby!
Good luck!
It is passing because the a and the b match. I'm not sure what your exact specs are but this one will satisfy "match on strings which contain only characters from the character class"
\A[abc]{,4}\z
This is checking that the line starts and ends with a, b or c with no other characters. You might even be able to simplify it to
\A[abc]+\z
rubular.com is good for playing around with regexes. Hope it helped!
So you want to match the blocks in the encapsulated area as well?
I've edited the regex and added the underscore as separator: https://regex101.com/r/2hLzaQ/2 You might need to clean the ">" in your matches. It also works in ruby: http://rubular.com/r/is6igkXXsl
I'm not sure how it will turn out with nested replies. Keep me posted :_)
I use Rubular for testing regular expressions, it makes it a lot easier! Anyway, this should work (\[?<\w+>\]?)
.
To explain it, everything within brackets () is a group, so we want to return the full match in this case. Next we optionally match [
, and because this character has a special meaning in regular expressions we need to escape it. The character is optional as it is followed by ?
. next we match the <
character and an unknown number of alphanumeric characters \w+
and the rest of the expression is pretty much the same but with the reversed characters. In summary:
()
is a group?
optionally matches the preceeding character\w
matches alphanumeric characters and \w+
matches many of themEdit: I just saw /u/Updatebjarni's answer and he's right, you should us .*
instead of \w+
as my origional regexp would break if the string contained punctuation.
Here's an example in Ruby that will capture the things that you want from the match so that you can do something like this (mind the quotes):
str = '[Unwanted things]<img src="[Unwanted things]http://r.jpg" />[Unwanted things]' regexp = /.(<img src=").(http:\/\/[^"]+").(\/>)./ str.sub(regexp, '\1\2\3')
Beware, though, that those .*'s might match more than you expect. Use with discretion depending on how much you trust your input :)
edit: added disclaimer
This didn't seem to work for me on rubular, and it doesn't really make sense to me that it would... The first capture group can match anything, and so it matches 2 or more of anything...? But without a backreference how can you know it's the same thing being matched? I do think the following would work /(.{2,}?)\1/ note that I'm assuming that the substring must be greater than 2 in length, but any number n could be substituted for 2 (litereally do s/2/n on the regex).
edit: fixed it I think
This will match the whole thing, tested it with http://rubular.com/:
[-a-zA-Z0-9@:%.+~#=]{2,256}.[a-z]{2,6}\b([-a-zA-Z0-9@:%+.,!~#?&\/=()*]*)
I only had to remove some spaces in the string, if there really in you have to adapt the regex for that.
Correct answer to your question is to use regular expressions. Here's an online example so you can play with it.
"(\w+)" seems to parse your input format perfectly.
Here's a C++ example:
https://stackoverflow.com/questions/21667295/how-to-match-multiple-results-using-stdregex
Yes definitely!
Use the match action with the following regex to find the whole string.
Useful_Number: [0-9]{5}
Here's what it matches: http://rubular.com/r/2HcqTjI1vd
Then once you have this out of the text you can use replace action, to just get the number.
That is a regular expression. They are used in many programming languages. They are used in pattern matching. Anything between the // characters is a pattern that will be searched for and returned, generally.
So if your text is:
The quick brown broth bragged brah.
And your regular expression was /bro[a-z][a-z]/ it would match brown and broth.
I am not sure what \w and \2* match. They seem to match everything to me. You can play around with Ruby regex here: Rubular
.'s are used to notate any possible single character. Check out http://rubular.com/ to get an idea for how all of this works. It's also a good way to come up with new regular expressions, just keep RMT spam that is getting through saved and occasionally go into here and see what you can come up with to block the new spam.
Regular expressions are sort of like a filter -- basically, it looks for a pattern and lets you know if anything matches. This is the site I use most frequently when I'm trying to figure out regular expressions.
As for solving your problem specifically, here's a quick most-of-the-way solution for you:
/\A\d+\s-\s(.+)\sBWV/
(You might not need the / at the start and end)
In this particular case, we're looking at the beginning of the string ( \A ), finding 1 or more numbers ( \d+ ) and a dash ( ** - ** ), then taking all the characters between that and the closing BWV ( (.+) ). The parentheses create what's called a matching group, so any time the RegEx finds a pattern that matches the whole RegEx, it comes back with just the stuff inside the matching group.
Here's the Rubular set of test cases, which you can play around with.
I don't know much about Name Mangler or how it implements regular expressions, but something like this should be pretty close to functional for you. You might need to ditch the \A at the start, and you might need to lose the opening and closing /
Hope this helps!
It works with the first example, but here is another one that should work in theory: http://rubular.com/r/DTa59pQru4
The second block doesn't match for some reason.
I'm getting these examples from my batch of emails, so I have to replace the actual text with placeholders for obvious reasons.
Awesome!. Regexes can be pretty tricky, especially since you need to worry about string escape sequences as well. It can be extremely useful to use a regex tester to hepp you bild your expression.
http://www.regexplanet.com/advanced/java/index.html Is a Java specific option. It's a great tool that gives you info on matching as well as how replacement will work, especially useful is that it gives you the exact Java string literal you need to paste into your code to make it work.
I also like http://rubular.com. It's based on Ruby regular expressions but regex are mostly universal (though the features do differ slightly). I like this one because of the reference guide right on the page below the input box and the live updating of your matches.
There are tons of these sites though so take a look around.
Regex is very different from a normal language like ruby: instead of using readable keywords, it relies on cryptic one-letter commands. But it's really just a more advanced String#include?
.
If it helps, there's a human-readable implementation of regular expressions: https://github.com/andrewberls/regularity
And also a very useful service to test your regexes: http://rubular.com/
I'm pretty sure I have all the regex rules memorized, but everytime I need to do a nontrivial regex, I head over to rubular.com to test it out. regexr is also pretty good.
This should do it: https://workflow.is/workflows/6cce61345515490e8143766e33b1e4e7
Note that . doesn't match new lines. You need to remove them first with the the \n
I personally use http://rubular.com to test.
Hope that helps.
It's like the escaping here on Reddit.
The question is: What is eventually reaching the regular expression parser?
With Ruby (e.g., http://rubular.com), you don't type your RegEx in a standard string, so any escapes that appear inside the RegEx are directly received by the RegEx parser.
But in some environments, you construct a string to send to the parser, so that language's normal string escaping happens first and then the resulting string is sent to the RegEx parser.
I don't know about Atom, because I've not got around to hacking it myself, but it sound's like this is what might be happening here.
Looking at one of the grammar files on my machine, that appears to be exactly what's happening:
'patterns': [ { 'captures': '1': 'name': 'punctuation.definition.keyword.tex' 'match': '(\\)(backmatter|...|mainmatter|if(case|...|vmode|void|x)?)\b' 'name': 'keyword.control.tex' }
See, first this JSON is read as a string, so regular string escaping is handled, turning \\b
into \b
(and \\\\
into \\
), and then it is interpreted as a RegEx, so the \b
is read as "word boundary" and \\
becomes \
.
By the way, the "filter results..." input box accepts regular expressions. If you don't speak regex, here are the basics:
separate terms with "|" and group them with "()". ".*" is a wild-card.
Want to see anything with "nexus" or "galaxy '4' OR '5'"?
nexus 5|galaxy s(4|5)
Anything containing the terms "samsung" followed eventually by "galaxy"
samsung.*galaxy
Assuming when you say "both even" you mean something like 88 or both-odd you mean something like 77, my assumption is you could use ranges:
[13579]
will catch any odd digit anywhere. `[13579]{2} will catch two of 'em in a row (this will also pass as a match on something like 188 -- so make sure to account for whatever's at the beginning and end of your two-digit numbers).
(You can do a similar thing for even digits)
Your solution so far of \d*[13579]
isn't too far off, but it'll match 0 or more digit characters followed by an odd number -- so it'll also flag a match on something like 27.
If you want to grab any two digit number (where both digits are even or both digits are odd), you can use the |
(or) operator:
[13579]{2}|[24680]{2}
If you want to play around with this, here's a Rubular link. (There are some slight differences between how regexes get used in Ruby v. how they get used in Java, but none of those differences are likely to matter here.)
Not really, ^
and $
are for start and end of any line in the string: http://rubular.com/r/cIk0slvsx1
While \A
and \z
are for start and end of the whole string: http://rubular.com/r/lT4Nz0iyWq
To clarify why your tries aren’t working:
\w+\z
translates to “one or more word characters (e.g. letters, but not whitespace) in a row followed by the end of the string”. So, that’d match a string like ”foo bar”
, but not ”foo bar “
. I suppose you could fix that by doing (\w)+\s+\z
. That would translate to “one or more word characters, followed by one or more spaces, followed by the end of the string, and only capture the word characters.”
(\w+{2})\w+
I’m not 100% sure that’s valid regex. You’ve got a pattern (\w
). You modify it by adding +
after it, which means “one or more of those’. You then modify that by adding {2}
afterwards, which means “exactly two of those.” So, x+
matches x
or xxxx
. x{2}
matches xx
. I’m not sure exactly what x+{2}
does, but I suspect it’s not what you want.
\w+(?!\w)
translates to “one or more word characters followed by something that’s not a word character. So, the first thing it finds of that description is ”word1 “
, so it stops there.
By the way, if you want to play around with a regex, http://rubular.com/ is a great tool for that.
already a few great responses so the only thing i'd add is that they look scary at first glance, but once you get the hang of it, they can be very helpful (and begin to look less scary..)
i found it really helpful to play around with a regex tool for w/e language you write in (e.g., rubular for ruby) -- it's helpful/fun to put in a few test cases and try to match certain parts of them
If you're using Ruby, you can check out Rubular, which lets you hack together a regex and see captures on a test string in real time.
Whenever I have to write one, I go to Rubular first and do some testing.
http://rubular.com/
http://txt2re.com/
http://gskinner.com/RegExr/
In order of how much I use each. Rubular is good for fine tuning regex, but the AJAX can be a little frustrating sometimes. txt2re is good for quickly generating regex, but can be a little greedy. RegExr is another tool like Rubular, I don't use it that much.
I really like using Rubular to check if my regex's are right. Why is it necessary to do this all in one regex? Also, to check, rather simply, if it's alphanumeric and at least 8 chars, this works: [0-9a-zA-Z]{8,}. To check if it contains at least 2 numbers:
[A-Za-z][0-9]+[A-Za-z][0-9]+[a-zA-Z]*
The problem you're going to have is that regular expressions are regular grammars - they are essentially finite state machines. I don't think it's possible to do this in one regular expression... Also, strlen.
I played around with the regex, probably better not to try it. Given what he's told us, the only thing that really differentiates the string that we want is that it contains letters and numbers, but no ":". Assuming that it have at least one letter, one number, and no colons:
(\d+[A-Za-z]\w+(?!=:)|[A-Za-z]+\d\w+(?!=:))+
Is ugly, but it matches!
Keep in mind that we've char classes in re
. Your [A-Za-Z0-9]
is almost the exact thing as \w
(the class also matches '_'). Regex is ugly enough without using shortcuts. As an example, check out what I came up with.
docs: https://docs.python.org/3/library/re.html#regular-expression-syntax
Your problem was in not specifying what you really wanted and what steps you tried to accomplish it. You would have got a better answer if you went ahead and implemented your lastIndexOf approach (this could work btw, splitting on \n and then using last instance of it to start a substring was a valid approach). Regex is a more powerful version of it (since it can match whole patterns, not just individual characters). Like so:
http://rubular.com/r/g3LcyZbv7E
It wouldn't be necessary in your case but it could be useful if you came across a pattern more complex than just "\n". Best answer would be probably a String split method (which accepts regexes as arguments) since it's a one liner and it does exactly what you asked for:
https://stackoverflow.com/questions/3481828/how-to-split-a-string-in-java
No, yours is more specific then mine. It all depends on the input an how messy this is and what you want to check. If you want to check your regex:
Great and simple site.
It's for another language (Ruby) but http://rubular.com works well for testing regular expressions.
For me, I like to use the terminal and the re
library to test things.
Something like this to find if any of the strings match the pattern:
> import re > > with open("tests.txt") as f: > ....sentences = f.readlines() > > pattern = r'[a-zA-Z]' > for sentence in sentences: > ....if re.search(pattern, sentence): > ........print(pattern)
Just a sidenote, you can use curly brackets to customize this behaviour as well.
a{1,} = one or more of a
a{2,} = two or more of a
a{,3} = up to 3 of a
a{3} = exactly 3 of a
a{3,5} = 3 to 5 of a
Rubular is pretty good to try out regex in realtime, its for Ruby but the regex should be basically the same I think.
Edit: Found a Python one, I haven't tried it though
not powershell specific, but im a big fan of Rubular (http://rubular.com/) Let you interactively build and test your regex. Once you have it working there, you can essentially:
$emailRegex = "......." $testStr -match $emailRegex $matches
This is probably all you need: https://workflow.is/workflows/2f89abc0a57640d6af5b609d541baabd
I use this site to try them out before pasting into workflow as it allows me to see the results in realtime. http://rubular.com
Regular expressions are magical to many simply because their underlying mechanisms of operation are seldom explained. If you want a really deep dive into how regexen are implemented, check out automata theory. It's scary-sounding and abstract, but all regular expression libraries are just implementations of finite state automata.
If all you need is a thorough treatment of regular expression features, there are plenty of books on the subject. Mastering Regular Expressions (O'Reilly) by Jeffrey Friedl is perhaps the best book I've ever read on regex. Beyond the comprehensive and competent treatment of almost every regular expression flavor out there, the book takes some time to explain the basis of regular languages and finite state machines in general.
As others have suggested, Rubular is a great playground for Ruby regexen, and I keep a bookmark of that site handy.
"\w" is a matcher for word characters. In most flavors of regex, that means [a-zA-Z0-9_], which is most of the characters that make up words. Hence, it's a shorthand for those characters. "\p{Alnum}" is a different matcher. The "\p" means "match characters with the following Unicode character property:", followed by that property in curly braces "{property}". "Alnum" means alphanumerics, or [a-zA-Z0-9]. So your regex:
/[^\p{Alnum}\p{Space}]/
is matching anything that IS NOT (denoted by the caret, ^) an alphanumeric character or whitespace of any kind. "\p{Space}" matches any whitespace character: space, tab, newline, etc.
I didn't really test it, but when I saw this:
/[!@#$%^&*()_+-={}|:"<>?',.]/
I thought, "you're trying to get rid of anything that IS NOT a word character." That's quite a list, and it's easy to miss some. Easier to find what you want to keep, then gsub the inverse of what you found, using the caret(^), which is what your new regex is doing. For your purposes, "\p{Alnum}\p{Space}" might be better/more accurate than just "\w". If so, killer.
EDIT:
Two other things. One, make this website your new best friend: http://rubular.com
Two, I found this regex reference at one point and I use it CONSTANTLY: http://turdb.in/dump/perl_regex_card.pdf
Ruby more or less uses Perl regex syntax and this is the best quick-reference I've seen.
EDIT TWO: The forward slashes in a regex are the boundaries. /regex/ will match exactly the string "regex". Backslashes are escape characters. Example: normally the period, or dot "." will match any character. If you want to match a literal period, you have to escape it with a backslash, e.g. "\." Other special characters and classes are denoted with backslashes as well. "\w" is one of them, the word character. "\p" is another. "\d" matches any digit. see the reference I linked for more.
The where().ToList() bit is LINQ (Language integrated query) which basically allows you to do a bunch of querying against objects.
The Where function takes a function which given one item from the collection you are querying against, returns either true or false to decide whether the item meets your criteria.
for example
int[] numbers = {1,2,3,4,5}; var greaterThanThree = numbers.Where(num => num > 3).ToList(); // greaterThanThree will now contain 4,5
In this case I gave the function using lambda syntax, a shorthand for anonymous functions
x =>
My anonymous function is run against each item in the array (a string which is placed in the variable x) against a regex.
The regex says
so Regex.IsMatch("> ", ">(\s+)?$") == true and Regex.IsMatch("> hello", ">(\s+)?$") == false because the hello is not the expected end of string
I ask to only get the lines where the string does not match the regex.
To mess around with regex have a look at the following, it is a super handy tool http://rubular.com/r/mYVYVHAdS1
Is that a good solution? If it works I would say it is a good solution, there are lots of ways to do all these things, my way of doing it would probably use the regex in the link I put above and match groups to get the data I require.
Ruby - http://rubular.com/ JS - http://scriptular.com/ Python - https://pythex.org/ (cert is out of date)
To start to get the hang of RegEx, just do some simple filtering on these sites. For me, it started to click after a bit of fooling around. Previously RegEx would send me running for the hills!
I hope these links were helpful!
((\$|USD).(Gil|gil))|((Gil|gil).(\$|USD))
I mainly rely on just this line that blocks out any mention of $ or USD followed or following any mention of gil. A few will still get past, because some use weird formatting but for those I usually just add the name of their group/site onto a different line.
If you want to experiment with it, save any gil seller tells you get into a notepad file and after you get a few paste them all into http://rubular.com/ and see what you can come up with.
I've used rubular a lot because it has a cheat sheet at the bottom, and you can do live tests with it. I can imagine that using both in tandem would be quite nice. You can build an expression and then get it more deeply explained.
It passed for me with this exact code:
def luck_check (str) raise unless str.match(/\A\d+\z/) half = str.size / 2 digits = str.chars.map(&:to_i) digits.first(half).reduce(:+) == digits.last(half).reduce(:+) end
Any whitespace in the string that's passed in will raise an exception, because it won't match the regex.
Try it out on Rubular and see for yourself.