I checked how the word "Europe" translates to all the 24 official Euro languages:
Language | "Europe" | |
---|---|---|
1 | Bulgarian | Европа |
2 | Croatian | Europa |
3 | Czech | Evropa |
4 | Danish | Europa |
5 | Dutch | Europa |
6 | English | Europe |
7 | Estonian | Euroopa |
8 | Finnish | Eurooppa |
9 | French | Europe |
10 | German | Europa |
11 | Greek | Ευρώπη |
12 | Hungarian | Európa |
13 | Irish | An Eoraip |
14 | Italian | Europa |
15 | Latvian | Eiropa |
16 | Lithuanian | Europa |
17 | Maltese | Ewropa |
18 | Polish | Europa |
19 | Portuguese | Europa |
20 | Romanian | Europa |
21 | Slovak | Európa |
22 | Slovenian | Evropa |
23 | Spanish | Europa |
24 | Swedish | Europa |
If we group that by the name:
"Europe" | Language(s) | |
---|---|---|
1 | An Eoraip | Irish |
2 | Eiropa | Latvian |
3 | Euroopa | Estonian |
4 | Eurooppa | Finnish |
5 | Europa | Croatian, Danish, Dutch, German, Italian, Lithuanian, Polish, Portuguese, Romanian, Spanish, Swedish |
6 | Europe | English, French |
7 | Európa | Hungarian, Slovak |
8 | Evropa | Czech, Slovenian |
9 | Ewropa | Maltese |
10 | Ευρώπη | Greek |
11 | Европа | Bulgarian |
I may write a regular expression matching all of these words:
(E[iuvw]r[oó]+p+[ae]|An Eoraip|Ευρώπη|Европа)
Would that make a better flair?
I decided to include An Eoraip, Ευρώπη, and Европа separately, as it would be pointless to try to force it to the first sub-expression, and would dilute it.
You may see it works on this site, by pasting the table with names to the bottom box and the expression I wrote to the top one. Names of Europe will highlight.
As a few people have already pointed out, this is standard regular expression (or regex) syntax, which is supported by the champ select search. To experiment more and discover all the wonderful possibilities of regex (it's really cool) check out these two sites.
http://regexlib.com/CheatSheet.aspx?AspxAutoDetectCookieSupport=1 is a regex cheat sheet, with all the various special characters and the basics of how to use them.
http://regexpal.com/ is a wonderful tool that allows you to type in a regular expression (in your example this would be b.........) and test data (for our purposes that would be all the names of the various champions, each on their own line). Then it shows you all the matches, allowing you to experiment without having to actually be in champ select.
Enjoy!
Edit: For anyone wanting to try this out, I will add the list of champion names for you to copy paste in a reply to this.
(E[iuvw]r[oó]+p+[ae]|An Eoraip|Ευρώπη|Европа)
I used http://regexpal.com/ to test it. Can you make it shorter? (Hopefully without diluting it, it should also not match words that don't mean Europe.)
These are the names of Europe in the official 24 languages of EU:
"Europe" | Language(s) | |
---|---|---|
1 | An Eoraip | Irish |
2 | Eiropa | Latvian |
3 | Euroopa | Estonian |
4 | Eurooppa | Finnish |
5 | Europa | Croatian, Danish, Dutch, German, Italian, Lithuanian, Polish, Portuguese, Romanian, Spanish, Swedish |
6 | Europe | English, French |
7 | Európa | Hungarian, Slovak |
8 | Evropa | Czech, Slovenian |
9 | Ewropa | Maltese |
10 | Ευρώπη | Greek |
11 | Европа | Bulgarian |
They're basically strings of characters that identify patterns in other strings. Using them you can easily parse large blocks of text data or determine if some piece of data is in the proper format for whatever application you're using.
As an example the regex (which I just looked up on stackoverflow here) :
^[_a-z0-9-]+(.[_a-z0-9-]+)@[a-z0-9-]+(.[a-z0-9-]+)(.[a-z]{2,4})$
Will match most email addresses and will reject any string that is not in a valid format for an email address. Using this you can check whether or not the data that a user gave you is correct or not (and if not, give them an appropriate error).
PS go here to practice forming regexes and this will test any regex against any string, highlighting all matches.
You should be using .match()
, not .search()
, because you want to test if the entire string matches a regex, not test if a regex is in a string.
Use this tool: http://regexpal.com/
Use the ^ and the $ characters first, to match the beginning and end of the string. Then, use the +, *, and {} operators to match things a number of times.
So, let's start with:
^.*$
This matches the beginning of the string, then any character (.
) zero or more times (*
), then the end of the string.
Let's add the @
character:
^.+$
Now, the regular expression matches the beginning of the string, then any character one or more times (+
), then the @
character, then any character one or more times. So, there has to be something before and after the @ symbol.
Ok, now we get into the square brackets.
^[a-zA-Z0-9].+$
Beginning, then match any lowercase, any uppercase, or any digit character one or more times, then the @ symbol, then anything one or more times, then the end.
What's wrong with [com|edu|org]
? This would match a single character in the sets com, or edu, or org. Basically, it's the same as saying [comeduorg]
. Remember that the square brackets only match with one character unless you add the +. *, or {} operators.
Keep at it! You're almost there. :D
a) literally google "tutorial on regular expressions". (http://en.wikipedia.org/wiki/Regular_expression (they are actually called 'regular expressions'... Just like R, but much easier to search for))
b) in R, you can get help on it, but it is more refresher than tutorial. ?gsub, ?regex
c) practice on a site like: http://regexpal.com/ which lets you see what you are matching live.
Regular expressions will make working with text much much easier.
in this case, it would be newstring=gsub(" ", "", oldstring)
to replace all spaces in oldstring
with nothing. Replacing with %20 (the URL way of encoding a space) might be more helpful to read.csv, but as cruyf8 notes, RCurl is the package for reading things off the internet.
> [pdf|zip]
That is a character class. You'll only match one of those, not the string. Change to parentheses and you should be okay. For future reference, regexpal will be handy. It helped me spot the error rather quickly.
> (//)+
Do you really want to match any number of these? Is that a legal URL format?
> [\w\d:#@%/;$()~_?+-=\.&]
Might be easier to use exclusions in this case. Actually, .*?
for non-greedy matching may be sufficient, if you're engine supports it.
Regex is OK once you get the basics - which is all you need for Django.
I struggled for a while, but found this YouTube vid really helpful: https://www.youtube.com/watch?v=kWyoYtvJpe4
and http://regexpal.com/ the Quick Reference is very handy.
[\w-]+ .... [\w-] means any word character (a-z, 1-9) or -. The + then means one or more of them. So [w-]+ actually means any number of non-punctuation characters (plus the minus sign).
In your example this allows the slug format of /category/this-is-a-slug/
\d+ means 1 or more digits (0-9).
'\w+' is a standard RE pattern which means "match any word but not the punctuation", so the command will return all the words:
>>> re.findall('\w+', "all, the, words") >> ['all', 'the', 'words']
The best to learn about regular expressions is (1) google what you want to find, like "regex all numbers", and (2) try the regular expression in an interactive console, like Regexpal.
Here ya go!
[^/](?=.[^.]$)
I'll break it down for you too, so you understand what's going on.
The first chunk, [^/]*, matches all characters in the string that aren't a forward slash.
(?=) is what's called a positive lookahead non-capturing group. The phrase inside, \.[^\.]*$, looks for a period, then an indefinite number of non-periods, and finally $, which designates the end of the string.
What that expression matches is any number of characters between the last period and the end of the string. However, that is a non-capturing group... which means the characters matched by that expression are not included in the results returned.
PS: Pro tip - regexpal is the best site ever for testing regex shit. You can paste your URL and my regex into that site and easily see it at work.
try this on for size /http.+?(jpg|gif)/
you can test it out here. http://regexpal.com/
This is a pretty good website for regular expressions
Javascript
function typoglycemia(s){ var words = s.split(/[^a-zA-Z]+/); var punctuation = s.split(/[a-zA-Z]+/); punctuation.shift(); words.pop(); var punctuationIndex = 0; var string = ""; for(var i = 0; i < words.length; ++i){ if(words[i].length <= 2){ string += words[i] + punctuation[punctuationIndex]; ++punctuationIndex; continue; } var chars = words[i].split(""); var newWord = []; newWord[0] = chars[0]; newWord[chars.length-1] = chars[chars.length-1]; var possibleChars = []; for(var p = 1; p < chars.length-1; ++p){ possibleChars.push(chars[p]); newWord[p] = -1; } for(var p = 1; p < newWord.length-1; ++p){ newWord[p] = possibleChars.splice(Math.floor(Math.random()*possibleChars.length), 1)[0]; } string += newWord.join("") + punctuation[punctuationIndex]; ++punctuationIndex; } return string; }
var text = "According to a research team at Cambridge University, it doesn't matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole. Such a condition is appropriately called Typoglycemia."; console.log(typoglycemia(text));
Correctly handles punctuation. This ended up being a lot more messy than anticipated. Also, apparently regexpal and node have subtly different interpretations of matching at the start and end the text.
No problem, I always enjoy a regex challenge :). While there are differing versions of regex, they do 'tend' to use very similar syntax. I use RegExPal a lot at work when testing new expressions
The part in between the slashes is a regular expression. So it's testing that the value of the "name" input is not empty and that it looks sort of like an e-mail address (start of string, then one or more characters, then an "@" sign, then some more characters, then one more letter (I think that the "." right before the \w should be . to make it an actual dot), then 2 to 4 alphanumeric letters, then the end of the string). I like to use http://regexpal.com/ to explore these.
notepad++ with some masterful regex should get the job done. Here's a good place to test your code http://regexpal.com/
Something like:
"(.*?)"
This regex is really simple though, and might break on some characters. Backup your files beforehand and test. If that doesn't work fully, try some others like the ones found here or here.
edit: This RegEx found on SO seems to be really thorough and works on my end.
(["'])((?:(?!\1)[^\]|(?:\\)|\.)?)\1
Find with that, replace with "Todd."
Hmm... I love cauliflower, too. :p
Try http://www.regular-expressions.info/
I've never used their tutorials specifically, but in general their info is always spot-on and helpful.
If you can afford it, I'd also strongly recommend getting RegexBuddy. If you don't want to drop the money on a license for that, there's also the free http://regexpal.com/, although that's JavaScript-based so it's got some limitations (notably JS lacks lookbehinds), but it should still be plenty good for learning the basic regex stuff that will serve you 99% of the time.
The reason RegexBuddy and RegexPal are helpful for learning is you can see in real-time what your regex is matching and what effect changing it has on how it matches.
Ok. So I'm worried I'm missing something critical. My (seriously lacking) regex experience tells me that we can make only 8 character length strings. And these can by any combination of a,h,d,|, and n through t (inclusive).
I just blindly copy/pasted to regexpal to verify.
So, the reason I feel that I must be missing something helpful, is it would seem one would have to hash <= ~~10^8 or 100 million~~ 214,358,881 combinations?
I guess I want to know: is this intended to be a programming puzzle or something someone clever can pull off with just a notepad and terminal?
Pretty cool. Seems to check out at http://regexpal.com/, as far as the match is concerned. http://txt2re.com/ is another good resource.
I was going to suggest a \d at the end, but I see now it doesn't matter since that info isn't being used...
What kind of example are you looking for? If it's the code for finding the hashtags, I'm afraid you'll have to figure that one out. ;)
Head over to regexpal and test out the regex and this example:
#we all #live #in #a #yellow #submersible
One of the best tools for debugging is more print statements. Try printing out the results of every call you make so you can see what the program is doing at each step.
Cheers!
This isn't exactly clear. A specific four letter word or a generic four letter word? Or four characters separated by spaces? Or separated by a comma?
In the simplistic case, you could do " [a-zA-Z]{4} " -- assuming the word is separated by spaces. That would match any combination of four upper and lower case letters surrounded by spaces.
The dots you have match any character, not just letters. It will also match numbers, punctuation, spaces -- anything but EOL & EOF basically.
This site:
Will help you figure out your regex.
Good luck on your homework.
It is only possible to capture users going backwards in the funnel when using the Goal Flow report. In the Funnel Visualization you would see the backward move as an exit.
You must have used the right regex, but it is always good to test them with some URLs we might doubt if they will trigger or not. I usually test every regex code here: http://regexpal.com/
Can I ask what language you're using and how you're setting it up?
The python documentation on the re module is quite good. You might want to also check additional info or so called online regex debuggers like this one
Depending on how sophisticated your parsing is supposed to be you might also want to look into parser generators.
oh christ... My life has been nothing but regular expressions for the past 5 months. I learned most everything off regex pal http://regexpal.com/ just wait till you start using tools that use different regex syntax!! LOVELY then you'll want to hang yourself like me!
in all seriousness regex isn't that bad its actually kinda fun once you get the hang of it.... and I just figured out GROK filtering with logstash which is really cool actually.. I still hate all logs though. but they are better than bad, they're are good!
there are programmers that will never use regex's since they will never encounter them (rare) but possible. yet every programmers that want to succeed in what he/she is doing should really understand regexs. this tool http://regexpal.com/ is really good. but if you would like to understand it on a university level i suggest going to coursera and doing the automata course :)
For your reference, this is the regular expression I placed in the content filter. I suck at regex, so it took me a while to come up with a rule that didn't throw false positives.
Use a regex tester to verify the rule acts as you expect, such as http://regexpal.com/
Rule: .rocks>?$
honestly - i code a bit and do my fair share of scraping.
and it is TOUGH for me.
i use: http://regexpal.com/ to test out ANYTHING first, and additionally to help make sense of things for myself.
i love that site.
edit to add: paste your data in the top box. then MANUALLY type out your regex. see what get's highlighted. you should have some "Huh! Look at that" moments along the way.
Essentially, you'll want to capture the stuff between <form>
and </form>
.
Ok, but the <form>
tag can have attributes. So lets start writing some regex:
<form>(.*)<\/form>
This will capture the text between two form tags, but only when the form tag has no attributes (normally form tags do have attributes, I suggest you try to start from there to keep going, this site can be good to test on.
Yes, that's it.
I always wonder if I'm being an asshole by not giving the answer right away, but it looks like you got it. Only one more thing: For future reference, try stuff out here. Even if you know regex reasonably well, you're going to want to try stuff out to make sure it works the way you think it does.
I programmed for years and never once understood regex... until one day I got involved in MUDs. Streams-of-text-based games that lend themselves to situations where regex is needed to interact more fluidly with the game world. I can't recommend them enough for organic regex problem solving, not to mention they are a lot of fun.
Nobody in the history of ever has read up on regex, used it, then known it. You need to repeatedly attack problems with it over a long period of time before anything is going to sink in.
As for resources, I use:
regex info for both an in depth tutorial and comprehensive reference, and
regexpal for testing
Another possibility is non greedy matching.
For example, to match everything between quotes, you'd use: ".*?"
.
As you probably know, ".*"
matches anything between these special characters. However, this matching is greedy. If your text is "hello" "world"
, then you'd match the entire string instead of two separate strings. The solution is *?
, which is a non-greedy match. Non-greedy means "match as little as possible".
Check it out: <code>".*"</code> vs <code>".*?"</code>
This is a "Javascript" tester but a bit more simple for your basic "see if this finds that". Good for practice if you just want to paste in a paragraph and find something out of it. /r/javascript was doing Regex Tuesday a while back (dunno if he kept it up) but you can find the challenges here or find some regex golf on Google.
Also: http://regexpal.com/ to test your regex search terms.
And I think you need \[.*\]
. stands for any character, * stands for any count of the previously keyed search term. so .* stands for any count of any character.
Here are some tools where you input your own source text and can then try to type a regex to match a part of that text:
I just checked and \bhamburger\b
matches perfectly in JavaScript. http://regexpal.com/?flags=&regex=%5Cbhamburger%5Cb&input=if\(hamburger%3E30\)%20%7B%0D%0A%20%20hamburgerhelper%3Dpizza\(\)%3B%0D%0A%7D