Why not run tests on the server locally? Or are you more worried about the network, not the server?
Edit 1: http://seleniumhq.org/ would be a good fit for local testing, and you could use test the whole web app, not just the 'load-test' page you're using.
Look into and read about Selenium, it has libraries for many different languages and can automate browser events. It's an automated "browser driver".
From the site: >Selenium automates browsers. That's it! What you do with that power is entirely up to you. Primarily, it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should!) also be automated as well.
Yes, this is entirely possible.
You will probably need a HTML Parser library like JSoup and/or a browser automation tool, like Selenium Webdriver.
Both libraries are very well documented.
Your going to want to use something like Ghost.py or Selenium so that you can execute the javascript as a browser would. You will also want to make sure that any bot you code up uses a popular browsers User Agent and accesses the page as though it were human – AKA, at a human-like pace with some random variability between each of your requests.
And remember, "don't be evil." The last thing you want to do is wake up one day to realize you're as morally vacuous as Zuckerberg himself.
I generally script something up if there are libraries available to interface with the service. Barring that, you can make use of Wireshark/scapy/whatever to craft or replay the traffic you need. If you're testing web stuff, I'm a fan of Selenium.
Depending on what you're trying to do, you may want to look at selenium. I've only used it for testing, but there's nothing that says you can't use it to automate behavior using a browser of your choice.
It has a number of APi's for various languages, including Java.
This is more of a question about importing packages than selenium.
Once you import a package it’s in the namespace and you can reuse it as much as you want. So no you don’t have to re-import anything.
I don’t use selenium, but the next question you seem to be asking is if you need to re-initialize the object.
from selenium import webdriver
browser = webdriver.Firefox() browser.get('http://seleniumhq.org/')
If you need multiple pages you can create multiple browser objects. But you should only need to do that if you need both pages at once.
You should be able to reuse the same object.
from selenium import webdriver
browser = webdriver.Firefox() browser.get('http://seleniumhq.org/') # do something. browser.get(‘http://second page’)
I once tried selenium. But it took too much time to implement and I didn't got paid for it so I stopped working on it.
What I learned from my implementation..
It would be great if someone writes a standard library for testing magento - where stuff like login into frontend/backend, browse to product, browse to cart, get product-price, get cart price.. is already implemented.
It might be easiest for you to just use Selenium. It's supremely easy to get a little script together, and if you need more advanced functionality it has pretty convenient hooks into several languages, including python.
> probably needs to be rewritten from scratch anyway
Nah. A rewite has to be a mistake - http://www.joelonsoftware.com/articles/fog0000000069.html.
Get a debugger, put some breakpoints in to see what is going on.
You'll soon see patterns of how the original developer was naming things. NetBeans has some good global search tools for finding and matching up the URLs, the POSTs and the variables that are being used.
Get a test server up. Get http://seleniumhq.org working. Now make your changes.
Try Selenium (http://seleniumhq.org/). You can install the firefox addon and make a script that scroll to the bottom of the page firing the ajax events to load all the content and take a screenshot (all done as a programmable "macro").
If you want it fully automated, you can look into setting up a selenium server and using the same script run via cron-job :)
This doesn't help very much, if at all. The main documentation with selenium is better: Documentation
Also the Selenium IDE plugin for Firefox helps with the first few tests: Selenium IDE
I used http://www.screenr.com/ before to record a demos. its best if you narrate to the video, make it as humane as possible. Automated process is technically possible (thinking of http://seleniumhq.org/ ) but not sure if there would be a market for that
Thanks for this, I never knew this existed. I have to take screenshots of webpages in multiple browsers and I use Selenium to do this. If you are aware of Selenium then you may also know that there is an issue with Selenium RC and it doesn't work with Opera. This will be very useful for me.
I used beautiful soup (written in python) to scrape millions of pages. Really easy to set up and run. Fast, multi-thread, multi-machine. No js support though.
I also use firefox/selenium to scrape js heavy sites (this will give you xpath (XPath checker) and the rest) - dump this info out into 'reports' (some sort of HTML dump) then parse these with php. I grab DOM nodes, simulate clicks and mouse overs, then dump DOM nodes. I seperate the mini dumps in the file with \n\n++++SOMESTRINGHERE++++\n\n. Pretty easy to parse with a simple finite state machine (I love fsm's :) )
I run selenium with php scripts, using the remote control thingy:
http://seleniumhq.org/projects/remote-control/
My project has some PHP, the selenium PHP library, selenium server (a JAVA program that controls firefox for you passing messages to and from it.)
Start the selenium server, then run the .php file (php -f scrape.php) firefox pops up, it then controls firefox (choose a DOM node, perform an action e.g. a click) wait for page response (e.g. wait for AJAX to respond) then DOM->node->toText() and save this into a file or DB or whatever.
The fun bit is watching the php browse the web with firefox. You think this comment was written by a human? Think again, I am a simple, 23 line, .php script (and two of those lines are comments).
Hope this helps.
We have "sites" on our intranet. The problem is that the browser.get doesn't even seem to run (even with a working URL). It opens the browser, but doesn't even try to process the URL. In my mind, that script should open the browser, then load http://seleniumhq.org/ (which would just give an error because there is no internet connectivity). The problem is it's not even trying to go to the URL. Do you follow?
Getting data from MS-Office documents is completely different to web scraping (e.g. getting a reddit post, or the contents of a web page).
You could use:
This might be over kill for what you want, but check out Selenium. I've only watched the videos and played around with the firefox extension, but it looks really pretty cool.
It's about 450 lines of Python, using Selenium RC to control a browser. I'm planning to open up the source on GitHub once I've made sure I didn't overlook some password in the source or something. If you'd like to see it in the meantime, I can send it to you.
OP, some advice from one Python programmer to another. People are fucking killing each other right now for QA testers who use Selenium, which automates web browser behavior. I had an internship this summer writing selenium test scripts and I got offered a job as a Senior QA Engineer. Get into it - it's really fucking easy. Go grab yourself a copy of the browser plugin and the Python version of the Webdriver API, then write some scripts that log someone into FB/twitter/etc, sign them up for services, whatever and push them to a git repo. You'll be able to drop out of college and get a job in California like I did. http://seleniumhq.org/
If you want a completely hands-free approach, then selenium is the right tool. Although, it's admittedly got a bit of a learning curve... and more features than you'd need... but is specifically designed for browser automation.
I would have it handle everything from launching the browser, navigating to a page, extracting/submitting data, etc. Use the client data to drive the selenium steps in a loop. For a specific task like this it could be done in <30 lines of ruby.
Yes, it is possible to solve this interesting problem. I would recommend using Selenium. This is a tool that lets you "remote control" web browsers.
There are Selenium libraries in many languages. So I would recommend using a language that is easy to learn, like Python.
Selenium is likely the kind of tool to use for this in an actual stand-alone environment. I've worked with but never deployed or programmed it, so YMMV - it's not super simple but it's very effective.
Now it appears they have a Firefox add-on called Selenium IDE which lets you do simple automation.
I think web scraping is a great way to learn python. It's challenging enough to be interesting, useful enough to be motivating and simple enough to let you focus on learning python.
FYI, if you functional/declarative scraping is more your style, lxml works with XPath. BeautifulSoup is a hugely popular web scraping library, but, as good/popular as it is, you might like lxml's approach better.
Use mechanize to fetch your pages. When you need an actual browser, use WebDriver.
it's linked from the downloads page: http://seleniumhq.org/download/ (API Docs)
So, here's a better link that will always be updated with every selenium release: http://selenium.googlecode.com/svn/trunk/docs/api/rb/index.html
check out selenium -- there's a version called IDE that's a firefox plugin. You can use it to very easily create repeatable tests. Then you could export your test suite and use their Web Driver to run tests for each site you work on daily and report any errors.
You might have better luck and and easier time using Selenium to drive a real browser:
require 'rubygems' require 'selenium-webdriver'
driver = Selenium::WebDriver.for :firefox driver.get "http://google.com"
element = driver.find_element :name => "q" element.send_keys "Cheese!" element.submit
puts "Page title is #{driver.title}"
wait = Selenium::WebDriver::Wait.new(:timeout => 10) wait.until { driver.title.downcase.start_with? "cheese!" }
puts "Page title is #{driver.title}" driver.quit
Try http://seleniumhq.org/ : Runs in the browser, and therefore there won't be any javascript related problems. You can use the firefox plugin to do some tests, and then export those tests as python scripts.