I think R might be a better fit for what you want to do. As a language it's weird (based partially on Scheme, but with objects and a lot of annoying special cases thrown in over the years), but it has MUCH better support for exploring data, visualizing it, and doing statistics than other languages. You might also find the RStudio IDE helpful.
Ask and ye shall receive. (WARNING: PDF)
Very basic. I'd be more interested in seeing stats for the common gossip sites, etc. But my Google fu has proven too weak.
EDIT: Google triumph! It occurred to me to look up one of those egotistical gossip blogger types specifically, rather than general stats. Success: Perez Hilton Reader Demographics.
Hi, I'm a PhD student in CMU Statistics. I can answer general questions about the program and curriculum, or at least point you at people who can.
Linear algebra is part of the required curriculum, as well as multivariate calculus. You'd only need linear algebra your second or third year when you take regression (36-401) and advanced data analysis (36-402), so learning it now may not be best, like /u/trijazzguy says.
Programming is definitely a good idea, though. Python is good if you take the time to learn the data packages (Numpy, Pandas, Matplotlib, etc.), but most of your courses will use R. But honestly it doesn't matter which language you learn, as long as you learn something you find interesting so you get practice thinking like a programmer. Find a little project you're interested in and write some code for it.
Also, take a look at our new majors. You can just do statistics, or you can combine it with economics, machine learning, or math. (I strongly recommend doing mathematical statistics if you're ever interested in going to graduate school or doing stats research -- the math preparation is essential.)
I'm not sure what else you can do to prepare. The CMU program is very good. Many undergraduates decide this means they need to take as many classes as possible every semester, so they spend all their waking hours doing homework and begging for extensions. Don't do that. Try to relax a bit and pick your courses strategically.
There doesn't need to be a 'winner' of the database race and in fact I think you shouldn't even want there to be one - although I appreciate you wanting it to be me :) The competition is a good thing!
The folks at Zam certainly have a lot of advantages over a site like mine, and the traffic knotor receives is still tiny compared to torhead who receives between 1.5M and 2M pageviews a day according to quantcast. But Knotor is growing as more people hear about the site. And if it weren't for sites like mine, what incentive would they have to improve? When was the last time wowhead added any innovative new feature to their site? If I can force them to spend more of their resources on improving torhead, then all the better for everyone playing the game in my opinion.
I had no strong opinions about this article (LIKE MOST OF /R/GAMING SEEMS TO HOLY SHIT WHAT THE FUCK). I agree that it's not fucking Shakespeare or Ebert or what-have-you, but your criticisms are needlessly harsh and come off as a petty ad hominem attack on this reviewer. You know another thing they teach in middle/high school English? Knowing who your audience is.
Quantcast's most recent estimates for ign.com demographics in the US
The majority of IGN's site traffic comes from people with no college education.
I'm not saying the review was written "well" (whatever something that subjective can mean). I'm not saying the review lacked substance (it did). What I am saying is that /r/gaming's collective opinion - that this review reads like a young person babbling - is correct because that was the entire point. If the writing staff at IGN can relate and appeal to their target audience, they are likely to keep that audience. If that means writing in a way that sounds shallow to us but like a "bro" to those who do not have, nor aspire to have, a college education, IGN will supply reviews written that way.
...the end.
EDIT: grammar
If you're serious about learning and using statistics, I'd suggest using <strong>R</strong>. It's free and used widely across college campuses and in the workplace. I did a Stats major at a top 10 stats program and we used R; in addition, my SO does analytics/quant work at a top 4 bank and R is used in some groups there as well.
I haven't used Mathematica, but if you want to be strong at stats (I'd guess this applies to Computational Math as well), you'll need to be able to program. R is most powerful if you learn the R programming language.
With that said, for just some basic regressions, Excel is much easier to learn and use.
R.
Better than a lot of commercial software by many criteria, though it does involve some investment to learn.
By default, it is command-line driven, but I think it's worth learning to use it that way.
There are many, many resources available.
From the article:
> Perhaps when this generation of aging white males dies off, aging women, aging Latino and black males, and young people will become the readers of journals such as The New York Review of Books, and they will endow symphony orchestras.
> I suspect not. And if not, the Left may come to regret its contempt for this particular group. Without aging white males, I doubt the New York Times would survive. How many young people, females, Hispanics, and blacks subscribe to the New York Times?
Ask and ye shall receive (sort of): Here is the quantcast data for the readership of the New York Times online.
Interestingly, 47% of the Times's readership online is female. And while 75% of its readership online is Caucasian, Caucasians underindex for Times readership vs. general internet usage, while African-Americans and Asians overindex.
Takeaway: The reason aging white Republican males like Dennis Prager are dinosaurs is that, in addition to looking down their noses at women and minorities, they can't be bothered to conduct a simple Google search... which is all I did to find this info.
For one you're patronizing the same company right now by using Reddit. About 3.7 million people visit nola.com on an average month so even if everyone on this subreddit never used nola.com we wouldn't even put a dent in it.
Extremely customizable, lots of packages for different types of graphs. There's definitely a learning curve, but it's fantastic for graphics (and statistics). You might want to check out the ggplot2 package.
Edit: forgot to mention that R is free and open source since price is a factor for OP
I use VMware extensively and can highly recommend it.
However, are you tied to OriginLab for any reason? If you are, ignore this :P but if not...
I know nothing about OriginLab beyond their landing page but from my brief glance it looks like R may be able to do what you want. I'm an Econ/Finance undergrad and I use R like crazy for analysis. It's an awesome open-source, cross-platform data crunching beast.
Use R. It is a free and open source software for statistical modelling and analysis.
http://nlp.stanford.edu/manning/courses/ling289/logistic.pdf will tell you how to do a logistic regression appropriately in R.
Let me be the 94th person to recommend R with optional RStudio/tidyverse.
You might also be interested in JASP and JAMOVI, they are free / open source and really good!
The two projects forked in different directions a while back, IIRC the main difference (apart from light cosmetics) is JASP offers a hook into Bayesian/Network analyses, while JAMOVI has stronger links into underlying R code.
I don't do stats, I work in computational science. Most of the specialist software I use was designed primarily on Linux, so compatibility isn't an issue.
But for data analysis, I use Python (and all the lovely packages that come with it) and occasionally Mathematica or MATLAB, both of which run natively. I know a lot of people who do more in-depth stats stuff use R and I have played with it some, but I really don't use it much. However, it is pretty powerful.
For large datasets (which I deal with a lot actually), I usually use HDF5 and the related tools. You can probably use a SQL database for some of the other things you mention, I don't really know as that's not my thing.
In any case, if I do run into something that is Windows only, I just run it in a virtual machine. My workstation is pretty beefy (48 cores worth of Xeon and 512GB of memory), so performance is a non-issue.
R is an excellent and easy platform to start with.
Hopefully you have OSX, because the OSX client is years ahead of the windows one. If not, it still is a good UI.
Start off with the quantmod library. It will get you free data and provides all types of statistical tools to start with.
Everyone has their own style and sees data differently, work with it long enough and you will start to develop your own strategies and algos.
Best of luck!
What is it with mormons and their love affair with inflated membership statistics?
http://siteanalytics.compete.com/exmormon.org/ : January, 2011 monthly stats: 33,422 unique visits / 207,349 total
http://www.quantcast.com/exmormon.org : avg. monthly: 15K people / 471K visits
R with RKWard is quite awesome.
I have used it daily since 2006. Here's video on installing RKWard on windows
This really depends on a number of factors, as there are a different software's available, depending on your price range.
If you're solely looking at statistics, I would suggest R as an introductory software. It is free and not so hard to pick up, as it is simple programming for statistical analysis.
Finally, you should also check out r/gis, the GIS subreddit.
If you're into more hands-on programming, the preferred alternative is R.
They have extensive documentation but I don't think its open source, but it is extremely collaborative and extensible through packages.
Check it out here: http://www.r-project.org/
RStudio is a common IDE for the language.
Or....https://www.r-project.org/ or https://jasp-stats.org/
But yeah it's quite a racket. We had a journal come to our dept. literally an hour ago, and she was saying 'when we charge £5000 to publish your papers open access, we're making a loss'. Taking the absolute piss: there is millions to be made in academia, usually skimming off researchers.
United States traffic to Al-Jazeera's website has almost tripled in under a month.
It went from ~200k to over 600,000. It's also worth noting Quantcast normally low-balls estimates.
>is there a way I can highlight a section of it to modify/delete?
No.
But if you are on windows I believe there is a built in text editor of sorts. Regardless get Rstudio, just install and start it and you have a fullblown editor that communicates automatically with R, one caveat: the grid-like view of your data does not support editing. (if you're a little more courageous there is the more advanced rkward)
>For my main question: I'm working with a time-series dataset
I don't know much about time-series but as far as I know R has special data types for time-series, do a
apropos('ts')
and see if something familiar comes up.
I have little experience myself, but R is a statistics language written by accountants which lets you graph and calculate with data easily. It has a slightly wonky syntax for traditional programming standards though, still, a popular pick. It lets you pull statistics off the web quite easily as well.
(1) for numerical computing: Octave or, better yet, R. R is a statistical package which has lots of numerical functions plus statistical stuff. R is very good for handling observational data of all kinds. http://www.r-project.org
(2) for symbolic computing: Maxima. Idiosyncratic but covers a lot of ground. Incidentally Maxima supplies the symbolic computation capability for Sage. http://maxima.sourceforge.net
The software packages mentioned above are all free software. Octave is comparable to Matlab, R is comparable to Splus, and Maxima is comparable to Mathematica.
I guess it depends what you mean by easy, because I've found R to be very powerful, and not too hard to use. It's free and runs on pretty much any OS. It has a command line interface, but I assume since you're reading /r/compsci that won't scare you.
R is an open source statistic analysis package: http://www.r-project.org/
Not sure what EXEL is, but if you mean Excel, here's a page on importing data from it: http://cran.r-project.org/doc/manuals/R-data.html
The data was cleaned up and formatted in python 3.6. I use Prism, since I like the way the figures look. FYI it's a good Excel alternative if you plan on doing research in medical school.
I googled site reviews for this. Here's what I got.
http://www.scamadviser.com/check-website/dledou.com
http://website.informer.com/dledou.com
Then if you do a reddit search, the only "people" who post about the site are obviously fake accounts. All three accounts have 4 lowercase letters, two numbers., and 1 uppercase (e.g. abcd12E).
I wouldn't trust this at all.
R is a programming language developed by and for people doing statistics (ie not-really-programmers) and has some very useful packages for calculating statistics, rolling dice, simulating decks of cards etc. It's not trivial to learn though.
I usually end up using python that is a scripting/programming language that is one of the easiest to learn and very powerful for making quick scripts that simulate some part of a game and just do that a few million times to you get a feel for if some mechanics work or not. There are some tutorials for non-programmers to learn python out there, but I don't know which one is the best. Anyway writing a few lines of python is in my experience always superior to the old method I used, which was typing in lots of cells in Excel.
I'd recommend using something like ggplot which follows the convention of The Grammar of Graphics. Or, make a skeleton plot in Matlab, export to PDF, and use something like illustrator to make things look aesthetically nice (as suggested by /u/MikeVladimirov). I also really like Graphpad Prism for plotting categorical or experimental data (i.e. things less mathematical)
If you'd like the data:
http://www.quantcast.com/garyjohnson2012.com
He's at 100,000 hits/mo in September.
Jill Stein uses Quantcast, so here numbers are here by comparison:
Some basic demographics for Reddit.
In short, 67% of traffic is 18 to 49 years old, and 59% makes over $60k per year. In addition, 41% have attended college and an additional 15% have attended grad school.
But ad hominem attacks are easier.
For full disclosure -- I work at Quantcast.
Our service is most useful when the site in question is being directly measured by our pixel trackers -- as other posters have noted, the estimates range from reasonably accurate to wildly flawed when the actual data isn't being tracked. Statistical inference is a powerful tool when used carefully, but there is always a confidence interval to take into account. It's entirely possible for wildly inaccurate metrics on sites, as observed by Thirsteh (though I'd be interested in the details, feel free to PM me, sir).
To respond to jonosprings below, we do actually look at cookie deletion rates and our models account for this phenomenon to some extent (noted here, and in more detail here.)
I'm biased because I do a lot of stats stuff, but R is a very different language than any of those. Thinking in terms of vectorized functions rather than loops - which you need to do to be a good R programmer - makes for a significant, challenging change of pace. It also has packages that let you integrate it with C++, Python, and Java, among other languages.
May be overkill, but take a look at RStudio for the R statistical programming language. Fully functional, professional, open-source, statistics IDE. Cannot recommend enough.
then do something like:
x = rnorm(10, mean=2, sd=0.5)
This will generate:
1.75884164412553, 1.96923295305206, 2.02906575504054, 2.84513526976282, 2.2049150744444, 1.73318266409414, 1.62611322640113, 1.2014866750171, 2.0473842615968, 1.92262243708622
I use R (free) and SPSS (paid) and data both proprietary and public (e.g. census data).
My last major project was analyzing cities for growth factors that are likely to increase housing costs for real estate investment opportunities. Data included the obvious geographics (e.g. mountains, lakes, reservations, etc., which prevent urban sprawl), demographics (e.g. age, race, religion, gender, etc), urban developments (schools, hospitals, courts, police stations, highways, parks,etc.), industrial and commercial growth, etc. etc. etc.., and less obvious things like traffic jams and gas stations per capita.
Cheers.
You probably skipped the Materials and methods
section, where it is said:
>"Statistics were done using GraphPad Prism."
My bet: GraphPad Prism produced the picture.
In addition to R or SPSS, I highly recommend JASP (https://jasp-stats.org) - it's a free to use statistical software with an intuitive graphical user interface developed by many Psychology professors across the world. My Psychology professors/supervisors also recommended me to use it when analysing data for my Masters thesis :)
It's simple, quick to install and gets the job done!! Best of all, it's very very intuitive to use!
Look at JASP: https://jasp-stats.org/
It's new and open source, but it has an interface like SPSS and can probably take care of all the basics you need.
Nothing is going to do all your work for you though: you need to understand what you want to do, what tests you want to run and how they work in order to actually present something meaningful
From the FAQ:
> Q. What programming language is JASP written in? > >A. The JASP application is written in C++, using the Qt toolkit. The analyses themselves are written in either R or C++ (python support will be added soon!). The display layer (where the tables are rendered) is written in javascript, and is built on top of jQuery UI and webkit.
Ok, I posted this info before on another posting. If someone wants to look into it the webinformer link on bottom about owner of the sites. I just don't have the time due to family stuff going on. If someone wants to email, or call him (phone number is there all in WHOIS Listing) or just send (Mail him a letter) snail mail to him. All info in on the link on the bottom of my posting. After that we need to get a gofundme page to pay for this after we get info on what is needed.
My OLD POSTING about info below..
> > daqtools.info server is down. I checked the Whois. REG is still good till DEC-07-17. Do we need to see if a gofundme is needed to pay for the server monthly cost? A WHOIS lookup gives all owner info if so. I will not post here per reddit rules. > > Domain Name: DAQTOOLS.INFO > > Registry Domain ID: D27110397-LRMS > > Registrar Registration Expiration Date: 2017-12-07T00:35:56Z > > I am thinking that the hosting is (possibly) being done by 1&1. TRACERT shows ending at 1&1 Hosting.
daqtools.info server is down. I checked the Whois. REG is still good till DEC-07-17. Do we need to see if a gofundme is needed to pay for the server monthly cost? A WHOIS lookup gives all owner info if so. I will not post here per reddit rules.
Domain Name: DAQTOOLS.INFO
Registry Domain ID: D27110397-LRMS
Registrar Registration Expiration Date: 2017-12-07T00:35:56Z
I am thinking that the hosting is (possibly) being done by 1&1. TRACERT shows ending at 1&1 Hosting.
daqtools.info server is down. I checked the Whois. REG is still good till DEC-07-17. Do we need to see if a gofundme is needed to pay for the server monthly cost? A WHOIS lookup gives all owner info if so. I will not post here per reddit rules.
Domain Name: DAQTOOLS.INFO
Registry Domain ID: D27110397-LRMS
Registrar Registration Expiration Date: 2017-12-07T00:35:56Z
I am thinking that the hosting is (possibly) being done by 1&1. TRACERT shows ending at 1&1 Hosting.
Admin Phone: +1.9134926024 Admin Phone Ext: Admin Fax: +1.9138886928 Admin Fax Ext: Admin Email: Registry Tech ID: Tech Name: ERIC N MATT Tech Organization: SYSTEN, LLC Tech Street: 9875 WIDMER ROAD Tech City: LENEXA Tech State/Province: KS Tech Postal Code: 66215 Tech Country: US Tech Phone: +1.9134926024
While this is true this particular email is used in several hundred domains with no activity that date back to 2003. There is a good chance that they are in the business of domain resale or at the very least hope to sell those sometime in the future.
> literally millions of people go on 4chan, it isn't that small of a percentage.
And there are literally billions of people in the world.
It is way less than .001% of the population.
According to Quantcast, 4chan sees about 3.7 million people a month. Even if we're generous and say it's 4 million a month, and even if we're generous and say it's different people every month, that's, at most, 48 million people in the past year.
There's a little over 7 million people in the world. I'll wait for you to do the math.
And that's just the people who visit the website. Cut out the people who visit once or twice and never come back, and the number shrinks. Cut out the people on the boards that are not major users of the *fag slurs (because remember, different boards have different cultures), and the number shrinks more.
The other day I was amazed to see that Twitter is the 4th most visited website in the US. A dust cover isn't hard to change in the future, but for the time being it isn't going anywhere. Smart marketing by Twitter as well.
Absolutely. Check the traffic analysis using tools like Compete and Quantcast (it's way up):
http://siteanalytics.compete.com/yelp.com/
http://www.quantcast.com/yelp.com
The buzz for any service dies down as tech/startup/marketing blogs/news outlets find it less innovative, but mainstream adoption continues well past the "buzz" fading.
Just double checked that at quantcast and it seems you are right.
That feels good, but I still have trouble believing I'm mature enough to be hanging out with adults.
I'm not OP but I am pretty sure he is referring to R, the statistical computing software. You might be familiar with similar programs like SAS, SPSS, or perhaps Minitab since I think they are used often in education.
I'm a bit surprised that he chose R to mention specifically for finance... I worked in M&A for a time, and never had to move beyond Excel and Access. Both programs were VITAL in almost every job I've ever had, from real estate to finance.
You can do relatively simple statistical analysis in SQL. It's probably more common to store the data in a SQL database, and use SQL to select data for processing by statistical software like R.
You can dive into R, which is an extremely powerful statistical programming tool, which is available for free and due to all the available libraries very flexible. The professional statistic packages, like SPSS or S have a rather steep upfront cost, but are more convenient to use...
Linear algebra plays a huge part in graphics programming, and also now in web development in the form of CSS transformations (a lot of this should look familiar starting about halfway down the page: http://franklinta.com/2014/09/08/computing-css-matrix3d-transforms/). Developing a physics engine for a video game and/or CGI sequence is also an exercise in linear algebraic heavy lifting, where you need to be able to simulate realistic movement by generating systems of equations to solve for transfer of energy in collisions, friction, drag, fluid dynamics, etc.
Also, when you get into handling large amounts of data, programming languages like R lean pretty heavily on linear algebra to analyze and present that data in a meaningful way.
The company I work for uses COBAL for its Billing System. It was antiquated 15 years ago with no sign of replacement on the horizon. There are many industries that can't seem to shake COBAL. Becoming proficient in it may help you.
If you're looking to be a Business Analyst, maybe the language you need to know is VBA. The BA's at my company work in Excel, Access, Power Point, and Visio to create project documents which support the development team. Our other analyst also benefit from VBA training since this helps them automate their reporting. VBA is easy to learn with a veritable cornucopia of resources online to guide you.
Analysts can also find use from R: a language specifically designed for statistical reporting: http://www.r-project.org/
I find most non-programmers (as in, people whose primary role is not development) learn a language out of necessity. You may focus your attention on Java, only to begin working for a company who wants you to code in C#. Experience in one language will help you in another--and always looks good on a resume.
In the end, I wouldn't recommend learning COBAL. There are many more modern languages you can learn which will help you achieve your goals.
It might not be 100% corresponding to your situation, but this document on R and FDA compliance is where you want to look, and most likely reference in your business proposal. It's the official position of the R Core team on how to handle the open-source aspect of R in a regulated environment.
I haven't seen too much in-depth information about NBA analytics, but I've seen some that use R.
You can learn about it here: http://www.r-project.org/ And here: http://cran.r-project.org/doc/manuals/R-intro.html
Stats 141: Statistical Computing taught by Duncan Temple Lang
of R fame: http://www.r-project.org/contributors.html
There is a lot going on in the davis stats program and the computational stats program definitely offers the skills to get a job right out of school. You should talk to counselors.
Thanks. I'm sure you've seen the Wiki page already?
http://en.wikipedia.org/wiki/Principal_component_analysis
Good links there. Not sure you need to go as deep as the Wiki page, but here goes. Remember, this is going to be rickety.
On the "math" side:
First, know single variable calc like the back of your hand. Learn multivariable calc. and study the important theorems there. But depending on what you're doing, you may not need too much theory (Real Analysis, etc.). Then you need linear algebra (but not too much, per se, just matrices stuff, rank-nullity, jordan decomposition, etc., so you have the tools to manipulate matrices enough to make them do what you want). Then you need matrix calculus. Because matrices are in general non-commutative, the formalism will be tedious at first but you'll get the hang of it. Remember: do lots of exercises. You need to basically be good at taking first and second derivatives with matrix variables. Also, use computers to your advantage for actual calculations. Use an open source environment such as R. http://www.r-project.org/
On the "statistics" side:
Probability theory (continuous variables and distributions, univariate and multivariate), expectation/variance/covariance/Bayesian analysis/maximum likelihood estimators, etc. You'll need the multivariable calc. here, so learn that first.
Learn basic Econometric concepts such as regressions/regression analysis, which will use all of the above concepts.
Overall, just do some initial fact finding with the help of Wiki and Amazon to compile a list of recommended resources/reading. You'll have to come up with the level of formalism you're going for based on what your goals are for PCA. E.g., re-learning calc with Spivak may be the "best" way to go, however, you'd be drinking from a fire-hose.
Also, if you're taking a linear algebra class that requires MatLab, doesn't the institution with whom you're taking the class provide free access via an institutional license to MatLab and other fee-based programs such as STATA and MATHEMATICA?
Lots of folks seem to like R. However, I've never used it--I tend to stick with Matlab for most things...
EDIT: Link is here: http://www.r-project.org/ and R replaces only a subset of Matlab functionality (a much smaller subset than Octave), as I understand it.
> the curred function F x ... how much code that would have taken you in a non-FP language.
How much code would that take you in Haskell without currying? Eight more characters, for the anonymous function? This is about the weight of the burden in any language with closures but not automatic currying.
> [your] strict CS arguments
You seem a bit narrowly educated. Here: you said that currying was crucial; I've pointed out that it isn't. You can go off and be enlightened, while still preferring the whole package of a particular FP language that leans on currying. Although, here's something to test that preference.
R, and ggplot2. Excellent quality graphics, with all the statistical backing an engineer will ever need.
R is a statistical-analysis language, an implementation of the 'S' language from Bell Labs.
http://www.r-project.org/
ggplot2 is a graphics package for R. http://had.co.nz/ggplot2/
Consider R. Once you get rolling you'll be ten times more productive than in anything else. Most statistical research and much machine learning research is published in R. You get vector and matrix math and publication quality graphics right out of the box. It's not hard to run in parallel and it's free.
The inbuilt plotting stuff in R looks like it'd be just what you want. Personally I use matplotlib via SAGE though. God knows what good it would be to get the slope of a line you draw yourself.
Hmm, that's totally outside my comfort zone but I think maybe Jamovi can be a (partial) replacement? It's a GUI package built on top of R, but it's point and click like SPSS. There's also JASP which IIUC is similar but the UI is a bit different, maybe more SPSS like.
Not quite. First of all, a p-value tells you that IF there were no effect (just random noise) the probability of getting a sample difference that is larger or as large as the one you observed (0.02) , is 5%. It doesn't directly tell you anything about the probability that there is or is not an effect. This is because when we calculate the p-value we assume that there is NO effect. That's sorta like saying "assuming the grass is green, what is the probability the grass is green?"
Secondly, the 2-tailed p-value itself tells you only that they're different because a difference of 0.02 and -0.02 would give you the same p-value in this case. But you can use your estimates (.10 and .12) to infer the direction.
If you're interested in this more, there's a Bayesian technique called VS-MPR which can give you estimates of what you're looking for: evidence for whether a p-value value of 0.05 means the null is more likely to be true. Here'sa good source on that.
ETA: Also we don't usually Say "95% significance" it would be significance at the 5% significance level. It's an easy slip to make because with confidence intervals we talk about our intervals with the phrase "95% confidence".
I'd recommend JASP. It's free, easy to use, and updates instantly when you change variable combinations. It also generates paper-quality tables without any significant effort. It's pretty slimmed down- you won't have access to anything more advanced than Linear Regression or ANOVA, but it's really slick.
Everyone here is recommending R, RStudio,and R Commander, but honestly the learning curve on programming this stuff might be a bit cumbersome for the social sciences undergraduate who just needs to run a single regression or ANOVA.
>ALSO - I have not used R for biostats but was looking for an option if I do NOT have access to SPSS (only have at the office)
I thought the Data Science courses on Coursera were taught in R? That's how I learned R. I now have no desire to go back and use SPSS.
An open source version of SPSS is PSPP, it is free to download and use. Especially if you're already proficient in writing SPSS code, it's very similar, although if I remember correctly some of the syntax is a little different. But the functionality is all there.
Also, can you come to my school and do a class talk to all the MPH students who complain about having to learn biostats and try and squeak by on minimal effort without learning anything? I kid, I kid, but it's frustrating at times because of how important and applicable it is to us after graduating.
I've been using PSPP on Yosemite. It is an open source copy of SPSS. It does a lot of things that SPSS can do, though you'll have to check if it can do what you need. http://www.gnu.org/software/pspp/
My mac-preferring statistician friend recently directed me to PSPP for mac. It's an open source version of SPSS. Unfortunately, I'm not a stats geek at all, so I have absolutely no clue how to make it sing and dance. But you can't beat the price...
oh that song, yeah its here
In the shipping info;
"Welcome everyone to our online store for shopping. You can enjoy best service and reasonable price here"
Hmmnn......
Also....Why the "dental" domain (probably an expired domain that had some history)...
http://website.informer.com/keltydental.co.uk (site)
http://website.informer.com/Mark+Langston.html (registrant)
Had to sign up to Reddit just for this. A WHOIS lookup look shows that there are currently two domain names registered with the webdrivertorso name, these being:
webdrivertorso.net & webdrivertorso.com
Interestingly, the .com uses fake information: Registrant Name: LEAST IMPORTANT1 Registrant Organization: Registrant Street: 11 EASTON STREET Registrant City: ALLSTON Registrant State/Province: MA Registrant Postal Code: 02134 Registrant Country: US Registrant Phone: +1.6178160917 Registrant Phone Ext: Registrant Fax: +1.5555555555 Registrant Fax Ext:
Whereas the .net doesn't have domain privacy applied: Domain Name: WEBDRIVERTORSO.NET Registrar URL: http://www.godaddy.com Registrant Name: John Zephjo Registrant Organization: Name Server: NS3.A2HOSTING.COM Name Server: NS4.A2HOSTING.COM DNSSEC: unsigned
I then found the following belonging to a John Zephjo: http://website.informer.com/John+Zephjo+GE6+Technologies.html
I concede it might not be relevant and this could be someone cashing in, though it's worth taking a look at.
And even if he doesn't have the requisite access to change the default charset in httpd.conf or wherever, he should at least be able to fix this by converting his HTML files UTF-8.
(I'm only thinking that it may be the case that he doesn't have that access because there are other sites hosted on the same box: http://website.informer.com/152.19.134.41 But seriously, what self-respecting hacker would not have that access?)
Actually, on Reddit you are far less likely to meet someone under 18 than other places on the web. Demographics/Index. Notice the "under 18" index is really low. There is a larger concentration of college-educated people here than other places on the web.
Oh, you're almost there! Maybe I can help your reasoning along with some refined logic.
What demographic is both aware of YouTube's existence and overwhelmingly uses the internet?
What demographic has the time and wherewithal to view videos on YouTube?
If you're not interested in exercising a scientific thought process, you can always just look up the statistics here.
And if you continue to rebut the facts, the 18-24 demographic is still composed of children. The prefrontal cortex does not complete maturation until beyond age 25. Here.
YouTube's top-tier shit-comedy is geared toward children. That is a fact.
>I'm willing to bet that there are at least five men for every one woman on OKC.
You're on! I'm willing to put every penny I own into this bet. Quantcast numbers Please let me know when I can collect, I accept cash only, unmarked bills preferred, marked bills acceptable.
Hold it! I can do the math for this!
Ok, so we start out with 100% of reddit, or 1.00
From Quantcast, we can determine that about 43% of reddit is female, or .43, leaving about .57 available males. According to Wikipedia, The gays make up about 4% (.04) of the population, or, put another way, good God-fearing heterosexuals account for 96% (.96)
.57 * .96 = .547
Now, as for those who are interested in Pokemon, lets assume it's 99% of the population because, lets face it, everyone loves poke'mon
.547 * .99 = .541 or 54.1%
Last of all, those available redditors who are good at math, I really have no idea. I was going to do this last part wrong as a joke about me being bad at math, but I have lost all interest as I am want to do. I'm... I'm so sorry.
Inferred, in the great tradition of western reasoning. Visitors are 68/32 male/female according to Google Analytics; commentators are likely much more male, however (for reasons similar to wikipedia editors being overwhelmingly male, despite the fact that visitors are almost 50/50).
Statistics generally support your demographic theory http://www.quantcast.com/reddit.com
As for the active vs passive ... that kinda makes sense too. A woman can typically ally get dozens of messages a week. So "profile critique requests" (PCR) wouldn't be needed. I do see many women in the comments of PCRs so maybe this subreddit is more 55/45 than we think, but with the majority of the links being from men.
Some basic demographics for Reddit.
In short, 67% of traffic is 18 to 49 years old, and 59% makes over $60k per year. In addition, 41% have attended college and an additional 15% have attended grad school.
But ad hominem attacks are easier.
Gaia may have been bigger at a time, but it has shrunk dramatically while at the same time 4chan has continued to grow. Pulling my data from public directly measured quantcast numbers: http://www.quantcast.com/4chan.org vs http://www.quantcast.com/gaiaonline.com
http://www.quantcast.com/minecraftforum.net http://www.quantcast.com/minecraftwiki.net
I count them as one site, it's a stupid habit, but combined they do ~70m a month, for December it was just shy of 69m, 68.8 I think. Switch to "page views" and then "monthly" (we use quantcast on the sites, it's not guesstimates, this is what we see via google analytics etc)
also, vnstat: http://humdi.net/vnstat/
$ vnstat Database updated: Mon Apr 16 23:30:11 2012
eth0 since 02/01/12
rx: 3.54 TiB tx: 21.11 TiB total: 24.65 TiB
monthly rx | tx | total | avg. rate ------------------------+-------------+-------------+--------------- Mar '12 1.78 TiB | 9.99 TiB | 11.77 TiB | 37.74 Mbit/s Apr '12 1.14 TiB | 7.07 TiB | 8.21 TiB | 51.06 Mbit/s ------------------------+-------------+-------------+--------------- estimated 2.14 TiB | 13.27 TiB | 15.41 TiB |
daily rx | tx | total | avg. rate ------------------------+-------------+-------------+--------------- yesterday 32.98 GiB | 399.86 GiB | 432.84 GiB | 42.02 Mbit/s today 19.55 GiB | 274.59 GiB | 294.13 GiB | 29.16 Mbit/s ------------------------+-------------+-------------+--------------- estimated 19.96 GiB | 280.43 GiB | 300.39 GiB |
edit: Things I like about vnstat,
I don't use Shaw, as they don't provide internet access in my area. But I did grow concerned about things like this a few months ago. Not necessarily no usage monitoring, but I had heard horror stories about certain large providers who didn't even have the capability to accurately monitor these things, yet still charged for nonexistent overages.
I already had a Linux box with multiple NICs acting as a router, so I threw vnstat on there. It seems to work fairly well, and I trust it far more than whatever numbers my ISP might throw at me.
Ok, here's the thing: it's really made as a two tier solution with one part being a point-and-click plugin to RKWard (native linux but can be made to run on windows) and the other part the R package which does the actual work. Only the R package is availabe right now and it's not so well documented. Also input sanitisation is done in the gui, which makes the R-functions somewhat unfriendly.
An early version of the package can be downloaded here.
Minimally it should be enough to just do:
cphsoc_convert_from_spss(<path to spss file>)
This should produce a csv-file, a do-file (for import into stata), an .RData-file (the dataset in R format) and a html table of variable names and labels in the same folder and with the same name as the original file.
The parameters
out_dirname out_filename
can be used to direct output to another folder/another name than the input file.
The encoding parameters
source_enc target_enc
may be relevant when in a non ascii locale as this can trigger all kinds of oddities.
If you can model the numbers appropriately, then you could try http://anydice.com
If you're not too bad at programming, then you could use R to model interactions to test for various results.
Yes I've created a Google Form based around the programme described in The Rock Climbers Training Manual and the training diary that comes with that.
Its simple and easy to use on the mobile phone or desktop/laptop and all data gets entered into a Google Sheet. This offers a huge advantage over recording it in a calendar application in that all of the data is in one place, it would be a pain for me to have to go through a calendar and extract all the entries and record them in something I can then summarise, which leads me to...
When I've time I'll be sitting down and writing R code to summarise and display the data and intend to eventually turn it into a web-page that I can navigate using Shiny.
I intend to develop and share my code on Github and will post details when I've found time to actually get things up and running (haven't had time in the past year due to work and wanting to actually climb but have accrued a fair amount of data to work with).
You can do this stuff for free in R, it's open source, but the interface is a lot more like Linux than windows.
Then there is SPSS is made by IBM, its more like Windows - not as customizable, but works more easily and has a nicer GUI. Costs as much the down payment on a car though.
I've added a comment on here with some details on how to follow and get notified once I have the schedule in place.
Data science and analytics is essentially the science of interpreting what data means so as to make intelligent decisions from that data. If we have very large datasets and interpret it correctly then there is information to be gained from this. An example of where this is used is in the stock market. Traders often times employ data scientists to examine data related to stocks that may tie to things like customer sentiment to see if that is valid indicator of price direction. This could then be used to influence a buy/sell decision.
It largely employs programming that integrates very closely with statistics. The information is then often presented in a visual manner through various types of graphs, charts, and other visualizations that are intended to make it more intuitive what the data means.
The R language is very popular for this type of programming (http://www.r-project.org/) due to it's very close relationship with statistics and the fact that it is highly optimized for operating on lists, vectors, and matrices which all come heavily in to play with this type of analysis. R also has great support for various visualizations. d3 is a JavaScript library that is extremely powerful and flexible for providing visualizations as well. A common use case would be to do the data analysis and processing with R and possibly visualizations there as well, but at a certain point if you need access to certain real time or extremely custom/complicated visualizations to integrate with d3 for this aspect of things.
Disclaimer: This will probably not solve your problem.
Analysts in our company make heavy use of the R language and also of Julia language.
Maybe you could also take a look into these technologies, as their capabillities are truely awesome.
Julia: http://julialang.org/
Has she heard of a statistics program called R? It's entirely designed for statistical calculation, really fast with things like getting all instances from an array (T = tweets.matrix[1:20,]). Download that and RStudio and give it a try, might be a lot faster and more appropriate than excel.
It's pure preference, but I figured I'd mention that the R language (get the RStudio IDE with it too) is another great tool related to (and sometimes used as an alternative to) MatLab/Octave for those new programmers who love math/statistics. It depends on what you are doing, but it is worth at least checking out!