I think R might be a better fit for what you want to do. As a language it's weird (based partially on Scheme, but with objects and a lot of annoying special cases thrown in over the years), but it has MUCH better support for exploring data, visualizing it, and doing statistics than other languages. You might also find the RStudio IDE helpful.
Hi, I'm a PhD student in CMU Statistics. I can answer general questions about the program and curriculum, or at least point you at people who can.
Linear algebra is part of the required curriculum, as well as multivariate calculus. You'd only need linear algebra your second or third year when you take regression (36-401) and advanced data analysis (36-402), so learning it now may not be best, like /u/trijazzguy says.
Programming is definitely a good idea, though. Python is good if you take the time to learn the data packages (Numpy, Pandas, Matplotlib, etc.), but most of your courses will use R. But honestly it doesn't matter which language you learn, as long as you learn something you find interesting so you get practice thinking like a programmer. Find a little project you're interested in and write some code for it.
Also, take a look at our new majors. You can just do statistics, or you can combine it with economics, machine learning, or math. (I strongly recommend doing mathematical statistics if you're ever interested in going to graduate school or doing stats research -- the math preparation is essential.)
I'm not sure what else you can do to prepare. The CMU program is very good. Many undergraduates decide this means they need to take as many classes as possible every semester, so they spend all their waking hours doing homework and begging for extensions. Don't do that. Try to relax a bit and pick your courses strategically.
If you're serious about learning and using statistics, I'd suggest using <strong>R</strong>. It's free and used widely across college campuses and in the workplace. I did a Stats major at a top 10 stats program and we used R; in addition, my SO does analytics/quant work at a top 4 bank and R is used in some groups there as well.
I haven't used Mathematica, but if you want to be strong at stats (I'd guess this applies to Computational Math as well), you'll need to be able to program. R is most powerful if you learn the R programming language.
With that said, for just some basic regressions, Excel is much easier to learn and use.
R.
Better than a lot of commercial software by many criteria, though it does involve some investment to learn.
By default, it is command-line driven, but I think it's worth learning to use it that way.
There are many, many resources available.
Extremely customizable, lots of packages for different types of graphs. There's definitely a learning curve, but it's fantastic for graphics (and statistics). You might want to check out the ggplot2 package.
Edit: forgot to mention that R is free and open source since price is a factor for OP
I use VMware extensively and can highly recommend it.
However, are you tied to OriginLab for any reason? If you are, ignore this :P but if not...
I know nothing about OriginLab beyond their landing page but from my brief glance it looks like R may be able to do what you want. I'm an Econ/Finance undergrad and I use R like crazy for analysis. It's an awesome open-source, cross-platform data crunching beast.
Use R. It is a free and open source software for statistical modelling and analysis.
http://nlp.stanford.edu/manning/courses/ling289/logistic.pdf will tell you how to do a logistic regression appropriately in R.
I don't do stats, I work in computational science. Most of the specialist software I use was designed primarily on Linux, so compatibility isn't an issue.
But for data analysis, I use Python (and all the lovely packages that come with it) and occasionally Mathematica or MATLAB, both of which run natively. I know a lot of people who do more in-depth stats stuff use R and I have played with it some, but I really don't use it much. However, it is pretty powerful.
For large datasets (which I deal with a lot actually), I usually use HDF5 and the related tools. You can probably use a SQL database for some of the other things you mention, I don't really know as that's not my thing.
In any case, if I do run into something that is Windows only, I just run it in a virtual machine. My workstation is pretty beefy (48 cores worth of Xeon and 512GB of memory), so performance is a non-issue.
R is an excellent and easy platform to start with.
Hopefully you have OSX, because the OSX client is years ahead of the windows one. If not, it still is a good UI.
Start off with the quantmod library. It will get you free data and provides all types of statistical tools to start with.
Everyone has their own style and sees data differently, work with it long enough and you will start to develop your own strategies and algos.
Best of luck!
Are you familiar with any programming languages? If not, then I suggest you start with R. Pretty much any tool you'll ever need for differential gene expression analyses will be implemented in R thanks to bioconductor. I'd also recommend you use Rstudio, which is a very convenient interface for R.
Most of the analyses you described can be done in base R, though some packages that you might find useful are biomaRt (for converting ensembl gene IDs to common gene names) and PIANO (for gene set enrichment, though you need to provide the gene sets, which you can get from MSigDB).
This really depends on a number of factors, as there are a different software's available, depending on your price range.
If you're solely looking at statistics, I would suggest R as an introductory software. It is free and not so hard to pick up, as it is simple programming for statistical analysis.
Finally, you should also check out r/gis, the GIS subreddit.
If you're into more hands-on programming, the preferred alternative is R.
They have extensive documentation but I don't think its open source, but it is extremely collaborative and extensible through packages.
Check it out here: http://www.r-project.org/
RStudio is a common IDE for the language.
I have little experience myself, but R is a statistics language written by accountants which lets you graph and calculate with data easily. It has a slightly wonky syntax for traditional programming standards though, still, a popular pick. It lets you pull statistics off the web quite easily as well.
(1) for numerical computing: Octave or, better yet, R. R is a statistical package which has lots of numerical functions plus statistical stuff. R is very good for handling observational data of all kinds. http://www.r-project.org
(2) for symbolic computing: Maxima. Idiosyncratic but covers a lot of ground. Incidentally Maxima supplies the symbolic computation capability for Sage. http://maxima.sourceforge.net
The software packages mentioned above are all free software. Octave is comparable to Matlab, R is comparable to Splus, and Maxima is comparable to Mathematica.
I guess it depends what you mean by easy, because I've found R to be very powerful, and not too hard to use. It's free and runs on pretty much any OS. It has a command line interface, but I assume since you're reading /r/compsci that won't scare you.
R is an open source statistic analysis package: http://www.r-project.org/
Not sure what EXEL is, but if you mean Excel, here's a page on importing data from it: http://cran.r-project.org/doc/manuals/R-data.html
R is a programming language developed by and for people doing statistics (ie not-really-programmers) and has some very useful packages for calculating statistics, rolling dice, simulating decks of cards etc. It's not trivial to learn though.
I usually end up using python that is a scripting/programming language that is one of the easiest to learn and very powerful for making quick scripts that simulate some part of a game and just do that a few million times to you get a feel for if some mechanics work or not. There are some tutorials for non-programmers to learn python out there, but I don't know which one is the best. Anyway writing a few lines of python is in my experience always superior to the old method I used, which was typing in lots of cells in Excel.
It's great that you want to learn all this stuff, and I definitely don't want to discourage your ambitions, but it takes a lot of time. You might want to learn some software package that can do these statistics for you and just learn what it means without going to the absolute fundamentals. A quick google search found this tutorial on how to do PCA in the free R (http://www.r-project.org/) http://strata.uga.edu/software/pdf/pcaTutorial.pdf . You are basically picking up a new language, so just google everything all the time.
I have to recommend Paul's Online Math Notes for algebra, calc, difeq, and linear algebra. Great set of notes, great explanation. Full problems solved, just don't cheat and look at the answers without struggling for some time ;)
Snarkiness aside, taking what you're talking about to the furthest degree is exactly what a database (RDBMS) actually is. People had the same idea 50 years ago and have been building and refining such solutions since then.
The reality is that instead of taking a perfectly good database and sucking it into Excel for its analytic capabilities, you should be learning how to use any of the DB-specific analytic/reporting tools (SSRS, SSAS for SQL Server etc.) or a DB-agnostic tool like Pandas or R.
The reason you're getting a snarky comment is because to those who already know databases and their capabilities your question amounts to "I drive my Honda every day and know it really well so what should I add to it to make it a do a 5-second quarter mile?". Drag cars are custom made for a particular purpose and if you put an enormous amount of time and money into your Honda you will eventually replace every part until you have a dragster. Just start with a dragster instead, they're free and are exactly the tool you're after.
R
Pandas
MS SSAS
I'm biased because I do a lot of stats stuff, but R is a very different language than any of those. Thinking in terms of vectorized functions rather than loops - which you need to do to be a good R programmer - makes for a significant, challenging change of pace. It also has packages that let you integrate it with C++, Python, and Java, among other languages.
May be overkill, but take a look at RStudio for the R statistical programming language. Fully functional, professional, open-source, statistics IDE. Cannot recommend enough.
then do something like:
x = rnorm(10, mean=2, sd=0.5)
This will generate:
1.75884164412553, 1.96923295305206, 2.02906575504054, 2.84513526976282, 2.2049150744444, 1.73318266409414, 1.62611322640113, 1.2014866750171, 2.0473842615968, 1.92262243708622
I'm not OP but I am pretty sure he is referring to R, the statistical computing software. You might be familiar with similar programs like SAS, SPSS, or perhaps Minitab since I think they are used often in education.
I'm a bit surprised that he chose R to mention specifically for finance... I worked in M&A for a time, and never had to move beyond Excel and Access. Both programs were VITAL in almost every job I've ever had, from real estate to finance.
You can do relatively simple statistical analysis in SQL. It's probably more common to store the data in a SQL database, and use SQL to select data for processing by statistical software like R.
You can dive into R, which is an extremely powerful statistical programming tool, which is available for free and due to all the available libraries very flexible. The professional statistic packages, like SPSS or S have a rather steep upfront cost, but are more convenient to use...
Linear algebra plays a huge part in graphics programming, and also now in web development in the form of CSS transformations (a lot of this should look familiar starting about halfway down the page: http://franklinta.com/2014/09/08/computing-css-matrix3d-transforms/). Developing a physics engine for a video game and/or CGI sequence is also an exercise in linear algebraic heavy lifting, where you need to be able to simulate realistic movement by generating systems of equations to solve for transfer of energy in collisions, friction, drag, fluid dynamics, etc.
Also, when you get into handling large amounts of data, programming languages like R lean pretty heavily on linear algebra to analyze and present that data in a meaningful way.
The company I work for uses COBAL for its Billing System. It was antiquated 15 years ago with no sign of replacement on the horizon. There are many industries that can't seem to shake COBAL. Becoming proficient in it may help you.
If you're looking to be a Business Analyst, maybe the language you need to know is VBA. The BA's at my company work in Excel, Access, Power Point, and Visio to create project documents which support the development team. Our other analyst also benefit from VBA training since this helps them automate their reporting. VBA is easy to learn with a veritable cornucopia of resources online to guide you.
Analysts can also find use from R: a language specifically designed for statistical reporting: http://www.r-project.org/
I find most non-programmers (as in, people whose primary role is not development) learn a language out of necessity. You may focus your attention on Java, only to begin working for a company who wants you to code in C#. Experience in one language will help you in another--and always looks good on a resume.
In the end, I wouldn't recommend learning COBAL. There are many more modern languages you can learn which will help you achieve your goals.
It might not be 100% corresponding to your situation, but this document on R and FDA compliance is where you want to look, and most likely reference in your business proposal. It's the official position of the R Core team on how to handle the open-source aspect of R in a regulated environment.
Bioinformatics? Computers? Biology? I study plants and ecology but this is the foundation of everything I needed:
Python. Learn it. here is a link to a free EDX public MIT course on python. Summer class starts on June 11th. It's how I learned when I realized there was cool shit I wanted to do with GIS but didn't know what I was doing.
R an S. It's free, open source, and does so much more than excel. It also is a bridge between GIS and bioinformatics. Learn them. I am currently looking for a book my professor wrote on studying bioinformatics using R... it's very comprehensive and a great reference tool.
ArcGIS. Become familiar with this. It's mapping software for ecology, but I've been working on stuff that uses it to look at speciation and plant physiology very quickly. MIT just released drones that basically use GIS software modules to interpret HIGH detail/information lidar images in order to do gross spectrographic analysis of crops. Very very cool.
A student license of ArcGIS software can be purchased via Amazon for 50 bucks with a manual on how it works. There are many levels of manuals... ones for python, ones for getting to know the desktop modules... you can grab any of the texts through your university license the key is not paying 1600 for a professional license. Or you can put on your pirate hat. I don't recommend the pirate hat, the connection to ARCGIS online is really nice.
Additionally, MIT also posts their bio courses:
Genetics is... interesting. You're going to have to study genes in-depth and understand a little chemistry to really get it. Hopefully someone with more background will post on where you can learn about that.
I haven't seen too much in-depth information about NBA analytics, but I've seen some that use R.
You can learn about it here: http://www.r-project.org/ And here: http://cran.r-project.org/doc/manuals/R-intro.html
Stats 141: Statistical Computing taught by Duncan Temple Lang
of R fame: http://www.r-project.org/contributors.html
There is a lot going on in the davis stats program and the computational stats program definitely offers the skills to get a job right out of school. You should talk to counselors.
As an aside, you can be setup in R in 10 minutes:
You now have the same stack as some of the most advanced quantitative researchers in the world.
Thanks. I'm sure you've seen the Wiki page already?
http://en.wikipedia.org/wiki/Principal_component_analysis
Good links there. Not sure you need to go as deep as the Wiki page, but here goes. Remember, this is going to be rickety.
On the "math" side:
First, know single variable calc like the back of your hand. Learn multivariable calc. and study the important theorems there. But depending on what you're doing, you may not need too much theory (Real Analysis, etc.). Then you need linear algebra (but not too much, per se, just matrices stuff, rank-nullity, jordan decomposition, etc., so you have the tools to manipulate matrices enough to make them do what you want). Then you need matrix calculus. Because matrices are in general non-commutative, the formalism will be tedious at first but you'll get the hang of it. Remember: do lots of exercises. You need to basically be good at taking first and second derivatives with matrix variables. Also, use computers to your advantage for actual calculations. Use an open source environment such as R. http://www.r-project.org/
On the "statistics" side:
Probability theory (continuous variables and distributions, univariate and multivariate), expectation/variance/covariance/Bayesian analysis/maximum likelihood estimators, etc. You'll need the multivariable calc. here, so learn that first.
Learn basic Econometric concepts such as regressions/regression analysis, which will use all of the above concepts.
Overall, just do some initial fact finding with the help of Wiki and Amazon to compile a list of recommended resources/reading. You'll have to come up with the level of formalism you're going for based on what your goals are for PCA. E.g., re-learning calc with Spivak may be the "best" way to go, however, you'd be drinking from a fire-hose.
Also, if you're taking a linear algebra class that requires MatLab, doesn't the institution with whom you're taking the class provide free access via an institutional license to MatLab and other fee-based programs such as STATA and MATHEMATICA?
Lots of folks seem to like R. However, I've never used it--I tend to stick with Matlab for most things...
EDIT: Link is here: http://www.r-project.org/ and R replaces only a subset of Matlab functionality (a much smaller subset than Octave), as I understand it.
> the curred function F x ... how much code that would have taken you in a non-FP language.
How much code would that take you in Haskell without currying? Eight more characters, for the anonymous function? This is about the weight of the burden in any language with closures but not automatic currying.
> [your] strict CS arguments
You seem a bit narrowly educated. Here: you said that currying was crucial; I've pointed out that it isn't. You can go off and be enlightened, while still preferring the whole package of a particular FP language that leans on currying. Although, here's something to test that preference.
R, and ggplot2. Excellent quality graphics, with all the statistical backing an engineer will ever need.
R is a statistical-analysis language, an implementation of the 'S' language from Bell Labs.
http://www.r-project.org/
ggplot2 is a graphics package for R. http://had.co.nz/ggplot2/
Consider R. Once you get rolling you'll be ten times more productive than in anything else. Most statistical research and much machine learning research is published in R. You get vector and matrix math and publication quality graphics right out of the box. It's not hard to run in parallel and it's free.
The inbuilt plotting stuff in R looks like it'd be just what you want. Personally I use matplotlib via SAGE though. God knows what good it would be to get the slope of a line you draw yourself.
Several of us learned it by working through exercises from Paul Graham's On Lisp. That book wasn't available as a free download back then. Peter Siebel's Practical Common Lisp is also a good tutorial.
The real challenge in Lisp programming is finding an implementation that you can live with. Most Lisp die-hards refuse to recognize that Lisp machines are dead and Unix won. Anything involving system I/O remains a non-ANSI kludge that's not portable between implementations. This was nearly ten years ago, mind you, so things may have changed.
I ended up using R as my Lisp. It's not Lisp, but it does let you carry over the concepts. YMMV.
If you can model the numbers appropriately, then you could try http://anydice.com
If you're not too bad at programming, then you could use R to model interactions to test for various results.
Yes I've created a Google Form based around the programme described in The Rock Climbers Training Manual and the training diary that comes with that.
Its simple and easy to use on the mobile phone or desktop/laptop and all data gets entered into a Google Sheet. This offers a huge advantage over recording it in a calendar application in that all of the data is in one place, it would be a pain for me to have to go through a calendar and extract all the entries and record them in something I can then summarise, which leads me to...
When I've time I'll be sitting down and writing R code to summarise and display the data and intend to eventually turn it into a web-page that I can navigate using Shiny.
I intend to develop and share my code on Github and will post details when I've found time to actually get things up and running (haven't had time in the past year due to work and wanting to actually climb but have accrued a fair amount of data to work with).
You can do this stuff for free in R, it's open source, but the interface is a lot more like Linux than windows.
Then there is SPSS is made by IBM, its more like Windows - not as customizable, but works more easily and has a nicer GUI. Costs as much the down payment on a car though.
I've added a comment on here with some details on how to follow and get notified once I have the schedule in place.
Data science and analytics is essentially the science of interpreting what data means so as to make intelligent decisions from that data. If we have very large datasets and interpret it correctly then there is information to be gained from this. An example of where this is used is in the stock market. Traders often times employ data scientists to examine data related to stocks that may tie to things like customer sentiment to see if that is valid indicator of price direction. This could then be used to influence a buy/sell decision.
It largely employs programming that integrates very closely with statistics. The information is then often presented in a visual manner through various types of graphs, charts, and other visualizations that are intended to make it more intuitive what the data means.
The R language is very popular for this type of programming (http://www.r-project.org/) due to it's very close relationship with statistics and the fact that it is highly optimized for operating on lists, vectors, and matrices which all come heavily in to play with this type of analysis. R also has great support for various visualizations. d3 is a JavaScript library that is extremely powerful and flexible for providing visualizations as well. A common use case would be to do the data analysis and processing with R and possibly visualizations there as well, but at a certain point if you need access to certain real time or extremely custom/complicated visualizations to integrate with d3 for this aspect of things.
Disclaimer: This will probably not solve your problem.
Analysts in our company make heavy use of the R language and also of Julia language.
Maybe you could also take a look into these technologies, as their capabillities are truely awesome.
Julia: http://julialang.org/
Has she heard of a statistics program called R? It's entirely designed for statistical calculation, really fast with things like getting all instances from an array (T = tweets.matrix[1:20,]). Download that and RStudio and give it a try, might be a lot faster and more appropriate than excel.
I dont know anything about AnyDice, but this is very simple to do in R (statistical programming language, http://www.r-project.org/, there are web interfaces: http://pbil.univ-lyon1.fr/Rweb/)
For an R code example, see http://pastebin.com/ZRp4X7f4
It's pure preference, but I figured I'd mention that the R language (get the RStudio IDE with it too) is another great tool related to (and sometimes used as an alternative to) MatLab/Octave for those new programmers who love math/statistics. It depends on what you are doing, but it is worth at least checking out!
This is where you can download the program:
And here are some tips on getting started/tutorials/resources:
http://scs.math.yorku.ca/index.php/R:_Getting_started_with_R
R is a powerfull statistics tool widely used in scientific research.
I'm surprised nobody told you yet to use R as statistical software. Its been three hours since you wrote that.
Having said that BH^2 does not assume any calculations beyond a simple calculator
Si', e' pesantissima ironia. E non capisco i 6 upvote, ma tant'e'.
I fogli elettronici e le macro in Vbasic ti possono far diventare la star dell'ufficetto del geometra, a 40 euro al giorno; ma piu' di li' non vai.
Per imparare excel puoi metterci qualche settimanetta molto part-time; e' facile e per questo non ti da' certo un gran vantaggio sul mercato del lavoro. E' uno strumento utile, ma - cosi' come Word e Powerpoint - non pensare che faccia una gran differenza.
Se ti laurei in Economia, a seconda di quello che vai a fare ti serviranno altri strumenti (tutti quanti abbordabili): gestionali 'personalizzati' per la contabilita', Stata o R per calcoli e modelli, basi di dati di vario tipo da cui tiri fuori i tuoi report o con interfacce grafiche o con SQL, e cosi' via.
Since you have Mathematica and GNU Plot, I can only think of R as a possible addition but I don't know how well the Pi Handles it with the computing power it packs.
In the pics I see the bitscope micro I think, what have been your experience with it?
> I could swear I've seen STATA™ written somewhere
Many people regularly mis-type "STATA" in forums and mailing lists so I wouldn't be surprised if you've seen it elsewhere.
> For other programs, based on acronym defn: Matlab? r?
I don't know the origin of the name Matlab, my guess would be that its a portmanteau of Maths (itself an abbreviation of Mathematics) and Lab (also an abbreviation for Laboratory).
With regards to R its name/origin stems from the authors initials and a pun on the Bell Labs 'S' which is a language proposed by John Chambers. See the R FAQ entry.
You could do a generalized linear model. It is similar to linear regression except the dependent variable is not normal (i.e. binomial, poisson). In your case, you have a binomial dependent variable (1=presence of MI, 0=absence of MI). This is a simple calculation you can do in R (an open source statistical software). I can provide code if you are unfamiliar with R or I can look up how you can do it by hand.
While it is probably overkill for dealing with t-tests and the like, R is free and all-powerful. The online calculator provided by redvelvetpoptart is fine, but you may find yourself offline, in which case it's nice to have something handy. Additionally, t statistics can be calculated fairly easily in Excel using TDIST() and TINV().
As a practical matter, a conservative approach is to take the next-lowest degrees of freedom in the table (e.g., if the jump is from 80 to 90, use 80 in this case). This will result in a very slightly biased estimate of the p-value that will bias the test toward a failure to reject the null, but usually the degree of bias is trivially small (and if it weren't, the df table would probably include a closer intermediate value).
Yeah, you could sorta do this in Excel. You kind of need the data table though, but it can be "hidden" so it's not visible to users. If you're interested I could describe one way that would be sorta work.
But really- this is a tough thing for Excel, and a very easy thing for other software. Consider "R" R-Project for statistical computing. Also, Mathematica, or Matlab. There are also a number of much smaller and more basic plotting utilities out there for free.
Nice! That might be worth a new post, IMO (maybe right after Selection Sunday?).
I basically took KenPom's raw data sheet, deleted everything but the team name, AdjO and AdjD, and dumped it into R. Here's the script I ended up with, though someone who uses R more frequently than me could definitely improve on it:
mydata=read.csv("kpteamstop100.csv") plot(mydata$AdjOE,mydata$AdjDE,xlim=rev(range(mydata$AdjOE)),xlab="Offensive",ylab="Defensive",pch=".") text(mydata$AdjOE,mydata$AdjDE,labels=mydata$TeamName,cex=.75)
No, just R and a few related libraries. R runs on Windows, Mac, or Linux... if you don't have much coding experience, it might be difficult to use this code, but a lot of people have messaged me asking for a web app version... so, I'll see what can be done :)
Sounds like a sensible plan; I would definitely put that in your personal statement.
If you haven't yet, I'd start getting acquainted with R and Bioconductor now, as they can have pretty steep learning curves.
It looks like OP can read these free slides and take a look at the R scripts that come with the fda package.
I feel obligated to mention that R is a great free option for statistical analysis. It's very flexible, and you can usually do a quick google search and/or read documentation to figure out how to do something. It may take a little while to get familiar with it, but it's a great tool to have available.
a Pivot should satisfy #1, maybe a few pivots, depending.
Just based on A17 though, if you know Pivot and VLOOKUP, maybe you should look into R, it's possible it's easier to answer these using that.
You'd also want clarification on "accuracy", for OfferID#3, RawData:Offcat2 is "Hair_Services/Haircut", but True:Offcat3 is "Hair_Services/Haircut", match or not?
I'm also astounded they gave you the file in advance...
My SO and I are similarly nerdy. He really likes graphs, which is part of the reason he convinced me to get a scale. We're tracking weight and measurements, but we only have about three data points for measurements, so we don't bother to chart it. This is our graph over the for the last three months: link
At some point, we realized the cat was getting a bit pudgy, so we're tracking her weight, too (multiplied by 10). The confidence interval on her weight is still huge, because we don't have many data points.
The graphs were made with R. We like watching the trendlines change ever-so-slightly in the morning.
Sounds interesting, and I'm sure some people would like to follow up further, but the linked blog doesn't seem to make clear basic things like:
This date is not significant, is it?
> Sunday, April 1, 2012 > More Examples of Vilno Table
Is it some kind of April fool's joke that I'm not getting ?
> ... the Vilno Table white paper ... ... > ... Vilno Table Programming Language white paper ...
Where can that be found ?
> Tuesday, April 12, 2011 > Produce Statistical Tables Really Fast > So what is this new product? > > It's an implementation of a new statistical programming > language, now up and running in beta mode
Where can the implementation be found ? If it was beta last year, how is it now ?
> A Statistical Table Programming Language, much faster to use than SAS
The R programming language does that.
I'd be curious to know more. Apologies if I'm missing something obvious.
I'm not sure what field your in, but I'm thinking that your SOL if you want to analyze larger data sets. I've never used datascape, but you're probably going to want a stats program. R is free and you can pretty much find any paper online that tells you how to look at your data R's website here. I know it's really not what you're looking for, but, at least in my experience, if you want to visualize large datasets, you have to put some effort into it.
I really hope someone has a better solution for you because it would help me out as well at times. Good luck!
What? No...
I imagine you can use it just fine to do some aspects of biological research but "R is a free software environment for statistical computing and graphics." http://www.r-project.org/
Here's a book: http://www.amazon.com/Data-Analysis-Graphics-Using-Example-based/dp/0521861160
From reviews of previous edition: "The text includes a wealth of practical examples, drawn from a variety of practical applications which should be easily understood by the reader. The methods demonstrated are suitable for use in areas such as biology, social science, medicine and engineering. The core of the book is taken up with detailed discussion of regression methods which leads onto more advanced statistical concepts." ISI Short Book Reviews
I'm a Bioinformatics student, where I took a statistics course. They taught the programming language R there (see http://www.r-project.org), which is especially designed for statistics and plotting. The whole simulation is only 5 lines of code in that language.
edit: posted the code, but the formatting was screwed up, making it pretty much useless, so I removed it again.
Since you're asking for programs, I think you probably want something along the lines of Matlab or Mathematica. If you want free, there are (at least) two good open source packages:
Either of these will do anything you could possibly want with 3D data. In octave for example, polyfit is a function that returns a polynomial to fit N-dimensional data (using least-squares regression).
Excel is actually pretty limited in the amount of statistical analysis and model building you can do with it. Most of the really interesting parts are basically hard coded, so you'd have to code your own package to do something customized. You might try the open source statistics language R for finding the statistical significance of the trend and/or population weighting.
Some people have built some add-ons for Excel that allow you to use R scripts in Excel, but I haven't used them.
As a rule, I wouldn't recommend R to a raw beginner: the language has a lot of quirks that heighten the learning curve, even for someone with some programming background. But. There is a package for R called quantmod that has a lot of nice functionality built in. For example, you can download and plot the price history of a stock symbol with just two commands.
Learning R from scratch with no programming background would be a challenge, but it's a good fit for this problem domain.
There are also other statistical packages in use. SAS is high-powered and costs a lot. R -- yes, just the letter R (http://www.r-project.org/) -- is also very high-powered, and is open source, and is free. It's mostly command-line oriented. SOFA (http://sofastatistics.com) is much more basic. It is also open source and free to download, and it has good explanations of the statistical tests it uses.
I have a hunch that one can do well using the much more popular R http://www.r-project.org for the needs of this class. It also has a slick GUI http://rstudio.org which can run on a webserver for multi-client access. R is stats focused instead of matrix focused, but for intro ML I am confident both programs are able.
I guess the official homework is in Octave/Matlab, but since the points don't matter, don't let that hold you back.
Short answer - maybe try this http://tinyurl.com/3om7zos (website with sample code here: http://www.financialriskforecasting.com/book-code ).
Long answer - if you have really no experience with anything beyond Excel, maybe start with R (http://www.r-project.org/) and add the Rmetrics packages. It's not as polished as Matlab, but it's free. If you have success with R, maybe then think about making the investment in Matlab? Just a thought...
Edit: FYI I think you'll also need to buy the Optimization Toolbox - pretty sure the Financial Toolbox requires it.
If you are a researcher, then it probably will pay for you to spend some time getting used to software that does this sort of thing.
Excel can handle simple regressions and anova. It wouldn't hurt to learn how to use excel.
The step is to learn how to use R, which is a free, open source program that many professional statisticians use. There is a steeper learning curve learning how to read your data into the program, but afterwards it is very powerful.
We are strongly encouraged to use General & Generalised Linear Models & to learn R.http://www.socialresearchmethods.net/kb/genlin.php
It takes a bit to get used to, but gives you a lot of flexibility. However, I don't know if you'll need to do anything like this.
R is a software language for statistical analysis, see the R-project. Basically what I could do is write a few lines of code to plot relationships between the different variables, run a linear regression and report what (if any) relationships exist in the data. The reason I would use R and give you the code would be that you could reproduce what I did instead of saying "some dude in his underpants said there was no relationship" :)
I often wonder how http://www.r-project.org/ does.
I imagine he gets a lot of accidental traffic from the /r/ system.
(that's where I go if I just enter r into the url bar and forget to select the /r/ I usually go to from the quick list).
I am going to graduate school currently for statistics, and my general feeling is that the job market is better for statisticians than for most other (non-science) related majors. It's my recommendation that you give R or SAS a try though, if you haven't been exposed to those languages before. They both come in handy for statisticians, so it always makes you more marketable to know them. R is probably the best to start with because you can download it for free and the documentation is great. Link
Probably Coda or BBEdit. I write code for R and I have very limited web development experience, but both of those programs have some cool workflow features.
Textwrangler is nice because it is free, extensible, and full featured enough to do good work (syntax highlighting alone makes it worth it).
I got into CS through the university's computer-science program, which has since been replaced by a brand-new program which I'm in the process of starting up.
I basically did everything myself. I did all of the homework, and the final project was just a matter of applying the techniques I learnt from my homework to the problems we set.
I use Python/Scala/C# for my work, but I'm not sure if there's anything else for me in the field. I'm actually not quite sure about any programming language, but I'd love to use Java for the final project.
I use R, and use a lot of R packages.
I use Microsoft Visual Studio, and use a lot of Visual Studio packages.
You're probably worrying about it too much, but I usually cite the version number just to make it a bit more clear, and then either the URL or a reference depending on the venue.
"All analyses were conducted in R (R version 4.0.1, http://www.r-project.org )"
"All analyses were conducted in R (R version 4.0.1 [1])"
Hope this helps:
They used a well-validated questionnaire of physical activity called the International Physical Activity Questionnaires (IPAQ) to assess subjects' fitness. If you want to see the actual survey questions, it's easily found via Google. Now one point of interest for r/MMA is that the researchers excluded those who train in martial arts. So take that however you will.
They did controlled for bodyweight in their analyses.
Thanks! I appreciate it!
I used Pro Football Reference and MockDraftable for most of my stats and measurables. FFToday was the most useful resource for determining fantasy points/positional rank from previous seasons.
I used Excel to compile my data and I downloaded the Real Statistics add-in to manipulate it. I know that a lot of people like to use R, but I'm not familiar with it at all.
I've done similar predictive models for QB, RB, and TE, but never got around to posting articles for them. I've compiled everything into what I like to call PREDATOR Scores. Here's an intro that I wrote up that may be of some use to you. Apart from that, Kevin Cole from RotoViz has done some similar work that's worth taking a look at as well.
Best of luck!
I'd agree with that, I've not looked at many fitness apps. Pesronally I use Endomondo to log my commutes and hiking thats about it, its not very useful for climbing. I use the Beastmaker App for my fingerboarding but its pretty useless if you use a custom work out.
It is the easiest way of collecting lots of data though.
Personally I've used Google Forms to emulate the Rock Climbers Training Manual to record my own training, its not perfect and I'm a long way off having completed writing any code to present the accumulated data in any sensible format. My intention is to write a Shiny Interface in R and host the code on Github so that anyone can take a copy of my Google form and then use the code to summarise their own training. I just have too little time available with work, 2yr old, wife, a few friends I like to keep in touch with, family, and actually trying to climb myself to sit down and do a great deal. This wouldn't lend itself to mass accumulation of data across people though in the same way as applications would.
The easiest way is to train a support vector machine to perform regression on the points you have.
The result is a function which maps the n-dimensional grid to the approximated function value. (This even works with thousands of dimensions)
With R this could be done with:
library(e1071) model = svm(functionValue~ ., data=yourDataMatrix, type="nu-regression", nu=0)
this call gives the model of the surface.
With the calculated model you can draw points from the surface function:
predict(model, <InterestingDataPoints>)
Each row of the <yourDataMatrix> variable needs to contain the coordinates of the points and its function value. The <InterestingDataPoints> variable is a matrix of all interesting coordinates, again rowise.
If you have noisy data you might set nu to a higher value. It is a upper bound on the % of points which are predicted wrong. Basically a regularization term. edit: Additional Info
Have you considering using R? It's totally free and the academic standard. You can make gorgeous plots too. I'd be happy to help you with code--I write R code all day, every day :)
The website is confusing, but the SCCS database is freely available. You can find it on this page in SPSS .sav or .RData format. If you're not familiar with either of those formats, I'd download the latter and open it in R (a free and open source statistics package). From there you can easily export it to a more widely used format, like a CSV:
write.csv(sccsfac, file="sccsfac.csv")
That will open in Microsoft Excel, for example.
If you're using it for research you should look at the journal <em>World Cultures</em> which covers issues surrounding the SCCS and its parent database the Ethnographic Atlas, and archives various versions of both of them.
I use R, statistical programming language http://www.r-project.org/. And the package I'm using is ggplot2 http://docs.ggplot2.org/current/. The hard part was not making the graphs, but to convert the PDF file to a workable table...took me a couple of hours :(.
The scientists will usually have written the program that runs the model themselves. Frequently the model involves "simply" solving a huge number of complex mathematical equations a huge number of times. Writing a program to do this doesn't require years and years of training as a software developer, anyone with a decent understanding of mathematics should be able to write that kind of program without too much trouble.
There are programming languages and systems designed specifically for this sort of task, a well-known and in-vogue example would be the R language.
Yay, the VBA code worked... which is good... as I know next to nothing about VBA at this point - I had to search the web on how to open the VBA editor :/ Guess I need to learn - which is why I just enrolled in the 'Intro to Excel VBA Programming' course that was promoted on this sub recently ;)
As far as the copy-n-paste... thats effectively what I'm already doing now, just using a programming language / environ outside of Excel (R, as mentioned in my original post) to generate the sequence and then copy-paste into the cell.
I'm a little fuzzy on how to go about removing numbers from that list as the cards get collected... for example, if I have a sequence 1, 2, 3,...9, 10 showing in the cell, and I pick up cards 2-7 and want to remove them from the list in the cell... how?
Like I said above, I know nothing about VBA... but would it be possible to have the formula stored in a cell in a hidden column, and have it write the actual values to the `Card Numbers Outstanding' column/cell, not the actual formula? But then there'd have to be a way to trigger it manually, so it doesn't just write over any updates I made to the numbers in that cell...
Sorry, I'm confused as heck when it comes to VBA and what is and what isn't possible.
Learn R. Where these issues arise both usually work but typically it is the British spelling that is native (because of its NZ authors) and the American that is translated to that under the hood.
This seems to please people.
I'd go with R. It's an open-source commandline stats application. R For GUI, RStudio is great. RStudio Here's a blog post I found on google about doing PCA with R. PCA blog post
You're welcome. For software, R is free and very popular. I've found it more flexible than Stata (if a little less user-friendly).
Interesting to hear about the salaries. I wonder whether the student thinks that's the typical salary for his field...or just thinks he's exceptional. Have a good weekend!
R is the name of both a software package for (not only) statistics and the programming language integrated into it. It's Open Source and free, you can check it out here: http://www.r-project.org/
What exactly you're doing with it and how much of a math background is required depends on the job and will vary heavily.
> I'm planning to dummy code the categorical variables, so is it even necessary to replace missing values? If someone is missing on gender, they will simply have a 0 for male and a 0 for female. Just not sure if that makes theoretical sense.
In my experience this sounds like a strange way of encoding gender as you have two variables (the '0 for male and a 0 for female' bit). Its more normal to have a single value for gender and encode it as a binary variable (I traditionally use 1 for male and 2 for female to reflect the number of X chromosomes carried by each gender, and then NA for missing as is the common method of coding missing data in R).
Anyway thats a digression. If you want to maximise your sample then consider imputation, in particular multiple imputation, it works for categorical and continuous variables.