Python has a group of custom libraries called SciPy that does pretty much everything matlab can do, with the benefit of also being an actual scripting language that doesn't make CS majors want to vomit. Its got comparable performance to matlab as well, and most of the basic expressions are carbon copied. I'd say that matlab is better for things you can do with only a few lines, but Python is my go-to for anything larger than that now.
Python is probably well suited via libraries/library bundles such as SciPy and scikit-learn as well as being pretty easy to become decently competent in. Although interfacing with certain languages like C++ (if the api doesn't expose a common C interface) can be problematic in any language other than it. (EDIT: 'it' being the external language, i.e. C++)
Another approach is to find a scriptable emulator that you like for games you'd like to experiment with and use it's chosen scripting language. That's what SethBling did in the OP video.
Mainly Python for general purpose programming tasks (and more specific things like SciPy and NLTK), but also a little R for some statistics stuff (though you can do most of what you do in R in Python). I'm trying to learn a little Haskell, as well.
Your claim that D has a mature statistical or machine learning algorithm isn't to produce one... but rather to just state that "Dr. Alexandrescu" did his dissertation on machine learning and hence there's gotta be something out there?
I mean fair enough... but usually when someone asks for a mature library to do feature X, you usually reply by providing a library that does feature X, as opposed to deferring indirectly to someone's PhD thesis.
For example, when someone asks for a mature scientific or numeric library on Python, one simply refers them to http://www.numpy.org/ or http://www.scipy.org/ and that's that. It would be unusual if someone replied by posting links to Guido's personal website mentioning whatever research he may or may have not done and which contains absolutely no code or libraries.
At any rate, the dub package repository reinforces my position that no such library exists. They are either not supported on the most recent versions of D, since as I said D does a lot of changes which break existing libraries, or the libraries listed are, by their very own description, in alpha or beta stage, which is not what most people consider to be mature.
You know that website you're using to ask this question. Yeah this one. Also YouTube, Yelp, Pinterest, and a ton of other websites. Also some games. And because of some of the third party tools, it's very popular in data science- you're very likely to be using either Python, R, or Matlab if you go into that field. Pretty much everything except low-level operating system work.
Easiest way is to download a windows installer for one of the many scientific Python distributions. My personal choice is Continuum's Anaconda, but there a few more outlined in this SciPy wiki page.
NumPy is a somewhat heavy dependency. But it's not true that it requires linking against Fortran libraries for efficient linear algebra. To quote the SciPy FAQ:
> One of the design goals of NumPy was to make it buildable without a Fortran compiler, and if you don’t have LAPACK available NumPy will use its own implementation.
Your dot products will be painfully slow if you don't have a linear algebra library like LAPACK installed, but everything does work.
I'd say that python would be a pretty good choice.
It's a nice beginner language with lots of resources around teaching it to novice programmers, but it's also really powerful when it comes to mathematical and linguistic processing.
Take a look at http://www.scipy.org/ and http://www.nltk.org/.
Furthermore, feel free to PM me if you have any questions!
MATLAB is by far the most widely used "programming language" in neuroscience (though it's really a fairly comprehensive software environment for data analysis). I started out working with MATLAB but I've recently switched to using Python, which I use for analysing calcium imaging data. The main advantages to MATLAB are the large numbers of ready-made toolkits and the (generally) good documentation and support. Because it's so widely used in neuroscience there are also lots of people who post useful snippets of code, such as this guy. However, as a programming language I often find it a bit frustrating. Certain things seem needlessly awkward to do - for example you can't index into temporary variables.
Python is, in my opinion, way nicer to program in, and with the pylab package you can do most of the same things MATLAB can do straight out of the box. Depending on what you're trying to do you may find that Python runs a fair bit quicker than MATLAB, and did I mention that it's free? The disadvantages are that because it isn't maintained by a single company the documentation is patchy, and you'll also probably have to get your hands dirty coding if there's some specific bit of functionality you need.
Which one you go for will probably depend mostly on what other people in your lab use - I'd imagine you'll probably start using MATLAB since it's still so much more popular at the moment. It is relatively easy to make the changeover from MATLAB to Python, though, if you do change your mind :-)
You missed this part:
> python modules can be and are coded in C when high order computations need to be done.
This is precisely what numpy/scipy do - offload the computationally heavy stuff to C.
See here for more information about scientific level python than you'd ever care to know
(I don't have first hand experience with the R language, or experience with data science)
This is what I've found before when looking at an article about a chemist using R It's not chemistry specific but it's got good points about data manipulation and the basics of R.
To my knowledge, R is very good for statistics and manipulating data, et c. It certainly seems like there would be some good applications for R in chemistry, especially if you work with a lot of data/statistics.
Python seems to be a good language to learn. SciPy, NumPy, SymPY, et c.
You are using the wrong libraries for data analysis. You should not be using csv to load excel data. It can be loaded in 1-2 lines with pandas
and you should look at the scipy stacks for data analysis http://www.scipy.org/stackspec.html
I didn't mean to give the impression that I was affiliated with SciPy, but it is an open source project and quite receptive to positive contributions from the outside. Things like the kdtree were certainly outside contributions. Kriging is a common technique used in numerous fields, and your work seems general enough that it could easily be used by scientists in many fields. That seems to be their basic criteria.
They use an MIT/BSD style license: http://www.scipy.org/scipylib/license.html
If you see yourself going into a technichal / analysis role later on, consider learning some of the free statistical packages. both R and python are widely used in industry, and will give you a leg up on competition in the job market as well as potentially help you in any programming / computational stats classes you take later on.
Instead of stealing a product that's been developed over 30 years by professors and their students and eventually became a company
https://en.wikipedia.org/wiki/MATLAB#History
why not use octave?
https://www.gnu.org/software/octave/
or scipy?
http://www.scipy.org/
Or buy a student version of matlab if you discover that matlab really does have things that are valuable to you?
You may think I'm being harsh here but I know Cleve Moler and he's no Steve Jobs or Bill Gates or Stephen Wolfram when it comes to aggressive pricing or stuff. He loves academics, how about showing him some love back?
One of the most useful features you'll utilize once you get a handle on Python is the incredible extensive power offered by various libraries available. SciPy is widely used in scientific/technical computing for number crunching tools/techniques commonly used by scientists and engineers. matplotlib likewise is great for graphics and visualization jobs. If you need it, chances are the very diverse Python community has a tool that will be helpful to fellow Python users.
+1 for Pandas.
OP, looks like some people have already mentioned Wes McKinney's book Python for Data Analysis. Definitely worth reading.
Also:
Since I write codes for theoretical physics projects (quantum mechanics mainly), I guess I should give some advice. Most anything I've ever had to code boils down to one of three things which aren't exactly mutually exclusive; matrix-matrix operations, array transformations, and solving sets of coupled differential equations. Find a language where you can do all these things comfortably and efficiently and you'll definitely be off to an excellent start.
I personally prefer C++ for it's overall speed and the fact that so many libraries for these routines exist (see BLAS, LAPACK, Intel's MKL, FFTW...). If you're looking for something easier to get started, I'd suggest python. The Numpy and Scipy libraries contain many functions that are incredibly useful for the sciences.
def calc_fft(time_data, trigger_data): """Given time and measured g's, returns vectors of frequency (x-axis) and amplitude data, after calculating FFT."""
from scipy.fftpack import fft
print("Analyzing data...") # http://glowingpython.blogspot.com/2011/08 # /how-to-plot-frequency-spectrum-with.html
y = list(trigger_data) # make a copy
Fs = 1/time_data[1] # sampling rate (1,000,000 Hz) n = len(y) # length of the signal k = np.arange(n) T = n/Fs frq = k/T # two sides frequency range frq = frq[range(n/2)] # one side frequency range
Y = fft(y)/n # fft computing and normalization Y = Y[range(n/2)]
return (frq, abs(Y))
Blog link is here: http://glowingpython.blogspot.com/2011/08/how-to-plot-frequency-spectrum-with.html And scipy: http://www.scipy.org/
PS: I have no idea if this what you're asking, since your question isn't very clear, but hopefully this helps!
>>> from numpy import matrix as m >>> A = m("1,3;4 0;2 1") # string form (use "," or " " btw columns and ";" btw rows) >>> B = m([[1],[5]]) # nested list form >>> A * B matrix([[16], [ 4], [ 7]])
If you've already started learning Python it will get you a long way. Especially with the numpy and scipy libraries.
R is great for interactive data manipulation and statistical analysis, but it is not pleasant to program in. Worth taking a look at, but it won't help you much with your specific goal.
Learning Linux: get a linux distro and install it. Mint, Arch, Ubuntu, whatever floats your boat. Use it for your everyday computing needs. Don't be afraid to learn a little scripting through the command-line.
Not totally related to hardware, but running SciPy on Linux might help with your genetics work too, I guess. http://www.scipy.org/
Any special reason you're building it instead of downloading a binary?
I advise you to not to build Numpy yourself unless you really need to (even on Linux). If you're using a 32-bit Python, you can download from the official site. If it's a 64-bit installation, get it from Gohlke's unofficial repository.
Some people also recommend installing Anaconda instead of the default (python.org) installation but I have no experience with it.
The Python equivalent of Matlab's core functionality is a combination of the libraries numpy, scipy, and matplotlib. Numpy provides the equivalent of a Matlab array/matrix and some basic functions for it, scipy provides a whole lot more functions that deal with arrays (e.g. the equivalent of ode45, etc) and matplotlib allows you to make nice plots from them. On a gross level, Matplotlib can be pretty similar to Matlab (hence the name) and you can set up a Python script with a lot of the same/similar functions by doing:
from matplotlib.pylab import *
When I was trying to learn enough Matlab to read through someone else's code, this site was a huge help and I'd imagine it would be in reverse too. The best other resource I can think of is the SciPy cookbook.
I would start from wav to numpy arrays then to tensors >>> from scipy.io.wavfile import read >>> a = read("adios.wav") >>> numpy.array(a[1],dtype=float) array([ 128., 128., 128., ..., 128., 128., 128.])
typically it would be bytes which are then ints... here we just convert it to float type
you can read about read here http://www.scipy.org/doc/api_docs/SciPy.io.wavfile.html
There are plenty of open-source projects that see wide use in astronomy (eg. astropy, SciPy), but these are large and fairly mature, so I don't know how receptive they'd be to an enthusiastic amateur (presumably?) wanting to get involved.
An interesting problem to have a go at programming yourself (if that would be of interest) is an n-body simulation, which is the only way to model any more than two planets/stars interacting gravitationally - there's no equation you can write down to describe this. It's relatively straightforward to get a basic program that does this going, but making it efficient and run fast is an interesting and challenging problem.
Definitely get started with coding, and data structures. There are probably a couple of paths you could choose since you have a physics background. Also note that depending on where you live, there could be start ups / smaller companies that would gladly hire a physics major with coding knowledge. Note that I'm from Irvine, CA so I'll just relate this post to what I've seen.
You could go a more engineering/hardware route and focus on testing of hardware. This would mean writing tests or scrips to test various features of hardware / firmware. I've seen a couple of jobs around where I live that specifically asked for physics majors with coding knowledge. One had to do with testing lasers, and the other had to do with writing test scripts for some kind of drone, and verifying the results of your tests (which is where the physics background came in).
The other route you could go for, and I would probably suggest this as its easier, would be to learn python and some associated math libraries (SciPy for example) and just do something physics related that uses python. I have a friend that uses python to analyze data about star clusters, and she is currently studying astrophysics.
Hope this was a little helpful, let me know if you have any questions!
Python. Experience with a dynamically-typed^1 language is essential for any programmer. Plus, getting to know NumPy, SciPy, and matplotlib is a good thing.
[1] Python uses a variant of this called duck-typing.
> And from my understanding python is fairly popular these days, is that correct?
Absolutely.
It has lot of applications in web development and scientific computing (the SciPy library in particular). A lot of universities are also now teaching it as the introductory language due to its ease of use.
Hi, you just have to read two things to achieve that: "Python for Programmers" and the SciPy documentation.
Python is fairly popular for all kinds of scientific computing, with packages like SciPy, matplotlib, and NLTK. Here's an (inefficient) Python solution:
from fractions import gcd from math import pi
def rel_prime_prob(n=1000): count = sum(1 for a in range(1, n+1) for b in range(1, n+1) if gcd(a, b) == 1) return count / float(n**2)
print rel_prime_prob() # 0.608383 print print 6 / pi**2 # 0.607927101854
General Comments:
It seems great so far! The code seems pretty clean and fluid in terms of layout. The comments were well placed around the code as well. I tested the program on Python 2.7 <Mac Mavericks> and it worked! :)
Improvements:
Consider using cleaner names for some variables like response0 or loop0 and loop1 to a more relevant titles (for example, loop0 could become logging_loop) so it's clearer what the code does.
Suggestions:
If you were thinking about expanding the program to crawl on data, maybe you could try adding SciPy (which contains matplotlib) to your program to allow users to graph their data after x amount of iterations. Some example crawling bots could include:
Well done though, this was a good idea and it was executed well!
And while you're at that take a look at the SciPy documentation. You can get full SciPy stack installs easily from different vendors like Anaconda or whatever. The full stack distributions are linked on the install page.
Edit: flippin AutoCorrect.
Scipy has several dependencies that pip doesn't manage. So either install it using apt-get (instructions) or make sure you fix the dependencies and try again. I usually try pip first in the hopes of getting the most recent version.
A question, out of curiosity: What do you aim to do with the data?
Anyways, if you have a lot of data points (and you apparently do), you might want to look into numpy, which can handle large, sparse, and complex data better than Python's internal data structures. There's a paper on it here, and there's a pretty good tutorial on it too.
The way certain operations work in numpy, it can use a view data structure on top of the original data instead of fully copying the underlying matrix. I don't know enough about Octave/Matlab to know if they have a similar programming model.
That said, I also don't know if the repmat (tile) and reshape equivalents in numpy would create views in this algorithm or not. I'm really just throwing this out there to point out that vectorized operations don't have to create deep copies in all scenarios.
+1 for weave. That shit is magical, and definitely my preferred way to mix python and fast C code.
For anyone who's interested: weave lets you write snippets in C and run them in a python function. It's very fast. See a nice example here.
This ^. Basic stats and comp sci appear in all kinds of other subjects you wouldn't expect. Teaching them the fundamentals will help the students no matter what course they eventually choose.
Only thing I would say is don't use excel. Excel is for accountants and doesn't have the scope when you want to extend into more complex mathematics. I find Python is a very good language for playing about with. It has a clear syntax, is incredibly easy to teach to beginners and there are all kinds of useful libraries available, such as SymPy, SciPy and NumPy.
If your University has licenses, you could also try Mathematica or Matlab
"SciPy’s license is free for both commercial and non-commercial use, the terms of the BSD license." - link
You should be able to use MATLAB commercially if you purchase their license. I think their license is around $2,000, with additional toolkits. You will need some of the toolkits sooner or later, think of it as DLC, hah. Please double check this because I am not actually familiar with MATLAB's licensing policies!
This may be unwanted advise, but I really have to say that a GUI is probably a mistake if you have to ask these questions. Adding a GUI element will nearly triple the size of your code, and if someone on your team is not formally trained in software design, the GUI is going to cause a mess!
edit: Although, if you are a computer science student, and your goal is to learn software design, then by all means! Just be wary about going down this path in a traditional science. It takes a lot of time!
I'll be doing more on the golf club project in the future so stay tuned. In the meantime, the two approaches I described, you can work on programming in something like SciPy or you can find a freeware version of a non-linear dynamic fea tool. You probably want an explicit solver if you can find one for an impact problem.
Though I've never used it to confirm, I've heard a lot of good things Calculix.
While many times the commercial packages have had significant development that gives them advantages like better solution stability and quicker solution times, a tool is still a tool. Find one you like. Learn how to use it to its maximum. Learn it's limitations. Then when it's time for a step up, you'll know what you're looking for.
There's instructions here. You will need SciPy installed, and you will need to know basic Python syntax, because that's what it's written in - it doesn't have a user interface as far as I can tell.
If you're looking to build in-browser learning games, you'll likely be writing them in JavaScript, regardless. The back-end language is mostly irrelevant, although you'll have to pick one. It will serve as a glorified data valet, shuttling data back and forth between your JavaScript games and some database saving people's progress. These days, the list of "usual suspects" for a web backend is Ruby, Python, JavaScript, or PHP. I'd pick whichever you feel you can get the most support for.
For whatever reason, the folks on this subreddit have a strong affinity for Python. It's definitely a popular choice, but there's nothing about it that makes it especially well-suited for what you're describing.
The one exception might be if the "learning modules" will require some intense, behind-the-scenes scientific computing. Of those languages listed, the Python ecosystem has the most robust scientific computing libraries. See SciPy, for example.
OK, I'm sold - I'll give Python another try. I do like the SciPy suite and iPython notebooks, and PyKep is icing on the cake.
It depends. On Windows it's a mess, because pip doesn't work, but SciPy and Numpy is available as prepared package from here
On most linux systems the package installer is working or pip.
Ive said this all over the place but I'm going to reiterate it.
The best thing about R is that it was invented by statisticians. The worst thing about R is that is was invented by statisticians.
R is awesome if you're new/trying to break into DS, but it's sort of inflexible and not very scalable. Start with R and move onto python.
I'd suggest checking out Swirl first. It gives you a really quick rundown of doing stats programmatically . (http://swirlstats.com/). The instructions are really well written and it shouldn't take you longer than a day or so to complete.
When you're done with that, try doing it all again in python. Start by downloading the Scipy stack. (http://www.scipy.org/install.html). There's lots of great books out there. I'd recommend Python for Data Analysis by Wes McKinney.
Good Luck.
I worked in machine learning for 2 years in Python. There are a couple of tools that will really help you.
Numpy is a collection of numerical tools like matrices and functions for working with them. It's very well designed and maintained, with many many users. "import numpy" will very likely be at the top of every program you write.
Scipy is built on top of numpy and has tools for numerical optimization, training classifiers such as SVMs, logistic regression, etc. and some higher level functions like decompositions on matrices. It also has a great package for working with sparse matrices.
PyBrain is another library I can vouch for. As it sounds, it is a library for working with neural networks but also reinforcement learning, clustering, and genetic algorithms.
Long story short there's a big community of people doing ML in Python and there are some features of Python that make this simple to do without so much of the performance loss.
Links to the above-
Seems like everything is python these days.
Check out SciPy and AstroPy specifically.
From there search pypi (the python package index).
https://pypi.python.org/pypi?%3Aaction=search&term=astronomy&submit=search
> Install Python
If you're interested in scientific computing, install Anaconda, or one of the other Python distributions listed on the SciPy site. These come with Python and many of the important scientific packages preinstalled, which makes things a lot easier.
Python Scientific Lecture Notes is a good site for learning about some of the core scientific computing packages.
NumPy added multi-threading support relatively recently (<2 years ago):
many architectures now have a BLAS that will take advantage of a multicore machine
If you are really trying to speed up your numerical operations though, I've heard good things about numexpr, which builds "up" from numpy rather than down.
The idea of numexpr is to dispatch entire expressions to C rather than single operations. So, for example this expression: a = b**2 + c / 2
In numpy there would be three dispatches to C, and intermediate values would need to be returned to Python. With numexpr, there would be a single dispatch to C, and the intermediate values of b**2 and c/2 would not need to be stored, the whole expression will execute in place. Since this code is usually memory bandwidth limited, this is a huge win performance wise :-)
I find that Hinton plots can be really useful to visualise correlarion or covariance matrices. Here's how to do them with Python and matplotlib: http://www.scipy.org/Cookbook/Matplotlib/HintonDiagrams
Out of self-preservation I never help other people with programming. There are far too few of me and far too many resources online for me to consider doing it.
The Python website does offer assistance for learning the language, including absolute beginners.
Also, if you're going into biochemistry and using Python, you might want to look into NumPy. This extension of python is excellent for those in the science field as it allows you to do matrix operations, linear regression, differential equations, signal processing, and a lot more. You can see some cool stuff here.
If OP chooses to go the Numpy/Scipy route, this is a good chart of equivalent commends that has proved very helpful in my work (started off using Matlab at school and then later moved to a python/numpy project).
If you have a lot of money to burn, Mathematica or Matlab. If you don't have money to burn SciPy. Though with any graphing tool that is based on a computer language (e.g. Octave's with GNUPlot) you could hack an animation together with a for loop outputting images with an incrementing variable. Then you find software that will put those images together into a movie (which any of the open source movie editing software should be able to do. )
I'm a civil engineering grad student. Friend of mine from Cali told me his undergrad courses used Scilab intensively. I think it's got a solid userbase there, and has the advantage of being more polished than Octave especially if you're a PC user. But it's not intended to be a "clone" and therefore syntax and commands vary.
i want to transition to a free alternative to matlab myself, but I will probably go the route of SciPy myself, because it's a "real" language, can easily package and distribute software to non-coders, etc.
I'm no statistician, but I've found that most things in Python have already been done. In this case, you're probably looking for SciPy. For cross-validation specifically, see this StackOverflow post, and this link to the general cross-validation method in scikit-learn.
SciPy is a mixed blessing for me. I don't have enough training to recognize that the interesting problems have these ready-made solutions, but when I do stumble into one, it's almost always exactly what I was looking for. The drawbakc, of course, is that SciPy is heavy.
Please let us know if this helped!
Nice code. Try using numpy, which can make your life easier. This uses more memory than yours (I think) but for any reasonable number of experiments it won't matter. import numpy as np # your code until this line: least_required = 10**6 scores = np.zeros((num_experiments), dtype='int') #this runs the experiments for i in xrange(num_experiments): found_before_complete = test() scores[i] = found_before_complete largest_required = np.max(scores) least_required = np.min(scores) avg_found_before_complete = np.mean(scores) print "in ", num_experiments, " trials..." print "avg needed: ", avg_found_before_complete print "most needed: ", largest_required print "least needed: ", least_required print np.histogram(scores)
You can use something like this matplotlib diagram to graph that histogram on the last line.
Ok here is my triple punch. Since you didn't specifically mention the nature of your script, I'm assuming you mean number crunching.
First level: Code like a Pythonista This is a simple first step, learn the proper idiomatic syntax. Speed wise that would be replacing O(n) stuff (lookups in lists for examples) with O(1) (lookups in dictionaries). Memory wise it would be replacing iterations over temporary lists with iterations over iterators. Simple stuff to learn, and they give a nice and tidy boost to performance.
Second level: Numpy, Numpy, and Numpy. It takes a while to get used to the syntax, but trust me, once you gone Numpy you never really want to go back. Long story short it's a simple way to get closer to the metal, datastructures are laid out the way processors like them to be.
This level: As always, all roads lead to C. Now this is the most labor intensive option, but also the one that will give the biggest speed increase (implemented the correct way that is). The gist of it is that Numpy arrays can be 1 to 1 translated to C arrays so the interface between the two becomes trivial. There are a lot of examples on that website to get you going.
Basically, you do operations on arrays of values all at once, instead of one single value at a time. Modern CPUs are optimized for this, so you can significantly increase the amount of work done in the same amount of time, but you are limited to certain mathematical operations. Here is a list of the vector operations NumPy supports, and here is an example of how you'd use one of them.
That should be the case (except py26-scipy) but the MacPorts scipy port also seems particularly big; it looks like, unlike most MacPorts, it requires its own gcc toolchain to be built, and not use Xcode's gcc. Ugh! You might want to ask on one of the SciPy mailing lists.
I hesitate to mention this because it is not perfect and it violates a cardinal rule of Mac OS X: "Thou shalt not modify anything in /System/Library". But, because the alternatives are not good, you could try to modify the Apple-supplied Python 2.6 so that it will look for a newer version of ActiveState Tcl. If you are not comfortable with working in a shell, you probably shouldn't try this. The overall idea is described here for Python 2.7. So you'll need to make a few changes. It should look something like this [untested!]:
$ cd /System/Library/Frameworks/Python.framework/Versions/2.6/ $ cd ./lib/python2.6/lib-dynload $ sudo cp -pE _tkinter.so _tkinter.so.BACKUP # just in case $ sudo install_name_tool \ -change /System/Library/Frameworks/Tcl.framework/Versions/8.5/Tcl \ /Library/Frameworks/Tcl.framework/Versions/8.5/Tcl \ -change /System/Library/Frameworks/Tk.framework/Versions/8.5/Tk \ /Library/Frameworks/Tk.framework/Versions/8.5/Tk \ _tkinter.so
Then install ActiveState Tcl 8.5.9.2, making sure that your usage is compatible with the ActiveState license. If everything works, you should have at least a not totally broken IDLE for the Apple Python 2.6 but it will still have some bugs and problems, some of which were fixed in later versions of 2.6 and/or 2.7.
ps. link to Lotka Volterra example in scipy cookbook
Clearly this is not as slick as the video, but the worked example still might be helpful to someone who wants to try using scipy for learning diffeq the "old-fashioned way" :)
Like these?
CDF and PDF and levy-stable?
Good point. I guess if you need to do stuff in a short amount of time, it's a good choice.
I would also like to mention N-d Image (part of SciPy) for python. It has a lot of image processing filters.
If you're using numpy it's a good idea to understand the differences between the array and matrix types. Arrays are the standard matrix/tensor/vector type.
Using arrays you would use c = dot(a,b) for matrix multiplication.
There's a list of the differences at http://www.scipy.org/NumPy_for_Matlab_Users.
If you come from a Matlab background this link will get you the most bang for the buck for just jumping right in.
If you're in a Debian/Ubuntu environment:
$ sudo apt-get install ipython $ sudo apt-get install python-matplotlib $ ipython -pylab
And then enter Matlab-ish statements (modulo the differences in the site above) and see how it goes.
In Windows you can download Spyder (also works in Linux), which will get you the same kind of functionality in a more IDE-like environment in one package.
I think this answers it mostly: http://www.scipy.org/scipylib/building/windows.html
Basically getting set up with BLAS on windows is difficult and there are a few different ways to do it, each of which has its own dependencies. Choosing one and making sure it works reliably is difficult. Thankfully it looks like different parts of the ecosystem (like wheel packages, openBlas, etc) are now getting to the point where this might be possible.
This is a really bad website, but here it is : http://www.scipy.org/
Basically what I was doing was matrix (numpy), uncertainty (uncertainties lib), symbolic (sympy, which has a great website)... Also Python is great for going through lots of data.
Edit : okay, now rereading your question, maybe you asked something else.
The first thing I noticed is that the community is more open, and code reuse is way better. I can also find better answers for questions on the tools using stackoverflow.com (and stackexchange) than on mathworks forums.
Keep in mind I really didn't like Simulink and always prefer coding YMMV.
Ok. I guess you are using vanilla numpy. You should try Numpy with Intel MKL or Atlas. You can get up to 30 times speed up with the optimized linalg libraries. With these libraries you can also get extreme temps that go beyond an all-thread load that you can create with multiprocessing.
If you won't overclock CPU, the cooler you selected should be fine even with under extreme compute loads. If you overclock, I would test the temps with full compute load on all cores. Games do not put much load on the CPU.
I would definitely consider i7-6700K. As for the RAM, a cheap hack is to increase the size of your SWAP partition. You may also consider getting some ultra-fast SSD (eg. samsung 950 pro nvme) and put the OS and SWAP on it.
Honestly I don't know. But it definitely doesn't need Fortran now:
> One of the design goals of NumPy was to make it buildable without a Fortran compiler, and if you don’t have LAPACK available NumPy will use its own implementation. SciPy requires a Fortran compiler to be built, and heavily depends on wrapped Fortran code.
If you're interested purely in data science and statistical applications, R is definitely the right choice. R is the language to learn for data science, and is quickly overtaking competitors like SAS in the field. Some understanding of statistical concepts will aid you in the process of learning the language.
That said, if you want a more traditional programming language that will teach you strong programming and computer science related concepts, Python is the right choice. Python is a stronger choice for a "general" programming language as it can fit a lot of different applications. As you said, along your Python path you can dip into Django or Flask and learn a bit of web development, or even check out Pygame for creating games. With Python you are not limited to statistical applications. For data science with Python, check out SciPy.
I would say Python is more difficult to learn than R as R education relies heavily on built-in packages and libraries that do a lot of stuff for you. However, R has some quirks that are not present in other, more traditional, programming languages.
Full disclosure you're going to get a biased answer from most programmers you talk to because everyone wants to promote their language.
That said, the python syntax is super easy to learn imo and there's a number of great packages for python to support statistical analysis like numpy, scipy. and especially ipython notebook. When I was doing the mathematically calculated /r/hockey power rankings last year, I did all of that work in python.
Let's ghangout sometime and I can give you a 10 minute runthrough, especially if you have a set of data ready to go.
I'm also a big Python fan for web development. I don't do any data analysis stuff, myself, but I've heard that Python's extremely popular in research applications -- I believe that the SciPy libraries are the tool of choice for that -- and I'd guess there's probably a lot of crossover with what you're going to be doing.
If you're into Java right now, Python would probably be a big adjustment -- I think it's a worthwhile one, personally, but your mileage may vary. Everybody works differently, you know?
More importantly, though: you've got an awesome opportunity here. Your job is offering to pay you to do some professional development! Make the most of that: pick out a few interesting technologies, and spend a few days on each. Do tutorials, research, troll some message boards and IRC channels, and get a feel for what your options are... then pick something you feel really good about and dive deep on it.
I was in a similar position a few months ago, and I basically got to handpick a technology stack for a new project. Super rewarding. I'd guess there are a lot of great technology options for your project, so take the opportunity to figure out what your dream environment is and chase it!
Not a data scientist, but perhaps it might be relevant. ;-)
You might want to ask this on /r/datascience or /r/askstatistics as well.
As for the programming bits, perhaps learn a bit of R for data visualization and hypothesis testing, or Python/pandas or Python/scipy if that's your cup of tea.
Do not be scared of pip. There are many instances in Python development where you will use it to install python libraries.
NumPy is not one of them. You should use apt-get to install it. Take a look at the wiki for the SciPy stack.
What you should learn if you are going to continue working in Python is to use Virtualenv. This will allow you to create isolated python environments that you can pip install into to your hearts content without messing up your base python installation.
From SciPy's getting started page:
> SciPy and friends can be used for a variety of tasks:
> NumPy‘s array type augments the Python language with an efficient data structure useful for numerical work, > e.g., manipulating matrices. NumPy also provides basic > numerical routines, such as tools for finding eigenvectors.
> SciPy contains additional routines needed in scientific > work: for example, routines for computing integrals > numerically, solving differential equations, optimization, and sparse matrices.
I'm not entirely sure I understand what you are trying to plot, but if you are using Python then I suggest installing the SciPy stack (which includes Matplotlib and iPython). This intro to iPython notebooks using Matplotlib also might be of some help. Matplotlib is capable animation as well.
If you go through with learning python and want to do all kinds of data analysis and visualization, I recommend having a look at SciPy. It's an eco-system of various python libraries that are targeted at data analysis and everything that has to do with it. I can recommend the Anaconda package that ships python together with SciPy and an even larger number of very helpful and useful python libraries, including spyder, an integrated development environment in which you code, run your code, manage projects, et cetera. If you install Anaconda, you don't need to do anything else.
I work in a department that's full of old (and new) fortran code. When I need to get a routine into python, I use f2py and it works very well. I also prefer to do my plotting with gnuplot because the docs for matplotlib are terrible.
I'm actually seeing a trend of more scientists using python because how it is easier to learn how to program in, but I only am making the observation from the biological sciences. Some libraries of python code dedicated to calculations have been set up for this purpose (http://www.scipy.org/).
If you are just starting out in learning how to program or don't have a grasp of what programming is, I suggest "Head First Programming", which has helped a few of my friends grasp the concept of programming.
I would also talk to the professors and find out what they use, because having the knowledge of what the professors use in their labs would make you very marketable to them. I/we may suggest a language like Python only to find out that the professors use FORTRAN would waste your time(in the short run is you're on a deadline. Knowing other languages and how they work works to your advantage in the long run and can make you more flexible in using the right tool for the job.).
SciPy is pretty good. IMAO, PDL is better, but that's because I like Perl's expressiveness and compactness more than I like Python's regularity and verbosity. (I like C more than Pascal, too -- go figure). The important thing to realize is that you don't have to stick with commercial schlock like Matlab or, Gods help us, IDL. The free stuff has finally gotten to the point that it is better.
> performance both in space and time is atrocious. anything seriously performant related has to be done in C and FFI'd out to. what a joke. if your language is so bad you have to write your program in another language, it's just sad.
I'm not a fan of Python either, and I generally agree with most of your points, but this one isn't fair. It's basically true of every non-native language, and especially every scripting language. It's a known and accepted trade-off.
And then at least there's Weave to make it slightly less annoying.
It's generally nice if you posted some code. This lets experienced people who don't know exactly what you're doing have a crack at solving it. If it's a simple error you're making, someone who has no idea what you're doing might spot it. The way you phrased your question without specific context makes it less likely that someone will help you. Post some code and you'll get help. It's also nice if you link to the pages you used as reference, did you use this page? Or this one that gives almost no information?
Then, it depends NumPy stands for numpy module or Numeric Python. Travis reunified the community , and
> I sacrificed a year of my life in 1999 (delaying my PhD graduation by at least 6-12 months) bringing SciPy to life. I sacrificed my tenure-track position in academia bringing NumPy to life in 2005. Constraints of keeping my family fed, clothed, and housed seem to keep me on this 6-7 year sabbatical-like cycle for SciPy/NumPy but it looks like next year I will finally be in a position to spend substantial time and take the next steps with NumPy to help it progress to the next stage.1
I'm not so certain. Certainly there is churn going on, but compare to an analysis of simplified two-actor ecologies. For the individuals involved, each day or year is a crapshoot of starve or not, get eaten or not. For the ecology as a whole, each generation may be a time or plenty or a time of collapse. But the system endures because the collapses don't entirely destroy it.
Thus it is with western civilization. We have our Falls of Rome, our Dark Ages, our World Wars, our Great Depressions, but none of them, catastrophic as they are, are enough to destroy the whole system.
I missed any comparison to MAD (if you even mean the Mutual Assured Destruction I thought of). And I was overly brief to speak of a 5% / 95% split. In reality, much of the 95% benefit, or think they do, from the current system. Police, Armies, well-paid wage slaves (me) all play their parts in maintaining the status quo, and are paid, in power, or honor, or material comfort, for their services.
You for sure need Numpy. I mentioned Scipy because that is kind of like Numpy's big brother at least that's how I understand it.
Something I'd warn you about is that in Windows land you will often find dependencies a big pain in your neck. At least that was the case years ago when I still had a windows box laying around.
Make sure you're using GSL linked with an optimized BLAS, like ATLAS. Then, if the linear algebra routines GSL provides aren't fast enough with that BLAS, you can easily call the Fortran LAPACK routines from C.
Or, do like I did, and use (NumPy)[http://numpy.scipy.org] and (SciPy)[http://www.scipy.org]. You can write your major computations in Fortran or C, and use Python to do all the fiddly IO/other calculations that those languages aren't any good at.
Python will expand as long as there are good examples in the cookbook. For people like me who are non-mathematicians or computer scientists, the cookbook is where we turn to for examples on how to do things. I hope to add on to it with some of my own examples, I know I have posted a couple of examples on here.
Though this cookbook example is a bit silly, it helped me tremendously to implement ODEs in Python, instead of matlab.
Here's one for Biology: http://biopython.org/wiki/Biopython A search at google for "chemistry python" shows mainly classes but I didn't dig that deep. http://www.scipy.org/ is the base library that's used by a lot of these projects.
Honestly: whichever you would find more familiar. There will be a lot of MatLab clones etc suggested in this thread, check them out (if they are free or have free trials.)
If you need something different read up on SciPy (which you already mentioned) Python is quite an easy language to get started with, it's an interpreted language which means you can run an interactive shell and run scripts without waiting to compile a program.
You could probably get similar results to Python with Perl or other languages but my personal experience is mostly with Python
NumPy / SciPy focus on speed, they're Python.
there's also some toolkits for ML in that lang: http://www.google.com.uy/search?q=machine+learning+python
and a subreddit! http://www.reddit.com/r/machinelearning :)
What platform are you developing on? There are packages of numpy 1.6.0 for Python 2.5, 2.6, 2.7, 3.1, and 3.2; and packages of scipy 0.9.0 for Python 2.5, 2.6, 2.7, and 3.1 as well (see scipy.org). There are unofficial packages of matplotlib 1.1.0 for Python 3.1 and 3.2 (see http://www.lfd.uci.edu/~gohlke/pythonlibs/#matplotlib). All of these are for developing on Windows, though.
There's more info on some matplotlib developments here.
The only thing that would worry me is getting matplotlib running on Solaris. I found at least one good indicator, best of wishes. I have done some porting in my life but matplotlib (or rather numpy) is one hell of a beast.
I should have mentioned this. This also caused me a bit of pain in the beginning.
Essentially, this means that when you do y=x, y points to what was in x, unless you did y = x.copy(). Slices are also by reference. (array slicing is pretty awesome, as well)
This and some common differences between MATLAB and NumPy are covered here. You will find that invaluable if you're used to MATLAB syntax and find yourself wondering what the NumPy syntax is for a given operation at first.
Edit: also, it indexes arrays starting at zero, as is proper :-)