> 2. Use Sequential
layers when possible for cleaner code.
> 3. Don't make lists of layers, they don't get registered by the nn.Module
class correctly. Instead you should pass the list into a Sequential
layer as an unpacked parameter.
Don't use nn.Sequential to represent a list, use nn.ModuleList. https://pytorch.org/docs/stable/generated/torch.nn.ModuleList.html
Obviously it's fine, from a code correctness standpoint, to use nn.Sequential (they're very similar data structures), but from a code legibility standpoint, you should use ModuleList, unless you're literally just stacking layers.
Here're a few common paths: 1. Many people are applying ML to projects by themselves at home, or in their companies. This helps both with your learning, as well as helps build up a portfolio of ML projects in your resume (if that is your goal). If you're not sure what projects to work on, Kaggle competitions can be a great way to start. Though if you have your own ideas I'd encourage you to pursue those as well. If you're looking for ideas, check out also the machine learning projects my Stanford class did last year: http://cs229.stanford.edu/projects2014.html I'm always blown away by the creativity and diversity of the students' ideas. I hope this also helps inspire ideas in others! 2. If you're interested in a career in data science, many people go on from the machine learning MOOC to take the Data Science specialization. Many students are successfully using this combination to start off data science careers. https://www.coursera.org/specialization/jhudatascience/1
Yes, I have. A few tips:
Okay, this is stupid, but bear with me: Wanted to figure out how the captionbot for /r/AdviceAnimals works, worked it out with https://github.com/tmbdev/ocropy (NN-based OCR), because all other OCR packages i tried produced shitty results (after preprocessing) and/or had amazingly bad/outdated documentation (looking at you tesseract).
I trained it on (i think) 1000 or so samples (transcribed manually, took some time, but relatively reasonable 1-2 hours). It works pretty well for losslessly compressed images (png).
Then i noticed that the bots just scrape the captions off the image hosting sites. Oh well.
For those actually interested in developing and testing gaming bots https://www.coursera.org/course/ggp starts at the end of the month.
PS: the intelligence of the bots is rarely evaluated based on the size of the logs...
I recommend you try Visual Studio Code with remote ssh. It's pretty much the advantages of local development, but on a remote machine.
https://code.visualstudio.com/docs/remote/ssh
(I realize this isn't the answer you're looking for, but I went through a similar thought process a few months ago. Going for a workstation/laptop combo meant a lighter laptop, that I could turn off anytime and wouldn't be constantly venting hot air)
A bunch of modern examples:
http://tensorflow.org/tutorials
And a web-based visualizer:
http://tensorflow.org/how_tos/summaries_and_tensorboard/index.md
Now just show us that Google can continue to maintain an OSS project well over time, and I'll be quite impressed.
I have some comments:
I write GPU drivers, GPU compilers, and optimized GPU kernels for a living. I learned through a combination of good mentorship, studying GPU hardware architecture, and being thrown in the deep end (i.e. being asked to make XYZ where XYZ is somehow related to the GPU, be it an optimized GPU kernel or some low-level GPU driver functionality).
If you're just beginning and don't have the same opportunities I did, I'd suggest the following. Try taking a look at this Udacity course: https://www.udacity.com/course/intro-to-parallel-programming--cs344. It's an excellent introduction. Afterwards, try implementing some algorithm of your choice on the GPU. Pick something that's already implemented in a popular GPGPU framework and see if you can create an implementation that runs equally as fast. Understanding how the underlying hardware works will be important for writing a well-performing GPU kernel. Using vendor provided profiling tools will also be equally important. Good luck :)
Really? Pretty sure this is his linkedin: https://www.linkedin.com/in/derekchen14
The only thing it says is that he took a coursera course. Far cry from calling himself a black belt. He even says that he's not an engineer.
The Book of Why by Judea Pearl
"Correlation is not causation." This mantra, chanted by scientists for more than a century, has led to a virtual prohibition on causal talk. Today, that taboo is dead. The causal revolution, instigated by Judea Pearl and his colleagues, has cut through a century of confusion and established causality--the study of cause and effect--on a firm scientific basis. His work explains how we can know easy things, like whether it was rain or a sprinkler that made a sidewalk wet; and how to answer hard questions, like whether a drug cured an illness. Pearl's work enables us to know not just whether one thing causes another: it lets us explore the world that is and the worlds that could have been. It shows us the essence of human thought and key to artificial intelligence. Anyone who wants to understand either needs The Book of Why.
This is especially interesting because the head of search ranking at Google, Amit Singhal, is well known to be reluctant to rely on machine learning: https://www.quora.com/Why-is-machine-learning-used-heavily-for-Googles-ad-ranking-and-less-for-their-search-ranking
Andrew Ng's course was incredibly well done. I took it before he started Coursera, so I'm not sure how it has changed since.
Neural Networks for Machine Learning would be my vote for the best. It assumed a working knowledge of calculus and linear algebra and went deeply into the math behind various types of Neural Networks as well as various topologies -- include the practical implications of the differences between them. It also was the first place I ran into recurrent networks, which I still haven't found a very good technical explanation for outside of the course.
The final lecture hit on some of the more interesting current research going on in the field. Absolutely not suitable for a first course -- but excellent to really dig into Neural Networks and Deep Learning.
Here you can find a short description: https://medium.com/swlh/create-your-custom-bounding-box-dataset-by-using-mobile-annotation-58232cfaa7ca
And here you can find the app: https://play.google.com/store/apps/details?id=www.app.manthano.ai
Thanks a lot for the feedback so far! Super valuable!
According to your responses we are going to update following things first: 1. App permissions (It took per default all permissions because we didn't specify them specifically) 2. Data disclaimer (Your data is yours and stays yours, also without the disclaimer) 3. Feature implementation such as "box by two points", "other annotation types", "zooming"
Off the top of my head:
my focus:
Hey guys! I'm one of the authors of this book, and I'm really excited to be able to share this with /r/machinelearning. There have been a lot of people asking for good introductory books to TensorFlow, and we hope that this is going to be one of the best offerings out there.
Our publisher's page has a link to get a free chapter from the book, but here's the direct link for the lazy.
My co-authors and I have been working our butts off to get this ready for you all, so let me know if you have any questions!
there's an entire field of IT risk that will prevent you (in large companies) from relying on small/non established vendors.
this is to prevent you from building mission critical processes on vendors who may not be around in several years.
if you open source the libraries, it means there is a larger degree of assurance that bugs can be fixed even if you don't work on it full time
plotly open sourced their core libraries 4 years ago (https://plot.ly/javascript/open-source-announcement/) and offer an enterprise grade value add package (centered around collaboration etc)
pm me if you wanna know more
Super excited for the event, I wonder if DeepMind can pull it off! On the left of Figure 4 here (page 11), Lee Sedol is supposed to be around the 9p level (with 3500-ish elo). If you look at the "distributed" part on the far right, it seems that even without changes, the program can be improved substantially with just adding more resources (which Google has a ton of). And the DeepMind guys will not be twiddling thumbs in the meantime for sure!
I don't know if the machine will win this match, but it will sure as hell win the next one.
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems https://www.amazon.com/dp/1492032646/
Hands down the best book in the field. Talks about both Keras and raw TensorFlow.
I strongly recommend going through either Dan Jurafsky and Christopher Manning's Coursera or Michael Collins'. They both treat the field from a different choice of emphasis on theory, and the material in both ought to be mastered by anyone looking into doing research in NLP. It'll give you the general idea and common frameworks used, and a surface level understanding of the many different problems encountered.
Ongoing research is often more about applying the most recent algorithms developed in (statistical) machine learning, so you'll also need to dig into more general machine learning courses if you aren't familiar with them.
Forget about machine learning or even Python specifically. What you need to worry about is making maintainable, scalable code. There are people like Scott Meyers who have been writing books about this subject for years, and the lessons are as true about machine learning in Python as they are about database backends in C++.
Some good books any software engineer should have read: Mythical Man-Month, Design Patterns, Clean Code, Pragmatic Programmer, Code Complete 2.
Man I wish the authors open sourced their code. I've just skimmed the paper, but implementing a Kronecker LSTM in tensorflow shouldn't be that difficult. I'll update my comment if I'm able to implement it and provide the github link.
The main problem I see is that there is not kronecker product op in tensorflow. Does anyone know if there is a simple way to do kronecker multiplication? Suppose your doing A X B. The only way I can see this being done is literally index every element in matrix A and multiply it by B, and finally concat it all together. Here they suggest tiling the tensor but report it being too slow.
The other approach is to reshape the tensor [1 x N] and the other tensor [1 x M] and broadcast multiply. This may be the best approach for going about it. Would be really nice if tf had this as a fused op.
Seems like a great way to train models much faster!
EDIT: Beginning Implementation here for kronecker highway rnn.
If your goal is to get into machine learning, linear algebra is going to be your best friend. There are structures like vectors and matrices that are elemental to the understanding of large scale data and data manipulation. Check out the Khan Academy to get grounded in the basics.
In general the more math and stats literate you become, the better off you'll be. MIT, Yale, Stanford and several other schools offer their courses online for free. I think Khan is the place to start because that was designed to be used by people on the internet who are learning the basics (i.e. you).
After that you will need to bone up on your programming skills. I like Python and R, both are free and have plenty of resources to help those who are new to the field. The Stanford online course in machine learning uses Octive, which is a development language for machine learning, but I don't know that much about it.
tl:dr Attend to your math first, then learning to program. You'll need a semesters worth of linear algebra (minimum).
Thanks evilmaniacal! There are a lot of information packed in that 2-page paper! One interesting point that gets my attention is the Google OCR system does not include any “preprocessing” steps. When I was using Tesseract and Ocropy, their documentation (Tesseract and Ocropy) put a lot of emphasize on preprocessing the image before feeding it to the model.
Does that mean preprocessing is no longer necessary for modern text detection model?
Project Gutenberg has lots of free audio books, along with the corresponding ASCII text.
https://www.gutenberg.org/browse/categories/1
They used to have the text and audio grouped together, but as far as I can see you now have to search for the text version separately - it's not on the same page as the audio.
To add to this, for Natural Language Processing, the next place is Stanford NLP course by Dan Jurafsky and Christopher Manning for a wide introduction to NLP, or Stanford Deep Learning for NLP by Richard Socher for a narrow introduction to NLP focused on deep learning.
I would like to recommend the Khan Academy sections on probability and linear algebra. (Not all of them, of course. Dip in and out when you need to.) It doesn't cost anything and the videos are probably of good-enough quality for your needs.
Good luck!
No, it is totally fine to calculate that way. Because now you can actually order spot instances that won't terminate for up to 6 hours from AWS. See https://aws.amazon.com/de/blogs/aws/new-ec2-spot-blocks-for-defined-duration-workloads/
Most things leveraging CUDA/CuDNN/CuBLAS without explicit effort to keep it deterministic. E.g. Convolution and Pooling in PyTorch on the GPU and the same in TensorFlow
There is an active discsussion on hacker news on this https://news.ycombinator.com/item?id=9504241
I want to clarify that tensors are not just theoretical curiousity. They are computationally tractable, in fact, they are embarrassingly parallel and computational complexity can be made just linear in input dimension with parallelization tricks. We can use optimized linear algebra libraries to do tensor operations efficiently. We have many papers on this and we also have source code available. Try it out
AWS is also releasing a FPGA developer preview today: https://aws.amazon.com/blogs/aws/developer-preview-ec2-instances-f1-with-programmable-hardware/ (I know probably not gonna be competitive with GPUs for deep learning, bit might be useful for preprocessing/ weak classifiers on large data sets, low latency, or just interesting for the learning experience)
Actually his use case is fine for researchers. He's generating image sets which are not licensed. Refer to their FAQ on videos generated with UE4. The only gotcha about this is the specific licensing of the assets used which are often under a difference license, but glancing at the "Open World Demo Collection" their licensing is incredibly broad to the point it seems like images generated with it using UE4 are fine. (People use it to make videos already and per the video section cited previously this seems completely fine).
Also FYI, GPL is a horrible license for researchers since it places heavy restrictions on its usage for commercial projects. It forces clean room coding practices in the worst case which is why you'll see MIT used in a lot of projects.
I've had some success converting LaTeX-based PDF files using a tool called k2pdfopt.
If you can find an HTML version (e.g. from the .tex source) then you can put that directly on the e-reader. It renders it reasonably well. Or you could convert it using... tools.
Finally, if you can reformat anything, e.g. if it's a MS Word document, create a paper size that works well for your e-reader (use a ruler) and print using that paper size to a PDF file.
Neural Networks for Machine Learning by Geoffrey Hinton
I'm also collecting a list of advanced ML resources, so feel free to take a look at that.
I would highly suggest Andrew Ng's machine learning course as a solid foundation, but it's based on videos and not text.
Anyway in this course some lecture notes may be available for certain topics as textpage.
That's not what word2vec does. That's what a language model does. Word2vec is just a co-occurrence model.
Edit: Omer Levy does a nice job explaining it here, but references his original papers which give thorough analysis and proofs: https://www.quora.com/How-does-word2vec-work
Recurrent and recursive neural networks are very promising models for processing text, there have also been recent papers popping up using convolutional networks for text too. To learn more about them, Hinton's coursera is an excellent resource and Yoshua Bengio's deep learning book is starting to fill out with great info as well!
On this specific dataset, unfortunately, it's small (25K labeled examples) for these much more complex models to really shine. They overfit a good deal and the results you get are just competitive/comparative with the code above.
When you initialize the supervised rnn with embeddings learned by a language model rnn leveraging the additional 50K unlabeled examples (similar to what was done with word2vec in this tutorial), it's getting about 90% accuracy which makes it better than any single linear model that I'm aware of in the literature, but still a bit worse than an ensemble of NB and SVM at 91%.
> I can't help but cringe every time he assumes that self-improvement is so easy for machines so that once it becomes possible at all, AI skyrockets into superintelligence in a matter of weeks.
He doesn't assume it, he concludes it after discussing the topic in depth.
Pages 75-94 of his book. Preview available via Amazon.
I actually do something where I use nginx together with a background python process :)
I use the proxy_pass directive to forward requests matching a certain url part to a python process listening with a very lightweight http server of its own.
Part of my nginx config looks like this (the python process listens on port 2345):
location /your-sub-url/ {
proxy_pass http://localhost:2345;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
Also have a look at this, which helped me for some use cases: http://serverfault.com/questions/562756/how-to-remove-the-path-with-an-nginx-proxy-pass
Hope it helps!
yeah 500 times is bonkers, that shows python being 5 times slower, and Pypy being about 20% slower than C on a reasonably tightly optimized function.
And then yeah, development is faster by leaps and bounds in python (specifically) than in C/C++, but because of fast libraries like Numpy (which will be as (or more) performant than straight C code) and scipy (which builds on numpy), and scikit learn, code written and architectured in python can run circles around handwritten java or C.
Or in short, developer time is more valuable than runtime.
I'm always curious about how much the cloud compute would cost for these crazy papers. 92 GPU YEARS. Some quick maths based on aws on demand hourly rate is 92*365*24*3.06 = $2,466,115 (but it appears that based on the "3-yr Reserved Instance Effective Hourly" you get down to $846,216 - I'm using the single gpu instances here: https://aws.amazon.com/ec2/instance-types/p3/)
that's bonkers.
I'm sure nvidia has their own massive clusters but still it's just such a huge magnitude. also wanna note I SUPER appreciate the final page of the paper discussing the routes and methods they pursued and breaking down training cost/time by areas of the paper development. Was super interesting to read
> a torrent would likely be up quite soon after
Not sure if you're referring to ImageNet or Open Images there, but just in case people don't know, there is a 1.3 TB torrent of ImageNet. It had no seeders a few months ago, but it's back up to 2.
You might want to read this.
With the release of PyTorch 1.0, soon, the pipeline will be researching, prototyping and training on PyTorch and deployment on Caffe2.
As of now, TF is your best bet.
> Some of the differences in MXNet when compared to PyTorch are as follows: > > + You don’t need to put the input into Variable
Not anymore, you can just use plain Tensor for everything starting with 0.4.0:
https://pytorch.org/docs/stable/autograd.html#variable-deprecated
Use tesseract 4.0, which now uses an LSTM. You can train it on your own data too.
They also have some docs describing the system and the typical OCR pipeline
Bitfusion also has some really useful deep learning AMIs on the AWS Marketplace in case they are useful to you or others -- they support NVIDIA, CUDA, Tensorflow, Torch, Caffe, etc. https://aws.amazon.com/marketplace/seller-profile?id=3b372560-86bf-4e3d-9ec0-016892a64bed
This is not a contest. - As I said in my post, it is simply pointing out that the USA is able to commit horrible crimes as well and that this arrogant looking down and seeing no wrong in what the USA does and did to other states and its own citizens is sickening.
> According to the most recent census data, there are nearly 160 million more white people in America than there are black people. White people make up roughly 62 percent of the U.S. population but only about 49 percent of those who are killed by police officers. African Americans, however, account for 24 percent of those fatally shot and killed by the police despite being just 13 percent of the U.S. population. As The Post noted in a new analysis, that means black Americans are 2.5 times as likely as white Americans to be shot and killed by police officers.
>[...] neither are schools being a war zone...
Surprised overleaf isn't on here - probably the best collaborative tool for writing latex papers
Also:
- Evernote has decent pdf annotation software for reading papers and marking up pdfs. I use it on my ipad to read almost all papers (except when I'm reviewing, then I do it on paper). Anytime I see a research paper I want to read, I add it into a notebook. Then I read through them in my free time. Best part of evernote? The tag system. So much better than Mendely which uses folders. I've found many research papers touch on multiple areas of AI. With a folder system like Mendeley, I had to make sure to drag the same paper into each folder on a topic, where as with evernote, I tag each document with all related areas. Then I just search by tag when I want to find all papers on a topic.
You could load your dataset into Weka: http://www.cs.waikato.ac.nz/ml/weka/
it's a suite of machine learning algorithms written in Java. You can import your data fairly easy, set the types of your variables and whether they're known or unknown, and then classify your unknowns using various algorithmic methods (e.g. decision trees / neural nets / naive bayes). It won't give you a properly sophisticated model of your problem, but it's fun to play around with common techniques.
Judging from the submissions to this forum I'd say researchers are already trying to bring the next AI Winter closer.
This reminds me of the book "The Innovator's Dilemma" which describes how great firms may fail because they focus completely on sustaining technologies and disregard new disruptive technologies.
I think the ML community is too benchmark- and SOTA-oriented.
In my opinion, AI Winters are not a good thing. They are the result of excessive exploitation and not enough exploration. For instance, nowadays so many people do research in DL that new students have started to believe that one doesn't need to know much theory to be a researcher in ML because if we can get SOTA results with technology we don't understand just by poking around, why should we waste time with theory?
Her's not misinforming, the project has no GPU support yet. Having been a consideration and having been implemented are two different things, which you know since you obviously read the corresponding paragraph. It's beyond me why you only cited one sentence instead of the whole thing:
> GPU and multi-GPU computation has been a core consideration in CGT’s design from day one. Usage of GPUs is currently not documented and we need some work to straighten out the API, but the basic scaffolding is in place for transporting data to and from the GPU, calling libraries like cuBLAS and cuDNN, as well as compiling kernels on the fly. We plan to substantially improve GPU and multi-GPU support in the coming weeks and months.
So it's planned, they're working on it, there is some progress, but it's still not there yet. You are the one spreading misinformation. Except that you did in on purpose, having obviously read this paragraph and choosing to misleadingly quote only the part that suited you.
I can only warmly recommend Geoff Hinton's Neural Nets class. Andrew Ng's intro to Neural Nets was pretty brief, but okay for an intro course, and Geoff's course would be the perfect follow up. The videos are also on Coursera: https://www.coursera.org/course/neuralnets
I think it's fairly well known how many people these targets go through before being approved for asssassination, so hopefully there's some skeptics in there that don't trust ML results.
Apologies as, even though I have not read through your tidy working, I'm lazy and would appreciate if you compared with Real Time Recurrent Learning.
RTRL used to be popular in the early 90s but is these days only occasionally encountered (such as in https://openreview.net/forum?id=q3KSThy2GwB). IIRC, a similar forward mode sensitivity approach was already present in the Hochreiter & Schmidhuber LSTM paper.
The primary disadvantage is how compute intensive it is per time-step, practically infeasible for large networks. There have been quite a few approaches which aim to address this issue, and a quick search turns up this early one by Schmidhuber (1992, https://www.semanticscholar.org/paper/A-Fixed-Size-Storage-O(n3)-Time-Complexity-Learning-Schmidhuber/89b9a181801f32bf62c4237c4265ba036a79f9dc).
Anyways, it's extremely impressive that you arrived at this largely on your own.
I don't know how you'd fast track, but I'll offer up that CS is far, far more than knowing how to program.
If you can't commit to a degree then I'd recommend brushing up on data structures and algorithm design MOOCs on Coursera or edX. Also, math: multivariable calculus, linear algebra, and a course in discrete mathematics for CS. This course seems like it might be engaging for someone with your background.
I'd compare each build to what AWS offers and make sure given relative power and usage it makes sense to even build instead of "rent": https://aws.amazon.com/blogs/aws/new-p2-instance-type-for-amazon-ec2-up-to-16-gpus/
Although they note in the paper that the topic is "mostly known but often underappreciated", I think they somewhat undersell the extent to which this is true. The well known quote by George E. P. Box is a good example "Essentially, all models are wrong, but some are useful."
Furthermore I would disagree with the statement "generative models need to be evaluated directly with respect to the application(s) they were intended for," as this is true of both generative and discriminative models. After all, things like square, log, and hinge losses are referred to as surrogates losses for a reason, they are a surrogate for the 0-1 loss (which in many cases isn't correct either as misclassification costs are typically asymmetric in the real world).
Despite the criticism, I like the paper overall. It might not be as exciting as proposing some new model variant that achieves SOTA on dataset X, but it is important that people think about fundamental things like commonly employed evaluation techniques.
Hi, In our lab, we have a reading group session on Fridays, where we discuss about interesting research works in mostly Deep Learning. We have fair enough attendance and a discussion session that follows will be for quite a long time(about an hour). I have found this idea extremely useful to get to know about interesting researches around the world. I heard from my supervisor that before I joined, they used to collaborate with Prof. Xavier from UPC (still it's on going but I am not quite sure :-)) and have shared the slides here, https://www.slideshare.net/xavigiro/presentations/3.
Cheers,
Q1: There's no differentiation under the integral sign. They exploit the fact that they want to maximize something of the form Max_D \int f(D(x)) dx and that for the D^* they proposed, they have f(D(x)) <= f(D^* (x)) for all x and other candidate solutions D, which implies then that \int f(D(x)) dx <= \int f(D^* (x)) dx. Namely, they actually show something stronger, that their proposed solution maximizes at every point (which then implies that it maximizes the integral).
Q2: Here's the step by step computation https://www.overleaf.com/10272055vvbtmwxzqpwv#/38065380/
Cheers!
Yep, we plan to report out the results, just as we do with our annual State of Data Science survey: https://www.anaconda.com/blog/2020-anaconda-state-of-data-science-report-moving-from-hype-toward-maturity
I agree with the sentiment.
Before Covid, I'd run into many people I knew (like 100+) at any of the big ML/AI/DS conferences, even without active collaborations. It was easy to catch up, have a chat, exchange ideas, etc.
Ever since virtual conference started, I haven't been able to catch up with almost any of them. Especially as people get more senior, they don't need to bother with social events, they don't ever log in to gather.town or rocket.chat. It's not necessary with their established networks. But if you're outside established circles or the Ivy League equivalents, you're even more fucked than before (outside, arguably, organized mentoring rounds that are more accessible because there aren't any travel/visa/cost issues).
The same goes for poster presentations. Being present in empty Zoom calls just waiting around, possibly at a annoyingly early/late time, just in case someone bothers dropping by isn't fun. In an in-person poster session, you always get some traffic from bystandars.
Hi, I'm the author of the above repo. Thank you for referring it! I also have slides on Poincare embedding (sorry for my poor English :P) , and am happy if it'll help you.
https://www.slideshare.net/daynap1204/poincare-embeddings-for-learning-hierarchical-representations
I really love the idea of Poincare embeddings but I wonder if there are some better dis-similarity functions than distances (distances are symmetric and positive always).
This KDD2017 workshop paper may also help (quite similar to poincare embeddings but inner product like function is used as similarity function).
"Neural Embeddings of Graphs in Hyperbolic Space"
https://arxiv.org/pdf/1705.10359.pdf
Inner products (used in the paper)
<u,v> = d(o,u) d(o,v) cos θ
(θ is the angle between u, v around the origin o, and d(o,u) is the distance (in hypo. space) between o and u)
look well but I found it's equivalent to the inner product between u * d(o,u)/||u|| and v * d(o,v)/||v|| (where ||u|| is the L2 norm of u) in Euclidean space. So I think the representation power of provided hyperbolic embeddings (with inner product) is equivalent to that of Euclidean embeddings.
If you have some good similarity / dis-siilarity function, please let me know :)
I like Colah's and Karpathy's blogs, Bishop's PRML, and Elements of Statistical Learning. I also really like All of Statistics by Larry Wasserman, which is more from a frequentist statistics point of view.
IMO, ml papers, and scientific articles in general, while often badly written, recently started getting better. I've read The Sense of Style, by Pinker, which basically teaches you how to write the same way his books are written. It's useful for communicating, but gets a little monotone, maybe. Totally applying it to my thesis though.
I've taken many of Udacity's ML courses and on the whole they are pretty shallow. I'd recommend Coursera's new 5 course ML specialization offered by University of Washington. The fist 2 courses were very good and the 3rd is due to be released this month.
I'm taking the Machine Learning Specialization, from University of Washington. The first course is an overview of ML techniques and practices, at a high level, using high level tools and Python. I found it very approachable and definitely learned a lot, although I know some people want to go deeper and really understand the methods they are using. The next 4 courses of this specialization seem to be doing that; I am now in the Regression course and it is going much deeper into the math behind these algorithms. I haven't taken the other courses/specializations but this one seems like a very good balance of practical knowledge and understanding of the fundamentals.
Disclaimer: I work at Dato; the CEO of Dato is also the Amazon Professor of Machine Learning in the CS department of UW, and one of the co-creators of the specialization.
The free Coursera machine learning course is what I started out with. It was a little rough in the beginning learning to code but if you stick with it and do all of the work you will have a decent intro to machine learning.
over on hnews jhartman provides context
"
Reading through this paper, this is a logical extension of the recent work published by FAIR (Facebook AI Research) that proved that with careful implementation of FFT convolution that the speed up is very significant. The Facebook work though still had all the learning happening in the spatial not frequency domain. They are doing all the updating in the frequency domain, and have introduced a new type of pooling that uses stochastic resolution reduction in the frequency domain that seems very useful. Very interesting paper, I'm keen to try out the techniques myself.
"
Do Hinton's Coursera class and then go through Hugo LaRochelle's graduate course on Youtube. That will give you a very strong foundation.
Opening is an action. From pg 2
> T seconds of delivery, providing the rank of the mail. Informally, we predict p = Pr(a 2 A; t 2 (Tmin; Tmax)jf ; s); where a is the action performed on the mail, A is the set of actions denoting importance (e.g., opens, replies, manual corrections), t is the delay between delivery and the action, f is the vector of features, and s indicates that user has had an opportunity to see the mail.
Google "image to gcode", there are a variety of solutions. https://inkscape.org/ru/forums/questions/jpg-to-gcode/
It's either quite easy or a bit involved depending if the output of this software is vectorized or rasterized. /u/vijish_madhavan what exactly the output of your software? Only rasterized images, or perhaps there is a more abstract data type right before the rasterization process that could be vectorized.
My biggest requirement when reading papers are actually remembering later what the papers did, what the big ideas were, what some of the highlights from the experiments, and sometimes some important details from the appendices (architectures used, notable things that came up, etc.)
Towards this end, I've started using Anki to literally make flashcards of important points in papers. For example, one flashcard might just look like: "In Paper X by authors Y et. al., what is the main idea of their algorithm?", and the back would detail the algorithm, and I quiz myself on these later.
Overall, I get paper suggestions from colleagues or reading groups, usually print the papers out, make notes on paper while I read, and then write down highlights with Anki.
There are also monthly HN threads for advertising freelancer gigs / availability, for instance for January 2014:
https://news.ycombinator.com/item?id=6995014
This is not specific to machine learning but there are often data related stuff there too.
I know there are traditional approaches to generate color palettes but they are limited to a few certain types (color harmony rules) e.g. https://color.adobe.com/ . Idea here is to use the data created by people and go beyond simple generated palettes. Some of data includes palettes from famous playstore apps. I am not sure if there exists some approach other than machine learning for such task.
Hi everyone
I'm one of the creators of metacademy. I'll check this post occasionally and happily answer any questions and take note of suggestions/criticism.
Here's an inside scoop for the reddit ML community: We recently launched a content contribution GUI, but haven't added it to the main site yet. You can play with it here (use chrome, ideally, but FF should work fine): http://metacademy.org/graphs/edit/new You'll have to log in if you want to save data to the server (to prevent spam). I'll eventually make it so that you can export the graphs/learning paths to a static site and host them yourself, e.g. for a course, company training, code documentation, etc.
As for the affiliate links: I recently wrote a script that changed all of our amazon links to affiliate links. Roger (the other metacademy creator) and I have been debating whether to keep them. We don't mind taking money from amazon to pay for our server fees (MIT OCW does this as well), but we don't want to compromise the site's integrity. We always try to list both "free" and "paid" learning resources for the concepts, which hopefully ameliorates integrity concerns. We're also 100% open source and never present a user with a login screen in order to view our content. Still, we're undecided about the affiliate links and would love to hear your perspective.
-Colorado Reed (colorado %at% berkeley %dot% edu)
Convolutional neural networks were originally a model of the brain's visual system coming from computational neuroscience.
Fukushima, K., & Miyake, S. (1982). Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In: "Competition and Cooperation in Neural Nets" (pp. 267-285). Springer, Berlin, Heidelberg.
Answer to the title: run Solomonoff induction, obviously, and over the next few months proceed to make billions on the stock market, cure disease, prove the Riemann hypothesis, take over the world, etc etc. Answer to the description: probably have a crack at getting some deep RL working on StarCraft :D
Weights are just tensors. You can always transfer them from one framework to another. It can be a lot of effort depending on your experience and the frameworks, but usually there are libraries to help you convert a model from one format to another.
https://pytorch.org/tutorials/advanced/super\_resolution\_with\_onnxruntime.html
https://github.com/onnx/onnx-tensorflow
PyTorch 1.6 comes with Automatic Mixed Precision, but I'm not sure how it compares to Apex. Also, the zero_grad()
hack looks like something PyTorch should fix on its own.
Based on what I've seen from the TensorFlow team, people I've talked to, and new tutorials, TF is moving towards an API that is similar to PyTorch. That means imperative and eager, where you can pass tensors through your network as you build it, which necessitates automatic differentiation.
This is an ease of use improvement. There is a massive cognitive jump between working in Numpy and defining static graphs in TensorFlow. The transition from Numpy to PyTorch is basically negligible because they are both strongly Pythonic and imperative APIs. In my experience, working with TensorFlow is almost like writing in a completely different language.
With a static graph, you define all the operations upfront and each of those operations has some gradient method. When you do the backwards pass to calculate gradients of your parameters, it just goes backward through the graph you defined using the gradient methods. This should also be considered automatic differentiation, just different than what PyTorch does.
PyTorch and TensorFlow's eager execution mode give you dynamic graphs. You don't define the graph ahead of time, so instead you have to keep track of the operations performed on the tensors. PyTorch's autograd and TF's GradientTape work by attaching a gradient function to each new tensor you make. For example if you have some tensor x
and do y = x**2
, y
will be created with a method like y.grad_fn = PowBackward()
, which calculates the gradient for the power operation given the output y
and the input x
. If all of your tensors have these gradient functions, then you can start at some tensor (the loss for example) and go backwards through all the operations leading to that tensor, calculating gradients along the way. This will eventually get you the gradients of your parameters for the SGD update step.
This has nothing to do with machine learning, so I would suggest asking in (one or more of) the following sub-reddits:
Also, there is nothing called Anaconda Jupyter. Jupyter comes from Project Jupyter and Anaconda is a Python distribution.
Currently, the fastest off-the-shelf approach is to factor your words into a n-gram (usually trigrams) vector space and then query that space, ranking results with some form of distance measure, such as cosine similarity. Postgres' 9.1 pg_trgm module [http://www.postgresql.org/docs/9.1/static/pgtrgm.html] provides an easy way of building an index using a GIN or GIST. Don't remember the algorithmic BigOs of the to of my head, but should be in your desired range.
Tableau has one available: https://www.tableau.com/covid-19-coronavirus-data-resources
Haven't checked for if it's different than what /u/khamisen linked.
Also an individual-case dataset being built up from Liquidata: https://www.dolthub.com/repositories/Liquidata/corona-virus/query/master?q=select%20*%20from%20mortality_rate_by_age_sex
Basically getting these as they come through from Jeremy Howard's Twitter feed.
Oh jeez I might use that references section as a joke example to teach people how to not write a paper.
> Despite not having anything practical to show for, it makes grand statements like that a superintelligent machine will be possible within the next year (huge bet on future work).
Even bigger red flag.
But look, popscience blogs already wrote exciting articles so I guess it worked.
On the one hand, this is cool. On the other hand, the license unfortunately includes this PATENTS clause, which people would do well to read before using these libraries.
On a related note, there's a nice discussion on HN about this point going on now.
https://www.udacity.com/wiki/nd009?nocache#!%23nanodegree-projects
Each project has a "This project is supported by material found in X" section. Each project is based on the skills learned from the free course.
You can start here : https://www.scaleway.com/docs/create-gpu-instance/
I recommend you choose an "ML" OS image. Use the 9.2 or 10.1 depending on the cuda version you need (9.2 for TF, for example).
You need to add a public ssh key so that you can ssh into your instance once it's created.
The OS comes with conda pre-installed (if you've never used conda, I recommend it ;), basically, it lets you manage different python environments).
Once you've started the instance, you just need to do a 'conda init'
And then if you need a keras environment, for example : 'conda create --name keras-env keras-gpu'
And you're set.
You can start your jupyter notebook on your brand new remote server, and then you can use ssh tunneling to connect to it from your local machine (see https://www.blopig.com/blog/2018/03/running-jupyter-notebook-on-a-remote-server-via-ssh/ for an example).
-> I'll probably add this to the list of articles in the documentation
I am no Machine Learning guy, but here are few things i personally use.
ArcPy, which comes along ArcGIS. Which has good online docs and examples.
As an alternative to ArcPy, you could use Qgis http://www.qgis.org/en/docs/pyqgis_developer_cookbook/index.html
As you also mention that you have a huge data set, i guess you must be using some DB to store that. In that case you might want to look at PostGIS(http://postgis.net/) which also provides excellent GIS functionality (Which i used for a project last semester, apart from installation pains, it was good experience )
Do I have to use the "dog face generator" settings or could I use the settings that build these lower level feature extractors too?
Echo state networks are an alternative. They are really cheap computationally (single pass), and don't have vanishing gradients. It seems to be the most biologically plausible method as well, if that is something you care about. http://www.scholarpedia.org/article/Echo_state_network
Maybe give use some hint, what you are trying to do/achieve.
Since you posted in the machine learning reddit I assume you have some interest in using them in the areas of classification. So Echo-state networks [1] / reservoir computing [2] might be of interest.
[1] http://www.scholarpedia.org/article/Echo_state_network [2] http://organic.elis.ugent.be/
Look into the search terms 'homeostatic synaptic plasticity' on pubmed.
This is a decent place to start, and includes references to studies that look at homeostatic plasticity at both the cell autonomous and circuit levels. http://www.scholarpedia.org/article/Homeostatic_Regulation_of_Neuronal_Excitability
MLlib in Spark is getting quite some steam at the moment. Available through Scala, Java and Python interfaces. I hear they are preparing an R interface too.
Here are features: http://spark.apache.org/docs/latest/mllib-guide.html.
Spark being Spark, this is supposed to scale to a cluster effortlessly too.
Two that I know of, that are sort of close to what you're talking about - are there more?
https://www.metamind.io/language/explore
https://algorithmia.com/algorithms
A distributed market has always existed. Some NLP corpuses have always been licensable for commercial purposes, usually at 4 to 5 figure rates and you had to directly contact the owner to negotiate the sale. Likewise with some NLP licenses that aren't licensed for commercial use, could be licensed for a fee.
No centralized marketplace existed for those and I imagine that the sales volume was rather low, but there has always been some kind of market. The general idea is feasible, but I expect that there will be several types of marketplaces that specialize in targeting different kinds of customers. E.g. very general purpose models that sell for cheap to many people, versus the highly-specialized narrow domain models that sell to just a few rich customers.
No, if you'd followed the compiler link you would have seen the words "open source". Google open sourced their CUDA compiler a while after the initial TensorFlow release, and it's now a part of clang. My point was that it wouldn't make sense to go and write a CUDA compiler if they were just calling CuDNN (which is distributed as pre-compiled binaries) for everything.
I use PyTorch at work and Jax/Flax for personal projects and research, with few exceptions. I find Jax/Flax really mathematically intuitive. Still, PyTorch is generally more mature, OOP-intuitive, and easier to onboard thanks to the existing documentation, tutorials, and books (I work with interdisciplinary scientists new to DL frameworks). I'm also a huge fan of Torchscript, even though we don't use it for deployment at my company. And, it's always nice to have an "ecosystem" ready to borrow from that's built on an intuitive interface.
I haven't touched TF or Keras for a few years. I've always found it frustrating at a baseline.
I didn't see this change in the release notes, but found it on the _<u>torch_function</u>_ documentation page:
> One should be careful within _torch_function_ for subclasses to always call super()._torch_function(func, ...) instead of func directly, as was the case before version 1.7.0. Failing to do this may cause func to recurse back into __torch_function_ and therefore cause infinite recursion.
I'm wondering when TensorFlow will get a proper, full C++ API like the one PyTorch has recently gained.
(Also, I can use CMake for PyTorch, but AFAIK am forced to use the monstrosity that is Bazel for TensorFlow...)
> prototyping and training on PyTorch and deployment on Caffe2
That's already possible via ONNX, see https://pytorch.org/tutorials/advanced/super_resolution_with_caffe2.html. AFAIK 1.0 is more about making PyTorch itself deployment-ready by eliminating the hard dependency on Python and making everything natively compilable with C++.