https://play.google.com/store/apps/details?id=com.boxcar2d.nippler
I created this app with a model I trained on a CNN with varying layers of pooling and convolution. The confidence can be adjusted. The colors represent the prediction confidence: 99% magenta, 95%red, 90% orange, 80% yellow, 70% blue, 60% gray, 50% cyan.
Computer vision is not all AI. There is a lot of subjects to study. I started with this book https://www.amazon.com.br/Digital-Image-Processing-Rafael-Gonzalez/dp/013168728X
It cover the basis and introduces all the themes to go on with your studies. Good luck and have fun.
Hi I'm one of the core developers for SimpleCV. It's a cross platform open source vision framework in python.
Basically we are trying to make vision much easier to use in general. There are many tasks and things that you could help incorporate that doesn't even require vision knowledge. Like we are in the process of building a web front end, and if you are familiar with these type of technologies, or even python in general it would be a great asset.
Your thesis could be based on UI and computer vision as they really are changing the land scape and help an open source project in the process. We also want to add image homography and feature tracking to the next release (1.3). We have quick release cycles as well (about every 3 months).
Anyway, give it a look: http://www.simplecv.org
You could get started with this PyTorch tutorial: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
It's a mask region based convolutional neural network implementation and you can use a custom dataset as well.
PyTorch is a bit more low level than Keras but it helps a lot with understanding the mechanisms behind neural nets.
good luck.
The big push this year was deep learning. It seemed to be everywhere.
These papers won awards: http://www.pamitc.org/cvpr15/awards.php
Here's the take of Quora's readers:
https://www.quora.com/What-are-the-most-interesting-CVPR-2015-papers
This is an excellent series of articles by the OpenCV community: http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_feature2d/py_table_of_contents_feature2d/py_table_of_contents_feature2d.html
If you want to learn more about the intuition behind feature tracking algorithms and optical flow, consider checking Udacity's Computer Vision course: https://www.udacity.com/course/introduction-to-computer-vision--ud810
This gent has a fantastic tutorial on making HAAR classifiers using OpenCV's built in machine learning tools. At the bottom of this the tutorial are a bunch of other great resources!
The Best App to use is IP WEBCAM (https://play.google.com/store/apps/details?id=com.pas.webcam&hl=en).
Steps to follow -
Like this -
Python = cv2.VideoCapture("rtsp://ip:port/h264_ulaw.sdp") //make sure you edit ip:port
​
You should first start building the foundations of digital image processing, filtering, convolution, cross-correlation,.. So you can build on the top of it the concepts of ML. Here is a completely free course offered by Udacity
https://www.udacity.com/course/introduction-to-computer-vision--ud810
The course focuses on the intuitions and mathematics of digital image processing, you do not, for the most part, apply high-level library functions like OpenCv, but rather you will use low to mid level algorithms to analyze images and extract structural information. This course helped me a lot to break into CV field. Of course you should have a strong background of CS and mathematics. Good luck.
Few important links that I have collected while studying SLAM : https://www.notion.so/sakshamjindal/SLAM-Links-f214435a23544bac8914c519064745c8
Someone did this, although the setup is a little clunky. Could be miniaturized and productized, but I don't know what the market would be like.
http://hackaday.com/2010/05/14/cat-door-unlocks-via-facial-recognition/
This is your solution, and a simple one at that.
http://hackaday.com/2012/08/13/building-a-better-kinect-with-a-pager-motor/
In effect, get a motor, a power supply at the same voltage, a MOSfet, a potentiometer, and a small piece of wood, and make one yourself. I've done it, and its not hard.
The catch is that there is an ideal frequency you'll have to find for your kinect. That's why you use a pot and a mosfet, so you can tune the speed. Yu can put a resistor on it of the measurement you get once you tune it accordingly.
One option is to run a Python back-end with your algorithm (highly likely a CNN using Tensorflow, Caffe etc.) and then use Flask for HTTP requests to communicate with front-end JavaScript.
Another option is to find a purely JavaScript facial recognition solution but this may not exist
http://cs.stanford.edu/people/karpathy/densecap.pdf From: cs.stanford.edu/people/karpathy/
His qualification is he got hired at Tesla.
For a good template on computer vision click here:
Then click "Open as template" then save as and export as PDF.
Do you know what format the huge image is in? Is it a standard image format? How big is the file?
You could probably use some open source tools like the "stream" command in ImageMagick: http://www.imagemagick.org/Usage/basics/#stream
If you have a huge jpeg, you might also try the "jpegtran" tool (https://linux.die.net/man/1/jpegtran). You probably want something like jpegtran -crop WxH+X+Y input.jpg > output.jpg
Here is some high res data.
Massachusetts USGS 30cm Color Ortho Imagery (2013) - JPEG2000 Format http://academictorrents.com/details/82c64b111b07ff855b8966701a13a25512687521
And with labels:
Mnih Massachusetts Building Dataset http://academictorrents.com/details/630d2c7e265af1d957cbee270f4328c54ccef333
I've installed OpenCV on my machines a few times and I remember it going relatively smoothly. That said I haven't switched to 3.0 yet and am still around 2.4.10 so maybe things have changed.
Checking it now it seems that they do provide an executable link
You should probably use the precompiled LIB files and then just link to them from your project file instead of re-building the whole thing. I remember having some issue with cameras initially but after I dropped enough DLL files into the bin folder things started to work. Can't really remember which though :/
If you are looking for course recommendations - for conventional CV, I have found this 'Intro to CV' course from GeorgiaTech to be very useful.
https://www.udacity.com/course/introduction-to-computer-vision--ud810
This should prep you with all the fundamentals needed to understand vSLAM.
Georgie Tech (Aaron Bobick)'s computer vision course on Udacity.
Bobick covers a lot of material in reasonable depth, and has a great sense of humour!
There has been some existing work on using a third camera to evaluate the disparity map from a stereo pair: https://www.researchgate.net/profile/Reinhard_Klette/publication/220914567_A_Third_Eye_for_Performance_Evaluation_in_Stereo_Sequence_Analysis/links/54a797c40cf256bf8bb6bf84.pdf
This seems like more of a question for caffe users than reddit, but seeing the upvotes I'll answer anyway.
You can have more than one blob as input to (or output from) a given layer. For example, any elementwise operation (the 'Eltwise' layer) will take two or more blobs as input (which may have been generated from different layers) and perform a sum/multiplication/max between them. Likewise, data layers tend to have two top blobs, one for the data and one for the label.
I would recommend the Udacity and Georgia Tech's Introduction to Computer Vision Lectures by Prof Aaron Bobick. Although this is more theory focused it will help you understand basics.
Udacity's Free Course - https://www.udacity.com/course/introduction-to-computer-vision--ud810
Take a look at Udacity: https://www.udacity.com/course/introduction-to-computer-vision--ud810
Their introduction to computer vision is going to cover the basics of old school cv pre deep learning era. It has a couple of quizzes and exercises that really help you learn.
I used this back in 2016 while taking a grad level cv class at U C San Diego. It really helped me learn and refresh relevant content.
How many documents need to be transcribed? How much money do you have available? Mechanical Turk may be an option for you.
I was looking for cursive handwriting recognition libraries myself a while back, and we found that most available options are either extremely expensive, had mediocre results, or both. We eventually decided to go with MTurk instead. We only had a dataset of ±2000 phrases though, set us back about $100.
I have no idea what your background is or what your goals are, so it is impossible to make any recommendations. Maybe if you specified both I could help more.
EDIT: This class looks pretty elementary. Mastery over its topics should be a minimum for further study in computer vision.
I had the chance of working with an adviser who had access to the Kinect prototype (the PrimeSense sensor) and we used a very nice open library OpenNI . It has versions for Windows so definitely check it out. Also, for this particular case, you might actually want to stick to Windows for better GPU driver support. Also, the Kinect SDK has very nice sample code for C# and most of the math is hidden by convenient functions in the SDK. If you speak German you should check out this coursera class: Introduction to Computer Vision
I'm still trying to track them all down (have been for about a week), but I've added some of the missing VOC dataset files to academictorrents.com: http://academictorrents.com/browse.php?search=voc
Note: x-post from /r/datasets
I modified scrcpy to be able to use OpenCV functions, to process frames on android smartphones. In the video, it is using OpenCV's HoughCircles
function to extract the position of a ball in a soccer game. Then, the trajectory of the ball is predicted, and a tap is sent where the ball approximately will be. I'm kind of happy with the result, since it plays better than I can :)
I look at online University course notes and lab material but this is hit and miss.
The core mathematics CV is signal processing theory (for registration (feature extraction)), geometry/linear algebra (for reconstruction) and machine learning/probability (for recognition). A solid understanding in these is all you need.
The android/openCV demo app renders "GoodFeaturesToTrack" in realtime. I'm not sure exactly where the demo APK is, but you should be able to track it down from here. http://opencv.org/platforms/android.html
I'd say take a dive in scikit-learn, if the documents have similar sturctures maybe similar documents have similar, unique contents?
Let's say an info form would have a standard code, you could use ocr for the form and check wether the unique word is in the list of recognised words?
https://ocr.space/ is a free OCR API, I haven't tried it though.
The only cost-efficiënt way to do what google vision does is to write it yourself. It will cost you in processing and in time though. Depending the amount of labeled data you want to use.
Is this a work project, a hobby POC or for school?
I thoroughly enjoyed this Udacity class so if it doesn't overlap too much with what you're taking now, I highly recommend. It doesn't really cover ML based techniques but I think having a foundation in "classical" CV is a good use of time.
You are right, real life images after my processing are less ideal, than image I posted, but still decent quality
I should do something similar to this: https://play.google.com/store/apps/details?id=com.powerfulmedical.cardiology
Thank you! Saved your comment. My bad that I haven't asked such question on the start of the project, where we were deciding approach(working with images or 1D signal). Know I understand that 1D signal was better option, in future would not commit such mistake again
>So you may as well focus on creating a tool for doctors to use
Opps my bad, bad explanation, actually you have written the main idea by yourself
​
> it could be incorporated into an ECG machine. But an app isn’t the right place for this technology
It could definitely, but medical specialists have decided to go approach with phone app and phone images, like person bring their ECG to doctor, it use this app to make I think better conclusions and put right diagnose, it should look like this, I suppose it's the only app on GoogleStore that use true-ML approach and not rule-based, this app medical team have found recently, so already there are one strong competitor
https://play.google.com/store/apps/details?id=com.powerfulmedical.cardiology
If you want to learn more about image processing with FPGA's I recommend this book
https://www.amazon.com/Design-Embedded-Image-Processing-FPGAs-ebook/dp/B005FXVEDY
As for which board to buy, I'd buy something that is a Xilinx Zynq, so it's easier to get camera data into the FPGA fabric, process, and then look at the results.
Funny, I am actually in a similar situation. Though my computer vision masters begins in September. I had also begun with the textbook approach but soon moved to this online course taught by a Georgia tech prof ( https://www.udacity.com/course/introduction-to-computer-vision--ud810).
The material covered is actually very comprehensive and I found that the "geometric computer vision" topics (i.e. camera calibration and projective transformations) was taught particularly well.
I have similar experience to you (very new to computer vision), but here's a few ideas off the top of my head:
Some form of 3D mapping for autonomous vehicles like seen in this video. It uses a cool technique called SLAM. This is probably way beyond either of our scope but if you want a large project to work towards, this is an idea.
An objectively easier idea may be implementing already established techniques such as object recognition and/or tracking (traffic signs or people in a picture/video), nothing is wrong with working on already solved problems imo.
Also consider checking out the AWS DeepRacer scholarship challenge, maybe by participating you can figure something out. I plan to start working on it next month after I complete Andrew Ng coursera DL course myself, but the event is currently ongoing.
Goodluck getting in man! It's kinda sad to see that getting into a tech team requires such a bar of entry but kudos to you for giving it your all.
For the core computer vision and it’s application in machine learning ( not derp learning yet ) this course by Aaron Bobick is my favorite :
https://www.udacity.com/course/introduction-to-computer-vision--ud810
It explains everything from the very beginning and ends up givig you a complete vision of the field. The assignments are various and interresting. Furthermore you can take it for free on udacity !
After that Cs231n at Sanford is really nice for deep learning I think, you will not learn the same thing at all.
Even if Deep Learning is an amazing tool for cv, the more classic tool I learnt in Bobick class actually helped me a lot in pratical applications
Did you try to use a Kallman filter ? You can initialize it using only a cropped image of what you want to track and then you have an adaptative version that change slightly the apparence of the tracking target with time and so should be able to keep tracking the rover when it’s pivoting,
Bonus : you have a good explenation on how they work in this course : https://www.udacity.com/course/introduction-to-computer-vision--ud810
The [google nanodegrees on udacity|https://www.udacity.com/google] cost 900 euros for a term. Compared to university, this is quite cheap.
But there are lots of online courses with certification available for less money at coursera, harvard and others. But google in the name looks impressive on a resume.
I like a lot « Intro to computer vision » by Aaron Bobick, it is clear and you covers a lot of different subjects https://www.udacity.com/course/introduction-to-computer-vision--ud810
Otherwise for opencv « pyimagesearch » is a blog that covers a lot of subject with functions of opencv
Well, there are a lot of resources for C++. You may find the Udacity C++ course for programmers useful since that course is designed for people who are familiar with a programming language and wish to learn C++. C++ Primer Plus is also a good book which, you can use as a reference. It's an old book (C++ 11 standard) but does the job.
Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library covers a very good portion of the OpenCV library. You can also use it as a reference. You mentioned that you have been using Python for your vision works. My suggestion is to start by re-implementing some of your previous less complicated vision works in C++ (once you are familiar with C++). Good Luck.
"I thought image processing could be interesting"
With a physics background, the basic image processing operations/concepts probably will be fairly easy for you to grasp. That being said, you probably should work on some projects related to image processing to first understand and learn what is image processing, and if it is indeed interesting to you.
To know what the job market is like, you can search on indeed.com using image processing related keywords like: "Image processing", c++, cuda, "open cv", "machine learning" + your existing skill set.
See what kind of posting/industry that interest you most, and see what kind of skills (programming language, libraries, particular knowledge) they are seeking. For companies that are looking for candidates with image processing skills, people with more experience in image processing will probably be taken more seriously than those with no experience. But then again, maybe the company have enough knowledge on image processing, and are looking for people with other types of knowledge, such as parallel programming, to help them scale/speed-up their existing processes.
Hey everyone! Thank you for sending over feedbacks and encouragements to my DM, really appreciate it!
Also, we have decided to launch on ProductHunt https://www.producthunt.com/posts/datature-portal ! If this is something you'd like to support, feel free to head on down to our PH post today 🙌
Thanks for your answer. I am not able to connect the things to perceive and understand what we are trying to fit using the polynomial (1 + k1*r^(2) + k2*r^(4) + ...)
From the usual explanation, r is the radius on polar co-ordinate system. (r^(2) = x^(2) + y^(2))
I am trying to link this with the idea of multiplying the aforementioned polynomial with distorted x and y to get undistorted co-ordinates, but I do not seem to understand what we are fitting. Also, I'm trying to understand how varying these parameters affect the graph visually using desmos (with hopes that the graph can help me understand it better)
Thanks for this! This was an excellent starting point. I'm currently reading up on OpenCV and started toying with implementing my own code, inspired by what you described here.
Here's what I have so far:
https://codepen.io/OhadRon/pen/eKxpyv
(Code's a real mess, I know)
https://www.notion.so/Unfreeze-249-diagram-d8dcc29a2ee84d8bb9da97636dd1fd22
Here are the layers unfrozen. If you have the time can you take a look?
If you're doing it to learn computer vision techniques, the recommended tutorial from hwillis is great.
If you just need it done, rather than doing it as a learning exercise, I'd recommend ZBar (http://zbar.sourceforge.net). I used it previously and it does a pretty awesome job of qr detection, provided the code takes up enough of the image. It's rotation (and largely, scale) invariant, handles detection and decoding and is free and open source.
If the qr code is relatively small, it might be worth doing some image processing in advance to make its life easier. I used the gradients of the image to detect qr-type regions to subsequently test with ZBar: the nature of qr codes means they show pretty well as high frequency gradient regions.
Since you're tracking, you could maybe make informed choices of where to look with ZBar based on the motion of the camera/object, either with optical flow directly or using a kalman filtering type approach based on previous directions.
Speaking of tracking; my guess is you'd probably need to implement some logic on top to record where you've seen a given code based on your detections from ZBar. Might also need to do some interpolation as it's likely you'll get some misses due to motion blur, etc.
After using the model some more it seems that it can indeed remove 'holes' from time to time so it is probably just a question of getting more data as you say. My reference is https://www.remove.bg/ which has amazing results and even their model sometimes fail to remove some of these 'holes', though most of the times their result is perfect.
Thanks again !
[probably far too late to be of any help], Hugin runs on Linux distros. On Ubuntu, you can install it with apt-get.
Hugin is a GUI that runs on a ton of command line utilities, so you can also do the stitching from the command line. But. When shooting interiors like this, parallax error is going to be your biggest enemy from achieving a clean stitch. It's why 360x180 pano shooters often use a two-arm panorama head, like a Nodal Ninja when shooting these types of shots.
See if Digikam's face recognition works for you - https://www.digikam.org/news/2019-12-22-7.0.0-beta1_release_announcement/ . Open source, self-hosted. All versions have face recognition, but while the release versions use older less accurate techniques, the latest beta version uses more accurate modern DL techniques.
Take a look at SimpleCV (http://www.simplecv.org).
It wrappers open cv and it much easier. The same above code can be written: cam = Camera() display = Display() while display.isNotDone(): img = cam.getImage() img.save(display)
Hi,
Using correlation is certainly a good first step - however correlation can often degrade if the images are too 'complex'. In these cases it's worthwhile looking at something like the mutual information or the Hilbert-Schmidt Independence criterion (HSIC). These are both pretty good for exactly this task.
http://www.scholarpedia.org/article/Mutual_information
If you'd like more information then feel free to PM me and I can help you out.
The problem is a bit underspecified. Probabilistic indicators of digital alteration do exist; but the more clever 'shoppers, or those using more recent software, can avoid increasingly many of them.
If we freeze the scope at some given set of indicators, like these, it can become a computer vision question--is there an algorithmic way to tell if lines that should be straight are bent? Can a computer tell if shadows don't match the same set of light sources?
Check out this course from udacity: part 2: https://www.udacity.com/course/cs271 and if you're into robotics maybe check this one out too: https://www.udacity.com/course/cs373
There are probably other good ones on similar learning platforms..
Well, things would have to be documented. It's a pipeline of 15 programs, each consuming the output of one or more of others. This takes you from raw jpeg images to an H.265 compressed, erasure coded, neural network indexed file.
That file is then mmap'd and served by a program that reads pre-encoded UDP packets directly out of memory and onto the network. A lot of the complexity comes from desire to make it possible to serve a large number of users at minimal expense. A Melondream server can run comfortably on a $5/month linode (https://www.linode.com/pricing).
Then there's all the Android stuff...
Yeah, we could just dump the source code onto GitHub, but you might have trouble understanding how it all fits together, even though it's completely automated.
Maybe it will just be scavenged (people will just use pieces) so a raw dump would be ok.
The Torchvision library, used with PyTorch, contains pre-trained versions of many popular architectures: https://pytorch.org/vision/stable/models.html
They've all been pre-trained with Imagenet, so you'll have the standard 1000 classes: https://gist.github.com/aaronpolhamus/964a4411c0906315deb9f4a3723aac57
https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveMaxPool2d.html
If you want N stages in your pooling, then you flatten and concatenate N adaptive poolings of the input tensor, the last one being a 1x1xC output (a global pooling), the one before that: 2x2xC, the one before that 4x4xC, and so on... You'd get a vector of fixed size: (1+4+16+...)*C where C is the input tensor's number of channels. This is for max pooling but you can do mean pooling as well.
Thank you for the great explanation.
I think i understand the concept now.
>batch of scores-of-closeness-to-the-bins of dimension Nbins
for each class right? From the figure, I see that they have computed the closeness from the bins for a particular class.
Also I said about notations, I don't get them clearly (i am a beginner in reading research papers). In which direction are we applying a convolution.
Say i have a (16, 100, K) vector where 16 is the batch size and K is the number of classes. I obtain this vector using a pretrained backbone. Can I just apply a 1D convolution on this? (I am referring to this)
The other ones contain the object detection models, on top of which you would be running an opencv tracker (tracker takes the results of the detection model).
https://pytorch.org/docs/stable/torchvision/models.html
This would be a good starting point for object detection models, pick one based on your runtime and accuracy requirements and installation should also be straight forward for both Windows and Linux with python
Detecto is actually built on top of PyTorch! Essentially though, PyTorch has a lower-level API, and there's a lot of code that you'd have to write just to finetune one of their pre-trained models. As an example, here's a tutorial listed on their website on how to do so.
With Detecto, the goal is to handle the difficult parts so end users don't have to; for example, Detecto comes with a Dataset class so you don't have to write it yourself, and you can train a model by calling the fit method that handles updating the parameters.
The end result is the following:
xml_to_csv('xml_labels/', 'labels.csv')
dataset = Dataset('labels.csv', 'images/')
loader = DataLoader(dataset)
model = Model(['dog', 'cat', 'rabbit'])
model.fit(loader)
detect_video(model, 'input_video.mp4', 'output_video.avi')
Possible? Yes
Necessary: Generally No
The standard way to increase throughput is just increase Batch size, and increase computational resources accordingly.
If that isn't an option then first make sure that the GPU computation is actually your bottleneck, and not loading your data.
If you have optimized the other parts as much as you can then you can maybe think about optimizing the Pytorch model. Really the only way to improve this is by implementing your model using custom C++ code.
See https://pytorch.org/tutorials/advanced/cpp_extension.html
But I stress that this should not generally done, unless you are doing something very custom or trying to optimize the last 5-10%.
It actually converged in around 15 minutes and accuracy was well above 90%. My bboxes were big because I was not trying to detect scratches or dents (segmentation is not a great approach for car damage recognition although it looks good at first). Have a look at https://pytorch.org/blog/torchvision03/
There might be other options, but Multiple View Geometry in Computer Vision was a classic when I was a student.
There are some problem classes that are much more efficiently solved by classical (as in: without neural nets) CV. Even neural nets use linear algebra a lot, and although it's not exactly the same as for 2D/3D transformations you might want to get acquainted with the math behind all that.
Are you familiar with where finite differences in general come from?
https://en.m.wikipedia.org/wiki/Finite_difference
Studying more numerical methods might give you a better grounding for CV. I like this book mostly as a reference but can’t remember how well it treats this topic:
https://www.amazon.com/Numerical-Algorithms-Computer-Learning-Graphics/dp/1482251884
It well at the least point you in the right direction of what you need to know even if other books can be more pedagogic.
For CPU based trackers what's in this Android demo app I wrote a few years ago is still not that far from the SOTA. Was a bit surprised by that when the topic came up a few months ago. I'm not sure how much better GPU based trackers are. Because of how good CNN's have become I think most development has switched over to tracking by detection. Those are not really single object trackers nor can they handle unknown object types.
I did the similar thing before using 6 realsense camera. Most of the problems came from how to keep all of the camera stables. Here're some personal experience.
>
>
>I tried to connect 4 USB cameras through an externally powered USB hub and run the script on two different computers, Mac Book Pro and Ubuntu machine with core i7 processor and 32 GB RAM. In the case of Mac Book Pro, I could capture the images from at most 2 cameras, whereas in the case of Ubuntu Desktop, I could capture only images from a single camera. As this issue is still present, I am wondering if the bottleneck is in the USB communication or multiple threads for video capture might overload the processor (I doubt the latter is the case).
Maybe this is because of your program. Did you try to run your program separately? I think it will work fine.
Cool. There are standards for different environments, and I believe you need an IP67 rated camera.
Quick search didn't find much that fit your Arduino requirement (what's that gonna do for you?).
However searching for a camera enclosure pointed me at this...
Large Weatherproof Enclosure With Clear Top https://www.amazon.com/dp/B00U0S0VM4/ref=cm_sw_r_sms_apa_i_BotSDbJPAB38Z
Does that help?
I've been thinking about making a general TCG app.
The problem is solvable, since an app already exists (https://play.google.com/store/apps/details?id=delverslab.delverlens&hl=en), but you could try your own approach.
Furthermore, you can extend this to other card games (Two popular ones are probably Yu-Gi-Oh and Pokemon, but I don't know if databases are as good as for MTG).
Another thing you could try is developing a plugin for Overwolf, that parses some game data that is not otherwise accessible for easier management.
Anyhow, these are just some ideas where you can get databases realtively easily, so it makes the process more approachable :)
How well would a product like the ELP-960P2CAM-V90-VC work? Would it only work at very short distances or not work terribly well in any capacity?
I don't have any particular advice for you on what or how to learn, just one important remark:
You will probably come across OpenCV at some point, in the context of computer vision. Do not think for even a second that OpenCV represents good, idiomatic C++; it's not. In fact, it's some of the worst, horribly mis-architected, unidiomatic C++ you can come across. Try to avoid it, if possible, for any kind of learning purposes. Even *using* the API while learning will get you into the land of inadvertently adopting design mistakes.
For getting a quick and concise overview of C++, try this book: https://www.amazon.com/Tour-2nd-Depth-Bjarne-Stroustrup/dp/0134997832 (Stroustrup, A Tour of C++, 2nd edition(!)).
​
Nice Demo! Also based on photo recognition try this app: https://play.google.com/store/apps/details?id=com.dyve.countthings - works on iphone & android also on desktop platform and recognizes and counts items in many areas!
Oh hell, when did those get so cheap? I thought they were still like $150... but it looks like you need a USB adaptor too, so I think it's still out of OP's budget.
In addition to the depth camera (which is actually quite good!) you also get free pose estimation and human recognition in the SDK, IIRC. That can be quite useful. It needs an extension cord and doesn't work well outside though.
Well, the point of stereo cameras is to have overlapping fields of view, so you can track points from multiple viewpoints at the same time (which lets you estimate their depth).
But certainly there are inside-out tracking devices that use a pair of fisheye cameras (e.g. https://www.amazon.com/Daydream-Standalone-Worldsense-Tracking-Ultra-Crisp-Designed/dp/B0793R2Q23/ref=sr_1_1?ie=UTF8&qid=1525242245&sr=8-1&keywords=mirage+solo)
Another thought I'm having is would it be possible to use a reference library for comparing and counting? example: I have a library of a few hundred edge shots of glass and it compares likeness and creates a count for each likeness in an image or video? the camera I plan to use is this one that can shoot 720p at 180fps. This should reduce motion blur and I would only use every 7th frame to be about 24fps extraction. Accuracy is the most important thing. But the whole time frame for processing and counting shouldn't exceed 45secs. Best would be under 15sec. Is this wishful thinking?
Also you recommend buying the kinect2 for windows, right?
Not the Kinect2 for xbox one..
Doing it optically is a fun project, but worth pointing out that the various rotations are so well known and studied that once you calibrate for position, a purely mechanical positioning system is more than adequate, and computer control has been a solved problem for decades. You might want to read Trueblood and Genet to get a feel for the control and automation side, and start with a mathematical/physical solution - you can add vision later.
Edit: specifically https://www.amazon.com/Microcomputer-Control-Telescopes-Mark-Trueblood/dp/0943396050 - the hardware is much easier these days, but the control theory and algorithms are the same.
The PS Camera is supposed to be the successor of the PS Eye for PS4. Do you know if this new camera also good for computer vision? (in terms of community support/drivers)
I am definitely not a pro when it comes to CV, but I feel that at some point it will be best if you are familiar with more advanced maths. I think familiarity with linear algebra will help the most, but you'll definitely see ideas or algorithms involving calculus.
When it comes to books, I've seen Concise Computer Vision recommended here 'n there. I'm about to purchase this book myself actually.
Here's a similar question that someone asked over on Stack Overflow: http://stackoverflow.com/questions/824679/how-can-i-learn-the-math-necessary-for-working-with-computer-vision
I'm building an app for real time exercise analysis (Play Store). I'm using a detector + shape predictor to get the body joints position, I have to train a model for each exercise. Let me know if you have questions!
For an explanation of the approach, see the project page here: http://www.kylebrocklehurst.com/bananagrams-verifier
And you can download it as an android app here: https://play.google.com/store/apps/details?id=com.kylebrocklehurst.kyle.bananagramverifier