Your phone camera images (and most camera images) will have a slight spherical distortion. That means the perspective lines of your image-based 3d world also have spherical distortion; while your Blender-based world has perfect and straight perspective lines. If you undistort your frames they will be closer to that ideal 3d space of Blender.
Many manufacturers publish their distortion parameters and Photoshop or Lightroom or other photo packages will recognize them and offer easy, automated ways to remove this camera lens distortion. If not, you can also do it yourself by learning how to use OpenCV to calibrate your camera.
Have a look at OpenCV, which can easily be installed on the Pi, and has a great many tutorials for you to play with.
The OpenCV library provides a bunch of prebuilt computer vision algorithms for these types of applications. It would not be hard to set up a simple proof of concept, but getting the accuracy up and the false positive rate down would take quite a bit of work.
I would love to!
I use the open source Python computer vision library "OpenCV". OpenCV has support for a lot of the major languages. (C++, C, Python and Java interfaces) plus the documentation is okay for the beginner stuff and it has the most market share so there are a lot of resources dedicated to it.
The eye detection is just a standard haar cascade (the detection algorithm) data set for OpenCV mixed with face detection which is also a standard set. The XML is just basically a set of weights that set what most people's eyes look like. (How big are they, shading, distance apart, shape, etc).
Hope this helped.
Computer Vision is much, much more than simply "Multiple camera angles". Look into OpenCV for more information on the topic if you want, or watch any video on the tech behind the new assisted driving technologies hitting the market.
Most have a radar and/or some kind of lidar in addition to the camera system the crew uses. They're all coupled together to make it as easy as possible on the human that has to do the target selection. These days machine learning is not out of the realm of possibility for a completely graphical solution, but I bet this thing has a pretty sweet radar array that does the actual aiming of the turret. The human just has to point and click and maybe adjust a little for crosswind.
Instead of an arduino you could try controlling it with a raspberrypi and opencv, and train it to recognise just the one cat and not the other animals.
Edit, as an aside, I've seriously thought about doing this to keep my cat off the kitchen table...
This gent has a fantastic tutorial on making HAAR classifiers using OpenCV's built in machine learning tools. At the bottom of this the tutorial are a bunch of other great resources!
most likely you'd be using something like http://opencv.org/
in which case, you can use c/c++ or python. if ultra performance is something you're interested in, c/c++ might be the better option.
... but what most people fail to understand is that you need to know a lot about c/c++ before you can even think about expecting your c/c++ app to perform noticeably better than something written in c#, java or python.
the extra development time is generally not worth it.
I'll admit I don't really know the answer to this, but it's an interesting questions, so I'm going to go ahead and speculate.
I think that existing "porn detectors" basically just go by looking at the percentage of pixels in the image which are flesh-tones, and flagging them for review. Less naive ones actually check that the flesh pixels are contiguous.
To do what you are talking about means getting into computer vision. You probably want to look into the OpenCV library, specifically the object classification. You'd need a large quantity of training images (that should be some fun research) - remember to use separate training sets for breasts, female genitalia, and male genitalia.
The actual pixellation can be done just by resizing the relevant section of the image by a factor of say, 20%, and then blowing it back up to full-size (500%).
Hey, author of the post here.
I used Scrapy And Python to scrape the Time magazine covers -- both the code and raw scraped images are available on that post.
From there, I took the data and processed it using Python + OpenCV. The actual analysis was very straightforward including:
From there, each averaged image was reshaped into a N x M x 3 image.
Feel free to ask me if you have any specific questions!
The answer depends very much on what kinds of images you have and how flexible you want your program to be.
If you have just one image format, for example PNG, you could just use a simple library to read and write the image and work on an array/vector (for example LodePNG).
The alternative would be to use a full-flexed image processing library (OpenCV), which can read almost any image format and supports many operations out of the box. But it also requires you to invest more time to use it.
I've taken a few upper division AI classes at the end of my BS in Computer Science. The first was a general course covering the basics of different types of AI. the second was a grad level Genetic Algorithms course. Both were great. AI is awesome, difficult and profoundly rewarding. The best feeling i've ever had writing software was when I implemented the A* algorithm on a connect 4 game and it proceeded to beat me and all my friends. Writing a program that can outthink the programmer is really cool!
I have that Norvig book sitting on my shelf right in front of me. It's what we used for the AI class I took. It's a marvelous book.
There are options for you right now too though. So don't be discouraged by not being able to grasp every nut and bolt.
You can use tools like http://opencv.org/
http://azure.microsoft.com/en-us/services/machine-learning/api/
https://cloud.google.com/prediction/
To build meaningful things with AI without having to learn the nitty gritty stuff just yet.
I've installed OpenCV on my machines a few times and I remember it going relatively smoothly. That said I haven't switched to 3.0 yet and am still around 2.4.10 so maybe things have changed.
Checking it now it seems that they do provide an executable link
You should probably use the precompiled LIB files and then just link to them from your project file instead of re-building the whole thing. I remember having some issue with cameras initially but after I dropped enough DLL files into the bin folder things started to work. Can't really remember which though :/
I did my final year project on something similar. I was looking at tracking an object (pool ball) across a video using the RPi using OpenCV. I was using the original Model B and I had very little experience programming but I found that the Pi simply couldn't pull the video fast enough. I was also using a USB camera since the dedicated camera hadn't been released yet.
It is definitely possible but I doubt you can do it with the Raspberry Pi. My advice for the best place to start is the OpenCV documentation. I think the main thing you want to work out is exactly what you want to track and how to get the computer to recognise that object only. I assume you would want to look at the ball and somehow stop the computer from recognising people's heads as the ball.
Give it a go. If you already have a Raspberry Pi with camera module then hook it all up and see what happens. OpenCV is free so you have nothing to lose by giving it a go.
Nice stuff, in Python too!
I'm not used to working in Python, but this is definitely the sort of simple, fun project I'd enjoy contributing to.
If you want to attract contributors, I suggest you give a few pointers in your README to set up the development environment: what to install, configuration files, database setup, etc.
Have you wondered yet about ways to enter pokémons in the LivingDex, other than manually toggling them one by one? Image recognition could be neat. User sends a pic of their box, LivingDex is updated accordingly. Open source libraries could do the hard work, like:
Keep up the good work :)
Hi TheGuy191919. If you don't already know about it, doing some simple Python programming with <strong>OpenCV</strong> can allow you to do some great stuff with a camera-equipped robot. Good luck to your team!
openCV is a pretty decent image processing library. It has pretty easy to use interfaces for both c++ and Python. I think there is also a java interface, but I've never used it.
If you want something quick, this is probably where you want to go. You can detect objects from an image with just a couple dozen lines of code.
I like this.
You want Cyberpunk? Buy yourself and Arduino or a Raspberry and make a robot.
Find a repetitive task and figure out how to automate that. Build and program and automatic feeder for your dog / cat / fish. Automate your window blinds so that they close when the sun comes up. Teach your doormat to tweet whenever you step on it.
Unless you have other motives for learning C++. If that's the case find a project you really want to do that will require C++. Use the C++ based OpenCV library and teach your webcam how to recognize faces and put a sweet looking cyberpunk interface over that.
For that http://opencv.org/platforms/android/
For libraries and tools in general, experience helps. Don't try to mesh together two things that obviously don't go together, and integration is a major pain. Decide what libraries to use and what ones not to use... if it's invented by one guy in his basement, might not be a good idea to depend on UNLESS there's millions of downloads. Prefer things with lots of documentation and a very active community, and of course pay attention to licenses (MIT, BSD the best for commercial works... can't use some copy-left licenses unless your stuff is open source). For example OpenCV appears to be a good candidate since someone has written a Java wrapper... look for things with SDKs and wrappers and don't write the integration layer yourself.
That's about it... you are running into the main curse (or blessing) of programming that is too many frameworks libraries and platforms to choose from or too much choice. What you can do is pick a "galaxy" like .NET or iOS or JavaScript or C++, then learn as much as possible to the max in that galaxy and either stay away from the rest or dabble in the rest. Choose carefully because time is your most precious asset.
For an alternative field...
Go check out OpenCV. It has more to do with image processing, but utilizes machine learning techniques. Tons of tutorials and it's a good foundation (imo) for anyone who's interested in getting into AI/Machine learning.
Personal experience with it : I used this software back in college to help build an image recognition application that could identify house addresses from bing street-car images. I personally like image recognition based AI as I feel there are ~~more~~ better data caches to train on and it's a bit more intuitive.
At my previous job I built a tool that used image recognition to find where certain elements were on webpages so I had a little experience from that. All of it is done using http://opencv.org/ so that's what I'd recommend if you're interested in playing around with it :)
I'm using the icons in the bottom right currently as they don't change with skins, although they are partially transparent which makes things a little harder!
If you're interested in image processing definitely check out http://opencv.org/
They have some cool examples that are pretty fun to play around with!
> using only induction
You probably could, but you'd have a lot of headaches in EMI with your huge antennas that you're building. It's a lot easier to use a sensor with much better documentation and established methods for gesture recognition.
You can give OpenCV a shot. It has a lot of unit tests and wanting more, so that would be a good place to start. It also has bindings into other languages, so if you find your interests shifting, you can follow there.
Assuming you are using only the "stardard" C++ opencv you can try OCL: http://opencv.org/platforms/opencl.html
or, if possible, implement some parallelization using: http://www.cplusplus.com/reference/thread/thread/
OpenCV is the best way to do this. I believe there are a few HAAR cascades for facial recognition and quite a bit of tutorials online. It's in C++ and I don't know if there are any good wrappers for swift, but you could make your own.
I'd say don't reinvent the wheel. Start here http://opencv.org/ is pretty solid, maybe not strong enough for commercial solutions, but used in industry for rapid prototyping of machine vision products an used extensively in higher education.
I look at online University course notes and lab material but this is hit and miss.
The core mathematics CV is signal processing theory (for registration (feature extraction)), geometry/linear algebra (for reconstruction) and machine learning/probability (for recognition). A solid understanding in these is all you need.
I'm not sure for windows, but you can start by downloading an installing Python 2 (not 3), (https://www.python.org/downloads/), and opencv (http://opencv.org/).
Then you can edit the script with the correct path for the video file, save, and double click should run it.
You can use OpenCV. But this might be totally unnecessary. Does the program you're trying to scrape have keyboard shortcuts? You could try to use those to navigate. Or if the buttons are in the same relative position in the window 100% of the time you can calculate it's location. Getting the location of the program window is pretty easy.
The next best thing would be monitoring the program's memory for the button draw call. Frankly, anything would be easier than using image recognition.
This is pretty far out from mechanical engineering. Computer vision by itself can be an extremely intensive field. However depending on the 'simplicity' of the project this can be very doable.
I would highly recommend using the Pi first of all. You wouldn't really use an mcu for this and I don't think the Arduino is capable of doing these calculations (maybe the other family of arduino can im not sure though).
Also I recommend the Pi because there is already opensource code created by an awesome community called opencv
They do stuff in python which is also a huge advantage as python is a relatively easy language to get into (programming choice for beginners in many colleges like Princeton).
From here you can start off on a good footing.
edit:
Well pretty much derper-man has stated what I just said but more. Definitely follow his advice
After skimming the article:
Neat. I see you used the Perceptual Image Diff. OpenCV is another option for parsing data from video or single images.
Also, small spelling error on the "problems" page. "You’ll get a lot mail and it’s quiet possible that..." - should be "quite".
Its not strictly embedded, but still C: I recommend you to download and just study/analyze the OpenCV code base. It is good C code with lots of examples that you can immediately try. You can learn a lot by understanding the code.
If you want to impress your colleagues with your C knowledge however, then study this.
Seems to be a machine vision camera at the top and the capture camera to the left (just a cheap camcorder). Machine vision camera feeds into computer vision software (something likely built off of openCV) where the data is processed ~thousand times a second with instruction sent to the mirrors. The two cameras is clever and means they don't have to rewrite for each different camera sensor broadcasters may use and can get high speed data straight off the sensor.
Facial recognition (and image recognition in general) is an extremely difficult task. I would recommend looking for a library to do it. OpenCV has facial recognition capabilities but I don't know how good they are since I haven't tried it; you may want to look and see what else there is.
I've never worked with it myself, but I believe that openCV does object tracking. Check it out - http://opencv.org/
You can also find youtube videos demoing some of it's capabilities. Looks like it's pretty cool.
Sounds like Computer Vision would be a good area for you to look into. The crux of self driving machines is being able to priocess / understand the environment. OpenCV is a great resource for that.
Check this out. I used this in an eye tracking program my senior year of college. We used it to analyze images to find roughly where the pupil of the subject's eye was looking. I wasn't the main coder of the project, but this could set you on the right path.
Take a look at OpenCV
Python tutorials: http://docs.opencv.org/trunk/doc/py_tutorials/py_tutorials.html
Specifically, to read a video from a file: http://docs.opencv.org/trunk/doc/py_tutorials/py_gui/py_video_display/py_video_display.html#playing-video-from-file
I'll let you explore the rest.
http://docs.opencv.org/trunk/modules/contrib/doc/facerec/
http://stackoverflow.com/questions/7368709/c-sharp-detect-face-and-crop-image
If your customers want a ~~fake~~ novelty ID, I think they'll be willing to take a decent photo. Ask them to stand in front of a sold white or grey background. Everybody has a white wall, a white sheet or blu-tac/tape and white paper. Without that, I think you'd need to create some seriously good code to deal with it.
Are you asking about interpreting images? That's computer vision. If you are using a raspberry pi, or beaglebone black for your robot computer, then you can use Python with either opencv or simplecv to "simply" extract useful information from an image or video to then work with (e.g. dominant color, shapes, lines, etc.)
Have a look at OpenCV it makes motion tracking with images a breeze. Just place a high-res camera overhead and use some of the example code with the OpenCV project to build a head tracker (go for the nose ;)).
Thanks! I'm familiar with multivariate differentiation/integration, matrix algebra and stuff like that. I've worked with the OpenCV library in the past so I tried to familiarize myself with all the math behind the functions they have there and all the papers that they have listed. I think I only have a rudimentary understanding of everything involved though so I doubt I'm nowhere near a graduate level of understanding.
as a quick research something like opensift and opencv would be adequate for our purposes, but with the serious problem of using lots of processing power. Using things like the a database of the previous images we could cut work, because spam has the characteristic of being reposted many times. We could also use analysis of the comments and the hailcorporate votings to complement the data. Maybe it's possible?
Check out OpenCV it's an image processing toolkit which will likely be the basis of ANYTHING you do in this field, so I siggest you master this before moving into algorithms
Also, this may be a good place to ask as well http://dsp.stackexchange.com/
This sounds like classification problem! But I imagine both the amount of variations and number of classifications would make it a difficult problem to solve. Counting may be an easy task though. You may want to check out OpenCV.
For hacking and hardware projects you can do arduino or rasbary pi projects. Controlling your home / garden lights by internet/ smartphone etc.. Image processing - SimpleCV http://simplecv.org/ , OpenCV http://opencv.org/ For OS : try to remaster a linux distribution or install gentoo And for anything else there are plenty of frameworks
Sure you can, Jaden Smith, but it's not a beginner's project.
Libraries like OpenCV can allow you to perform some degree of computer vision, including gesture recognition. You could then execute some other command based on these gestures.
You'll need to learn how to program, first, though.
Extra credit - use the wireless devices you have + the basic stamp to actually drive the vehicle wirelessly from the rasppi with a vision system.
use http://opencv.org/ - I don't think it would be that hard to create something useful.
Computer vision is a different, yet related problem to image processing -- I conjecture your project is more likely inside computer vision.
Have you considered OpenCV?
If you must use ImageJ -- would you be allowed to write a plugin instead? I find that easier than including the jar, personally.
If including the jar is needed because it's to run in command line in a nice long posix pipe, then I think I have a general idea of what you're getting at.
Good luck!
This is definitely possible!
Apps like Google Goggles are far more advanced in what they can detect.
I imagine a good place to start would be OpenCV for Android.
The android/openCV demo app renders "GoodFeaturesToTrack" in realtime. I'm not sure exactly where the demo APK is, but you should be able to track it down from here. http://opencv.org/platforms/android.html
These changes for the version 3 are close to the same as this v2.4 cheat sheet. Hopefully these docs are somewhat useful.
OpenCV and Pillow are two good libraries for image manipulation, they will both let you do thumbnails, conversions, filtering etc (+ a lot of advanced stuff that is not really applicable here). Basically, you should be able to automate the whole workflow pretty easy, except, as the other user stated, for the image tracing part, which is currently hard comp-sci and not even done very well by state of the art technology.
In computing, there is a pretty hard line between an image represented as a bitmap/raster and one represented as a vector. You can not easily go from the former to the latter.
I realise that this is a bummer, as this is probably the most time-consuming part of the process. A trick to cheat your way around this might be to apply a series of filters that give similar or otherwise useful results, that's definitely doable with the above libraries.
>I was thinking about making a program that sees an image, analyzes it, notices colors and rudimentary shape changes through time, and that's it.
>also this might be off topic for the sub, but does the lack of an institute-based degree (university+) completely negates you from a good coding job in a company?
No, but having one is a really big advantage for finding a job. A lot of companies wont consider you without one unless you have a very good portfolio.
The easiest way is to use something like OpenCV. It's a library for Python and C++ that provides a bunch of functions for manipulating images. You can load images as a bitmap (a chunk of binary data) and do whatever you want to it, or create a new image and load whatever data you want into it and write it back out to disk. It's quite easy to use.
>Could some sort of a priori image processing help? Is there another solution I haven't thought of? Maybe there's a way of telling it that I just want digits, even specify sort of what the font looks like?
These are all tricks in computer vision and machine learning. If I were you and wanted to get something working quickly, I would just use Google's: https://cloud.google.com/vision/
For a deep dive on your own implementation, get cracking: http://opencv.org/
http://opencv.org/ computer vision. Basically, if done right you wouldn't have to rely on EXIF being intact. In theory at least.
http://docs.opencv.org/master/d7/da8/tutorial_table_of_content_imgproc.html You'd be looking at this module. Not sure what your backend is yet as I didn't poke around too hard but i'd bet there is bindings for it.
A popular machine vision library (type of software) is called OpenCV. You can go there for technical explanations. There are many ways to identify objects. A basic example is to take everything that is a similar color and treat it as an object. A red ball can be found by looking for red against a non-red background. You can also compare different frames to see what is moving, and then identify the clump of pixels that is changing as an object.
RPi probably has some touch screen support on the website; there's an official touch screen they've released. RPi cam should also have good support, but if you're using a USB cam I'd recommend using OpenCV. As for audio... idk. OpenCV may have something. If not, it's probably a google search away.
I've used OpenCV before btw; it works with other libraries exceedingly well. At least in C++.
One of the options to learn C++ in practice having fun in robotics is to make some applications using OpenCV (http://opencv.org/, https://github.com/opencv/opencv), which is a C++ computer vision library with a comprehensive tutorials and books: http://opencv.org/books.html
Otherwise you can take some course on coursera or another MOOC and afterwards take part in some open source project.
The complicated part is definitely the image processing side of things. I would check out http://opencv.org/ if you're interested in image processing. It's the library I used to do most of the heavy lifting :)
Thanks. I had seen the mapping link already.
All these things are made with the OpenCV library which is available for OF, Cinder, even Unity. It's pretty deep but it's the standard tool for pattern recognition.
OK, if you are seriously learning to code then that's a good first step. The most common AR library is OpenCV and is open source. There are versions for most major platforms, developing for Android is more complex than with iOS. There is also Vuforia which is based on OpenCV in many ways. It's a paid system so it obviously has better support.
Otherwise it's much like developing any other app. You need to access cameras, video and probably OpenGL ES if you intend to do rendering in your AR. You might also need models and artwork depending on what the idea is. That's a very brief overview, just learning to code takes quite a while.
Generally speaking, image recognition is a difficult problem in computer science; it's the sort of thing that ends up in a lot of conference papers and university theses.
The most common recommendation I've seen for doing this relatively easily in Python is to use OpenCV. However, it's worth stepping back and determining what your requirements are, and then we can discuss what the best way to program a solution would be.
What is this image? Is it popping up in different places on the screen? Is it a part of some other program? What is the actual task that you're trying to automate?
Hi AMD, thanks in advance for this AMA. We appreciate it!
My question is GPGPU related. Current academic research is almost exclusively using CUDA for GPGPU purposes. CUDA has been implemented in for instance OpenCV and Caffe for computer vision, machine learning and training neural networks. The main advantage of CUDA is the ease use and maturaty of the platform, which stimulates development. I am secretly hoping that AMD will develop an open source alternative. Is there, or will there be a CUDA like GPGPU platform from AMD?
The perf/$ of the RX 480 is very good. These graphics cards could stimulate GPGPU development if a good alternative to CUDA is provided.
You can already do this with cameras and OpenCV. You can have all sorts of options such as gait detection, facial recognition, hand gestures, etc.
Then again you could also just wear an RFID and solve this fairly easily.
For image recognition I'd look into OpenCV, it has a lot of information and possibly there is already something done to recognize colors.
A "simpler" way would be to use continuity test, by crimping the first side in any order and checking for continuity in the other side.
The downside is that it wouldn't respect any of the standards.
If you are interested in robotics, C++ is really the way to go. It runs closer to the metal, but more importantly gives you access to cool libraries like OpenCV.
That being said, it's pretty hard to learn and has more pitfalls than Java, so good luck!
Hi, I just noticed that the latest opencv binaries released provides a python 2.7 extension built with Visual studio 2013.
I use my anaconda python 2.7 (built with VS2008) and the VS2013 built extension works fine.
Could we revisit this discussion. Are opencv making mistake or can we mix compiler versions for extension?
To test: Can download windows binaries from http://opencv.org/downloads.html (You can find the opencv extension within opencv\build\python\2.7\x64\cv2.pyd ) I wanted to look at the visual studio version , I did : strings cv2.pyd | busybox grep -i msv
The following is the output : MSVC: 1800 Linker flags (Release): /machine:x64 /NODEFAULTLIB:atlthunk.lib /NODEFAULTLIB:msvcrt.lib /NODEFAULTLIB:msvcrtd.lib /INCREMENTAL:NO /debug /NODEFAULTLIB:libcmtd.lib Linker flags (Debug): /machine:x64 /NODEFAULTLIB:atlthunk.lib /NODEFAULTLIB:msvcrt.lib /NODEFAULTLIB:msvcrtd.lib /debug /INCREMENTAL /NODEFAULTLIB:libcmt.lib C`DQEOGMJMKNKQIWIYK[ IMJNJQHWHYIZK[M[OZQXRV TMRVRYSZU[W[YZ[X\V]R]M\M]O UMSVSYU[ KXNMNV SMNV SMSV XMSV MWSMSV SMNSVS MSVFW32.dll
Yes, it suggests he must be using some sort of libraries, something like http://opencv.org/ or perhaps training his own neural network but on top of abstraction libraries like https://www.tensorflow.org/
I've worked on a research project that used OpenCV (computer vision) that tried to summarise sports videos to just the "most interesting" bits. Most interesting for many sports was when the score changed or the crowd noise grew. It was a fun project. I don't think it would be hard using the OCR part of OpenCV to designate the life area, the deck type/name and a pipeline to feed the various video content in it and you could effectively "watch" hundreds to thousands of games a day. Sample description of the solution for politics: https://waldo.jaquith.org/blog/2011/02/ocr-video/ OpenCV project: http://opencv.org/
Depends on what operating system. To run it, you'll need opencv (apt-get install libopencv-dev python-opencv
, http://opencv.org/), and python2 (installed by default on ubuntu, https://www.python.org/).
Then you'll edit the script to put in the correct video file path, and output image path, and output image size that you want.
To run it is a simple python script.py
It's technically possible this year, there are Android libraries to do it:
Not my usual problem domain so I don't have input on this directly however it does sound like you are doing some image/video work. Have you checked out Open CV? it is specific to this domain. http://opencv.org/
Well when I was working on that I was using c# and .net library to take the screenshot of my desktop and then I would use http://opencv.org/ to check what cards I have for example. But poker is a pretty huge project to make a bot for really.. their security is really tough to get around for someone doing it casually, you need to invest in some expensive hardware.. cant remember the names but their purpose was to make your bot undetected/stealth.. and then to write the AI for the bot as well..i just dont have the time/funding etc to take on such a project.
I just want to make bots for games for fun as a hobby :)
I've not tried it myself, but OpenCV sounds like what you're looking for: http://opencv.org/
There's an iOS version in the Downloads page: http://opencv.org/downloads.html
cd ~/<my_working _directory> git clone https://github.com/Itseez/opencv.git
OpenCV 3 is still in RC phase. It was just recently pushed out of beta only a month or so ago. Until the OpenCV 3.0 is officially announced and released, I recommend that people using OpenCV 2.4.X, hence the code in the post assumes OpenCV 2.4.X. If you compile and install OpenCV >= 2.4.9, then the code above will run fine without a hitch.
See the notes on the RC release:
>The release main goal was to finalise the API, simplify migration from OpenCV 2.4 and stabilize the code. We are still in the bug fixing mode, so the upcoming 3.0 should be even more stable.
Yes, we do pretty well with two eyes but it's difficult to replicate the human brain. A lot of progress has been made in Computer Vision: projects like OpenCV and Mircrosoft's research. But it's still difficult to achieve accuracy when it comes to identifying objects and their properties. It is as you say, a novel goal.
How can you teach a baby to identify a sedan? They've got this general shape, four wheels, four doors, made of metal, windows, usually found on the road, etc. With CV, we're limited to the general shape. You can attempt to identify wheels and windows but that takes extra processing time. We can use the taillights' color and license plate to our advantage, but we still need to figure out the car's velocity (speed and direction). Depth perception is something we don't even think about, we just have it. I don't know how to calculate depth from stereo image but even the Xbox Kinect uses a laser for depth sensing.
That's why LIDAR and ultrasound are the technologies being used because they give instant information about depth (and therefore velocity) without a calculation overhead. Even with deep learning and neural networks, computers can completely misidentify what they're seeing.
You really don't want to use the camera's built in motion detect for an alarm. It is way too sensitive. You can however do some interesting things with software (I like OpenCV) to trigger an alarm based on pattern recognition.
You might want to take a look at OpenCV (open computer vision). http://opencv.org
In my understanding it is pretty good library for all kind of computer vision problems, including face and object recognition.
Well it really depends what your math background is. Your best bet is to search on google scholar for keywords like 3d photogrammetry or 3d reconstruction plus the word review. Review papers often don't have much information, but cite pretty much any paper you'll need to understand the problem.
As for free and open source software, you might try VisualSFM. If you're a programmer definitely check out OpenCV. I also recommend Meshlab for editing and viewing 3d file formats.
Check out OpenCV and it's C# wrapper EmguCV. I've used EmguCV extensively for image processing, specifically chroma keying and it can be used with either video or static bitmaps. It's very, very powerful.
I've used ImageMagick quite a lot too but that's just for bitmaps. There's a .NET wrapper for that too:
I do not buy it looks like it's in a similar situation to nltk, which I have used with Python 3. Python 3 support exists in the upcoming version, which is available it looks like here http://opencv.org/opencv-3-0-alpha.html
If you're looking for a way to contribute to open source, this is a great way to do it too.
If you really want flexibility, then you could design your own program in Python using http://opencv.org/. Should only take ~20 lines if you're proficient in Python.
Edit: Proficient in Python and reading docs.
I've had this idea for a while too,
I don't think I'd start with HD, I'd start with cheap CCD lower res cameras for a couple reasons. First they are cheap, and second I think smaller images take less processing so the rest of your hardware is going to be cheaper as well. For me the challenge would be learning basic and advanced computer vision concepts and implementing them in OpenCV.
Some of my book marks for when I get to this someday maybe project:
Realtime video stitching doesn't seem simple at all to me. I think you'll need serious processing power to realtime stitch that many streams even if they are low res. Now if you don't need real time then it is probably much cheaper. My first step will be setting up a jig that can hold x number of cameras in the proper orientation with the correct overlap. I'll also prolly skip the hemispherical aspect and just go with all the cameras oriented to the same flat plane.
Bottom line is you're not going to get anywhere near sub $100 for what you're asking. You're looking at maybe as low as a couple hundred for a really low-res version (not including the cost of the PC and GPU).
Well just off the top of my head, here's a few projects and developers you should submit bug reports to for being pants-on-head wrong.
It might be easier to push the button using a servo with an arm on it as opposed to a solenoid, just because the solenoid will need to be mounted so that the solenoid rod isn't always pressing the button, whereas the servo can be mounted next to the button, similar to in this video. This has the advantage that, after making an LED blink, controlling servos with an Arduino is one of the most common tutorials out there.
For light sensors, if you want to just measure brightness, you can probably use an analog component, such as a photoresistor/LDR. If you need to detect colours, you'll probably use an IC on a breakout board (something like this) interfacing digitally with the microcontroller, which means learning about digital communication protocols such as UART, SPI or I2C.
Finally, if you have some programming experience, consider looking into computer vision - processing images or video algorithmically to try and determine information about them. If you have something powerful enough to read data from a webcam (something like a Raspberry Pi), check out OpenCV.
Hey - sounds like you're looking for something that more fits the lines of "computer vision" - I've used OpenCV to develop some silly motion tracking iOS apps - it also works for still images and it has C++, C, Python and Java interfaces - hope this helps.
For OpenCV? I know there's a contribution page here but I'm not sure if there's a concerted Python3 port section, I haven't gone looking, but I know there's no Py3 build yet.
A phone camera is plenty good enough- you can clearly distinguish the mouth and eyes from the rest of the face in a phone picture as long as there's enough light. Really you'd probably want to downsample the phone images to something more manageable like 1080p (1.3 megapixels) instead of their native resolution because you don't actually need that many pixels.
What you'd have to do is write the image processing code to distinguish facial features and then create a training set where you have a set of known images- this picture (with these corresponding facial features) corresponds to this emotion. Then you'd have a machine learning system which would assign weights to each facial feature (if an image has the corner of the mouth at such angle, they are more likely to be happy). You'd probably want to use an existing image processing library like OpenCV to analyze the image. It's certainly doable, but it's probably more of a group project and you'd want someone that's taken some courses in computer vision.
Well, if you're really interested in it, you might want to check out OpenCV. There are OpenCV and Arduino helper libraries for both Processing and OpenFrameworks, but I'm not sure if they expose functionality that will make it easy to track an object, but both will make it (most likely) easier for you to get framebuffer data from a camera.
I figured, lamest case scenario, you could track an object as long as the camera is still, and nothing else is moving in the picture, just by finding a big enough area of difference between frames. But there's definitely more sophisticated object tracking systems available.