If you use Java in the workplace I highly recommend Java Concurrency in Practice by Brian Goetz.
Its old but rigorous and really helped me understand the more general concepts of concurrency
My first questions is do you have any state that need to be shared by the masses and springs? And would that shared state need to be updated by multiple threads?
Multi-threading get's fairly complex, especially around the details of when data from one thread is published for another thread to read. You would also have to worry about potential race conditions around when values are read and written.
If you do have shared state you will probably want to dig into the nuances of concurrency in java. I would recommend checking out this book: https://smile.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601/
If you don't have to worry about sharing state it should be a bit easier.
I haven't done anything as complex with threading as you are trying to do here. I wonder if a runnable for each object would be overkill, as opposed to just splitting the objects you want to update into multiple lists, and having each thread just run through one of those lists. It might be worth experimenting to see what gives you the best performance.
So long as you aren't depending on what happens when race conditions occur, you'll be perfectly fine.
Read Java Concurrency in Practice and you'll be safe regardless of the architecture you decide to run on.
The only people in trouble are those relying on undefined behavior.
Java Concurrency in Practice Not only does it cover the specifics of the java concurrency packages and how to use them, but discusses lots of good programming practices that are essential to being a good programmer and not covered in many other books.
In short, unless you spend all of your time writing boring, vanilla business logic, you will spend a lot of time with your fact in this book "Java Concurrency in practice" by Brian Goetz, which is a good book, but at some point you may wonder just what the hell you are doing with your life and if programming with thread has to be so complicated.
Pretty sure Rich already did this and we can learn from his journey.
> But it is still a lot more complicated that just letting the computer handle multithreading for you like you would in Java, C#, python
Not at all. Java Concurrency In Practice is 384 pages long for a reason. The way Node does concurrency is way simpler than using threads with manual synchronization.
Highly recommend learning it right the first time by reading Java Concurrency in Practice.
You need 2 things to have a race condition, mutability and shared memory. The best way to avoid a race condition is to eliminate one or both of those. In java, that means working with promises and merging the results rather than trying to worry about locks keeping state in sync.
Sure, for ultra-high performance threading, this advice has to be ignored. However, you shouldn't default to ignoring it. It's a whole lot easier to get concurrency right when you aren't weaving locks.
That being said, if you have to mess with locks, get familiar with java.util.concurrent ( https://docs.oracle.com/javase/8/docs/api/index.html?java/util/concurrent/package-summary.html ) So many of the constructs you might create have already been made both better AND thread safe.
In this specific example, LongAdder would do a much better job than anything you might cook up on your own.
A book on concurrency/multithreading in your backend language of choice.
For example, I mostly work in C++ and Java so here are some books I have in my book shelf:
https://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601
https://www.amazon.com/C-Concurrency-Action-Anthony-Williams-dp-1617294691/dp/1617294691/
If you've worked with Java before this is a great book
Java Concurrency in Practice. Ainda é relevante.
https://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601
Or if you don't want to buy a book: https://docs.oracle.com/javase/tutorial/essential/concurrency/index.html
https://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601/
This book is a great resource if you want to gain a deep understanding of multi-threading in java.
If you can design the algorithm such that you don't have to share mutable memory across multiple threads you will save yourself a lot of work. Mutable means any object or value that can be modified after it is created. The way threads share memory is un-intuitive and complex, and implementing things incorrectly can lead to deadlocks where the program will not be able to complete. These issues are notoriously difficult to debug.
I don't have a good understanding of how edge detection works, but my understanding is it involves looking at each pixel and comparing it to neighboring pixels.
I would suggest making the input image immutable before passing it to the threads. Each thread could be assigned a subset of the image array, and the main thread can assemble the results into a single object. This avoids any threads having to modify any shared data, which reduces a lot possibility of weird failure states by a lot.
A super easy way to create a thread is to use a CompletableFuture. You provide it with a function, which could take the image and some information on which subset it has been assigned. Once it is done processing the thread that created it can access the results.
By default this will use the common fork/join thread pool, which has threads equal to the number of CPU cores the machine has minus 1. If your machine has 2 CPU cores, the fork join pool will get 1 thread, which basically disables multi-threading. You can define a custom Executor with a larger pool size and pass it into the CompletableFuture. Baeldung has a really good article on how to use them: https://www.baeldung.com/java-completablefuture
This book https://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601 is really helpful.
Fram a Java language perspective (the language that's my wheelhouse):
I only skimmed your code, because there's a lot of code there.
In general, there are two ways that threads communicate with each other:
In fact, most concurrency constructs involve both. For example, a <code>Semaphore</code> class depends on all threads being able to see the number of available permits and, if no permits are available at the time that acquire
is called, depends on being signaled when a permit is made available.
Concurrency is hard to get right, even for developers with a lot of experience. It's easy to write code that appears to work, but which has a hidden concurrency bug. I'd posit that the mental model that most developers have about memory visibility in Java isn't quite accurate.
For your case, probably the easiest approach is to use something like a <code>BlockingQueue</code> (perhaps in the form of a <code>LinkedBlockingQueue</code> or, if you know you can process messages faster than they arrive or are OK with dropping messages, an <code>ArrayBlockingQueue</code>). Have threads communicate by sending messages to each other through these queues, and otherwise share as little state as possible. Make sure that each thread does periodically check its queue for more messages. This is the model adopted by other programming languages that emphasize concurrency, like Erlang and Go.
For more information, the book Java Concurrency in Practice is excellent. I highly recommend it.
Yeah! Great book! Currently discounted on Amazon.
I would suggest picking up Java Performance: The Definitive Guide By Scott Oaks. It does an excellent job of talking about, in pretty good detail, how everything works under the hood. You pretty much need to know what goes on under the hood in order to write fast java, so looking for books/guide which talk about performance would probably lead you the learning about how the jvm works.
I would also suggest looking at Java Concurrency in Practice by Brian Goetz and pretty much the whole JVM team. It does a pretty good job of explaining the ins and outs of concurrency, important to know if you want to take full advantage of your hardware.
Do you have anything specific you want to improve on?
Here's a few that I liked that are specific to Java:
I recommend checking out java concurrency in practice to learn best practices for concurrent programming in Java.
For distributed apps I mostly use Amazon EC2 service. They also have SimpleDB (a nosql database), SQS (message queue), SNS (push message service), and a relational database service. For most distributed non website apps those are all the building blocks you need.
For webapps or web services I really like the Play Framework, which is very rails like except for being java/scala. My modus operandi is to set up a web service interface for each element of my distributed app because then you are down to simple http requests for the bits of you program to communicate. To secure these services I usually will limit requests to localhost then set up ssh tunnels to communicate over.
EC2 has an easy to use Java api library for spawning up and monitoring instances. Once an instance is spawned I usually set up an ssh tunnel to forward commands to launch whatever I want to do. I use ganymed as a pure java ssh client.
So in terms of a simple distributed webscraper that doesn't need to track state (e.g. logging) this is an example architecture:
Also I recommend checking out Hadoop, which is the open source Map-Reduce implementation, if you are interested in more of a distributed batch type job. I also really like RabbitMQ as a message queue service (written in Erlang not java but they have a java client api library) if you want to use your own message service and not Amazon SMS.
Get the book Java Concurrency in Practice by Brian Goetz (one of the architects of the Java platform). I'm not sure how current it is though with all the new stuff that's come out since Java 8 though (ForkJoinPool, etc.)