From your other posts I can see you are interested in Data Science. There's a lot of good stuff out there, you definitely need more than one book to cover all that. Here are some I found on amazon:
> "Thanks for the encouragement and helpful perspectives."
Truly, it's my pleasure!
> "I am a bit of an older horse using Stata and manual processes for scientific documents."
I'm a bit long in the tooth myself. Although I've always incorporated computing into my work, I never imagined that I'd make a sharp career trajectory change more than 15 years after earning my PhD and building an industry career. Nonetheless, I'm glad I did!
> "I have a programming background from awhile back so I have some concept of good versus bad modularization, design, maintenance, etc. processes."
Truly, that experience puts you ahead of a lot of people who are currently working professionally as data analysts with either R
or Python
.
> "...R is a new language for me..."
Sincerely, when I say, "I understand.", I do. Although my formal training is in chemistry, I did a lot of software engineering for my PhD. Because I was designing my own chemical analysis instrumentation, I had to write all of my own hardware drivers, as well as all of my own data collection software and most of my own post-processing software.
R
properly from the beginning.I suggest reading R for Data Science by Hadley Wickham. You can read the entire book online, but, if you're like me, I suggest purchasing a hard copy and keeping it beside your computer. It's a great book for someone like you. It doesn't start with "intro to programming" concepts, like variables and loops. It assumes you know those general concepts, which you do. However, this book does an excellent job of showing you how R
handles those concepts.
This book also has a nice, light introduction to the concept of "tidy data," the importance of which cannot be overstated. Before working with the tidyverse
, I had done an extensive amount of software development with large datasets, as well as relational databases, including data models for both transactional data and data warehouses. I felt as though I had a very strong handle on data structure. However, I still find the concepts of "tidy data" to be somewhat elusive. I liken the concept to the game of "Othello": it takes minutes to learn, but a lifetime to master.
R
based on experience with other programming languages.When I started working with R
, I was already a very proficient software developer with extensive experience coding with Visual BASIC, Visual C++, and C++ Builder. The only reason I started using R
was because one of my statisticians was using it. He was not only able to perform very sophisticated statistical analyses with R
; he could also produce stunningly beautiful and useful graphics with it.
Prior to programming with R
, my statistician had no experience with any other programming platform, besides SAS. He knew that I had a lot of software development experience, so both of us thought that I'd be able to become proficient with R
very quickly. We were both so wrong!
I started trying to learn R
"in my spare time" (LOL!), and I made zero progress. Next, my statistician started teaching me R
, and I absolutely hated it! Over time, what he and I figured out was that my prior software development experience was actually hindering me.
The reason was that, although R
has a syntax that's based on ALGOL, just like C, C++, Java, JavaScript, etc., it has a lot of peculiarities compared to those languages. Examples include it's concept of environments
vs scopes
, non-standard evaluation, and pass-by-reference vs pass-by-value concepts. To my, although R
syntax looks very similar to C
(or C++
) syntax, R
actually has much more in common with Python
than either of those languages.
Thus, rather me than taking the time to learn R
from first principles, I was taking shortcuts by relying on my prior experience and simply looking up function calls whenever I needed them. For this reason, I was not really "thinking in R
". Rather, I was "thinking in C
" but writing in R
, which was a recipe for frustration.
tidyverse
immediately."...I am barely walking and having to live with some bad processes (e.g., using absolute column number references) until I get more facility with the language and have time to go back and make things more robust."
Once you invest in understanding the R
paradigm, you will start "thinking in R
" and become proficient very quickly. However, on this point, one of the biggest mistakes that I made with R
in my early years with it, which also was the source of much of my frustration, was avoiding the tidyverse
. My reasoning, like so many other people in the same boat, was that I "didn't want to become dependent on third party packages."
If you stop and think about that statement for a moment, then start to unpeel it, you'll see the folly of this kind of thinking. Being "dependent upon 3rd party packages" is just one step removed from "being dependent upon R
." People have a subconscious fear of "3rd party dependencies" for two primary reasons.
Potential for costs to run out of control.
Potential for the 3rd party to stop supporting the tool, project, feature, etc.
tidyverse
early.With R
, cost is not an issue, since it, and by extension, all of its packages, are licensed under the strict GNU GPL. The second issue is somewhat of a concern. However, the tidyverse
is here to stay. In the early days, when it was a one-man show, with Hadley Wickham being that one man, the concern about long term support was real. Nowadays, the amount of support for the tidyverse
is simply astounding.
Although I started using R
around 2006, by around 2014, I was fed up with it, and had resigned myself to abandoning it in favor of Python
with Pandas
and NumPy
. Before giving up my ~8 year "investment" in this platform, I decided to finally "give in" and try out packages like ggplot2
, plyr
, and reshape2
. After just a month or so, I knew that I was not ready to give up on R
after all. My biggest regret, then and now, was that I'd resisted them for so long.
Just this week, I had to write for a client an R
package with no 3rd party dependencies. Although I was very proficient with base
R
's facilities at one time, this recent experience was an intense reminder of what I hated about using base
R
. Although base
R
has some very powerful facilities for data wrangling, the code you write is almost inscrutable at times.
Over the years, many people, even many C
programmers themselves, have reveled in just how cryptic and difficult C
code can be to read. If you're not familiar with the Obfuscated C
Programming Contest, look it up. You won't be disappointed. In my most candid opinion, base
R
code is far more cryptic than C
code.
ggplot2
, dplyr
, and tidyr
.The 3 tidyverse
packages that will change your life almost overnight are the following.
Package | Description |
---|---|
ggplot2 |
Beautiful plots of all kinds, with incredible flexibility and performance. |
dplyr |
Think of it as SQL for R , with filtering, sorting, aggregating, etc. It's indispensable for "data wrangling" |
tidyr |
In indispensible tool for data reshaping, for example, going from "tall & skinny" to "short & wide" |
Also, for rendering tables in RMarkdown, have a look at the "kableExtra
" package. It's another one that I find to be indispensable on an almost daily basis.
> "Thanks for all of the help!"
You're welcome. You're entering an exciting, new world. Enjoy the journey!
It can actually be shortened to:
https://www.amazon.com/Data-Science-Transform-Visualize-Model/dp/1491910399
without using a URL shortener.
Here's the direct link to the book on Amazon.
https://www.amazon.com/Data-Science-Transform-Visualize-Model/dp/1491910399/ seems like a good one.