I am going to be stoned to death for this, but here it goes: R is the worst programming language I have ever used.
I used it for statistical analysis with time series and while the amount of libraries is amazing, its quality is not always the best. Working with data frames? The horror!
The mess with time series objects is just unbelievable. Many times I just ended guessing which type a routine needed and converting between types a lot. The mess with their object oriented implementation is beyond words.
Besides, that assignment operation (<-) makes my fingers hurt. I now that "=" is also valid, but it is frowned upon.
The topping of the cake: R is slow!
On the other hand, I must admit I am fan of some R tools like shiny. I love that tool. I just wish it were based in another language.
I settled with Matlab and while its programming language is not the best (and their editor and UI just sucks), I always get the right support whenever I need it. Algorithms are fast and dependable.
Honestly, if you are looking into learning something new, run away! Learn Julia, Python, or Matlab.
My 2c.
I've built dashboards for largish clients, using pretty much the same stack as you. Generally feel that building your custom dashboard > buying an off the shelf product, because:
Would recommend a couple of things though:
>Shiny by RStudio
>A web application framework for R
>Turn your analyses into interactive web applications
>No HTML, CSS, or JavaScript knowledge required
That pretty much sums it up.
Definitely R with shiny is perfect for what you need. If you know some java and or python, learning R isn't so bad; as usual though, any new language has a bit of a learning curve. Good luck! http://shiny.rstudio.com/
> Perl is great for wrangling data - surprised no one's mentioned it yet
There is one person in our lab who is proficient in Perl and she is transitioning to Python because everyone else is working in Python already. It seems like are no new tools being developed for NLP in Perl (though correct me if I'm wrong, might just be a blind spot).
Perl has loads of valuable modules and libraries, but people seem to be moving on. A lot of intro to programming courses out there now feature Python, and the NLTK has been instrumental in its success in our field. (Along with NumPy & SciPy)
> better visualization than R.
well, i can't really show you mine, because it's confidential work stuff, but i am more than happy to walk you through the process.
the tricky part is knowing that a packaged* tableau workbook is just a zip file with .zip changed to .twbx. if you rename it to .zip and open it up, you'll see that it's a workbook and a folder containing a datasource, in my case, a csv i have r make.
if you know how to use shiny, then it's not that difficult to make a form that takes in parameters (i'm not that good with shiny, but everything you need to know is from the shiny tutorial page, and you use those parameters to run an sql query and put it into a dataframe.
from there it's just file manipulation. if you have that packaged workbook (run the query once with whatever parameters you want, so you can design a presentation; then tableau knows the columns/names of the data, but the data will change depending on the parameters you passed it), then you have the workbook, so just put it in a location that r has access to, and then have r recreate that zip file (i.e. replace the existing data source in the zip with your newly created data source and zip it back up). you can make another page for a downloader that will export that workbook and make it available as a download, or you can have r email it with a user supplied address (another shiny option). or, if you have a presentation on a tableau server, and you can set the location of the data source that tableau uses, just have r update that file with the new data file.
does that help, or should i go into more depth?
edit: correction
There are 3 components to what you are looking to do:
1) Scrape data and place it into a database.
Potential Book: "Mining The Social Web"
If you aren't looking to go that formal, the best API to start and play with in this regard is Twitter. That is mainly because there are approximately 5,000 guides as to how to do anything. For example, for information around how to potentially take data and place it into a database take a look [here](http://stats.seandolinar.com/collecting-twitter-data-introduction/.
2) Using R to visualize the data with the help of D3
Accessing the database itself is something that can definitely be done within R. It's strong dependent on the database you choose, but the mechanism is generally: 1) Get library that connects to particular database, 2) Connect to said database, 3) Run a query that returns some kind of data frame.
Then, you can use shiny as a mechanism to take your data and calculations from R and make dynamic and interactive charts and graphics. Lots of documentation exists for connecting it to D3.
As an alternative, especially if all you will be doing is very basic table or other manipulation. You can just have R output a csv and do all the work on the frontend.
3) How do I do this on mobile?
On that, I'm afraid I can't help with.
R & Shiny or Shiny Dashboard can do some pretty cool webapps / live visualizations with a little programming. I have done a few of these at my work. It requires almost no HTML/web programming knowledge but you will need to learn a bit of R. The Shiny Server portion runs in linux, eg ubuntu server, and can be set up somewhat easily. The main con here being you would have to reimplement your excel work in R, which likely is not a minor task.
Sehr cool, wenn du Spaß an R hast, dann schaue dir doch mal das knitr Paket an: http://yihui.name/knitr/ Mit knitr könntest du einen Standardisierten Bericht in R erstellen und somit jede Woche so einen schönen Bericht raushauen, ohne manuelle Anpassungen. Auch sehr gut ist shiny, hiermit lassen sich in R gleich komplette Wepapps entwickeln: http://shiny.rstudio.com/
Never done it, but if you have money to burn, check out deployR
If you don't want to spend money and can figure out messaging protocols, there is an r package for zero mq.
The Shiny package is awesome for linking a front end to reactive r code.
My immediate suggestion would be R and Shiny, which has the advantage of giving you a lot of data processing power under the visualization in one tool. I've used R fairly extensively for mapping, including animated output, but have only really played with Shiny.
There's a useful looking Shiny mapping tutorial here: http://shiny.rstudio.com/tutorial/lesson5/
A former classmate of mines used Shiny (for R) at his previous job. I had some experience running an interface and it seems pretty simple. No experience/comment on serving the pages, but I think there should be an export option. Rstudio has a guide here (no comment on usefulness, but if I had to do it, I would look here first): http://shiny.rstudio.com/tutorial/lesson7/
If you wanted something more traditional, I would look into Javascript frameworks. How would you manage this? Well, for 10-20 options, it might be feasible to print all of the outputs for all combinations of outputs. My mind is dead so I don't know if this is unreasonable, but this would be a static JSON file. You then write your website using a JS framework (Angular and React seems to be hip these days) and then you can host it using Github IO or Heroku (little experience with both, but there's guides out there). If you can run Python code in the backend, I would write a REST interface using Bottle/Flask on the Python side so it can pass the arguments from the JS side to your predictor. I don't know a place that can host that for free.
Because shiny is built to create interactive graphical representations of data with minimal effort, and it's really great at that. Check out http://shiny.rstudio.com/gallery/ (not affiliated with any of the developers, btw)
Turing completeness is not terribly useful to judge a language’s usefulness: Brainfuck is Turing complete and utterly useless, whereas SQL isn’t Turing complete and tremendously useful.
R isn’t — and isn’t trying to be — a Swiss army knife. It’s a special tool for a special purpose. And, although it’s admittedly far from perfect, it excels at that special purpose better than any Swiss army knife could.
However, I can’t resist pointing out that there’s a Flask-like library for R called plumber which makes writing web services in R a breeze. And for more special applications there’s also Shiny.
You will have to have R running on a server in the background. Check out Shiny. Shiny let's you write a web app natively in R, "compiles" to a javascript front end that runs on an R Server back end.
The alternative is to use something like HTMLwidgets which "compile" R graphics into moderately interactive javascript widgets. These don't allow on-the-fly computation in R, though.
As mentioned there is Shiny for R, although you have to pay for a license if you want concurrent users.
I've built a little front end for scripts on the web called Wooey (for "Web UI"). It's automatically generates a user interface from a Python command-line definition (argparse). It can support other commandlines, though you have to define the interface manually in that case.
As a currently employed data analyst, I use both R and python daily depending on which I decide is better suited to the task at hand. In general I favor R.
More recently R has grown the capacity to produce stunning reports and interactive applications using shiny and Rmarkown. The ability to quickly create beautiful reports and share my data / analysis is key for me. You can create great reports in python using the IPython notebook, however it is well behind in functionality at this point.
I tend to gravitate to python when I need to do something file I/O oriented such as reading and saving files from a nested sftp network drive . One of Pythons major machine learning libraries Sci-kit learn is better (IMO) than R's current offerings as well. I also use python to create and write to my own custom databases in sqlite.
Overall you may as well pick the one you find more interesting and start there. My biggest fault in the past is doing more researching on what is "the best" than actually learning something.
Such a thing can be done via RStudio's Shiny; you can create sliders and show normal distributions and show random data... and a lot more besides
Here's an example of the sort of thing that Shiny does fairly readily:
https://gallery.shinyapps.io/sampling_and_stderr/
There are other things for R that might do instead but Shiny sounds closest to what you probably want.
See the gallery, and also some of the things at showmeshiny.com
d3.js visualizations take a lot of time to create and require a lot of CSS/JS expertise if you want a high quality and interactive product. If you have the skillset then it will work well, but you'll still invest a lot of time into the visualization.
Alternatively there's shiny, which leverages R plotting libraries and requires no web development skills. You gain a nice balance of production speed and quality, but you trade in the low-level DOM control you get with d3.js, which is basically just a means-to-an-end that you get with shiny anyway.
Why not use shiny here?
Don't get me wrong, I love d3. However, the author starts in R, and then switches after taking as an axiom that there is no way to make an R visualization interactive.
The article needs a stronger support for connecting the two and requiring the use of framework that's built in a second language.
Thanks! They render each time. How long does one of your graphs take to render when you run it manually? Would it be difficult to make a list that has a large part of the processing already done? I've helped a friend do choropleth maps of US county data in shiny and it only took about 10 second to render a new map. You can see some examples on showmeshiny that are doing some pretty powerful stuff.
When displaying that many graphs I would worry more about usability. How many people are going to find 300 graphs useful? If you explore around there may be better ways to display the data.
If you want to go that route rstudio has an article on rendering images in shiny. It's certainly doable.
> Can you recommend any resources to learn to use R for someone that knows next to nothing about coding?
I am afraid that I can't. First, I knew some basic coding before I heard about R for the first time, so it's quite difficult for me to put myself in shoes of someone who know next to nothing about coding. Second, for starters I could personally recommend book "Przewodnik po pakiecie R" by Przemysław Biecek, but as far as I know, it was never translated to English. So unless you happen to speak Polish, it won't really help you.
You could maybe start with R Programming at Coursera. They don't require programming skills, but mark them as "useful", so this might be still a little bit too challenging.
> Also, is this tool useful for sharing things with other people who are also not knowledgeable about coding and likely won't have R installed on their machine?
R can export data to CSV and SQL database. It can also write variety of other formats, but virtually everyone can read CSV and do it properly, so it's your best bet.
If you are interested in more consumer-oriented approach, you can easily write HTML, LateX or markdown file (and then convert markdown to virtually anything using pandoc) from inside R. Or you can go fancy and create web application using Shiny.
But if you want your colleagues to run their own analysis, probably based on yours, this will not be possible without them installing R and having at least basic knowledge about it.
Yes. I asked in the RStudio support forum and they suggested to look into the Authentication feature that ShinyApps have. Very cool. I think I will propose this to my clients to see how it goes.
If you can write a function, you can write a shiny app. If you use R-studio, this is the standard axample under "new file". Running it should open a browser window with the app:
# # This is a Shiny web application. You can run the application by clicking # the 'Run App' button above. # # Find out more about building applications with Shiny here: # # http://shiny.rstudio.com/ #
library(shiny)
# Define UI for application that draws a histogram ui <- fluidPage(
# Application title titlePanel("Old Faithful Geyser Data"),
# Sidebar with a slider input for number of bins sidebarLayout( sidebarPanel( sliderInput("bins", "Number of bins:", min = 1, max = 50, value = 30) ),
# Show a plot of the generated distribution mainPanel( plotOutput("distPlot") ) ) )
# Define server logic required to draw a histogram server <- function(input, output) {
output$distPlot <- renderPlot({ # generate bins based on input$bins from ui.R x <- faithful[, 2] bins <- seq(min(x), max(x), length.out = input$bins + 1)
# draw the histogram with the specified number of bins hist(x, breaks = bins, col = 'darkgray', border = 'white') }) }
# Run the application shinyApp(ui = ui, server = server)
By 2 files would presume you mean ui.R
and server.R
. The code of both these files can be combined into one file and the app would work as well.
On a general note, and since you have up to a month to your presentation, why don't your go through the basic Shiny tutorial at http://shiny.rstudio.com? It wont take more than 2 days to cover the material for the introductory section.
I think you might want a more programming oriented subreddit for this question but I am going to give this a shot. You want to find some common data formats R can target and then a commonly adopted JS library that can plot them with minimal modification?
Its hard to say if you can do it for every type of chart but I know that a bar or line chart is trivially exportable as plain old CSV. Which should be easy to convert to something like JSON data which is often used by plotting frameworks. There is also some XLS and XLSX packages for R so if your webdev guy already has a way to crunch those Excel files in a webportal friendly way or perhaps can be embedded with sharepoint/office365 functionality that is good to know.
It is kind of sad because R is kind of well liked for its ability to have really nice defaults and 'built ins' for its charts, if it is ok to generate image data server side you could just use R for that on the original data. But that might sound like a lot of work to your web-dev. Even if you get the data in the other framework you might end up with users unsatisfied with the way it chooses colors or other formatting details that can make it harder for them to do their visual analysis.
I don't have any hands on experience with these but for the JS display side I'd say look into D3.js and Google Charts APIs. I have seen those around and it can help you determine what is a good format for your data.
Edit: or maybe not reinvent the wheel and look at http://shiny.rstudio.com/ I had a vague recollection of people working on this a good while ago but it kind of looks ready for use now.
So if you are willing to lock yourself into paying money you might be interested in enthought: https://www.enthought.com/ . My understanding is they are company that in some sense backs numpy/scipy while developing tooling which is similar to matlab but for python.
So kind of like a less evil matlab.
You might also be interested in shiny for R. I think this is more of a "rapid web app development tool for statistics" than jupyter. Jupyter feels like it exists in the reproducible research space rather than the rapid application development tool space. http://shiny.rstudio.com/
> I just want to imagine a future where creating super data rich web apps in python doesn't require a whole bunch of setup.
It's worth mentioning Zope here. I've never used it, and I know people who seem to hate it with a passion. I would also mention cookiecutter as a tool to make setup simpler.
There whole: framework vs library vs language vs tool question is thorny. There seem to be a disturbing number of companies who buy into comercial frameworks, hit the limit of the framework, pour effort into somehow adding features to the horrible framework and then burn through junior developers because they have difficulty hiring people to work on their crazy old language.
Maybe I'm misunderstanding what you're trying to do. Can you elaborate a bit more on the purpose of those scripts?
A package is just a set of functions that make the usage of R easier for common data analysis tasks. They will need to be called via line commands. The only way to use UI with R is Shiny.
I would suggest going through this webinar to get acquainted with the concept of Shiny: http://shiny.rstudio.com/tutorial/video/
That sounds interesting, I am going to look into a method of saving the graphs. I could actually just save the locations the user has selected, and use them as inputs to generate the graphs on the fly.
One issue with data storage on shinyapps.io (which is what I am using) is found here: http://shiny.rstudio.com/articles/share-data.html
I think you can get pretty far while only teaching people the basics of how a server works from the client side. For example, R shiny can create complex dashboards while largely sweeping the server internals under the rug.
This data is sourced directly from the FDA's OpenFDA API: https://open.fda.gov/drug/event/
It's interesting to compare aspects like the share of fatal events by drugs. Unsurprisingly 15% of event reports for Oxycodone are fatal. Acetaminophen is also higher than I thought it would be with 7.9% of events being fatal.
The dashboard itself is a node.js app written in R using Rstudio's Shiny web framework: http://shiny.rstudio.com/
Plots are done in the ggplot2 plotting library by Hadley Wickham: http://ggplot2.org/
I haven't found latency to be a big problem with Shiny, and I bet refitting the model would take most of the time. Here's an example that's closest, it's not instant, but probably is quick enough to make a useful learning tool. http://shiny.rstudio.com/gallery/plot-interaction-exclude.html
The requirements are relatively trivial if you use the right tools. D3.js or R language and Shiny http://shiny.rstudio.com/
I think it comes down to what the company does. For example if the job posting is in a data mining company I would totally expect this for JDev since its not that technically complex. It would be assumed that you've been exposed to certain tools if you're applying for something specific.
However, if their core business has nothing to do with stats or data analysis then they're being completely unreasonable.
I have no experiance with the R language but i was able to dind some leads:
http://stackoverflow.com/questions/1397097/r-web-application-introduction -- Loads of links to tutorials and other resources.
http://shiny.rstudio.com/ seems to be reccomended. idk.
Tableau is good for this, especially if you want to spend your coding time on the data, not the presentation layer (HTML, JavaScript).
If you are an R user (or r willing to learn), this could be a good candidate: http://shiny.rstudio.com/