Parallel Processing with R
As someone who work with data day in day out, data wragling with a huge amount of data prior to everything else is like brushing your teeth first thing in the morning, a necessary step but you wish you could get over it fast in order to start your day.
It would be awesome if the repetitive tasks could run in parallel and shorten the processing time!
In my example, i wish to extract the daily mean, median, max and min weather data of Singapore from an API that contains per min record from each part of the island.
Being a very efficient country, we collected alot of data. Screenshot below illustrated how much of data we have for ONE SINGLE DAY, on temperature only.
If we need to process for 2 full years of weather information, i’m not sure how long that would take us. The superhero of the day is the built in library(parallel) !!!!
Performance comparison, 50 mins with Lapply vs 6 mins with ParLapply on random date range. Phew, its pretty warm on a new year day in 2019.
And, you are welcome!
Enjoy the code and leave your comment :)