Using a clever trick in leveraging static dictionaries in PL/Python, we can easily scale ML models from popular libraries like scikit-learn or XGBoost for data parallel problems. You can read the full blog that I published in the Pivotal Engineering Journal following the link below.
Building machine learning models at scale for data parallel problems on Pivotal's MPP databases
- All about machine learning - a "Breaking 404" podcast with HackerEarth
- Einstein for Sales - Under the Hood (Dreamforce 2019 Breakout Session)
- Predicting Commodity Futures with NLP on Tweets (Text Analytics World San Francisco 2013)
- PyMADlib - A Python Wrapper for Apache MADlib (Data Day Texas 2013)
- Python Powered Data Science at Pivotal - PyData NYC 2013