All Data Scientists know that story... Install the well-known Jupyter Classic or JupyterLab Notebook on their local PC/laptop, pip install some python libraries like pandas..., download some datasets and finally start analysing with a notebook in isolation. There are a few pain points there:
- Setting up the tools is hard and time consuming. You have to install Python, Jupyter and add the libraries you need. Conda environments or Docker containers can help mitigate the pain at some point, but finally these are yet additional tools to setup and manage.
- At some point, they want to collaborate with teammates, or want to share some results. The Data Scientist is just on his island and has no easy way to break the silo. The recent Realtime collaboration features have been merged into JupyterLab but it is just the permises and miss fundamental building blocks like identity, authorisation...
- The analysis is not easily reproducible. The setup you have done on a particular Windows platform is completely different from the setup another collaborator may have done on MacOS.
There comes the need for an better solution. At Datalayer we think that a more Cloud-native Jupyter can help remove those pain points. In other words, we embrasse the infrastructure provided by cloud providers like GCloud, AWS, Azure... and build on top to provide more power to the Data Scientist.
Cloud native computing is an approach in software development that utilizes cloud computing to "build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds.