Intro to Data Analytics with Python, SQL, Spark and Seaborn 2-days workshop
Learn to analyze real world data using Python & Pandas, extend your analyses to databases using SQL and scale to Big Data with Apache Spark. Create compelling visualizations with Seaborn!
This workshop will introduce you to essential concepts and practices for building compelling analyses and dashboards on datasets of any size. You will learn how to:
Use Python and Pandas to select, group and summarize your data
Decide what data to keep and what to ignore
Create compelling visualizations using Seaborn and Matplotlib
Connect and retrieve data from a database using Python
Extend your analyses to relational databases using SQL
Perform aggregations and combinations using SQL
Include unstructured data sources in your analysis using Spark
Scale up your analyses to Gb of data using Spark on AWS
Combine Spark and SQL for maximum flexibility and power
The workshop is designed to maximize the learning experience for everyone and includes 50% theory and 50% hands-on practice.
Is lunch provided
Yes! Lunch is included.
Are there any prerequisites?
Previous experience programming in Python or in other languages is advised to make best use of the workshop.
In the last 2 years Python has become a de-facto standard in data science and is widely adopted by most major companies. Reasons for this success include:
large set of mature data visualization libraries => most needs covered
worldwide community of enthusiasts => get help when you need it
easy to learn, read and write => start contributing immediately
supports both functional and object oriented coding => versatile and powerful
full stack programming language => easier interaction between data scientists and software engineers
SQL is the most widely used language for managing data in a relational database. It is supported by both open source projects like MySQL and PostgreSQL and by enterprise databases like Oracle, Microsoft SQL Server and many others.
Apache Spark has revolutionized how we build and deploy data pipelines for ETL, Visualization and Machine Learning. Reasons for this success include:
Flexible enough to run SQL-style queries, machine learning algorithms, and everything in between
Fast and scalable: efficient memory use => runs up to 100x faster than Hadoop
Supports data exploration and production workflows => same code that works on a laptop can be deployed to cloud-based computing clusters
Free and open-source
The course is lead by Francesco Mosconi. Ph.D. in Physics and Data Scientist at Catalit LLC, he was formerly co-founder and Chief Data Officer at Spire, a YC-backed company that invented the first consumer wearable device capable of continuously tracking respiration and physical activity. Machine Learning and python expert he also served as Data Science lead instructor at General Assembly and The Data incubator.
Read our reviews on Yelp
Terms & Conditions
In certain cases, we may need to cancel this workshop due to circumstances beyond our control or otherwise. If this happens, we will refund all registration fees for those who signed up. We are not responsible for any related expenses incurred by registered attendees (including but not limited to travel and hotel expenses).
More than 1 week before course: full refund.Less than 1 week before course: no refund available.
All public workshops come with a no-questions-asked money-back guarantee. If you are unhappy for any reason after attending the class, you can ask for a full refund.