Visualising Data with Python
Mon 8 Apr 2024 10:00 - Fri 12 Apr 2024 16:00
University of Birmingham, B152TT
Description
In person-delivery, University of Birmingham (Venue: FRANKLAND BUILDING: Room 1125)
Aimed at: All Years
Duration: 5 days
CENTA Training Credits: 10
Training Area(s): Theme Science and Technical Skills
*This training is free. Please bring your own laptops to this training.
As always, please only book onto this training if you are committed to attending (no shows are not acceptable), and notify me ASAP if you register but find you will not be able to attend so I can offer your space to a waitlist applicant.*
Think Python is just the bigger, scarier version of R? Think again!
Python is a powerful tool for data science, including data analysis, statistics and visualisation. It's also a great language to learn if you are interested in Machine Learning, AI and other related applications. It's a skill that is highly in demand with employers, including in the Environmental Sectors. Put simply - if you're interested in R, you might find Python an even better (and more broadly applicable) fit!
Trainer Martin Jones is back with his amazing resources and teaching style to help introduce you to Python and data visualisation. But don't just take my word for it - previous CENTA student feedback for Martin's Python teaching includes:
'Python is a lot less scary than you might think and is a great language to learn because of its broad applicability to data science and beyond.'
'To me, the instructor was the best I've ever had as someone who's trying to explain coding. Sometimes I almost felt I understood it - which hardly ever happened before on similar courses - and some bits and pieces I did indeed understand which is essential for post-course practicing and application of this knowledge in real life later on.'
'I never thought I'd enjoy such a course as much as I did. And this is in the largest part thanks to Martin, who is a great instructor with incredible patience and very good explanatory skills. On top of that, he is also very eager to help and is a nice person.'
***Update of course info from trainer: ***
Python is a dynamic, readable language that is a popular platform for all types of data analysis work, from simple one-off scripts to large, complex software projects. One of the strengths of the Python language is the availability of mature, high-quality libraries for working with scientific data. Integration between the most popular libraries has lead to the concept of a "scientific Python stack": a collection of packages which are designed to work well together.
This workshop is split into two sections. In the first two days we will introduce the basics of the Python language for those new to programming. For students with previous experience of Python or other languages, this will serve as a refresher and a chance to discuss best practice and focus on the parts of the language that we will need later.
In the second two days we will see how to leverage the libraries in the scientific Python stack to efficiently work with and visualise large volumes of data. Specifically, we will cover:
- pandas for reading, cleaning and manipulating tabular data
- numpy for efficiently working with arrays of data
- scipy for basic statistics
- seaborn and matplotlib for data visualization
This workshop is aimed at complete beginners and assumes no prior programming experience. Rather than attempting to give a comprehensive overview of Python, we will instead concentrate on how best to use existing libraries to accomplish a lot while writing a very small amount of code! There will be opportunities to use your own data throughout, and the final day is set aside as workshop time for you to work on your own datasets with help from the instructor.
Intended audience:
This workshop is aimed at anyone who needs to make sense of large datasets, working in any field. We will use examples drawn from many different subject areas, and the tools we will discuss are equally applicable to any kind of data. No previous programming experience is required. If in any doubt as to whether the workshop is suitable for you, take a look at the detailed session content below or drop Martin Jones (martin@pythonforbiologists.com) an email.
Curriculum:
1. Introduction
In this session we'll get familiar with the notebook environment in which we'll be working, and cover the very basics of writing and running Python code. We will briefly cover the different parts of code – variables, functions, methods and arguments – and discuss how these simple building blocks can be combined to make programs that do useful things. We will finish this session by looking at Python's tools for text manipulation, then using them to solve some exercise problems.
2. Lists and loops
In this session we will take our first steps into working with larger datasets by examining lists (which allow us to store multiple bits of information) and loops (which allow us to process them). This will require learning a little bit more about Python's syntax, which will set us up well for future sessions.
3. Conditions
In this session we will take the next logical step and learn how to write programs that can make decisions and implement rules for working with data. Python has a variety of ways to express complicated rules, and we will learn the circumstances that best suit each. We'll also focus a bit more on the importance of readability when there are multiple different ways to achieve the same thing in a program.
4. Organizing and structuring code
In this session we will discuss functions that we’d like to see in Python before considering how we can add to our computational toolbox by creating our own. We examine the nuts and bolts of writing functions before looking at best-practice ways of making them usable. We also look at a couple of features of Python – named arguments and defaults – that are very heavily used in the libraries that we will cover next.
5. Reading and processing data with pandas In this session we will introduce the first of our scientific python packages: pandas. We will learn how to get our data out of files and into our Python programs. In the process we will have a chance to discuss file formats, missing data, and pandas' data model. We will learn how to efficiently carry out basic calculations and transformations on data and, crucially, how to select and filter data for analysis.
6. Distributions and relationships with seaborn In this session we will begin to look at visualisation. We will start with the workhorses of data visualization: the histogram and the scatter plot. Studying a few examples of each will allow us to get familiar with the seaborn interface and to cover a few points about visualizations that communicate effectively. We will combine this with a look at styles and colours, which contribute greatly to the readability of our charts. Here we will also discuss the use of statistical methods to discover patterns that are not obvious from simply looking at the data.
7. Relationships in different categories In this session we will cover the most powerful aspect of seaborn: dividing up our data into different categories to look for patterns. This will build on the tools from the previous session, and there will be lots more to discuss about using them to rapidly explore new datasets. We will learn how to use pandas' grouping ability to produce summary tables, and how to use heatmaps - a fantastic and under-utilized tool for representing complex categorical data - to visualise them. We will also extend our understanding of pandas data types to see how to deal with categorical data.
8. Distributions in categories
In this session we will round up our survey of seaborn's chart types by bringing in those that directly compare categories. Here we find the classic box plot, as well as the more exotic swarm, violin, and boxen plots which aim to deal with some of its shortcomings. Drawing from a range of example datasets will allow us to illustrate which type of data suit each one best. We will also return to pandas to learn how we can sometimes represent continuous data as categories, and what trade-offs are involved.
9 and 10
The last day is set aside for workshop time. This is an ideal opportunity for students to apply the material to their own datasets (or to examples from their own field) with help from the instructor. Alternatively, we can use the time to discuss topics of particular interest that haven't be covered in the standard syllabus, or to continue to work on exercises from throughout the week.
Location
University of Birmingham, B152TT