Version numbers are hard to get right. Maintainers want to communicate to users what the impact of adopting a new version will be, but poor communication can lead to a lot of frustration. There are a few popular version schemes in use today including Semantic Versioning (SemVer) and Calendar Versioning (CalVer). However, projects in the Python community often don’t strictly conform to these standards which leads to confusion.| jacobtomlinson.dev
In October 2015 I gave a talk on Kubernetes at Tech Exeter (back when it was called the Exeter Web Meetup).| jacobtomlinson.dev
An introduction to GPU programming in Python| jacobtomlinson.dev
The PyData software ecosystem is made up of many open-source software libraries that are used heavily in Python Software Development, Data Science/Engineering, Traditional Sciences, Artificial Intelligence and beyond. They were used to fly a helicopter on Mars, drive new discoveries around climate change and generate the first image of a black hole.| jacobtomlinson.dev
Writing GPU code in Python is easier today than ever. You don’t need to learn C++ and there are many libraries available to get you started quickly. In this tutorial we will learn some GPU programming fundamentals and explore the ecosystem of GPU accelerated libraries that do the hard work for you.| Talks on Jacob Tomlinson
Debugging software itself is a hard task, but debugging GPU software environments can be even more challenging. Understanding the intricate interactions between hardware, drivers, CUDA, C++ dependencies, and Python libraries can be far more complex.| Talks on Jacob Tomlinson
| Talks on Jacob Tomlinson
| Talks on Jacob Tomlinson
| Talks on Jacob Tomlinson
| Talks on Jacob Tomlinson
Accelerating Python using the GPU is much easier than you might think. We will explore the powerful CUDA-enabled Python ecosystem in this tutorial through hands-on examples using some of the most popular accelerated scientific computing libraries.| Talks on Jacob Tomlinson
Dask is a popular Python framework for scaling your workloads, whether you want to leverage all of the cores on your laptop and stream large datasets through memory, or scale your workload out to thousands of cores on large compute clusters. Dask allows you to distribute code using familiar APIs such as pandas, NumPy and scikit-learn or write your own distributed code with powerful parallel task-based programming primitives.| Talks on Jacob Tomlinson
| Talks on Jacob Tomlinson
| Talks on Jacob Tomlinson
Abstract Since joining NVIDIA I’ve gotten to grips with the fundamentals of writing accelerated code in Python. I was amazed to discover that I didn’t need to learn C++ and I didn’t need new development tools. Writing GPU code in Python is easier today than ever, and in this tutorial, I will share what I’ve learned and how you can get started with accelerating your code.| Talks on Jacob Tomlinson
Abstract Since joining NVIDIA I’ve gotten to grips with the fundamentals of writing accelerated code in Python. I was amazed to discover that I didn’t need to learn C++ and I didn’t need new development tools. Writing GPU code in Python is easier today than ever, and in this tutorial, I will share what I’ve learned and how you can get started with accelerating your code.| Talks on Jacob Tomlinson
| Talks on Jacob Tomlinson
| Talks on Jacob Tomlinson
By leveraging cloud computing resources, you can pay for just the computing power you need, when you need it. Additionally, GPU acceleration can significantly decrease the amount of time you need computing resources, reducing your overall cost.| Talks on Jacob Tomlinson
| Talks on Jacob Tomlinson
Pandas is flexible, but often slow when processing gigabytes of data. Many frameworks promise higher performance, but they often support only a subset of the Pandas API, require significant code change, and struggle to interact with or accelerate third-party code that you can’t change. RAPIDS cuDF enables Pandas users to accelerate their existing workflows and third-party code with zero code change required. You can continue using Pandas on CPUs for small-scale local development and testing...| Talks on Jacob Tomlinson
Training Large Language Models (LLMs) requires a vast amount of input data, and the higher the quality of that data the better the model will be at producing useful natural language. NVIDIA NeMo Data Curator is a toolkit built with RAPIDS and Dask for extracting, cleaning, filtering and deduplicating training data for LLMs.| Talks on Jacob Tomlinson
Abstract This year I built a library that already exists. The existing solutions didn’t quite meet my needs, I wanted something that ticked all of my boxes. When thinking about building something new people referred me to xkcd #927. But I did it anyway.| Talks on Jacob Tomlinson
Dask is a flexible library for parallel computing in Python. Dask provides high-level interfaces to extend the PyData ecosystem to larger-than-memory or distributed environments, as well as lower-level interfaces to customise workflows. No previous experience is required, though knowledge of Python, NumPy and pandas is preferred.| jacobtomlinson.dev
Kubeflow is a popular MLOps platform built on Kubernetes for designing and running Machine Learning pipelines for training models and providing inference services.| jacobtomlinson.dev
Kubeflow is a popular MLOps platform built on Kubernetes for designing and running Machine Learning pipelines for training models and providing inference services.| jacobtomlinson.dev
Talk abstract There are many powerful libraries in the Python ecosystem for accelerating the computation of large arrays with GPUs.| jacobtomlinson.dev
Writing GPU code in Python is easier today than ever! I joined NVIDIA in 2019 and I was brand new to GPU development.| jacobtomlinson.dev
Straight from our own Informatics Lab, Jacob will share how to perform large scale distributed data analysis on any cloud platform with Pangeo.| jacobtomlinson.dev
Cloud agnostic distributed data analysis with Pangeo| jacobtomlinson.dev
Enable your development teams to be able to deploy software quickly and efficiently, as well as at great scale, with containerisation.| jacobtomlinson.dev
If you read reports from organizations like the CNCF they will tell you that 90% of people are developing in containers, 69% are running them in production and 77% of those are using Kubernetes to manage them.| jacobtomlinson.dev
Server administration is an activity that often happens in an isolated context in a terminal. ChatOps is a way of bringing that work into a shared environment and unlocking more collaboration.| jacobtomlinson.dev
In late 2017 the Met Office ran a joint Tianchi data science challenge with Alibaba. I was invited to present the winning team with their award at the Mobile World Congress in Barcelona.| jacobtomlinson.dev
My data lives in an object store, but my tools expect a POSIX file path, what do I do?| jacobtomlinson.dev
This session gives an in-depth look at the current state of big data at AWS. Learn about the latest big data trends and industry use cases.| jacobtomlinson.dev
In order to analyse the petabytes of data we have at the Met Office we need very large clusters of servers.| jacobtomlinson.dev
Autoscaling Distributed Compute With Dask Kubernetes and AWS| jacobtomlinson.dev
Interactive Big Data Analysis With Jupyter, Dask and more| jacobtomlinson.dev
Making Environmental Science and Data Beautiful and Useful| jacobtomlinson.dev
The Met Office is a world leading weather and climate centre with the largest operational supercomputer in Europe.| jacobtomlinson.dev
We should split the session into roughly three activities, with the final one being totally open ended.| jacobtomlinson.dev
The Met Office Informatics Lab includes scientists, developers and designers. We build prototypes exploring new technologies to make environmental data useful.| jacobtomlinson.dev
Learn How to Build Cool Things With Weather Data in Python| jacobtomlinson.dev
By using Dask to scale out RAPIDS workloads on Kubernetes you can accelerate your workloads across many GPUs on many machines.| jacobtomlinson.dev
I was invited to join the Kubernetes Batch Working Group (k8s-batch-sig) to give an overview of how Dask runs on Kubernetes, especially for batch style workloads with DaskJob.| jacobtomlinson.dev
The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs with minimal code changes and no new tools to learn.| jacobtomlinson.dev
Writing GPU code in Python is easier today than ever! I joined NVIDIA in 2019 and I was brand new to GPU development.| jacobtomlinson.dev
Writing GPU code in Python is easier today than ever! I joined NVIDIA in 2019 and I was brand new to GPU development.| jacobtomlinson.dev
Writing GPU code in Python is easier today than ever! I joined NVIDIA in 2019 and I was brand new to GPU development.| jacobtomlinson.dev
Summary Writing GPU code in Python is easier today than ever, and in this tutorial, I will share what I’ve learned and how you can get started with accelerating your code.| jacobtomlinson.dev
At the Dask Summit I chaired a workshop called Deploying Dask. This workshop was comprised of multiple talks delivered by myself and others.| jacobtomlinson.dev
At the Dask Summit I chaired a workshop called Deploying Dask. This workshop was comprised of multiple talks delivered by myself and others.| jacobtomlinson.dev
RAPIDS is an end to end data science stack built entirely for CUDA GPUs. Faster analytics, at scale, for lower total cost of ownership.| jacobtomlinson.dev
High-throughput (task-based) computing is a flexible approach to parallelization. It involves splitting a problem into loosely-coupled tasks.| jacobtomlinson.dev
Writing code for GPUs has come a long way over the last few years and it is now easier than ever to get started.| jacobtomlinson.dev
Native Cloud Deployment with Dask-Cloudprovider| jacobtomlinson.dev
The RAPIDS suite of open source software libraries (https://rapids.ai/) allow you to run data science and analytics pipelines entirely on GPUs, but following familiar Python APIs including Numpy, Pandas and SciKit Learn.| jacobtomlinson.dev
The Informatics Lab is a technology, science and design research group at the Met Office. For the last four years we’ve been exploring, among other things, how we can do more science using the power of cloud computing.| jacobtomlinson.dev