Node.js ETL (Extract, Transform, Load) Pipeline: What Are We Building?
In this series of tutorials we are going to learn about using Extract, Transform, Load pipelines for handling large datasets with Node.js.
We will use two different approaches to creating an ETL pipeline:
- Basic ETL pipeline that retrieves all the relevant data at once from a remote data source.
- ETL pipeline that uses Node.js Streams to process data from a local CSV file of any size.
Along the way you will learn about the different stages of an ETL pipeline, how to use Promises and async/await in Node.js, how to interact with the file system using the fs
module, and how to use Streams.
Goal
Get an overview of what these ETL tutorials are about.
Prerequisites
- None
Watch
Overview
In this series of tutorials we are going to learn about using Extract, Transform, Load pipelines for moving and processing data between systems.
We will use two different approaches to creating an ETL pipeline:
- Basic ETL pipeline that retrieves all the relevant data at once from a remote data source.
- ETL pipeline that uses Node.js Streams to process data from a local CSV file of any size.
The major difference between the two is that with Streams we are able to process data from files of nearly unlimited size, whereas the first approach is limited by fitting the data set into the memory available on the machine it's running on.
Along the way you will learn about the different stages of an ETL pipeline, how to use Promises and async/await, how to interact with the file system using the fs
module, and how to use Streams.
ETL starts with a data source that we are extracting data from. Our data source will be the NASA Exoplanet API, which is a publicly accessible collection of data that supports the research of exoplanets. An exoplanet is a planet which orbits a star outside our solar system, and the Exoplanet API is an interface to NASA's Exoplanet Archive which collects data from astronomers from around the world.
We will be extracting data from the Exoplanet API, transforming some of the data into a new structure, and then loading the data into a JSON file. When you create your own ETL pipelines, the destination will likely be a database instead of a flat file, but the same concepts will apply.
Recap
In this tutorial, we previewed the app we'll be building to demonstrate an ETL pipeline.
Further your understanding
- Be sure you review the concept of ETL pipelines before diving into the code. Check out the next tutorial, Node.js ETL (Extract, Transform, Load) Pipeline: What Are We Building? which is a recap of ETL pipelines taken from our previous overview of data brokering.
Sign in with your Osio Labs account
to gain instant access to our entire library.