Postgres sample data
Download sample data for learning, testing, and exploring Neon
This guide describes how to download and install sample data for use with Neon.
Prerequisites
- wget for downloading datasets, unless otherwise instructed. If your system does not support
wget
, you can paste the source file address in your browser's address bar. - A
psql
client for connecting to your Neon database and loading data. This client is included with a standalone PostgreSQL installation. See PostgreSQL Downloads. - A
pg_restore
client if you are loading the employees or postgres_air database. Thepg_restore
client is included with a standalone PostgreSQL installation. See PostgreSQL Downloads. - A Neon database connection string. After creating a database, you can obtain the connection string from the Connection Details widget on the Neon Dashboard. In the instructions that follow, replace
postgres://[user]:[password]@[neon_hostname]/[dbname]
with your connection string. - A Neon Pro account if you intend to install a dataset larger than 3 GB.
- Instructions for each dataset require that you create a database. You can do so from a client such as
psql
or from the Neon SQL Editor.
note
You can also load sample data using the Neon CLI. See Load sample data with the Neon CLI.
Sample data
Sample datasets are listed in order of the smallest to largest installed size. Please be aware that the Neon Free Tier has a storage limit of 3 GB per branch. Datasets larger than 3 GB cannot be loaded on the Free Tier.
Name | Tables | Records | Source file size | Installed size |
---|---|---|---|---|
Periodic table data | 1 | 118 | 17 KB | 7.2 MB |
World Happiness Index | 1 | 156 | 9.4 KB | 7.2 MB |
Titanic passenger data | 1 | 1309 | 220 KB | 7.5 MB |
Netflix data | 1 | 8807 | 3.2 MB | 11 MB |
Pagila database | 33 | 62322 | 3 MB | 15 MB |
Chinook database | 11 | 77929 | 1.8 MB | 17 MB |
Lego database | 8 | 633250 | 13 MB | 42 MB |
Employees database | 6 | 3919015 | 34 MB | 333 MB |
Wikipedia vector embeddings | 1 | 25000 | 1.7 GB | 850 MB |
Postgres air | 10 | 67228600 | 1.2 GB | 6.7 GB |
note
Installed size is measured using the query: SELECT pg_size_pretty(pg_database_size('your_database_name'))
. The reported size for small datasets may appear larger than expected due to inherent Postgres storage overhead.
Periodic table data
A table containing data about the periodic table of elements.
-
Create a
periodic_table
database: -
Download the source file:
-
Navigate to the directory where you downloaded the source file, and run the following command:
-
Connect to the
periodic_table
database: -
Look up the the element with the Atomic Number 10:
- Source: https://github.com/andrejewski/periodic-table
- License: ISC License
Copyright (c) 2017, Chris Andrejewski <christopher.andrejewski@gmail.com>
World Happiness Index
A dataset with multiple indicators for evaluating the happiness of countries of the world.
-
Create a
world_happiness
database: -
Download the source file:
-
Navigate to the directory where you downloaded the source file, and run the following command:
-
Connect to the
titanic
database: -
Find the countries where the happiness score is above average but the GDP per capita is below average:
- Source: https://www.kaggle.com/datasets/unsdsn/world-happiness
- License: CC0: Public Domain
Titanic passenger data
A dataset containing information on the passengers aboard the RMS Titanic, which sank on its maiden voyage in 1912.
-
Create a
titanic
database: -
Download the source file:
-
Navigate to the directory where you downloaded the source file, and run the following command:
-
Connect to the
titanic
database: -
Query passengers with the most expensive fares:
- Source: https://www.kaggle.com/datasets/ibrahimelsayed182/titanic-dataset
- License: Unknown
Netflix data
A dataset containing information about movies and tv shows on Netflix.
-
Create a
netflix
database: -
Download the source file:
-
Navigate to the directory where you downloaded the source file, and run the following command:
-
Connect to the
netflix
database: -
Find the directors with the most movies in the database:
- Source: https://www.kaggle.com/datasets/shivamb/netflix-shows
- License: CC0: Public Domain
Pagila database
Sample data for a fictional DVD rental store. Pagila includes tables for films, actors, film categories, stores, customers, payments, and more.
-
Create a
pagila
database: -
Download the source file:
-
Navigate to the directory where you downloaded the source file, and run the following command:
-
Connect to the
pagila
database: -
Find the top 10 most popular film categories based on rental frequency:
- Source: https://github.com/devrimgunduz/pagila
- License: LICENSE.txt
Copyright (c) Devrim Gündüz <devrim@gunduz.org>
Chinook database
A sample database for a digital media store, including tables for artists, albums, media tracks, invoices, customers, and more.
-
Create a
chinook
database: -
Download the source file:
-
Navigate to the directory where you downloaded the source file, and run the following command:
-
Connect to the
chinook
database: -
Find out the most sold item by track title:
- Source: https://github.com/lerocha/chinook-database
- License: LICENSE.md
Copyright (c) 2008-2017 Luis Rocha
Lego database
A dataset containing information about various LEGO sets, their themes, parts, colors, and other associated data.
-
Create a
lego
database: -
Download the source file:
-
Navigate to the directory where you downloaded the source file, and run the following command:
-
Connect to the
lego
database: -
Find the top 5 LEGO themes by the number of sets:
- Source: https://www.kaggle.com/datasets/rtatman/lego-database
- License: CC0: Public Domain
Employees database
A dataset containing details about employees, their departments, salaries, and more.
-
Create the database and schema:
-
Download the source file:
-
Navigate to the directory where you downloaded the source file, and run the following command:
Database objects are created in the
employees
schema rather than thepublic
schema. -
Connect to the
employees
database: -
Find the top 5 departments with the highest average salary:
- Source: The initial dataset was created by Fusheng Wang and Carlo Zaniolo from Siemens Corporate Research. Designing the relational schema was undertaken by Giuseppe Maxia while Patrick Crews was responsible for transforming the data into a format compatible with MySQL. Their work can be accessed here: https://github.com/datacharmer/test_db. Subsequently, this information was adapted to a format suitable for PostgreSQL: https://github.com/h8/employees-database. The data was generated, and there are inconsistencies.
- License: This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Wikipedia vector embeddings
An OpenAI example dataset containing pre-computed vector embeddings for 25000 Wikipedia articles. It is intended for use with the pgvector
Postgres extension, which you must install first to create a table with vector
type columns. For a Jupyter Notebook that uses this dataset with Neon, refer to the following GitHub repository: neon-vector-search-openai-notebooks
-
Download the zip file (~700MB):
-
Navigate to the directory where you downloaded the zip file, and run the following command to extract the source file:
-
Create a
wikipedia
database: -
Connect to the
wikipedia
database: -
Install the
pgvector
extension: -
Create the following table in your database:
-
Create vector search indexes:
-
Navigate to the directory where you extracted the source file, and run the following command:
note
If you encounter a memory error related to the maintenance_work_mem
setting, refer to Indexing vectors for how to increase this setting.
- Source: OpenAI
- License: MIT License
Postgres air database
An airport database containing information about airports, aircraft, bookings, passengers, and more.
-
Download the file (1.3 GB) from Google drive
-
Create a
postgres_air
database: -
Navigate to the directory where you downloaded the source file, and run the following command:
Database objects are created in a
postgres_air
schema rather than thepublic
schema. -
Connect to the
postgres_air
database: -
Find the aircraft type with the most flights:
- Source: https://github.com/hettie-d/postgres_air
- License: BSD 3-Clause License
Copyright (c) 2020, hettie-d All rights reserved.
Load sample data with the Neon CLI
You can load data with the Neon CLI by passing the --psql
option, which calls the psql
command line utility.
The Neon CLI and psql
must be installed on your system. For installation instructions, see:
If you have multiple Neon projects or branches, we recommend setting your Neon CLI project and branch context so that you don't have to specify them explicitly when running a Neon CLI command. See Neon CLI commands — set-context.
To load sample data:
-
Download one of the data files listed above. For example:
Alternatively, supply your own data file.
-
Load the data using one of the following Neon CLI commands (projects, branches, or connection-string):
-
Create a new Neon project, connect to it with
psql
, and run the.sql
file. -
Create a branch, connect to it with
psql
, and run the an.sql
file. -
Get a connection string, connect with
psql
, and run the.sql
file.
-
Need help?
Join our Discord Server to ask questions or see what others are doing with Neon. Users on paid plans can open a support ticket from the console. For more detail, see Getting Support.
Last updated on