Contoso Data Generator


The Contoso Data Generator is a tool for generating sample data with randomly generated orders for the Contoso data model to provide demo data. Generated data is ready to be imported into Power BI, Fabric, and other platforms.

To consume the data, you can just download a ready to use sets of data generated by the tool.

If you want to create your customized set of data, you can get Contoso Data Generator on GitHub. It is a c# program that generate the data plus additional scripts for simplifying the activity, importing data to sql-server, etc.

Supported output formats:

  • Parquet
  • Delta Table (files)
  • CSV
  • CSV multi file
  • CSV multi file, gz compressed
  • Sql Server, via bulk-insert script of the generated CSV files


Delta Table output can be directly used in Fabric LakeHouse without any conversion:



Usage

DataGenerator runs at command-line and requires four mandatory elements to run:

  • a configuration file (json)
  • a data file (excel)
  • an output folder
  • a cache folder
  • [optional parameters]
databasegenerator.exe  configfile  datafile  outputfolder  cachefolder   [param:AAAAA=nnnn] [param:BBBBB=mmmm]

Example:

databasegenerator.exe  c:\temp\config.json  c:\temp\data.xlsx  c:\temp\OUT\  c:\temp\CACHE\

To simplify running the tool, a set of scripts is available.

Note: The tool needs some files containing static data: fake customers, exchange rates, postal codes, etc. The files are cached after being downloaded over the Internet from a specific GitHub repository. More details.

Previous versions

The current version of this tool is the evolution of the first Contoso Data Generator, still available on GitHub:

Last update: Jul 22, 2024
In this page