How to Use DTM Data Generator for JSON — Step-by-Step Guide
This guide shows a concise, practical workflow to create realistic JSON test data using DTM Data Generator. Assumptions: you have DTM Data Generator installed (or access to the web/CLI tool) and a basic understanding of JSON structure. If you need installation steps, tell me and I’ll add them.
1. Define your JSON schema
- Identify fields, types, required vs optional, and example values.
- Map nested objects and arrays.
- Decide cardinality (number of records) and variability (uniqueness, ranges).
Example schema (conceptual):
- id: integer
- name: string
- email: string (unique)
- created_at: datetime (ISO 8601)
- address: object { street, city, postal_code }
- tags: array of strings
2. Create a DTM profile/template
- Open DTM’s UI or create a template file for the CLI.
- For each field, select a generator type:
- integer: sequential or random range
- string: names, Lorem, custom pattern
- email: email generator with domain options
- datetime: range and format (ISO 8601)
- object: nested template referencing subfields
- array: set length or variable length with item template
- Mark fields as required/nullable and set uniqueness constraints for keys like email or id.
Example (pseudoconfig):
- id: type=sequence start=1
- name: type=name
- email: type=email unique=true
- created_at: type=datetime start=2020-01-01 end=now format=iso
- address: type=object { street:type=street, city:type=city, postal_code:type=postcode }
- tags: type=array min=0 max=5 item=type=word
3. Configure output format and options
- Choose JSON output.
- Select output style:
- NDJSON (newline-delimited JSON) for streaming/line-based ingestion.
- JSON array for single-file loads.
- Set pretty-print vs compact output.
- Configure file naming, compression (gzip), and destination folder.
4. Specify record count and performance settings
- Set total records (e.g., 10,000).
- Configure concurrency/threads if supported to speed generation.
- Adjust memory or batch sizes to balance speed and resource use.
5. Run a small test
- Generate a small sample (e.g., 10–100 records).
- Validate JSON correctness with a linter or by loading into your target system.
- Check uniqueness constraints, date ranges, and nested structures.
6. Iterate on data realism
- Tune distributions (e.g., age skew, probability of nulls).
- Add realistic constraints (country-specific postal codes, locale for names).
- Include edge cases: very long strings, special characters, missing fields.
7. Generate full dataset
- Run the full generation job using finalized template and output settings.
- Monitor job progress and resource usage.
- Verify end-file integrity (valid JSON, correct record count).
8. Integration and consumption
- Import NDJSON into databases like Elasticsearch, MongoDB, or data pipelines.
- Use JSON array files for batch loads into relational databases after transformation.
- Automate generation in CI pipelines for repeatable tests.
9. Maintain templates and versioning
- Store templates alongside tests in version control.
- Document template purpose, schema versions, and generation parameters.
- Reuse and parameterize templates for different environments (dev/staging).
Troubleshooting (brief)
- Invalid JSON: check nested object templates and commas; run a linter.
- Duplicate keys despite uniqueness setting: ensure seed or uniqueness pool is large enough.
- Performance issues: reduce batch size or increase threads; generate compressed output.
If you want, I can:
- produce a ready-to-run DTM template file for the example schema above,
- show CLI commands for NDJSON vs array output,
- or generate sample JSON output for verification. Which would you like?
Leave a Reply