Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.



#Data #GAN #reading #synthetic

The simple solution of left-joining the customer table with the order table, consolidating all the information and then synthesising this big table will not work.

For two reasons:

1. Each synthesised row will be a new customer and order. But in reality we have to fix the customer and only then generate details at the order level — in other words, fix the parent and generate the child. This would leave us unable to constrain the number of products in any order, which is a key feature of the data that needs to be preserved.

2. The assumption of independence of rows is no longer valid for the customer table — each customer is independent of each other. However, the order details — products bought — are not independent. Some of these fields may even have dependencies on time and should be treated as a sequence: A given Product X is dependent on the presence of other products in the basket.

To address this particular case, let's consider two levels:

1. Customer level (name, address, etc) - that we call the “parent level”

2. Order details level (products, suppliers, etc) - that we call the “child level” It is important to preserve this structure in the synthetic version because otherwise there could be misalignments and information leaks such as orders without customers or customers having unrealistic orders. This can be seen as a particular case of sequential data.

Some other examples where sequential data is common include:

electronic health records (EHR) data - diagnostics, exams

messages sent and received between two or many agents

measurements of physical systems taken over time

credit card transactions

Note that the finest grain may not be a sequence but the key insight is that data has a structure that has to be preserved — rows are not independent

If you want to change selection, open document below and click on "Move attachment"

pdf

owner: crocodile - (no access) - Generating synthetic data with referential integrity using GANs - Hazy.pdf, p3


Summary

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Details



Discussion

Do you want to join discussion? Click here to log in or create user.