Cryso Agori
V.I.P. Member
Lol, twitter artists tryna say gotcha with this paper when
4.2.1 Extraction Methodology Our extraction approach adapts the methodology from prior work [11] to images and consists of two steps:
1. Generate many examples using the diffusion model in the standard sampling manner and with the known prompts from the prior section.
2. Perform membership inference to separate the model’s novel generations from those generations which are memorized training examples
Generating many images. The first step is trivial but computationally expensive: we query the Gen function in a black-box manner using the selected prompts as input. To reduce the computational overhead of our experiments, we use the timestep-resampled generation implementation that is available in the Stable Diffusion codebase [58]. This process generates images in a more aggressive fashion by removing larger amounts of noise at each time step and results in slightly lower visual fidelity at a significant (∼10×)performance increase. We generate 500 candidate images for each text prompt to increase the likelihood that we find memorization.
In order to evaluate the effectiveness of our attack, we select the 350,000 most-duplicated examples from the training dataset and generate 500 candidate images for each of these prompts (totaling 175 million generated images). We first sort all of these generated images by ordering them by the mean distance between images in the clique to identify generations that we predict are likely to be memorized training data. We then take each of these generated images and annotate each as either “extracted” or “not extracted” by comparing it to the training images under Definition 1. We find 94 images are (2,0.15)extracted. To ensure that these images not only match some arbitrary definition, we also manually annotate the top-1000 generated images as either memorized or not memorized by visual analysis, and find that a further 13 (for a total of 109 images) are near-copies of training examples even if they do not fit our 2-norm definition. Figure 3 shows a subset of the extracted images that are reproduced with near pixel-perfect accuracy; all images have an 2 difference under 0.05. (As a point of reference, re-encoding a PNG as a JPEG with quality level 50 results in an 2 difference of 0.02 on average.) Given our ordered set of annotated images, we can also compute a curve evaluating the number of extracted images to the attack’s false positive rate. Our attack is exceptionally precise: out of 175 million generated images, we can identify 50 memorized images with 0 false positives, and all our memorized images can be extracted with a precision above 50%. Figure 4 contains the precision-recall curve for both memorization definitions. Measuring (k, ,δ)-eidetic memorization. In Definition 2 we introduced an adaptation of Eidetic memorization [11] tailored to the domain of generative image models. As mentioned earlier, we compute similarity between pairs of images with a direct 2 pixel-space similarity. This analysis is computationally expensive as it requires comparing each of our memorized images against each of the 160 million training examples. We set δ = 0.1 as this threshold is sufficient to identify almost all small image corruptions (e.g., JPEG compression, small brightness/contrast adjustments) but has very few false positives. Figure 5 shows the results of this analysis. While we identify little Eidetic memorization for k < 100, this is expected due to the fact we choose prompts of highlyduplicated images. Note that at this level of duplication, the duplicated examples still make up just one in a million training examples. These results show that duplication is a major factor behind training data extraction.