# # randomly sample items from a set, with or without replacement # set UNIVERSE ordered; # set of items to sample -- must be ordered param ssize integer > 0; # sample size desired param sample {1..ssize} symbolic; # will hold generated sample data; # data below is just for demonstration purposes set UNIVERSE := 'cat', 'dog', 'bird', 'fish', 'turtle', 'snake', 'frog', 'rock'; param ssize := 4; model; # back to the code let ssize := min(ssize, card(UNIVERSE)); # omit if sampling with replacement param item symbolic; # dummy parameter to hold each selection for {i in 1..ssize} { let item := member(ceil(Uniform(0, card(UNIVERSE))), UNIVERSE); let sample[i] := item; let UNIVERSE := UNIVERSE diff {item}; # omit if sampling with replacement } display sample;
I might mention a few things about the code:
- I used symbolic data in the example. If you are sampling from a set of numbers, you can omit the two instances of the keyword symbolic.
- If you want to sample with replacement, there are two lines (marked by comments) that you need to omit.
- The code puts the sample in a (vector) parameter. If you are sampling without replacement, an alternative is to put the code into a set. Define the set with a default value of {}, and use the union operator with second argument {item} to add the item to the sample set inside the loop.
Hey Rubin,
ReplyDeletedo you have an idea how to adjust your code, if the set members have different probabilities?
Thank you in advance,
Marc
I can outline one approach. (I'm not sure it's the most efficient, but if the sample is not very large, it should be good enough.)
Delete1. Create a vector of partial sums of the probabilities, in the same order that the entities appear in the UNIVERSE set. If the probabilities are $p_1, \dots, p_n$, the new vector will be $s = (p_1, p_1 + p_2, \dots, p_1 + \dots + p_{n-1}, 1)$.
2. For each observation in the sample:
2a. Generate a pseudorandom number $u\in (0,1)$.
2b. Find the smallest index $i$ such that $u \le s_i$.
2c. Select member($i$, UNIVERSE) as the sampled item.
2d. If sampling without replacement:
2d1: Remove item $i$ from UNIVERSE.
2d2: Recalculate the probabilities of the remaining items, changing $p_j$ to $p_j / (1-p_i)$ for $j \neq i$.
2d3: Regenerate the partial sum vector $s$ (now shorter by one entry).
I like your approach.
DeleteThank you very much,
Marc