Random Numbers
Transkrypt
Random Numbers
Models and modelling Czas Monte Carlo methods in physics Tadeusz Lesiak T.Lesiak Monte Carlo methods 1 Models and modelling Czas Monte Carlo Integration: Random Number Generators T.Lesiak Monte Carlo methods 2 Random Numbers „There is no such thing as a random number – there are only methods to produce random numbers.” „Anyone who considers arithmetic methods of producing random digits is, of course, in a state of sin.” John von Neumann T.Lesiak Monte Carlo methods 3 Random Numbers A random number – a particular, single value of a random variable. Two basic categories of random numbers: • Truly random (TRN), • Pseudo-random (PRN). Desired features of random number generators: • • • • • T.Lesiak Randomness Reproducibility Long period (length of sequence before repetition) Generation time Computer memory – now rather obsolete Monte Carlo methods 4 Truly Random Numbers (TRN) TRN – governed by a chance, their values cannot be predicted, - any sequence of such numbers is irreproducible, - can only be generated by random physical processes, physical generators are sources of TRN. Example 1 Two kinds of physical generators: • • Mechanical e.g. roulette, dice, toss a coin, lotto, Exploiting directly some physical process like the radioactive decay, white noise of electronics… Example 2 The number of of disintegrations of radioactive material over a given time interval is counted: odd 0bit, even 1bit; (31 bit number is formed); typical speed 6000 random numbers/hour Drawbacks of physical generators: - Slow Problems with stability – small changes in the environment can cause serious variations in the probabilistic properties of the sequences need for additional tests and corrections Note: white & black noise by G.Marsaglia (1995) – 650 MB of random numbers T.Lesiak Monte Carlo methods 5 WWW.RANDOM.ORG T.Lesiak Monte Carlo methods 6 Pseudo Random Numbers (PRN) PRN – generated according to a given, strict mathematical formula, are reproducible (not truly random), - have appearance of randomness (their statistical behavior is very close to the one of TRN). Mathematical generators – sources of PRN Advantages of mathematical generators: • easy to use (fast, simple, convenient), • the generated numbers possess good statistical properties. Physical generators Mathematical generators flourish and dominate. „random numbers” = pseudo random numbers. „random number generators (RNG)” = mathematical algorithm for generation of PRN. T.Lesiak Monte Carlo methods 7 Pseudorandom Variables Why are pseudorandom variables used? An absolute requirement in debugging a computer is the ability to repeat a particular run of the program. If real random numbers are used, an identical calculation could not be repeated and the recurrence of an error would be left to chance. T.Lesiak Monte Carlo methods 8 Pseudo Random Numbers (PRN) General scheme of RNG: 1. Take starting values (initial constants, seeds): 2. Use a strict formula that gives n-th number on the basis previous (n-1) ones Output: integer numbers – sequences of random bits. They are then converted to floating point numbers uniformly distributed on [0,1]. How to get random numbers Ui uniform in the range [0,1] from the bit sequence bi1,bi2, … biL ? T.Lesiak Monte Carlo methods 9 Pseudo Random Numbers (PRN) A period of RNG: Let us define a sequence of random numbers: P ( a certain integer) is a period of the generator if In principle the period can be evaluated theoretically (often with much difficulty) General requirement for the period of a good RNG: N – the number of random numbers used in a given MC calculation Typical practical requirements: Example: RNG Mersenne Twister (Matsumoto & Nishimura, 1998) T.Lesiak Monte Carlo methods 10 Your First Own Generator Assume any real number from (0,1) e.g.: Assume any big real number: Repeat: frac = fractional part of the real number: ([z] – integer part of the number) frac – also called mantissa) T.Lesiak Monte Carlo methods 11 Exercise : Your own generator of uniformly distributed random numbers Please implement two simple generators of the uniform distribution on the range [0,1] and plot their distributions: A) R = 34123412341341.43142341341 x0 = 0.123125412351234 GenUniRNG00.C ( ½ x 2p.) B) GenUni*.C, *=1,2,3,4,5,12 T.Lesiak Monte Carlo methods (1/2 * 2p.) 12 Is the Generator Good or Bad? „Random number generator should not be chosen at random” Donald Knuth „A good generator” provides sequences of numbers that have properties of truly random numbers But how to make sure that the numbers are „truly random”? Practical (tradition) realization: - Define the features of the uniform random numbers and check if the sequence (output from the generator) possesses these properties. - BUT: an infinite number of features can be invoked infinite number of tests, even if some finite number of tests was passed, the next one could disqualify the generator. We can show that the generator is a bad one, without the proof that it is good. Examples: a battery of tests DIEHARD (Marsaglia; http://stat.fsu.edu/~geo/diehard.html) D.E. Knuth „The Art of Computer Programming”, vol. 2 T.Lesiak Monte Carlo methods 13 Is the Generator Good or Bad? Simple generator T.Lesiak RANLUX Monte Carlo methods 14 Is the Generator Good or Bad? http://demonstrations.wolfram.com/RandomNumberGeneration// http://demonstrations.wolfram.com/MersenneTwisterAndFriends/ http://demonstrations.wolfram.com/PoorStatisticalQualitiesForThe RANDURandomNumberGenerator/ T.Lesiak Monte Carlo methods 15 RNGs – Uniform Distribution Linear generators (congruential generators) (LG) Shifted registers generators (SRG) Fibonacci sequence based generators (FSBG) Subtract-with-borrow generators (SWBG) Multiply-with-carry generators (MWCG) Non-linear generators (NLG) Combined Generators (CG) T.Lesiak Monte Carlo methods 16 Linear Generators (LG) The general formula: - parameters of the generator ( fixed, non-negative integers) Q mod P – integer modulo operation of Q over P - multiplicative generator - mixed generator usually k =1 suffices: Choosing one of the prime number as m improves statistical properties of the generator. In practice IMAX – the largest integer represented by a computer (32 bit machine: IMAX = 232-1 The method originally gives a random number x from the range [0,m-1] ; Conversion: y = r / (m-1) The maximum period (can be reached only of appropriate choice of parameters): T.Lesiak Monte Carlo methods 17 Linear Generators (LG) Simple example The largest number which can be generated: 24 – 1 The period: 24 It is instructive to plot the „random number cycle” Contours give the random number value Azimuthal indices give the order of each random number with a sequence (here for seed r0= 1) a=5 (b) (a) T.Lesiak Monte Carlo methods 18 Linear Generators (LG) Simple example cont. a = 5, 9, 13, period = 16 a = 14, period = 4 Doesn’t it look ugly? T.Lesiak Monte Carlo methods 19 Linear Generators (LG) Theorem on linear generators (Hull, Dobel, 1962): A full period for a generator can be achieved if the three parameters have the following properties: T.Lesiak Monte Carlo methods 20 Linear Generators (LG) Author/Year a b m 216 + 3 0 231 4 * 237 + 1 0 235 Marsaglia (1972) 69069 1 232 Park, Miller (1980) 16807 0 231 - 1 L’Ecuyer (1988) 40692 0 231 - 249 Fischman (1990) 68909602460261 0 248 RANDU Zieliński (1966) Note: - not sufficient for present MC calculations For multidimensional case (generation of a vector of random numbers) A – matrix n x n Depending on the choice of A, some correlations between components of the vector x are possible T.Lesiak Monte Carlo methods 21 Linear Generators (LG) Main drawback - Marsaglia effect: The distribution of points is not fully uniform – events lie on certain regular hyperplanes Let us define: yi uniform distribution on [0,1] Points But also points are located on certain equidistant hyperplanes inside of a hypercube [0,1]N T.Lesiak Monte Carlo methods 22 Shifted Register Generators (LG) The general formula: bn - bits - binary constants The implementation is straightforward since (the symmetric difference of a and b) Bits are converted to digits The period: a b a xor b 0 0 0 0 1 1 1 0 1 1 1 0 Drawback: do not pass modern statistical tests T.Lesiak Monte Carlo methods 23 Fibonacci Sequence Based Generators (FSBG) Fibonacci sequence: (liber abaci, Leonardo di Pisa, 1202) The first FSBG: fails independence tests Generalization: (lagged generators) easy to implement The maximum period for operation +,- Pmax (2r - 1) Stat. properties 2L-1 good * (2r - 1) 2L-3 very good xor 2r - 1 fair Some values of r and s leading to maximum period T.Lesiak The best statistical properties for the multiplication r 17 31 55 68 97 607 1279 s 5 13 24 33 33 273 418 Monte Carlo methods 24 Exercise : Fibonacci generator of uniformly distributed random numbers Please implement a Fibonacci generator of the uniform distribution on the range [0,1] and plot its distribution: GenUniFib.C, T.Lesiak (2p.) Monte Carlo methods 25 Subtract-with-borrow Generators (SWBG) Marsaglia & Zaman (1991) c – the carry bit 0 < c < 1 The operation of subtraction-with-borrow: The scheme of generation: Initialisation: a sequence of integers from the interval (0,m) and c = 0. The algoritm is simple and fast but does not pass all the tests Example: RCARRY: r=24, s = 10, m = 224, T.Lesiak Monte Carlo methods 26 Multiply-with-carry Generators (SWBG) The generation scheme: - fixed parameters (natural numbers) - the carry value to the next step Cecha, część całkowita – największa liczba całkowita nie większa od wyniku tej operacji Advantages: - good statistical properties, - fast, simple, easy implementation, - long periods (of the order of 10180). T.Lesiak Monte Carlo methods 27 Non-linear Generators (NLG) The drawback of linear generators: clustering of points on some hypersurfaces The remedy: use a non linear generator (since 1980s) - they have very good stat. properties Few examples: Eichenauer & Lehn m – prime number; The generator gives a sequence from the range [0,m-1), which is then converted to [0,1) Eichenauer & Hermann The value of the xn is obtained independently of previous numbers L.Blum & M.Blum & Lehn Drawback: NLGs are substantially slower to compare with LGs T.Lesiak Monte Carlo methods 28 Combined Generators (CG) The combination of generators usually gives better statistical behavior of the generated sequences. Example: Let us take a combination of r>4 non linear generators with parameters mk (k = 1,2,…,r) being prime numbers where un - a random variable with uniform distribution from the range [0,1), and good statistical behavior The period T.Lesiak Monte Carlo methods 29 Is the Generator Good or Bad? „Random number generator should not be chosen at random” Donald Knuth „A good generator” provides sequences of numbers that have properties of truly random numbers But how to make sure that the numbers are „truly random”? Practical (tradition) realization: - Define the features of the uniform random numbers and check if the sequence (output from the generator) possesses these properties. - BUT: an infinite number of features can be invoked infinite number of tests, even if some finite number of tests was passed, the next one could disqualify the generator, we can show that the generator is a bad one, without the proof that it is good . Examples: a battery of tests DIEHARD (Marsaglia; http://stat.fsu.edu/~geo/diehard.html) D.E. Knuth „The art. Of Computer Programming”, vol. 2 T.Lesiak Monte Carlo methods 30 Is the Generator Good or Bad? A typical scheme of a test: 1. Generate a sequence of n random numbers 2. Calculate the value of a chosen test statistics s 3. Calculate the probability distribution f(s) 4. Repeat the above points (1-3)x N times If the generator is „good” then the sequence f(s1), f(s2), …, f(sN) is a random sequence with uniform distribution on [0,1). This hypothesis can be tested at given confidence level. T.Lesiak Monte Carlo methods 31 Some Tests of RNGs Tests of the uniform distribution 1. Test 2. Multidimensional tests 3. Kołmogorov test 4. Test OPSO (overlapping pairs sparse occupancy) Statistics distribution tests 1. Test d2 2. Series Test 3. Poker test 4. Test of the smallest distance between pairs T.Lesiak Monte Carlo methods 32 The chi**2 Test 1. Podział przedziału [0,1) na k-podprzedziałów: 2. Generacja N liczb losowych 3. ni – ilość LL należących do i-tego podprzedziału [ai-1,ai) 4. Dla rozkładu jednorodnego zachodzi: 5. Definicja zmiennej losowej: 6. Jej rozkład: chi-kwadrat o (k-1) stopniach swobody 7. Przypadek szczególny równego podziału na podprzedziały: T.Lesiak Monte Carlo methods 33 Birthday Spacing Test 1. Generujemy ciąg LL: 2. Sortujemy tę sekwencję w zbiór niemalejący 3. Tworzymy ciąg odstępów: 4. otrzymujemy nową zmienną losową Y – liczbę odstępów, które występują więcej niż jeden raz w powyższej sekwencji Jeżeli zmienne losowe X1, X2,…, Xn są niezależne i mają rozkład jednorodny w zbiorze {1,2,…n}, to zmienną Y cechuje rozkład Poissona o parametrze: Generatory Fibonacciego zwykle nie spełniają tego testu T.Lesiak Monte Carlo methods 34 Combinatorial Tests (Poker Test as an example) Ich zadanie: sprawdzenie niezależności (losowości) próby. Przykłady: Test pokerowy 1. Podział przedziału zmienności zmiennych losowych na k-podprzedziałów (dla rozkładu normalnego): 2. Generacja N liczb losowych: 3. Dla tego ciągu zachodzi (przy równym podziale): 4. Tworzymy ciąg zmiennych losowych Yj według formuły: 5. Tak utworzony ciąg dzielimy na sekwencje po pięć wyrazów (k5 kolejnych piątek): T.Lesiak z rozkładu jednorodnego Monte Carlo methods 35 Poker Test Rodzaje piątek (w języku pokerowym): Typ Nazwa piątka AAAAA Czwórka (kareta) AAAAB Full AAABB Trójka (three of a kind) AAABC Dwie pary AABBC Para AABCD Wysoka karta (High card) ABCDE Rozkład danego typu (P(nazwa)): Najczęstsze wybory: k = 2,8,10 Test chi-kwadrat służy do sprawdzenia zgodności rozkładów generowanych piątek z oczekiwaniami j.w. T.Lesiak Monte Carlo methods 36 Coupon Collector’s Test 1. Utworzenie ciągu Yj w ten sam sposób jak dla testu pokerowego 2. Obserwacja tak wygenerowanej sekwencji aż do chwili gdy choć raz wystąpią w niej wszystkie liczby k: 0,1, … ,k-1 3. R - ilość liczb w tak zdefiniowanej sekwencji 4. Teoretyczny rozkład R: 5. Test chi-kwadrat służy do sprawdzenia zgodności przewidywania - generacje T.Lesiak Monte Carlo methods 37 Collision Test 1. Generacja n x k liczb losowych 2. Uszeregowanie ich w n sekwencji po k LL każda 3. Podział hipersześcianu [0,1)k na m=sk jednakowych k-wymiarowych „kostek” (objętość każdej z nich 1/m) 4. Obliczenie liczby kolizji Z – liczby przypadków dla których kolejny punkt wpada do kostki zajętej już przez co najmniej jeden inny punkt. 5. Z – stanowi tutaj statystykę testową. Jej rozkład teoretyczny: 6. Test chi-kwadrat służy do sprawdzenia zgodności przewidywania - generacje T.Lesiak Monte Carlo methods 38 Serial Test via Plotting Consider several multiplicative generators: A. Let us change the seed: B. Let us change the multiplier (a): T.Lesiak Monte Carlo methods 39 Serial Test via Plotting Consider several multiplicative generators: C. Effect of variation of the constant (b): D. Sensitivity of RNG performance to small changes of multiplier (a): T.Lesiak Monte Carlo methods 40 Serial Test via Plotting Consider 3-tuples: sets of three consecutive random numbers (x,y,z) Case 11 T.Lesiak Case 18 Monte Carlo methods 41 Serial Test via Plotting Consider 3-tuples: sets of three consecutive random numbers (x,y,z) Case 20 T.Lesiak Monte Carlo methods 42 Serial Test via Plotting 1. Generacja T.Lesiak Monte Carlo methods 43 Tests by Performing Control Jobs Exemplary tasks: 1. Evaluation of by hit-or-miss, or Buffon’s needle etc. 2. Calculation of integrals known analytically 3. Calculation of the unit sphere volume in the d-dimensional space: – Euler’s Gamma Function 4. Estimation of parameters of some physical phenomena with exact solutions T.Lesiak Monte Carlo methods 44 Integration over a Triangle Consider the integration of the function g over the two-dimensional region specified as At least four ways of estimating this integral by MC methods can be singled out: 1. „The obvious way” a. Choose a random number xi from the uniform distribution (range [0,1)), b. Choose another random number yi between zero and xi c. Take the sum of g(xi,yi) repeating steps (a) and (b) - The method yields the same number of points along each vertical line Overpopulation (higher density) on the left-side corner T.Lesiak Monte Carlo methods 45 Integration over a Triangle 2. The rejection method a. b. c. d. Choose a random number xi from the uniform distribution (range [0,1)), Choose another random number yi from the uniform distribution (range [0,1)) If yi > xi, reject the event and return to (a) Accumulate the sum of g(xi,yi) for the remaining points Correct , but slow (takes into account only half the events 3. The folding method a. Choose two independent random number r1 and and r2, each from the uniform distribution (range [0,1)), b. Set c. Accumulate the sum of g(xi,yi) as before Correct and efficient; equivalent to choosing points r over the whole square, then folding the square about the diagonal so that all points (x,y) fall in the lower triangle 4. The weighting method a. Choose a random number xi from the uniform distribution (range [0,1)), b. Choose another random number yi between zero and xi c. Take the sum of 2 xi g(xi,yi) repeating steps (a) and (b) Points are chosen „incorrectly” as in 1, BUT the bias is corrected by applying the appropriate weighting function – 2x T.Lesiak Monte Carlo methods 46 BACKUP T.Lesiak Monte Carlo methods 47