Random Numbers

Transkrypt

Random Numbers
Models and modelling

Czas
Monte Carlo methods
in physics
Tadeusz Lesiak
T.Lesiak
Monte Carlo methods
1
Models and modelling

Czas
Monte Carlo Integration:
Random Number Generators
T.Lesiak
Monte Carlo methods
2
Random Numbers
„There is no such thing as a random number
– there are only methods to produce random numbers.”
„Anyone who considers arithmetic methods
of producing random digits is, of course, in a state of sin.”
John von Neumann
T.Lesiak
Monte Carlo methods
3
Random Numbers
A random number – a particular, single value of a random variable.
Two basic categories of random numbers:
• Truly random (TRN),
• Pseudo-random (PRN).
Desired features of random number generators:
•
•
•
•
•
T.Lesiak
Randomness
Reproducibility
Long period (length of sequence before repetition)
Generation time
Computer memory – now rather obsolete
Monte Carlo methods
4
Truly Random Numbers (TRN)
TRN – governed by a chance, their values cannot be predicted,
- any sequence of such numbers is irreproducible,
- can only be generated by random physical processes,
 physical generators are sources of TRN.
Example 1
Two kinds of physical generators:
•
•
Mechanical e.g. roulette, dice, toss a coin, lotto,
Exploiting directly some physical process like
the radioactive decay, white noise of electronics…
Example 2
The number of of disintegrations of radioactive material over a given time interval is counted:
odd 0bit, even 1bit; (31 bit number is formed); typical speed 6000 random numbers/hour
Drawbacks of physical generators:
-
Slow
Problems with stability – small changes in the environment can cause serious variations in
the probabilistic properties of the sequences need for additional tests and corrections
Note: white & black noise by G.Marsaglia (1995) – 650 MB of random numbers
T.Lesiak
Monte Carlo methods
5
WWW.RANDOM.ORG
T.Lesiak
Monte Carlo methods
6
Pseudo Random Numbers (PRN)
PRN – generated according to a given, strict mathematical formula,
 are reproducible (not truly random),
- have appearance of randomness
(their statistical behavior is very close to the one of TRN).
Mathematical generators – sources of PRN
Advantages of mathematical generators:
• easy to use (fast, simple, convenient),
• the generated numbers possess good statistical properties.
Physical generators 
Mathematical generators flourish and dominate.
„random numbers” = pseudo random numbers.
„random number generators (RNG)” = mathematical algorithm for generation of PRN.
T.Lesiak
Monte Carlo methods
7
Pseudorandom Variables
Why are pseudorandom variables used?
An absolute requirement in debugging a computer is the ability to
repeat a particular run of the program.
If real random numbers are used, an identical calculation could not be
repeated and the recurrence of an error would be left to chance.
T.Lesiak
Monte Carlo methods
8
Pseudo Random Numbers (PRN)
General scheme of RNG:
1. Take starting values (initial constants, seeds):
2. Use a strict formula that gives n-th number on the basis previous (n-1) ones
Output:
integer numbers – sequences of random bits.
They are then converted to floating point numbers uniformly distributed on [0,1].
How to get random numbers Ui uniform in the range [0,1] from the bit sequence bi1,bi2, … biL ?
T.Lesiak
Monte Carlo methods
9
Pseudo Random Numbers (PRN)
A period of RNG:
Let us define a sequence of random numbers:
P ( a certain integer) is a period of the generator if
In principle the period can be evaluated theoretically (often with much difficulty)
General requirement for the period of a good RNG:
N – the number of random numbers used in a given MC calculation
Typical practical requirements:
Example: RNG Mersenne Twister (Matsumoto & Nishimura, 1998)
T.Lesiak
Monte Carlo methods
10
Your First Own Generator
Assume any real number from (0,1) e.g.:
Assume any big real number:
Repeat:
frac = fractional part of the real number:
([z] – integer part of the number)
frac – also called mantissa)
T.Lesiak
Monte Carlo methods
11
Exercise : Your own generator of uniformly
distributed random numbers
Please implement two simple generators of the uniform distribution
on the range [0,1] and plot their distributions:
A)
R = 34123412341341.43142341341
x0 = 0.123125412351234
GenUniRNG00.C ( ½ x 2p.)
B)
GenUni*.C, *=1,2,3,4,5,12
T.Lesiak
Monte Carlo methods
(1/2 * 2p.)
12
Is the Generator Good or Bad?
„Random number generator should not be chosen at random”
Donald Knuth
„A good generator” provides sequences of numbers that have properties of truly random numbers
But how to make sure that the numbers are „truly random”?
Practical (tradition) realization:
- Define the features of the uniform random numbers and check if the sequence
(output from the generator) possesses these properties.
-
BUT:
an infinite number of features can be invoked  infinite number of tests,
even if some finite number of tests was passed, the next one could disqualify the generator.
We can show that the generator is a bad one, without the proof that it is good.
Examples: a battery of tests DIEHARD (Marsaglia; http://stat.fsu.edu/~geo/diehard.html)
D.E. Knuth „The Art of Computer Programming”, vol. 2
T.Lesiak
Monte Carlo methods
13
Is the Generator Good or Bad?
Simple generator
T.Lesiak
RANLUX
Monte Carlo methods
14
Is the Generator Good or Bad?
http://demonstrations.wolfram.com/RandomNumberGeneration//
http://demonstrations.wolfram.com/MersenneTwisterAndFriends/
http://demonstrations.wolfram.com/PoorStatisticalQualitiesForThe
RANDURandomNumberGenerator/
T.Lesiak
Monte Carlo methods
15
RNGs – Uniform Distribution
Linear generators (congruential generators) (LG)
Shifted registers generators (SRG)
Fibonacci sequence based generators (FSBG)
Subtract-with-borrow generators (SWBG)
Multiply-with-carry generators (MWCG)
Non-linear generators (NLG)
Combined Generators (CG)
T.Lesiak
Monte Carlo methods
16
Linear Generators (LG)
The general formula:
- parameters of the generator ( fixed, non-negative integers)
Q mod P – integer modulo operation of Q over P
- multiplicative generator
- mixed generator
usually k =1 suffices:
Choosing one of the prime number as m improves statistical properties of the generator.
In practice
IMAX – the largest integer represented by a computer
(32 bit machine: IMAX = 232-1
The method originally gives a random number x from the range [0,m-1] ; Conversion: y = r / (m-1)
The maximum period
(can be reached only of appropriate choice of parameters):
T.Lesiak
Monte Carlo methods
17
Linear Generators (LG)
Simple example
The largest number which can be generated: 24 – 1
The period: 24
It is instructive to plot the „random number cycle”
Contours give the random number value
Azimuthal indices give the order of each
random number with a sequence
(here for seed r0= 1)
a=5
(b)
(a)
T.Lesiak
Monte Carlo methods
18
Linear Generators (LG)
Simple example cont.
a = 5, 9, 13, period = 16
a = 14, period = 4
Doesn’t it look ugly?
T.Lesiak
Monte Carlo methods
19
Linear Generators (LG)
Theorem on linear generators (Hull, Dobel, 1962):
A full period for a generator
can be achieved if the three parameters have the following properties:
T.Lesiak
Monte Carlo methods
20
Linear Generators (LG)
Author/Year
a
b
m
216 + 3
0
231
4 * 237 + 1
0
235
Marsaglia (1972)
69069
1
232
Park, Miller (1980)
16807
0
231 - 1
L’Ecuyer (1988)
40692
0
231 - 249
Fischman (1990)
68909602460261
0
248
RANDU
Zieliński (1966)
Note:
- not sufficient for present MC calculations
For multidimensional case
(generation of a vector of random numbers)
A – matrix n x n
Depending on the choice of A, some correlations between components of the vector x are possible
T.Lesiak
Monte Carlo methods
21
Linear Generators (LG)
Main drawback - Marsaglia effect:
The distribution of points is not fully uniform – events lie on certain regular hyperplanes
Let us define:
yi uniform distribution on [0,1]
Points
But also points
are located on certain equidistant hyperplanes inside of a hypercube [0,1]N
T.Lesiak
Monte Carlo methods
22
Shifted Register Generators (LG)
The general formula:
bn - bits
- binary constants
The implementation is straightforward since
(the symmetric difference of a and b)
Bits are converted to digits
The period:
a
b
a xor b
0
0
0
0
1
1
1
0
1
1
1
0
Drawback: do not pass modern statistical tests
T.Lesiak
Monte Carlo methods
23
Fibonacci Sequence Based Generators (FSBG)
Fibonacci sequence:
(liber abaci, Leonardo di Pisa, 1202)
The first FSBG:
fails independence tests
Generalization:
(lagged generators)
easy to implement
The maximum period for
operation
+,-
Pmax
(2r
- 1)
Stat. properties
2L-1
good
*
(2r - 1) 2L-3
very good
xor
2r - 1
fair
Some values of r and s
leading to maximum period
T.Lesiak
The best statistical properties
for the multiplication
r
17
31
55
68
97
607
1279
s
5
13
24
33
33
273
418
Monte Carlo methods
24
Exercise : Fibonacci generator of uniformly
distributed random numbers
Please implement a Fibonacci generator of the uniform distribution
on the range [0,1] and plot its distribution:
GenUniFib.C,
T.Lesiak
(2p.)
Monte Carlo methods
25
Subtract-with-borrow Generators (SWBG)
Marsaglia & Zaman (1991)
c – the carry bit 0 < c < 1
The operation of subtraction-with-borrow:
The scheme of generation:
Initialisation: a sequence of integers from the interval (0,m)
and c = 0.
The algoritm is simple and fast but does not pass all the tests
Example: RCARRY: r=24, s = 10, m = 224,
T.Lesiak
Monte Carlo methods
26
Multiply-with-carry Generators (SWBG)
The generation scheme:
- fixed parameters (natural numbers)
- the carry value to the next step
Cecha, część całkowita – największa liczba całkowita nie większa od wyniku tej operacji
Advantages:
- good statistical properties,
- fast, simple, easy implementation,
- long periods (of the order of 10180).
T.Lesiak
Monte Carlo methods
27
Non-linear Generators (NLG)
The drawback of linear generators: clustering of points on some hypersurfaces
The remedy: use a non linear generator (since 1980s) - they have very good stat. properties
Few examples:
 Eichenauer & Lehn
m – prime number;
The generator gives a sequence from the range [0,m-1), which is then converted to [0,1)
 Eichenauer & Hermann
The value of the xn is obtained independently of previous numbers
 L.Blum & M.Blum & Lehn
Drawback: NLGs are substantially slower to compare with LGs
T.Lesiak
Monte Carlo methods
28
Combined Generators (CG)
The combination of generators usually gives better statistical behavior of the generated
sequences.
Example:
Let us take a combination of r>4 non linear generators with parameters mk (k = 1,2,…,r)
being prime numbers
where

un - a random variable with uniform distribution from the range [0,1), and good statistical
behavior
The period
T.Lesiak
Monte Carlo methods
29
Is the Generator Good or Bad?
„Random number generator should not be chosen at random”
Donald Knuth
„A good generator” provides sequences of numbers that have properties of truly random numbers
But how to make sure that the numbers are „truly random”?
Practical (tradition) realization:
- Define the features of the uniform random numbers and check if the sequence
(output from the generator) possesses these properties.
-
BUT:
an infinite number of features can be invoked  infinite number of tests,
even if some finite number of tests was passed, the next one could disqualify the generator,
we can show that the generator is a bad one, without the proof that it is good .
Examples: a battery of tests DIEHARD (Marsaglia; http://stat.fsu.edu/~geo/diehard.html)
D.E. Knuth „The art. Of Computer Programming”, vol. 2
T.Lesiak
Monte Carlo methods
30
Is the Generator Good or Bad?
A typical scheme of a test:
1. Generate a sequence of n random numbers
2. Calculate the value of a chosen test statistics s
3. Calculate the probability distribution f(s)
4. Repeat the above points (1-3)x N times
If the generator is „good”
then
the sequence f(s1), f(s2), …, f(sN) is a random sequence with uniform distribution on [0,1).
This hypothesis can be tested at given confidence level.
T.Lesiak
Monte Carlo methods
31
Some Tests of RNGs
Tests of the uniform distribution
1. Test
2. Multidimensional tests
3. Kołmogorov test
4. Test OPSO (overlapping pairs sparse occupancy)
Statistics distribution tests
1. Test d2
2. Series Test
3. Poker test
4. Test of the smallest distance between pairs
T.Lesiak
Monte Carlo methods
32
The chi**2 Test
1.
Podział przedziału [0,1) na k-podprzedziałów:
2.
Generacja N liczb losowych
3.
ni – ilość LL należących do i-tego podprzedziału [ai-1,ai)
4.
Dla rozkładu jednorodnego zachodzi:
5.
Definicja zmiennej losowej:
6.
Jej rozkład: chi-kwadrat o (k-1) stopniach swobody
7.
Przypadek szczególny równego podziału na podprzedziały:
T.Lesiak
Monte Carlo methods
33
Birthday Spacing Test
1. Generujemy ciąg LL:
2. Sortujemy tę sekwencję w zbiór niemalejący
3. Tworzymy ciąg odstępów:
4.  otrzymujemy nową zmienną losową Y – liczbę odstępów, które
występują więcej niż jeden raz w powyższej sekwencji
Jeżeli zmienne losowe X1, X2,…, Xn są niezależne i mają rozkład jednorodny
w zbiorze {1,2,…n}, to zmienną Y cechuje rozkład Poissona o parametrze:
Generatory Fibonacciego zwykle nie spełniają tego testu
T.Lesiak
Monte Carlo methods
34
Combinatorial Tests (Poker Test as an example)
Ich zadanie: sprawdzenie niezależności (losowości) próby. Przykłady:
Test pokerowy
1.
Podział przedziału zmienności zmiennych losowych na k-podprzedziałów (dla rozkładu normalnego):
2.
Generacja N liczb losowych:
3.
Dla tego ciągu zachodzi (przy równym podziale):
4.
Tworzymy ciąg zmiennych losowych Yj według formuły:
5.
Tak utworzony ciąg dzielimy na sekwencje po pięć wyrazów (k5 kolejnych piątek):
T.Lesiak
z rozkładu jednorodnego
Monte Carlo methods
35
Poker Test
Rodzaje piątek (w języku pokerowym):
Typ
Nazwa
piątka
AAAAA
Czwórka (kareta)
AAAAB
Full
AAABB
Trójka
(three of a kind)
AAABC
Dwie pary
AABBC
Para
AABCD
Wysoka karta (High
card)
ABCDE
Rozkład danego typu (P(nazwa)):
 Najczęstsze wybory: k = 2,8,10
 Test chi-kwadrat służy do sprawdzenia zgodności rozkładów
generowanych piątek z oczekiwaniami j.w.
T.Lesiak
Monte Carlo methods
36
Coupon Collector’s Test
1. Utworzenie ciągu Yj w ten sam sposób jak dla testu pokerowego
2. Obserwacja tak wygenerowanej sekwencji aż do chwili gdy choć raz
wystąpią w niej wszystkie liczby k: 0,1, … ,k-1
3. R - ilość liczb w tak zdefiniowanej sekwencji
4. Teoretyczny rozkład R:
5. Test chi-kwadrat służy do sprawdzenia zgodności
przewidywania - generacje
T.Lesiak
Monte Carlo methods
37
Collision Test
1. Generacja n x k liczb losowych
2. Uszeregowanie ich w n sekwencji po k LL każda
3. Podział hipersześcianu [0,1)k na m=sk jednakowych k-wymiarowych „kostek”
(objętość każdej z nich 1/m)
4. Obliczenie liczby kolizji Z – liczby przypadków dla których kolejny punkt
wpada do kostki zajętej już przez co najmniej jeden inny punkt.
5. Z – stanowi tutaj statystykę testową. Jej rozkład teoretyczny:
6. Test chi-kwadrat służy do sprawdzenia zgodności
przewidywania - generacje
T.Lesiak
Monte Carlo methods
38
Serial Test via Plotting
Consider several multiplicative generators:
A. Let us change the seed:
B. Let us change the multiplier (a):
T.Lesiak
Monte Carlo methods
39
Serial Test via Plotting
Consider several multiplicative generators:
C. Effect of variation of the constant (b):
D. Sensitivity of RNG performance to small changes of multiplier (a):
T.Lesiak
Monte Carlo methods
40
Serial Test via Plotting
Consider 3-tuples: sets of three consecutive random numbers (x,y,z)
Case 11
T.Lesiak
Case 18
Monte Carlo methods
41
Serial Test via Plotting
Consider 3-tuples: sets of three consecutive random numbers (x,y,z)
Case 20
T.Lesiak
Monte Carlo methods
42
Serial Test via Plotting
1. Generacja
T.Lesiak
Monte Carlo methods
43
Tests by Performing Control Jobs
Exemplary tasks:
1. Evaluation of  by hit-or-miss, or Buffon’s needle etc.
2. Calculation of integrals known analytically
3. Calculation of the unit sphere volume in the d-dimensional space:
 – Euler’s Gamma Function
4. Estimation of parameters of some physical phenomena with exact solutions
T.Lesiak
Monte Carlo methods
44
Integration over a Triangle
Consider the integration of the function g
over the two-dimensional region specified as
At least four ways of estimating this integral
by MC methods can be singled out:
1. „The obvious way”
a. Choose a random number xi from the uniform distribution (range [0,1)),
b. Choose another random number yi between zero and xi
c. Take the sum of g(xi,yi) repeating steps (a) and (b)
- The method yields the same number of points along each vertical line
 Overpopulation (higher density) on the left-side corner
T.Lesiak
Monte Carlo methods
45
Integration over a Triangle
2. The
rejection
method
a.
b.
c.
d.
Choose a random number xi from the uniform distribution (range [0,1)),
Choose another random number yi from the uniform distribution (range [0,1))
If yi > xi, reject the event and return to (a)
Accumulate the sum of g(xi,yi) for the remaining points
Correct , but slow (takes into account only half the events
3. The folding
method
a. Choose two independent random number r1 and and r2,
each from the uniform distribution (range [0,1)),
b. Set
c. Accumulate the sum of g(xi,yi) as before
Correct and efficient; equivalent to choosing points r over the whole square, then folding the
square about the diagonal so that all points (x,y) fall in the lower triangle
4. The
weighting
method
a. Choose a random number xi from the uniform distribution (range [0,1)),
b. Choose another random number yi between zero and xi
c. Take the sum of 2 xi g(xi,yi) repeating steps (a) and (b)
Points are chosen „incorrectly” as in 1, BUT the bias is corrected by applying the
appropriate weighting function – 2x
T.Lesiak
Monte Carlo methods
46
BACKUP
T.Lesiak
Monte Carlo methods
47

Podobne dokumenty