Validation

From CDS 130

Jump to: navigation, search

Contents

  1. Objective
  2. Motivation
  3. Priming questions
  4. Notes
    1. Definitions
    2. Definitions cont.
    3. Correct Equations?
    4. Correct Equations? cont.
    5. Parts of a Model
    6. Validation Steps
    7. Validation Limitations
    8. Validation vs. Development
    9. Relation to Scientific Method
    10. Example
    11. Model 1, part 1
    12. Model 1, part 2
    13. Model 1, part 3
    14. Model 1, part 4
    15. Model 1, part 5
    16. Model 2
    17. Model 3, part 1
    18. Model 3, part 2
    19. Model 3, part 3
    20. Model 1, part 4
    21. Model 3, part 5
    22. Model 3, Conclusion
    23. A few more days pass
    24. Model 4
    25. Summary
    26. Summary cont.
    27. Summary question
    28. Model 5
    29. Model 5 cont.
    30. SIR References
  5. Activities
    1. Kinematics
      1. a
      2. b
      3. c
  6. Questions
    1. Neuron Model
    2. Dog Shaking Frequency Model
    3. Validation in the Wild
    4. Reading
  7. References

1. Objective

  • To introduce the concept of Validation of a computer simulation

2. Motivation

  1. Every science model is an approximation of reality.
  2. Every mathematical representation of a science model is an approximation of the science model.
  3. Every computer representation of a mathematical model is an approximation.

It is important to understand how well the computer simulation predicts the actual behavior of the system that is being modeled.

3. Priming questions

  • In a set of bullet points, write down everything that comes to mind when you hear the word "scientific method". Write your name on the sheet of paper and turn it in after three minutes.
  • Newton's law of gravity accurately predicts the position of the planets. Suppose a new planet is discovered and its position is not consistent with Newton's law of gravity. (This actually happened with Uranus).
    • If the law invalid?
    • Are the data invalid?

4. Notes

4.1. Definitions

There are two basic stages in testing computer simulations

  • Verification - Are you solving the equations correctly?
    • This does not address the question of if the model is a reasonable reflection of reality.
  • Validation - Are you are solving the correct equations?
    • A science model requires assumptions and assertions about how a system works. The validation process is used to determine if these assumptions and claims were valid.
    • The process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model. (AIAA G-077-1998)

4.2. Definitions cont.

Compare with other definitions

  • Validation - In the context of science - are you are solving the correct equations?
  • In the context of business, it is a statement about a product or service [1]:
Validation is a Quality assurance process of establishing evidence that provides a high degree of assurance that a product, service, or system accomplishes its intended requirements. This often involves acceptance of fitness for purpose with end users and other product stakeholders. It is sometimes said that validation can be expressed by the query "Are you building the right thing?" and verification by "Are you building it right?" "Building the right thing" refers back to the user's needs, while "building it right" checks that the specifications be correctly implemented by the system. In some contexts, it is required to have written requirements for both as well as formal procedures or protocols for determining compliance.

4.3. Correct Equations?

  • Data for population (S-curve)
  • Prediction from computer simulation (J-curve)

4.4. Correct Equations? cont.

  • How do you know that you are not solving the correct equations?
  • What about this argument:
    • Equations are correct, but the growth rate in the simulation was not correct

4.5. Parts of a Model

  • The equations
    • Are usually specified by the science model
  • The adjustable parameters
    • Are parameters like growth rate, interest rate, etc.
    • Parameters are usually restricted to fall in a range. What is an acceptable range for growth rate?

4.6. Validation Steps

Usually performed after Verification

  1. Run the simulation with reasonable estimates of the parameters.
  2. Do sanity checks - Does the simulation predict behavior that would never happen in reality? (Does population ever go negative? Does the mass stay positive?)
  3. If measurements of the modeled system are available, compare simulation with data

4.7. Validation Limitations

Name some!

4.8. Validation vs. Development

Model validation is intimately related to model development.

  1. A model is proposed and then validation reveals "warts".
  2. The model is revised until warts are gone.

4.9. Relation to Scientific Method

Validation is a key component of the scientific method as applied to computational science:

  1. Characterization of existing data
    • Identify features using a table and/or plot
    • Estimate uncertainty using information about how the data were collected
  2. Formulation of a hypothesis (the science model is the hypothesis)
  3. Formulation of a predictive test (both the mathematical and computational models produce quantitative predictions)
  4. Experimental testing (verification and compare computational model predictions with existing data)
  5. Validate
    1. If "valid", report and peer review. If peer review reveals "warts", go back to hypothesis step
    2. If "invalid", go back to hypothesis step
  6. Wait till more data comes in. Validate again.

4.10. Example

  • In the following, we give an example of how the scientific method could be applied to some data involving a flu outbreak.
  • Although the steps taken are made up, it is an illustrative example of how science sometimes proceeds.

4.11. Model 1, part 1

1. Characterization of existing data

  • Data are from article: http://dx.doi.org/10.1136/bmj.1.6112.586
  • Actual N values not given in paper. N (measured values) determined by zooming in on PDF and overlaying grid. Then numbers were plotted using a spreadsheet.
  • Uncertainty due to zoom method = +/- 3
  • Uncertainty in N measured by doctors in school = ???

4.12. Model 1, part 2

2. Formulation of a hypothesis

Model 1

The number infected equals 10 times the day number, with day number = 1 corresponding to the day the first student was infected.

4.13. Model 1, part 3

3. Formulation of a predictive test

Create mathematical and computational representation of the model and plot predictions versus data.

Mathematical Model

N(i) = 10*i

Image:model1.png

Plot: [2]

4.14. Model 1, part 4

4. Experimental testing

Verification and compare computational model predictions with existing data.

Suggest ways to

  • Verify
  • Compare

Image:model1.png

Plot: [3]

4.15. Model 1, part 5

5. Validate

Validation - Are you are solving the correct equations?

  1. Run the simulation with reasonable estimates of the parameters.
    • We guess the parameters. The growth rate is positive, so it seems reasonable.
  2. Do sanity checks - Does the simulation predict behavior that would never happen in reality? Does population ever go negative? Does the mass stay positive?
    • No negative behavior, but ...
  3. If measurements of the modeled system are available, compare simulation with data
    • We did this, and the match is "not good".
    • We will cover what "not good" means later in the semester.

What if you you don't want to give up on your equations?

  • Try same equations with different parameters

4.16. Model 2

Repeat Model 1 with a different parameter.

Model 2

The number infected equals 20 times the day number, with day number = 1 corresponding to the day the first student was infected.

Plot: [4]

Image:model2.png

Repeating this process by changing the parameters will result in the realization that the model will never be able to reproduce the curve (basic math happens to tells us this too, but this is rarely the case in the real world).

Are you are solving the correct equations? No

4.17. Model 3, part 1

1. Characterization of existing data

Same as Model 1, part 1

4.18. Model 3, part 2

2. Formulation of a hypothesis

Model 3

The number of new people infected on a given day is proportional to the number of people infected on the previous day.

or

The change in the number of people infected from one day to the next is proportional to the number of people already infected.

4.19. Model 3, part 3

3. Formulation of a predictive test

Create mathematical and computational representation of the model and plot predictions versus data.

Mathematical Model

If N(i) is the number of infected on a given day, then the number of new people infected is a*N(i):

N(i+1) = N(i) + a*N(i)

or

N(i+1) - N(i) = a*N(i)

Image:model3.png

4.20. Model 1, part 4

4. Experimental testing

Verification and compare computational model predictions with existing data.

Image:model3.png

Enter equations and plot: [5]

4.21. Model 3, part 5

5. Validate

  1. Run the simulation with reasonable estimates of the parameters.
    • We guess the parameters. The growth rate is positive, so it seems reasonable.
  2. Do sanity checks - Does the simulation predict behavior that would never happen in reality? Does population ever go negative? Does the mass stay positive?
    • No negative behavior although ...
    • ... If we ask what will happen on day 32, we get a number that is larger than the number of available students. This is a problem. We need to revise our statement about the model to say that it seems to be valid in the first few days of an outbreak. This is a wart!
  3. If measurements of the modeled system are available, compare simulation with data
    • We did this, and the match is "good".
    • We will cover what "good" means later in the semester.

4.22. Model 3, Conclusion

  • It is a good representation of the available data?
    • Yes, with the caveat that very few data are available
  • It is a good predictor of how an influenza outbreak will spread?
    • With the caveat that it is only been validated on the initial stages of an outbreak, and that it would fail a sanity check for a longer time.
  • It is an exact representation of an influenza outbreak?
    • No. No model is ever an exact representation of a system. There are always approximations and uncertainty.

4.23. A few more days pass

And more data are discovered! Need to validate the model on the new data.


4.24. Model 4

Need something to "pull down" model curve. Guess:

N(i+1) = N(i) + a*N(i) - b*N(i)*N(i)
  • How would you go about answering the question: "Are you are solving the correct equations?"

Image:model4.png

Result: [6]

4.25. Summary

How to proceed?

"Inverse modeling" approach

  1. Continue to guess mathematical models
  2. If one passes all validation tests, then ask "Can we think of a science model that explains this?".

That is, start with mathematical and figure out science later.

Sometimes failure in passing the validation tests results in discovery - the "guess" mathematical model actually has properties that explain other data!

4.26. Summary cont.

How to proceed?

"Forward modeling" approach

  1. Go back and think more about the science (how the system works)
  2. Try validation on new science models until one passes all validation tests.

That is, start with science and work out mathematical model later.

4.27. Summary question

Think of a science discovery that came about using

  • The inverse modeling approach
  • The forward modeling approach

4.28. Model 5

Instead of trying to remove warts ad-hoc, go back to school and gain a better understanding of how disease spreads. Write a paper when you get it right.

The S-I-R model

  • S = Susceptibles (neither infected or immune)
  • I = Infectives (infected and can transmit)
  • R = Recovered (have been infected but are not immune)

\frac{}{}N = S + I + R

We can use this fact for Validation! If the model does not predict this, we have a wart!

\frac{\Delta S}{\Delta t} = -aSI

\frac{\Delta I}{\Delta t} = aSI-bI

\frac{\Delta R}{\Delta t} = bI

a = 0.00218/day (probability of becoming infected)

b = 0.441/day (infectious period)

S(1) = 762

I(1) = 1;

R(1) = 0;

4.29. Model 5 cont.

Do these equations make sense? For the first equation, assume S=constant and I=0. What does the equation predict?

\frac{\Delta I}{\Delta t} = aSI-bI

\frac{\Delta R}{\Delta t} = bI

4.30. SIR References

  • Original paper: W.O. Kermack and A.G. McKendrick, A Contribution to the Mathematical Theory of Epidemics, Proc. Roy. Soc. London A 115, 700-721, 1927.
  • Textbook with extensive discussion of model: J.D. Murray, Mathematical Biology I, An Introduction, p. 325-326, Springer-Verlag, 2002.
  • Study of the SIR model using Mathematica (parameters used in lecture were taken from these notes) [7]


5. Activities

5.1. Kinematics

A model of the relationship between the velocity v of an object and time t is

v(t) = vo + at

A computational representation of this model is

v = vo + a*(i-1)

where i is an integer greater than or equal to one, and the difference between i and i+1 is 0.1 seconds.

Given the relationship between the computational variable i and physical variable t, we can determine the velocity at t = 0.1 seconds using

v = vo + a*(2-1) = vo + a

We can compute the velocity at many different times using iteration. In the following program, values of velocity are stored in an array named V. The first element of the array corresponds to the velocity at t = 0. The second element is the velocity at t = 0.1.

a  = 0.1;  % Proportionality constant
Vo = 8.0; % Start (initial) velocity
for i = [1,2]
  V(i) = Vo + a*(i-1);
end

5.1.1. a

What will the velocity be at t = 10 seconds if

  1. The start velocity was 5 and a=0.1, and
  2. The start velocity was 12 and a=0.1?

5.1.2. b

Suppose measurements of an object were taken at the following times

  1. t=1.0, v=20
  2. t=1.2, v=22
  3. t=1.4, v=24

What calculations would you make to validate the model v(t) = vo + at? Make these calculations and then state why the equations are or are not a correct representation of the measurements.

Create a line plot that shows the values in the array V versus t (not i) as dots and shows the four measurements as circles. Label the x-axis "t [s]" and create a legend with values "v measured" and "v modeled".

5.1.3. c

Suppose more measurements of the object became available:

  1. t=1.6, v=29
  2. t=1.8, v=33
  3. t=2.0, v=36

What calculations would you make to validate the model v(t) = vo + at? Make these calculations and then state why or why the equations are or are not a correct representation of the measurements.

6. Questions

6.1. Neuron Model

In the above, we gave a procedure for validating a model. In reality, these steps are rarely explicitly presented, and sometimes steps are left out.

In the paper "Simple Model of Spiking Neurons" by Izhikevich (pdf), simulation results from a model are presented and the author argues that the model reproduces the spiking and bursting observed in cortical neurons. You are not expected to understand many of the science or computational details given in this work. However, you should be able to answer the following basic questions that should be specified in any description of the results from a computational simulation:

  1. What was the mathematical model?
  2. Was the computational model given?
  3. What are the model's adjustable parameters?
  4. Write out any sentences in the paper that you feel are related to validation.

6.2. Dog Shaking Frequency Model

Read this "Physicists Discover Universal "Wet-Dog Shake" Rule - How fast should a wet dog rotate its body to dry its fur?" [8]

  • What was the science/conceptual model for how fast a dog shakes?
  • What was the mathematical model for how fast a dog shakes?
  • How was the mathematical model validated?
  • How could a computational model be used to figure out the reason the mathematical model did not match the data?

6.3. Validation in the Wild

In one of your science courses, you have worked with a science model. In the description of the model, was there any discussion of how it was validated? If you can't find a discussion there, do some searching on the web.

Write two or three sentences that describe the model and how the model was validated. If you could not find any discussion of validation, describe the research that you did in an attempt to find a model with a discussion of validation.

6.4. Reading

  • Read the two papers (pdf | pdf) (only read the first two pages of each and the conclusions of the neuron paper, you can skim the rest)
  • You are not expected to understand many of the science or computational details given in this work. However, you should be able to answer the following basic questions that should be specified in any description of the results from a computational simulation and be prepared to discuss:
    • What is the conceptual/science model?
    • What is the mathematical model?
    • What is the computational model?
    • How many adjustable parameters does each model have?
    • How was the model verified?
    • How was the model validated?
    • Suggest your own verification test
    • Suggest your own validation test

7. References

  • A discussion of Verification and Validation in the context of Computational Fluid Dynamics (the study of the behavior of liquids and gases with a computer): [9].
  • Validation and Verification in the context of models of social behavior: [10]
  • Book: "Verification and validation of complex systems: human factors issues" [11]
Personal tools