Spaces:
Running
Running
Upload misc documentation files
Browse files
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
ProjectResilience[[:space:]]Overview[[:space:]]LF\[27\].pdf filter=lfs diff=lfs merge=lfs -text
|
ProjectResilience Overview LF[27].pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d900140a203f124b7dd9c20494741d50b6ad7507cede6eddf6f58aafff2cbe3e
|
3 |
+
size 8608823
|
data_requirements.md
ADDED
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Project Resilience Data Requirements and Tips
|
2 |
+
|
3 |
+
## Format and Features
|
4 |
+
|
5 |
+
Data features in each row of the data should include columns that can be cast
|
6 |
+
as **Context**, **Actions** and **Outcomes** of a decision pertaining to the
|
7 |
+
unit of decision-making.
|
8 |
+
|
9 |
+
For example, if the problem is carbon emissions decisions per power plant:
|
10 |
+
- the unit of decision making is a power plant, so each row should represent
|
11 |
+
a decision made for a power plant
|
12 |
+
- Context features are features about the plant that can't be changed
|
13 |
+
(e.g., location, weather, reactor type)
|
14 |
+
- Actions are policies for the plant that can be changed within reasonable
|
15 |
+
time so that the effect can be observed and associated to the action
|
16 |
+
(e.g., generator setup config, carbon capture level, change in generation
|
17 |
+
hours)
|
18 |
+
- Outcomes are quantifiable values that can be attributed to a single region
|
19 |
+
within a reasonable lag (e.g., carbon emissions, cost of actions, energy
|
20 |
+
- output)
|
21 |
+
|
22 |
+
## Predictability
|
23 |
+
|
24 |
+
We need some a priori theory of why/how Actions could affect Outcomes,
|
25 |
+
and why we should expect prediction of Outcomes to be easier from
|
26 |
+
Context/Actions rather than from a Context alone. A human being should,
|
27 |
+
just by looking at the context / action data, be able to predict more or less
|
28 |
+
what the outcome should be. At least be able to reason about it.
|
29 |
+
Alternatively, a basic predictor model mapping Context/Actions to Outcomes
|
30 |
+
should be able to show that it uses the Actions to make predictions better
|
31 |
+
than with Context alone. This simple predictor model does not need to use
|
32 |
+
the full data or input/output spaces, it just needs to make it clear that
|
33 |
+
there's something there.
|
34 |
+
|
35 |
+
## Rules of Thumb
|
36 |
+
|
37 |
+
### Time-series
|
38 |
+
|
39 |
+
Either (1) we have an outcome value at each time step, in which case the row
|
40 |
+
should indicate the time step; or (2) we have an outcome value only at
|
41 |
+
particular time steps (e.g., if we have daily power plant CO2 output,
|
42 |
+
but only monthly cost reports). In any case, if there are time steps which
|
43 |
+
are missing some values (context, action, or outcome), it's ok if they are
|
44 |
+
NA in the row for that time step: we can still construct time series to train
|
45 |
+
on from this dataset.
|
46 |
+
|
47 |
+
### Missing Data
|
48 |
+
|
49 |
+
To give the project the best chance of success, the amount of missing data
|
50 |
+
should be minimal and/or structured, e.g., we only get cost reports monthly.
|
51 |
+
|
52 |
+
### Data Sufficiency
|
53 |
+
|
54 |
+
Data rows should cover variations of decisions sufficiently, and so, in the
|
55 |
+
case of time-series data, we need historical decision instances that include
|
56 |
+
different actions taken for similar context.
|
57 |
+
|
58 |
+
A single row should represent one observation, which includes context,
|
59 |
+
actions, outcomes for that observation.
|
60 |
+
|
61 |
+
We need enough cases for our predictor to learn something about how Actions
|
62 |
+
affect Outcomes. If we have thousands of samples to begin with, that certainly
|
63 |
+
gives us a better shot. A quick-and-dirty check for correlations between
|
64 |
+
actions and outcomes could be used as a gating function, i.e., the
|
65 |
+
correlation matrix should not look like noise. If it looks like noise,
|
66 |
+
the project may be possible but hard.
|
67 |
+
|
68 |
+
Data requirement grows exponentially with number of outcome objectives.
|
69 |
+
|
70 |
+
### Consistency
|
71 |
+
|
72 |
+
Same context and actions should result in similar outcomes. Contradicting
|
73 |
+
samples should be minimal. In other words, not too many rows with same
|
74 |
+
Context and Actions resulting in different Outcomes.
|
75 |
+
|
76 |
+
It should be possible to observe the outcome of an action in a reasonable
|
77 |
+
amount of time (e.g., less than 3 months)
|
78 |
+
|
79 |
+
### Availability and Updates to the Data
|
80 |
+
|
81 |
+
As a rule of thumb, the number of new samples should be at least on the order
|
82 |
+
of the problem dimension, or (dim(A) + dim(C)) x (dim(O)). More important
|
83 |
+
than the number of new samples is which data is sampled: one sample in a
|
84 |
+
previously-unknown region of interest may be more useful than thousands
|
85 |
+
in a region we already know well or don't care about. So, if we control
|
86 |
+
which data is sampled, we don't need as much of it.
|
87 |
+
|
88 |
+
### Transparency/Accountability
|
89 |
+
|
90 |
+
Data should come from reliable, trusted, scientific, ethical sources.
|
91 |
+
(e.g. not blackboxes or your mom's Facebook surveys).
|
project_resilience_conceptual_architecture.pdf
ADDED
Binary file (179 kB). View file
|
|