3,554,162 events, 1,584,593 push events, 2,807,787 commit messages, 195,033,451 characters
Buttons sense clicks
Talkin' away... I don't know what I'm to say, I'll say it anyway... today is not my day to find you. Shyin' away... I'll be comin' for your love okay... TAAAAKE OOON MEEEEEEEEE (take on me) TAAAAAKE MEEEEE OOOOONN (take one me) I'LLL BEEE GOOOONE IN A DAY OR TWOOOOOOO
Add files via upload
Description:
Paul is an excellent coder and sits high on the CW leaderboard. He solves kata like a banshee but would also like to lead a normal life, with other activities. But he just can't stop solving all the kata!!
Given an array (x) you need to calculate the Paul Misery Score. The values are worth the following points:
kata = 5 Petes kata = 10 life = 0 eating = 1
The Misery Score is the total points gained from the array. Once you have the total, return as follows:
< 40 = 'Super happy!' < 70 >= 40 = 'Happy!' < 100 >= 70 = 'Sad!'
100 = 'Miserable!'
Hello, world!
Printing something now. Wheeeee.
This has been clarifying with respect to two things about Hare's relationship to C. Firstly the array type mumbo jump explained in the comment in type.lisp. Secondly I have a new understanding of C global variables and constness.
To sum up, this is legal:
const char* hello = "Hello, world!"; int f() { hello = "Fuck you!"; return puts(hello); }
the "const" refers to "char", not the pointer itself, or in other words, "const char*" means "pointer to const char". The string itself can't be altered (e.g. hello[0] = 'c' is illegal) but hello itself is just a variable. In a normal C implementation this means that there is a sort of implicit value of type const char**, call it mhello; source references to "hello" are really "*mhello".
This is not how Hare is going to work, because variables are not modifiable. If a variable is initialized with "Hello", it has type pointer to array, and you can't assign a new array. (If a variable is initialized with an integer for example, it IS modifiable, by writing into the int*.)
[CHG] core, web: deprecate t-raw
Add a big fat warning when the qweb compiler finds a t-raw
.
t-esc
should now be used everywhere, the use-case for t-raw
should
be handled by converting the corresponding values to Markup
objects. Even though it's convenient, this constructor should never
be made available in the qweb rendering context (maybe that should be
checked for explicitely?).
Replace werkzeug.escape
by markupsafe.escape
in
odoo.tools.html_escape
, this means the output of html_escape
is
markup-safe.
Updated qweb to work correctly with escaping and Markup
, amongst
other things QWeb bodies should be markup-safe internally (so that a
t-set
value can be fed into a t-esc
). See at the bottom for the
attributes handling as it's a bit complicated.
to_text
needed updating: markupsafe.Markup
is a subclass of str
,
but str
is not a passthrough for strings. So Markup
instances
going through would be converted to normal str
, losing their safety
flag. Since qweb internally uses to_text
on pretty much
everything (in order to handle None / False), this would then cause
almost every Markup
to get mistakenly double-escaped.
Also mark a bunch of APIs as markup-safe by default
- html_sanitize output.
- HTML fields content, sanitization is applied on intake (so stripped
by the trip through the database) and if the field is unsanitised
the injection is very much intentional, probably. Note: this
includes automatically decoding bytes as a number of default values
& computes yield bytes, which Markup will happily accept... by
repr-ing them which is useless. This is hard to notice without
-b
. - Script-safe json, it's rather the point (though it uses a non-standard escaping scheme).
- Note that
nl2br
, kinda: it should work correctly whether or not the input is markup-safe, this means we should not need to escape values fed tonl2br
, but it doesn't hurt either.
Update some qweb field serialisations to mark their output as markup-safe when necessary (e.g. monetary, barcode, contact). Otherwise either using proper escaping internally or doing nothing should do the trick.
Also update qweb to return markup-safe bytes: we want qweb to return
markup-safe contents as a common use-case is to render something with
one template, and inject its content in an other one (with Python code
inbetween, as t-call
works a bit differently and does not go through
the external rendering interface).
However qweb returns bytes
while Markup
extends str
. After a
quick experiment with changing qweb rendering to return str
(rather
unmitigated failure I fear), it looks like the safest tack is to add a
somewhat similar bytes-based type, which decodes to a Markup
but
keeps to bytes semantics.
For debugging and convenience reasons, MarkupSafeBytes does not
stringify and raises an error instead (__repr__
works fine). This is
to avoid implicit stringifications which do the wrong thing (namely
create a string "b'foo'"
).
Also add some configuration around BytesWarning (which still has to be
enabled at the interpreter level via -b
, there's no way to enable it
programmatically smh), and monkeypatch showwarning
to show warning
tracebacks, as it's common for warnings to be triggered in the bowels
of the application, and hard to relate to business logic without the
complete traceback.
There are a few issues with respect to attributes. The first issue is
that markup-safe content is not necessarily attributes-safe
e.g. markup-safe content can contain unescaped <
or double-quotes
while attributes can not. So we must forcefully escape the input, even
if it's supposedly markup-safe already.
This causes a problem for script-safe JSON: it's markup-safe but
really does its own thing. So instead of escaping it up-front and
wrapping it in Markup, make script-safe JSON its own type which
applies JSON-escaping *during the __html__
call.
This way if a script-safe JSON object goes through markupsafe.escape
we'll apply script-safe escaping, otherwise it'll be treated as a
regular strings and eventually escaped the normal way.
A second issue was the processing of format-valued
attributes (t-attf
): literal segments should always be markup-safe,
while non-literal may or may not be. This turns out to be an issue if
the non-literal segment is markup-safe: in that case when the
literal and non-literal segments get concatenated the literal segments
will get escaped, then attributes serialization will escape
them again leading to doubly-escaped content in attributes.
The most visible instance of this was the snippet_options
template,
specifically:
<t t-set="so_content_addition_selector" t-translation="off">blockquote, ...</t>
<div id="so_content_addition"
t-att-data-selector="so_content_addition_selector"
t-attf-data-drop-near="p, h1, h2, h3, .row > div > img, #{so_content_addition_selector}"
data-drop-in=".content, nav"/>
Here so_content_addition_selector
is a qweb body therefore
markup-safe, When concatenated with the literal part of
t-atff-data-drop-near
it would cause the HTML-escaping of that
yielding a new Markup object. Normal attributes processing would then
strip the markup flag (using str()
) and escape it again, leading to
doubly-escaped literals.
The original hack around was to unescape() Markup
content before
stringifying it and escaping it again, in the attribute serialization
method (_append_attributes
).
That's pretty disgusting, after some more consideration & testing it
looks like a much better and safer fix is to ensure the
expression (non-literal) segments of format strings always result in
str
, never Markup
, which is easy enough: just all str()
on the
output of strexpr. We could also have concatenated all the bits using
''.join
instead of repeated concatenation (+
).
Also add a check on the type of the format string for safety, I think it should always be a proper str and the bytes thing is only when running in py2 (where lxml uses bytestrings as a space optimization for ascii-only values) but it should not hurt too much to perform a single typecheck assertion on the value... instead of performing one per literal segment.
Note: we may need to implement unescape anyway, because it's still
possible to get double-escaping with the current scheme: given an
explicitly escape-ed foo
and t-att-foo="foo"
, foo
will be
re-escaped.
Merge #62728 #63073 #63214
62728: roachtest: don't call t.Fatal in FailOnReplicaDivergence r=erikgrinaker a=tbg
In #61990 we had this method catch a stats divergence on teardown in an
otherwise successful test. The call to t.Fatal
in that method
unfortunately prevented the logs from being collected, which is not
helpful.
Release note: None
63073: roachtest: retire bank
tests r=nvanbenschoten,andreimatei a=tbg
The bank
roachtests are a basic form of Jepsen test. We were hoping
that they could provide additional benefit due to being less complex
than a full Jepsen test. In my opinion, this calculus has not worked
out. We've spent many hours staring at this test deadlocked when it
wasn't CockroachDB's fault at all. Besides, it's not adding anything
meaningful on top of our Jepsen tests plus KVNemesis plus the TPCC
invariant checks, so we are removing the bank
family of tests here.
Should we want to reintroduce these tests, we should do it at the level
of TestCluster, or at least augment the bank
workload to incorporate
these invariant checks instead.
Closes #62754. Closes #62749 Closes #53871.
Release note: None
63214: kvserver: reduce ReplicaGCQueueInactivityThreshold to 12 hours r=ajwerner a=erikgrinaker
ReplicaGCQueueInactivityThreshold
specifies the interval at which the
GC queue checks whether a replica has been removed from the canonical
range descriptor, and was set to 10 days. This is a fallback for when we
fail to detect the removal and GC the replica immediately. However, this
could occasionally cause stale replicas to linger for 10 days, which
surprises users (e.g. by causing alerts if the stale replica thinks the
range has become underreplicated).
This patch reduces the threshold to 12 hours, which is a more reasonable timeframe for users to expect things to "sort themselves out". The operation to read the range descriptor is fairly cheap, so this is not likely to cause any problems, and the interval is therefore not jittered either.
Resolves #63212, touches #60259.
Release note (ops change): Replica garbage collection now checks replicas against the range descriptor every 12 hours (down from 10 days) to see if they should be removed. Replicas that fail to notice they have been removed from a range will therefore linger for at most 12 hours rather than 10 days.
Co-authored-by: Tobias Grieger [email protected] Co-authored-by: Tobias Schottdorf [email protected] Co-authored-by: Erik Grinaker [email protected]
"10:10am. I got little sleep last night. It was one of those days where I lie in bed and instead of falling asleep start thinking even harder. But I did resolve some things. My regrets - though my rational mind says different, they all stem because I believe in outdated values of self improvement. To be more precise, though I figured out the self improvement loop, my rational mind is in conflict with what I wanted as a kid.
Eventually, I am going to overcome it and completely internalize this new way of thinking.
Rationally, there is no way a path to power would ever care about past failures or victories. It is all about the future. That I do care, is proof that I am half-normie and that my drive is half posturing. As a kid, I believed that a real man should always keep winning, but as I lived, I was constantly forced to make compromises in order to cultivate the actual path.
Me becoming a programmer was more of a desperation move than a premediated choice.
I do regret that being a 'real' man is not only emotional manipulation from the outside, but it is outright incompatible with a lot of requirements of the actual path. Trying to live up to wrong expectations just results in meaningless suffering.
I am good in this respect to some amount compared to other people, but I have weaknesses I have to resolve.
Ultimately, even I myself do not have a perfect image of how the post human should behave. I do not buy the rational ubermench image of characters like Fang Yuan completely. It is just too easy to have characters act a certain way and win in fiction.
10:25am. Let me chill a bit here and I will start.
Yesterday I tried out Rance Quest Magnum. At first I had issues with game screen being cut off near the bottom that I fixed by modifying the text scaling in the Windows settings, but after that I was surprised by how much fun I was having. Rance's chad antics put a constant grin on my face and made me laugh. Considering it is a plain jRPG at its core, I thought it would be a lot more boring than it actually is. It is not like I am unilaterally a Rance fan, I actually dropped Rance 4 due to it being boring. And Rance 2 wasn't that interesting even though I finished it. Kichikou and Sengoku are exceptions and they were both grand strategy RPGs.
10:45am. Maybe I did get some rest last night, even though throughout it felt I was awake the whole time.
I am not as tired as I expected. Let me start here.
10:50am. I've slacked for long enough. I'll finish today at 6pm so I can get some extra gaming time.
Until then let me just focus on the work in front of me.
10:55am. Let me get some thinking done. I need to decide what concrete steps I should take.
During the night, I've figured out how to parallelize the sample collection. Yesterday I said that CPSing is an absolute requirement, but I do not have enough. I am going to have to turn the game into an union type. Instead of having the game operate on the players, I will make it so that it returns the action nodes with a continuation. In essence I am going to compile the game to an union type.
This will make it easy to have an arbitrary number of instance run in parallel. Right now I can't do that. Even the CPS'd version that I have now does not return a continuation, instead for the human player it just calls the dispatcher.
I might have to redesign the whole thing to get this capability, but that is fine. If I did it like this, it would have the benefit of completely generalizing the other two ways.
11am. The reason why I'd want to do this is because GPUs suck for online learning. Running a single sample through the system would be 128 times slower than running 128 of them in parallel. And NN evaluation on the GPU is where the majority of the runtime overhead will be.
When I was running those PG in 2018 on a single sample, yes, I was doing something pretty stupid.
11:10am. Close down /g/ focus on the task at hand. It is time to get serious here. Though I feel really fatigued, I should give it my best shot for the day as usual. I can't give excuses and start slacking off because of things like my fatigue state.
...
Let me think. Yesterday I did the tabular CFR agent. Today I need to run it.
11:15am. Focus me. I already have the general outline.
What are the specific steps I have to take?
11:20am. The graph is troublesome as it requires a separate non-updating pass with just the average policy being used on both ends.
Nonetheless, let me go for it. Let me do a step of optimization, and a step of showing the value to me.
I should also try implementing linear CFR, since it has been mentioned to give two order of magnitude of speedup.
11:25am. Graph UI, graph UI... That is what I should focus on. Once I decided that I absolutely want to do something, things become easier.
Then instead of thinking what I want to do in addition to have to do things, I can just focus on the later. This gives me the ability to think more deeply about the subject. When one is one the fence, there is always the hesitation of expending too much mental energy on something that is fruitless.
I should do a number of training steps followed by a sampling step.
11:30am. Now I am thinking about GANs. Forget that, focus on the task at hand.
11:45am. Oh yeah, I was wrong about predictive coding being useless. I had an idea where those local targets could be used to make the replay buffer for each individual layer. That would also give me reward scaling invariance without any of the top level layer hacks.
But is more complicated and not as advantageous as my own scheme on the GPU.
12:25pm. I am thinking about that UI right now. I am narrowing down what my reqirements are. I am going to get rid of the kind of random replay buffer that I had in ui_replay
.
Instead for trained tabular agents, I should just display the dictionary.
12:30pm. I am not going to bother with that step when it comes to flop poker.
https://gregoryszorc.com/blog/2021/04/06/surprisingly-slow/ https://www.thenewatlantis.com/publications/welcoming-our-new-robot-overlords
Let me stop here. The first article is interesting.
12:35pm. I want to think more. I am not eager to start just yet. After running the task through my mind enough times, I'll build up the motivation where I can't stop myself from starting.
One of the first things I will have to do is master some new widgets. For the next part, buttons and labels won't cut it. I am going to have to do some experimentation how to do tabs, and also those tree view like controls. I need to master showing and hiding widgets.
I've figured out how the recycle view works. Along with the charts, it is time to put those lessons to bear."
weblux seriously sucks for security unless your doing cookie cutter httpbasic and oauth2 with little customization. if you dont want to be forced into their obsession with oauth2 - the project reactor folks say FUCK YOU”
"2:10pm. Done with breakfast and chores. Let me finish the second article and I will do some work.
It is my life I am spending here, so I want to go fast, a lot faster than this. But there is no choice, but to be patient and thorough. Trying to sprint through this would just get me lost. I am already going as fast as I can.
2:40pm. Let me resume instead of reading articles on the Rance wiki.
Ok, first of all, how do I do tabs in Kivy. Then how do I do articles that can be opened and closed. As the UIs get more complex beyond what can be fit on a single page, that becomes necessary.
https://kivy.org/doc/stable/api-kivy.uix.tabbedpanel.html
Oh, interesting. Here is the tab.
https://kivy.org/doc/stable/api-kivy.uix.treeview.html
Oh, I said I wanted something like a tree view, but looking at this, a tree view is exactly what I need.
https://kivy.org/doc/stable/api-kivy.uix.actionbar.html
Let me do a little browse of the uix widgets.
https://kivy.org/doc/stable/api-kivy.uix.accordion.html
Ohhhh! This is even better than a tree view for my purpose! Yeah, accordion is what I should try out.
https://kivy.org/doc/stable/api-kivy.uix.bubble.html
Maybe I could use this for displaying validation errors. There was something on validation in the TextInput doc page. I should look into that.
https://kivy.org/doc/stable/api-kivy.uix.filechooser.html
I should keep these in mind. All of these are useful.
https://kivy.org/doc/stable/api-kivy.uix.progressbar.html
This one is something I should have for the sake of keep track of the iteration count.
https://kivy.org/doc/stable/api-kivy.uix.splitter.html
Oh, it has splitters.
3pm. https://kivy.org/doc/stable/api-kivy.uix.textinput.html
Now let me do this final thing. There was something on validation if I recall. What was it?
3:15pm. Had to take a break. Let me start.
I will need the tab, the accordion and the progress bar.
It is time to start playing with each in turn. I'll also want a text input box that allows only numbers.
Hmmmm...speaking of only numbers, Spiral has fixed size integers and Python does not. I am going to have to do a little check to make sure it is not more than the u64 max.
3:25pm. So harsh, so hard. Life is that way. But I must keep going forward. I have to dedicate myself to doing this UI.
https://kivy.org/doc/stable/api-kivy.uix.dropdown.html https://kivy.org/doc/stable/api-kivy.uix.spinner.html
I'll use this for picking between player one and player two for humans.
Agents should train against themselves, but I'll also want to pit agents of different stripes against each other.
I will want to pit CFR+ agents against the sampling based ones.
3:30pm. When it comes to doing MC CFR let me just take a look at something.
let f alpha x = x + alpha * (2.0 - x)
let g = f 0.1
g (g (g 1.0))
0.9 ** 3.0
Ohhh, this is wonderful. This is exactly how I should rescale the alpha.
That is...
let f alpha x = x + alpha * (2.0 - x)
let g = f 0.1
let g' n = f (1.0 - (1.0 - 0.1) ** n)
let (=.) a b = abs (a - b) <= 0.000001
g (g (g 1.0)) =. g' 3.0 1.0
This is what I meant.
Unlike with regrets I cannot just rescale them willy nilly by diving by the current player probability. That would unhinge the targets. But what I can do is simulate the effect of doing the application numerous times.
...No, actually this is not good.
3:45pm. Yeah, the moving average does exactly what is supposed to, but I want the weigthed updates to be decayed differently than normal.
Yeah, a weighted moving average is what I need to update the Q values in the sampling scheme.
Unlike in the regular one, I can't just decay the tail value. I am going to have to keep not only the sum of the values, but also the sum of the weights.
The tail will then be sum values / sum weights
. The value update will be ((1-a) * sum values * sum weights + a * new_value * new_weight) / (1-a)
...
Damn it, this is complicated. Let me just say it is...(1-a) * x + a * y
.
inl (+) a b = {value=a.value*a.weight + b.value*b.weight; weigth=a.weight + b.weight}
It is somewhat similar to adding fractions.
inl (*) a v = {a with value#=(*) v; weight#=(*) v}
This is how I will define the addition of two weighted floats, and a float and a regular one.
I could also make the update be...
x + a * (y - x)
Yes, this is what I should do. Instead of the usual Q values being floats, they should be these weighted floats. That will make things smooth.
4:05pm. Doing the exponential rescaling of the moving average would make the Q values sensitive to reward scaling. The above weight float scheme is what would get me the behavior if I had used a replay buffer. It is a more rational way of keeping track of data.
Ok, good.
I don't want the learning rates to vary extremely for the deep nodes compared to the outer ones. If they end up being too high for the deep nodes, that will defeat the point of using moving averages in the first place, every single data point will decay the old one completely. That is not what I want.
What I am talking about here is something I will need for tabular MC CFR. The deep one will just use a weighted replay buffer with the norm magnitude trick.
...Ah, no wait this is wrong.
inl weight value weight = {weight value=value*weight}
inl (+) a b = {value=a.value + b.value; weigth=a.weight + b.weight}
inl unweight {value weight} = if weight = 0 then 0 else value / weight
This is how I should define the operations.
What about the numerical stability of this...I wonder...
Actually, it should do fine. Let me not worry about this too much...
4:15pm. Ok, now I should have enough insight to implement the sampling Q update. This is not what is in the paper, but it is what I should be doing.
Now what is next?
Do I finally want to start work on the UI, or do I want to think about something else?
4:30pm. I am thinking about the value update.
Let me code it up.
let f alpha x = x + (2.0 - x) / alpha
List.map (fun x -> f x 1.0 * x) [0.0..0.02..1.0]
val it : float list =
[nan; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]
I've been trying to run this through my mind, and looking at this ahead of time, the result is not at all what I expected.
I was sure that at some point the value would go beyond 2, but this is a more straightforward moving average than I expected.
If I was not aiming for divison how would I implement this?
let f alpha x = x + (2.0 - x) / alpha
let g alpha x = x + alpha * (2.0 - x)
let a = List.map (fun alpha -> f alpha 1.0 * alpha) [0.0..0.02..1.0]
let b = List.map (fun alpha -> g alpha 1.0) [0.0..0.02..1.0]
val f : alpha:float -> x:float -> float
val g : alpha:float -> x:float -> float
val a : float list =
[nan; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]
val b : float list =
[1.0; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]
Wow, I am really dumb not to realize these are the same. My off the cuff calculation were completely wrong.
(x + (2.0 - x) / alpha) * alpha
alpha * x + 2.0 - x
(1.0 - alpha) * x + 2.0
Just how is this possible? They can't be the same mathematically.
let y = 2.0
let x = 1.0
let f alpha = x + (y - x) / alpha
let g alpha = x + alpha * (y - x)
let g' alpha = (1.0 - alpha) * x + alpha * y
let a = List.map (fun alpha -> f alpha * alpha) [0.0..0.02..1.0]
let b = List.map (fun alpha -> g alpha) [0.0..0.02..1.0]
let b' = List.map (fun alpha -> g' alpha) [0.0..0.02..1.0]
val y : float = 2.0
val x : float = 1.0
val f : alpha:float -> float
val g : alpha:float -> float
val g' : alpha:float -> float
val a : float list =
[nan; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]
val b : float list =
[1.0; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]
val b' : float list =
[1.0; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]
The code can't be wrong. My mental calculations have to be.
(x + (2.0 - x) / alpha) * alpha
alpha * x + 2.0 - x
(1.0 - alpha) * x + 2.0
Did I miss a reduction step in there somewhere.
(x + (2.0 - x) / alpha) * alpha
alpha * x + 2.0 - x
(alpha - 1.0) * x + 2.0
Yes, I missed one. Then...
let r = List.map (fun alpha -> (alpha - 1.0) * x + 2.0) [0.0..0.02..1.0]
val r : float list =
[1.0; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]
So this is right. Then how do I...
let y = 3.0
let x = 1.0
let l = [0.0..0.02..1.0]
let a = List.map (fun alpha -> (x + (y - x) / alpha) * alpha) l
let b = List.map (fun alpha -> x + alpha * (y - x)) l
let b' = List.map (fun alpha -> (1.0 - alpha) * x + alpha * y) l
let r = List.map (fun alpha -> (alpha - 1.0) * x + y) l
val a : float list =
[nan; 2.02; 2.04; 2.06; 2.08; 2.1; 2.12; 2.14; 2.16; 2.18; 2.2; 2.22; 2.24;
2.26; 2.28; 2.3; 2.32; 2.34; 2.36; 2.38; 2.4; 2.42; 2.44; 2.46; 2.48; 2.5;
2.52; 2.54; 2.56; 2.58; 2.6; 2.62; 2.64; 2.66; 2.68; 2.7; 2.72; 2.74; 2.76;
2.78; 2.8; 2.82; 2.84; 2.86; 2.88; 2.9; 2.92; 2.94; 2.96; 2.98; 3.0]
val b : float list =
[1.0; 1.04; 1.08; 1.12; 1.16; 1.2; 1.24; 1.28; 1.32; 1.36; 1.4; 1.44; 1.48;
1.52; 1.56; 1.6; 1.64; 1.68; 1.72; 1.76; 1.8; 1.84; 1.88; 1.92; 1.96; 2.0;
2.04; 2.08; 2.12; 2.16; 2.2; 2.24; 2.28; 2.32; 2.36; 2.4; 2.44; 2.48; 2.52;
2.56; 2.6; 2.64; 2.68; 2.72; 2.76; 2.8; 2.84; 2.88; 2.92; 2.96; 3.0]
val b' : float list =
[1.0; 1.04; 1.08; 1.12; 1.16; 1.2; 1.24; 1.28; 1.32; 1.36; 1.4; 1.44; 1.48;
1.52; 1.56; 1.6; 1.64; 1.68; 1.72; 1.76; 1.8; 1.84; 1.88; 1.92; 1.96; 2.0;
2.04; 2.08; 2.12; 2.16; 2.2; 2.24; 2.28; 2.32; 2.36; 2.4; 2.44; 2.48; 2.52;
2.56; 2.6; 2.64; 2.68; 2.72; 2.76; 2.8; 2.84; 2.88; 2.92; 2.96; 3.0]
val r : float list =
[2.0; 2.02; 2.04; 2.06; 2.08; 2.1; 2.12; 2.14; 2.16; 2.18; 2.2; 2.22; 2.24;
2.26; 2.28; 2.3; 2.32; 2.34; 2.36; 2.38; 2.4; 2.42; 2.44; 2.46; 2.48; 2.5;
2.52; 2.54; 2.56; 2.58; 2.6; 2.62; 2.64; 2.66; 2.68; 2.7; 2.72; 2.74; 2.76;
2.78; 2.8; 2.82; 2.84; 2.86; 2.88; 2.9; 2.92; 2.94; 2.96; 2.98; 3.0]
No there are differences between this and the original.
let y = 6.0
let x = 1.5
let l = [0.0..0.02..1.0]
let a = List.map (fun alpha -> (1.0 - alpha) * x + alpha * y) l
let b = List.map (fun alpha -> (alpha - 1.0) * x + y) l
val a : float list =
[1.5; 1.59; 1.68; 1.77; 1.86; 1.95; 2.04; 2.13; 2.22; 2.31; 2.4; 2.49; 2.58;
2.67; 2.76; 2.85; 2.94; 3.03; 3.12; 3.21; 3.3; 3.39; 3.48; 3.57; 3.66; 3.75;
3.84; 3.93; 4.02; 4.11; 4.2; 4.29; 4.38; 4.47; 4.56; 4.65; 4.74; 4.83; 4.92;
5.01; 5.1; 5.19; 5.28; 5.37; 5.46; 5.55; 5.64; 5.73; 5.82; 5.91; 6.0]
val b : float list =
[4.5; 4.53; 4.56; 4.59; 4.62; 4.65; 4.68; 4.71; 4.74; 4.77; 4.8; 4.83; 4.86;
4.89; 4.92; 4.95; 4.98; 5.01; 5.04; 5.07; 5.1; 5.13; 5.16; 5.19; 5.22; 5.25;
5.28; 5.31; 5.34; 5.37; 5.4; 5.43; 5.46; 5.49; 5.52; 5.55; 5.58; 5.61; 5.64;
5.67; 5.7; 5.73; 5.76; 5.79; 5.82; 5.85; 5.88; 5.91; 5.94; 5.97; 6.0]
5:05pm. Had to do some chores.
But basically, with this now I know what the update is doing.
let y = 6.0
let x = 6.0
let l = [0.0..0.02..1.0]
let a = List.map (fun alpha -> (1.0 - alpha) * x + alpha * y) l
let b = List.map (fun alpha -> (alpha - 1.0) * x + y) l
val a : float list =
[6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0;
6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0;
6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0;
6.0; 6.0; 6.0; 6.0; 6.0; 6.0]
val b : float list =
[0.0; 0.12; 0.24; 0.36; 0.48; 0.6; 0.72; 0.84; 0.96; 1.08; 1.2; 1.32; 1.44;
1.56; 1.68; 1.8; 1.92; 2.04; 2.16; 2.28; 2.4; 2.52; 2.64; 2.76; 2.88; 3.0;
3.12; 3.24; 3.36; 3.48; 3.6; 3.72; 3.84; 3.96; 4.08; 4.2; 4.32; 4.44; 4.56;
4.68; 4.8; 4.92; 5.04; 5.16; 5.28; 5.4; 5.52; 5.64; 5.76; 5.88; 6.0]
This value for b makes not even a lick of sense to me. Is this how it really was in the paper?
Yes it was. Keep in mind these are suppose to be Q surrogates.
let y = 6.0
let x = 0.0
let l = [0.0..0.02..1.0]
let a = List.map (fun alpha -> (1.0 - alpha) * x + alpha * y) l
let b = List.map (fun alpha -> (alpha - 1.0) * x + y) l
val a : float list =
[0.0; 0.12; 0.24; 0.36; 0.48; 0.6; 0.72; 0.84; 0.96; 1.08; 1.2; 1.32; 1.44;
1.56; 1.68; 1.8; 1.92; 2.04; 2.16; 2.28; 2.4; 2.52; 2.64; 2.76; 2.88; 3.0;
3.12; 3.24; 3.36; 3.48; 3.6; 3.72; 3.84; 3.96; 4.08; 4.2; 4.32; 4.44; 4.56;
4.68; 4.8; 4.92; 5.04; 5.16; 5.28; 5.4; 5.52; 5.64; 5.76; 5.88; 6.0]
val b : float list =
[6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0;
6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0;
6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0;
6.0; 6.0; 6.0; 6.0; 6.0; 6.0]
Wow, this is crazy. The paper has to be wrong. VR MC CFR's bootstraping rule is complete nonsense.
5:15pm. No, I absolutely can't use it. But I do believe in the general idea of how bootstrapping is done here.
What I am going to replace it with is simply letting the value from above fall through.
let c = List.map (fun alpha -> alpha * y) l
There is no need to think about what the original estimate is given that I have a much more accurate one based on the trace.
Let me read the original VR MCCFR paper.
5:50pm. No the ablation study in the original paper does not depend on the rule I find troubling. I can replace it. State action dependence just makes sure that values and multiplied by the policy probs before propagating. And bootstrapping is just boostraping from nearby nodes.
Damn it. This is why doing ML is so stressful. You can't just trust what you read in the papers outright. Even good results might have nonsensical algorithmic decisions that happen to work due to the vagaries of optimization.
I am definitely going to make all of this work.
5:55pm. I should have been working on UIs rather than obsessing about the sampling algorithm, but it is worth it.
It does not feel like 2018 anymore. If feels like all the ideas are coming together for me, and my skill level is steadily rising. My actual understanding itself is going up.
My focus is different from 2018. Back then I was laser focused on credit assignment and backpropagation itself. Right now, I am just thinking of how to deal with the replay buffer in the most efficient manner. With a different approach, my results are different.
6:05pm. I need to have courage and do what I think is right. If it fails to work for me then I will go with the speculative approach in the paper itself.
I can't apply random rules I find in the wild if I want to be successful long term. I need to optimize for beauty, elegance and understanding. If ML was a mature field, it would make sense to give authors the benefit of the doubt, but not when I have a sense for how things would be better and when my own translation of a parent enumerative algorithm to the sampling regime is pushing me in a different direction then that direction is the one I should take.
I must follow my own way even if leads to my destruction.
In the worst case, I will waste a few extra months before giving up on my own stuff and trying the things in the paper.
Forget about this.
Tonight, I am going to get some proper sleep and ready myself for the work on the UIs. Today I just could not get going.
Tomorrow, I am absolutely going to sketch out that UI at least, and then get the tabular enumerative CFR to work. Then I will do the sampling CFR."
fixed non-constant var
also i changed the name again because fuck you
6Blocka > All niggas
You fucking nigger report me ban me fuck you