Skip to content

Latest commit

 

History

History
801 lines (568 loc) · 37.1 KB

2021-04-08.md

File metadata and controls

801 lines (568 loc) · 37.1 KB

< 2021-04-08 >

3,554,162 events, 1,584,593 push events, 2,807,787 commit messages, 195,033,451 characters

Thursday 2021-04-08 01:35:36 by Andrew Wright

Buttons sense clicks

Talkin' away... I don't know what I'm to say, I'll say it anyway... today is not my day to find you. Shyin' away... I'll be comin' for your love okay... TAAAAKE OOON MEEEEEEEEE (take on me) TAAAAAKE MEEEEE OOOOONN (take one me) I'LLL BEEE GOOOONE IN A DAY OR TWOOOOOOO


Thursday 2021-04-08 04:09:39 by Kirill Shvedov

Add files via upload

Description:

Paul is an excellent coder and sits high on the CW leaderboard. He solves kata like a banshee but would also like to lead a normal life, with other activities. But he just can't stop solving all the kata!!

Given an array (x) you need to calculate the Paul Misery Score. The values are worth the following points:

kata = 5 Petes kata = 10 life = 0 eating = 1

The Misery Score is the total points gained from the array. Once you have the total, return as follows:

< 40 = 'Super happy!' < 70 >= 40 = 'Happy!' < 100 >= 70 = 'Sad!'

100 = 'Miserable!'


Thursday 2021-04-08 04:15:00 by Bike

Hello, world!

Printing something now. Wheeeee.

This has been clarifying with respect to two things about Hare's relationship to C. Firstly the array type mumbo jump explained in the comment in type.lisp. Secondly I have a new understanding of C global variables and constness.

To sum up, this is legal:

const char* hello = "Hello, world!"; int f() { hello = "Fuck you!"; return puts(hello); }

the "const" refers to "char", not the pointer itself, or in other words, "const char*" means "pointer to const char". The string itself can't be altered (e.g. hello[0] = 'c' is illegal) but hello itself is just a variable. In a normal C implementation this means that there is a sort of implicit value of type const char**, call it mhello; source references to "hello" are really "*mhello".

This is not how Hare is going to work, because variables are not modifiable. If a variable is initialized with "Hello", it has type pointer to array, and you can't assign a new array. (If a variable is initialized with an integer for example, it IS modifiable, by writing into the int*.)


Thursday 2021-04-08 06:46:47 by Xavier Morel

[CHG] core, web: deprecate t-raw

Add a big fat warning when the qweb compiler finds a t-raw.

t-esc should now be used everywhere, the use-case for t-raw should be handled by converting the corresponding values to Markup objects. Even though it's convenient, this constructor should never be made available in the qweb rendering context (maybe that should be checked for explicitely?).

Replace werkzeug.escape by markupsafe.escape in odoo.tools.html_escape, this means the output of html_escape is markup-safe.

Updated qweb to work correctly with escaping and Markup, amongst other things QWeb bodies should be markup-safe internally (so that a t-set value can be fed into a t-esc). See at the bottom for the attributes handling as it's a bit complicated.

to_text needed updating: markupsafe.Markup is a subclass of str, but str is not a passthrough for strings. So Markup instances going through would be converted to normal str, losing their safety flag. Since qweb internally uses to_text on pretty much everything (in order to handle None / False), this would then cause almost every Markup to get mistakenly double-escaped.

Also mark a bunch of APIs as markup-safe by default

  • html_sanitize output.
  • HTML fields content, sanitization is applied on intake (so stripped by the trip through the database) and if the field is unsanitised the injection is very much intentional, probably. Note: this includes automatically decoding bytes as a number of default values & computes yield bytes, which Markup will happily accept... by repr-ing them which is useless. This is hard to notice without -b.
  • Script-safe json, it's rather the point (though it uses a non-standard escaping scheme).
  • Note that nl2br, kinda: it should work correctly whether or not the input is markup-safe, this means we should not need to escape values fed to nl2br, but it doesn't hurt either.

Update some qweb field serialisations to mark their output as markup-safe when necessary (e.g. monetary, barcode, contact). Otherwise either using proper escaping internally or doing nothing should do the trick.

Also update qweb to return markup-safe bytes: we want qweb to return markup-safe contents as a common use-case is to render something with one template, and inject its content in an other one (with Python code inbetween, as t-call works a bit differently and does not go through the external rendering interface).

However qweb returns bytes while Markup extends str. After a quick experiment with changing qweb rendering to return str (rather unmitigated failure I fear), it looks like the safest tack is to add a somewhat similar bytes-based type, which decodes to a Markup but keeps to bytes semantics.

For debugging and convenience reasons, MarkupSafeBytes does not stringify and raises an error instead (__repr__ works fine). This is to avoid implicit stringifications which do the wrong thing (namely create a string "b'foo'").

Also add some configuration around BytesWarning (which still has to be enabled at the interpreter level via -b, there's no way to enable it programmatically smh), and monkeypatch showwarning to show warning tracebacks, as it's common for warnings to be triggered in the bowels of the application, and hard to relate to business logic without the complete traceback.

Attributes handling

There are a few issues with respect to attributes. The first issue is that markup-safe content is not necessarily attributes-safe e.g. markup-safe content can contain unescaped < or double-quotes while attributes can not. So we must forcefully escape the input, even if it's supposedly markup-safe already.

This causes a problem for script-safe JSON: it's markup-safe but really does its own thing. So instead of escaping it up-front and wrapping it in Markup, make script-safe JSON its own type which applies JSON-escaping *during the __html__ call.

This way if a script-safe JSON object goes through markupsafe.escape we'll apply script-safe escaping, otherwise it'll be treated as a regular strings and eventually escaped the normal way.

A second issue was the processing of format-valued attributes (t-attf): literal segments should always be markup-safe, while non-literal may or may not be. This turns out to be an issue if the non-literal segment is markup-safe: in that case when the literal and non-literal segments get concatenated the literal segments will get escaped, then attributes serialization will escape them again leading to doubly-escaped content in attributes.

The most visible instance of this was the snippet_options template, specifically:

<t t-set="so_content_addition_selector" t-translation="off">blockquote, ...</t>
<div id="so_content_addition"
    t-att-data-selector="so_content_addition_selector"
    t-attf-data-drop-near="p, h1, h2, h3, .row > div > img, #{so_content_addition_selector}"
    data-drop-in=".content, nav"/>

Here so_content_addition_selector is a qweb body therefore markup-safe, When concatenated with the literal part of t-atff-data-drop-near it would cause the HTML-escaping of that yielding a new Markup object. Normal attributes processing would then strip the markup flag (using str()) and escape it again, leading to doubly-escaped literals.

The original hack around was to unescape() Markup content before stringifying it and escaping it again, in the attribute serialization method (_append_attributes).

That's pretty disgusting, after some more consideration & testing it looks like a much better and safer fix is to ensure the expression (non-literal) segments of format strings always result in str, never Markup, which is easy enough: just all str() on the output of strexpr. We could also have concatenated all the bits using ''.join instead of repeated concatenation (+).

Also add a check on the type of the format string for safety, I think it should always be a proper str and the bytes thing is only when running in py2 (where lxml uses bytestrings as a space optimization for ascii-only values) but it should not hurt too much to perform a single typecheck assertion on the value... instead of performing one per literal segment.

Note: we may need to implement unescape anyway, because it's still possible to get double-escaping with the current scheme: given an explicitly escape-ed foo and t-att-foo="foo", foo will be re-escaped.


Thursday 2021-04-08 09:59:21 by craig[bot]

Merge #62728 #63073 #63214

62728: roachtest: don't call t.Fatal in FailOnReplicaDivergence r=erikgrinaker a=tbg

In #61990 we had this method catch a stats divergence on teardown in an otherwise successful test. The call to t.Fatal in that method unfortunately prevented the logs from being collected, which is not helpful.

Release note: None

63073: roachtest: retire bank tests r=nvanbenschoten,andreimatei a=tbg

The bank roachtests are a basic form of Jepsen test. We were hoping that they could provide additional benefit due to being less complex than a full Jepsen test. In my opinion, this calculus has not worked out. We've spent many hours staring at this test deadlocked when it wasn't CockroachDB's fault at all. Besides, it's not adding anything meaningful on top of our Jepsen tests plus KVNemesis plus the TPCC invariant checks, so we are removing the bank family of tests here.

Should we want to reintroduce these tests, we should do it at the level of TestCluster, or at least augment the bank workload to incorporate these invariant checks instead.

Closes #62754. Closes #62749 Closes #53871.

Release note: None

63214: kvserver: reduce ReplicaGCQueueInactivityThreshold to 12 hours r=ajwerner a=erikgrinaker

ReplicaGCQueueInactivityThreshold specifies the interval at which the GC queue checks whether a replica has been removed from the canonical range descriptor, and was set to 10 days. This is a fallback for when we fail to detect the removal and GC the replica immediately. However, this could occasionally cause stale replicas to linger for 10 days, which surprises users (e.g. by causing alerts if the stale replica thinks the range has become underreplicated).

This patch reduces the threshold to 12 hours, which is a more reasonable timeframe for users to expect things to "sort themselves out". The operation to read the range descriptor is fairly cheap, so this is not likely to cause any problems, and the interval is therefore not jittered either.

Resolves #63212, touches #60259.

Release note (ops change): Replica garbage collection now checks replicas against the range descriptor every 12 hours (down from 10 days) to see if they should be removed. Replicas that fail to notice they have been removed from a range will therefore linger for at most 12 hours rather than 10 days.

Co-authored-by: Tobias Grieger [email protected] Co-authored-by: Tobias Schottdorf [email protected] Co-authored-by: Erik Grinaker [email protected]


Thursday 2021-04-08 10:44:10 by Marko Grdinić

"10:10am. I got little sleep last night. It was one of those days where I lie in bed and instead of falling asleep start thinking even harder. But I did resolve some things. My regrets - though my rational mind says different, they all stem because I believe in outdated values of self improvement. To be more precise, though I figured out the self improvement loop, my rational mind is in conflict with what I wanted as a kid.

Eventually, I am going to overcome it and completely internalize this new way of thinking.

Rationally, there is no way a path to power would ever care about past failures or victories. It is all about the future. That I do care, is proof that I am half-normie and that my drive is half posturing. As a kid, I believed that a real man should always keep winning, but as I lived, I was constantly forced to make compromises in order to cultivate the actual path.

Me becoming a programmer was more of a desperation move than a premediated choice.

I do regret that being a 'real' man is not only emotional manipulation from the outside, but it is outright incompatible with a lot of requirements of the actual path. Trying to live up to wrong expectations just results in meaningless suffering.

I am good in this respect to some amount compared to other people, but I have weaknesses I have to resolve.

Ultimately, even I myself do not have a perfect image of how the post human should behave. I do not buy the rational ubermench image of characters like Fang Yuan completely. It is just too easy to have characters act a certain way and win in fiction.

10:25am. Let me chill a bit here and I will start.

Yesterday I tried out Rance Quest Magnum. At first I had issues with game screen being cut off near the bottom that I fixed by modifying the text scaling in the Windows settings, but after that I was surprised by how much fun I was having. Rance's chad antics put a constant grin on my face and made me laugh. Considering it is a plain jRPG at its core, I thought it would be a lot more boring than it actually is. It is not like I am unilaterally a Rance fan, I actually dropped Rance 4 due to it being boring. And Rance 2 wasn't that interesting even though I finished it. Kichikou and Sengoku are exceptions and they were both grand strategy RPGs.

10:45am. Maybe I did get some rest last night, even though throughout it felt I was awake the whole time.

I am not as tired as I expected. Let me start here.

10:50am. I've slacked for long enough. I'll finish today at 6pm so I can get some extra gaming time.

Until then let me just focus on the work in front of me.

10:55am. Let me get some thinking done. I need to decide what concrete steps I should take.

During the night, I've figured out how to parallelize the sample collection. Yesterday I said that CPSing is an absolute requirement, but I do not have enough. I am going to have to turn the game into an union type. Instead of having the game operate on the players, I will make it so that it returns the action nodes with a continuation. In essence I am going to compile the game to an union type.

This will make it easy to have an arbitrary number of instance run in parallel. Right now I can't do that. Even the CPS'd version that I have now does not return a continuation, instead for the human player it just calls the dispatcher.

I might have to redesign the whole thing to get this capability, but that is fine. If I did it like this, it would have the benefit of completely generalizing the other two ways.

11am. The reason why I'd want to do this is because GPUs suck for online learning. Running a single sample through the system would be 128 times slower than running 128 of them in parallel. And NN evaluation on the GPU is where the majority of the runtime overhead will be.

When I was running those PG in 2018 on a single sample, yes, I was doing something pretty stupid.

11:10am. Close down /g/ focus on the task at hand. It is time to get serious here. Though I feel really fatigued, I should give it my best shot for the day as usual. I can't give excuses and start slacking off because of things like my fatigue state.

...

Let me think. Yesterday I did the tabular CFR agent. Today I need to run it.

11:15am. Focus me. I already have the general outline.

What are the specific steps I have to take?

11:20am. The graph is troublesome as it requires a separate non-updating pass with just the average policy being used on both ends.

Nonetheless, let me go for it. Let me do a step of optimization, and a step of showing the value to me.

I should also try implementing linear CFR, since it has been mentioned to give two order of magnitude of speedup.

11:25am. Graph UI, graph UI... That is what I should focus on. Once I decided that I absolutely want to do something, things become easier.

Then instead of thinking what I want to do in addition to have to do things, I can just focus on the later. This gives me the ability to think more deeply about the subject. When one is one the fence, there is always the hesitation of expending too much mental energy on something that is fruitless.

I should do a number of training steps followed by a sampling step.

11:30am. Now I am thinking about GANs. Forget that, focus on the task at hand.

11:45am. Oh yeah, I was wrong about predictive coding being useless. I had an idea where those local targets could be used to make the replay buffer for each individual layer. That would also give me reward scaling invariance without any of the top level layer hacks.

But is more complicated and not as advantageous as my own scheme on the GPU.

12:25pm. I am thinking about that UI right now. I am narrowing down what my reqirements are. I am going to get rid of the kind of random replay buffer that I had in ui_replay.

Instead for trained tabular agents, I should just display the dictionary.

12:30pm. I am not going to bother with that step when it comes to flop poker.

https://gregoryszorc.com/blog/2021/04/06/surprisingly-slow/ https://www.thenewatlantis.com/publications/welcoming-our-new-robot-overlords

Let me stop here. The first article is interesting.

12:35pm. I want to think more. I am not eager to start just yet. After running the task through my mind enough times, I'll build up the motivation where I can't stop myself from starting.

One of the first things I will have to do is master some new widgets. For the next part, buttons and labels won't cut it. I am going to have to do some experimentation how to do tabs, and also those tree view like controls. I need to master showing and hiding widgets.

I've figured out how the recycle view works. Along with the charts, it is time to put those lessons to bear."


Thursday 2021-04-08 12:43:06 by Mitch Warrenburg

weblux seriously sucks for security unless your doing cookie cutter httpbasic and oauth2 with little customization. if you dont want to be forced into their obsession with oauth2 - the project reactor folks say FUCK YOU”


Thursday 2021-04-08 16:13:26 by Marko Grdinić

"2:10pm. Done with breakfast and chores. Let me finish the second article and I will do some work.

It is my life I am spending here, so I want to go fast, a lot faster than this. But there is no choice, but to be patient and thorough. Trying to sprint through this would just get me lost. I am already going as fast as I can.

2:40pm. Let me resume instead of reading articles on the Rance wiki.

Ok, first of all, how do I do tabs in Kivy. Then how do I do articles that can be opened and closed. As the UIs get more complex beyond what can be fit on a single page, that becomes necessary.

https://kivy.org/doc/stable/api-kivy.uix.tabbedpanel.html

Oh, interesting. Here is the tab.

https://kivy.org/doc/stable/api-kivy.uix.treeview.html

Oh, I said I wanted something like a tree view, but looking at this, a tree view is exactly what I need.

https://kivy.org/doc/stable/api-kivy.uix.actionbar.html

Let me do a little browse of the uix widgets.

https://kivy.org/doc/stable/api-kivy.uix.accordion.html

Ohhhh! This is even better than a tree view for my purpose! Yeah, accordion is what I should try out.

https://kivy.org/doc/stable/api-kivy.uix.bubble.html

Maybe I could use this for displaying validation errors. There was something on validation in the TextInput doc page. I should look into that.

https://kivy.org/doc/stable/api-kivy.uix.filechooser.html

I should keep these in mind. All of these are useful.

https://kivy.org/doc/stable/api-kivy.uix.progressbar.html

This one is something I should have for the sake of keep track of the iteration count.

https://kivy.org/doc/stable/api-kivy.uix.splitter.html

Oh, it has splitters.

3pm. https://kivy.org/doc/stable/api-kivy.uix.textinput.html

Now let me do this final thing. There was something on validation if I recall. What was it?

3:15pm. Had to take a break. Let me start.

I will need the tab, the accordion and the progress bar.

It is time to start playing with each in turn. I'll also want a text input box that allows only numbers.

Hmmmm...speaking of only numbers, Spiral has fixed size integers and Python does not. I am going to have to do a little check to make sure it is not more than the u64 max.

3:25pm. So harsh, so hard. Life is that way. But I must keep going forward. I have to dedicate myself to doing this UI.

https://kivy.org/doc/stable/api-kivy.uix.dropdown.html https://kivy.org/doc/stable/api-kivy.uix.spinner.html

I'll use this for picking between player one and player two for humans.

Agents should train against themselves, but I'll also want to pit agents of different stripes against each other.

I will want to pit CFR+ agents against the sampling based ones.

3:30pm. When it comes to doing MC CFR let me just take a look at something.

let f alpha x = x + alpha * (2.0 - x)
let g = f 0.1
g (g (g 1.0))

0.9 ** 3.0

Ohhh, this is wonderful. This is exactly how I should rescale the alpha.

That is...

let f alpha x = x + alpha * (2.0 - x)
let g = f 0.1
let g' n = f (1.0 - (1.0 - 0.1) ** n)
let (=.) a b = abs (a - b) <= 0.000001
g (g (g 1.0)) =. g' 3.0 1.0

This is what I meant.

Unlike with regrets I cannot just rescale them willy nilly by diving by the current player probability. That would unhinge the targets. But what I can do is simulate the effect of doing the application numerous times.

...No, actually this is not good.

3:45pm. Yeah, the moving average does exactly what is supposed to, but I want the weigthed updates to be decayed differently than normal.

https://www.fidelity.com/learning-center/trading-investing/technical-analysis/technical-indicator-guide/wma

Yeah, a weighted moving average is what I need to update the Q values in the sampling scheme.

Unlike in the regular one, I can't just decay the tail value. I am going to have to keep not only the sum of the values, but also the sum of the weights.

The tail will then be sum values / sum weights. The value update will be ((1-a) * sum values * sum weights + a * new_value * new_weight) / (1-a)...

Damn it, this is complicated. Let me just say it is...(1-a) * x + a * y.

inl (+) a b = {value=a.value*a.weight + b.value*b.weight; weigth=a.weight + b.weight}

It is somewhat similar to adding fractions.

inl (*) a v = {a with value#=(*) v; weight#=(*) v}

This is how I will define the addition of two weighted floats, and a float and a regular one.

I could also make the update be...

x + a * (y - x)

Yes, this is what I should do. Instead of the usual Q values being floats, they should be these weighted floats. That will make things smooth.

4:05pm. Doing the exponential rescaling of the moving average would make the Q values sensitive to reward scaling. The above weight float scheme is what would get me the behavior if I had used a replay buffer. It is a more rational way of keeping track of data.

Ok, good.

I don't want the learning rates to vary extremely for the deep nodes compared to the outer ones. If they end up being too high for the deep nodes, that will defeat the point of using moving averages in the first place, every single data point will decay the old one completely. That is not what I want.

What I am talking about here is something I will need for tabular MC CFR. The deep one will just use a weighted replay buffer with the norm magnitude trick.

...Ah, no wait this is wrong.

inl weight value weight = {weight value=value*weight}
inl (+) a b = {value=a.value + b.value; weigth=a.weight + b.weight}
inl unweight {value weight} = if weight = 0 then 0 else value / weight

This is how I should define the operations.

What about the numerical stability of this...I wonder...

Actually, it should do fine. Let me not worry about this too much...

4:15pm. Ok, now I should have enough insight to implement the sampling Q update. This is not what is in the paper, but it is what I should be doing.

Now what is next?

Do I finally want to start work on the UI, or do I want to think about something else?

4:30pm. I am thinking about the value update.

Let me code it up.

let f alpha x = x + (2.0 - x) / alpha
List.map (fun x -> f x 1.0 * x) [0.0..0.02..1.0]
val it : float list =
  [nan; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
   1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
   1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
   1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]

I've been trying to run this through my mind, and looking at this ahead of time, the result is not at all what I expected.

I was sure that at some point the value would go beyond 2, but this is a more straightforward moving average than I expected.

If I was not aiming for divison how would I implement this?

let f alpha x = x + (2.0 - x) / alpha
let g alpha x = x + alpha * (2.0 - x)
let a = List.map (fun alpha -> f alpha 1.0 * alpha) [0.0..0.02..1.0]
let b = List.map (fun alpha -> g alpha 1.0) [0.0..0.02..1.0]
val f : alpha:float -> x:float -> float
val g : alpha:float -> x:float -> float
val a : float list =
  [nan; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
   1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
   1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
   1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]
val b : float list =
  [1.0; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
   1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
   1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
   1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]

Wow, I am really dumb not to realize these are the same. My off the cuff calculation were completely wrong.

(x + (2.0 - x) / alpha) * alpha
alpha * x + 2.0 - x
(1.0 - alpha) * x + 2.0

Just how is this possible? They can't be the same mathematically.

let y = 2.0
let x = 1.0
let f alpha = x + (y - x) / alpha
let g alpha = x + alpha * (y - x)
let g' alpha = (1.0 - alpha) * x + alpha * y
let a = List.map (fun alpha -> f alpha * alpha) [0.0..0.02..1.0]
let b = List.map (fun alpha -> g alpha) [0.0..0.02..1.0]
let b' = List.map (fun alpha -> g' alpha) [0.0..0.02..1.0]
val y : float = 2.0
val x : float = 1.0
val f : alpha:float -> float
val g : alpha:float -> float
val g' : alpha:float -> float
val a : float list =
  [nan; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
   1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
   1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
   1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]
val b : float list =
  [1.0; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
   1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
   1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
   1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]
val b' : float list =
  [1.0; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
   1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
   1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
   1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]

The code can't be wrong. My mental calculations have to be.

(x + (2.0 - x) / alpha) * alpha
alpha * x + 2.0 - x
(1.0 - alpha) * x + 2.0

Did I miss a reduction step in there somewhere.

(x + (2.0 - x) / alpha) * alpha
alpha * x + 2.0 - x
(alpha - 1.0) * x + 2.0

Yes, I missed one. Then...

let r = List.map (fun alpha -> (alpha - 1.0) * x + 2.0) [0.0..0.02..1.0]
val r : float list =
  [1.0; 1.02; 1.04; 1.06; 1.08; 1.1; 1.12; 1.14; 1.16; 1.18; 1.2; 1.22; 1.24;
   1.26; 1.28; 1.3; 1.32; 1.34; 1.36; 1.38; 1.4; 1.42; 1.44; 1.46; 1.48; 1.5;
   1.52; 1.54; 1.56; 1.58; 1.6; 1.62; 1.64; 1.66; 1.68; 1.7; 1.72; 1.74; 1.76;
   1.78; 1.8; 1.82; 1.84; 1.86; 1.88; 1.9; 1.92; 1.94; 1.96; 1.98; 2.0]

So this is right. Then how do I...

let y = 3.0
let x = 1.0
let l = [0.0..0.02..1.0]
let a = List.map (fun alpha -> (x + (y - x) / alpha) * alpha) l
let b = List.map (fun alpha -> x + alpha * (y - x)) l
let b' = List.map (fun alpha -> (1.0 - alpha) * x + alpha * y) l
let r = List.map (fun alpha -> (alpha - 1.0) * x + y) l
val a : float list =
  [nan; 2.02; 2.04; 2.06; 2.08; 2.1; 2.12; 2.14; 2.16; 2.18; 2.2; 2.22; 2.24;
   2.26; 2.28; 2.3; 2.32; 2.34; 2.36; 2.38; 2.4; 2.42; 2.44; 2.46; 2.48; 2.5;
   2.52; 2.54; 2.56; 2.58; 2.6; 2.62; 2.64; 2.66; 2.68; 2.7; 2.72; 2.74; 2.76;
   2.78; 2.8; 2.82; 2.84; 2.86; 2.88; 2.9; 2.92; 2.94; 2.96; 2.98; 3.0]
val b : float list =
  [1.0; 1.04; 1.08; 1.12; 1.16; 1.2; 1.24; 1.28; 1.32; 1.36; 1.4; 1.44; 1.48;
   1.52; 1.56; 1.6; 1.64; 1.68; 1.72; 1.76; 1.8; 1.84; 1.88; 1.92; 1.96; 2.0;
   2.04; 2.08; 2.12; 2.16; 2.2; 2.24; 2.28; 2.32; 2.36; 2.4; 2.44; 2.48; 2.52;
   2.56; 2.6; 2.64; 2.68; 2.72; 2.76; 2.8; 2.84; 2.88; 2.92; 2.96; 3.0]
val b' : float list =
  [1.0; 1.04; 1.08; 1.12; 1.16; 1.2; 1.24; 1.28; 1.32; 1.36; 1.4; 1.44; 1.48;
   1.52; 1.56; 1.6; 1.64; 1.68; 1.72; 1.76; 1.8; 1.84; 1.88; 1.92; 1.96; 2.0;
   2.04; 2.08; 2.12; 2.16; 2.2; 2.24; 2.28; 2.32; 2.36; 2.4; 2.44; 2.48; 2.52;
   2.56; 2.6; 2.64; 2.68; 2.72; 2.76; 2.8; 2.84; 2.88; 2.92; 2.96; 3.0]
val r : float list =
  [2.0; 2.02; 2.04; 2.06; 2.08; 2.1; 2.12; 2.14; 2.16; 2.18; 2.2; 2.22; 2.24;
   2.26; 2.28; 2.3; 2.32; 2.34; 2.36; 2.38; 2.4; 2.42; 2.44; 2.46; 2.48; 2.5;
   2.52; 2.54; 2.56; 2.58; 2.6; 2.62; 2.64; 2.66; 2.68; 2.7; 2.72; 2.74; 2.76;
   2.78; 2.8; 2.82; 2.84; 2.86; 2.88; 2.9; 2.92; 2.94; 2.96; 2.98; 3.0]

No there are differences between this and the original.

let y = 6.0
let x = 1.5
let l = [0.0..0.02..1.0]
let a = List.map (fun alpha -> (1.0 - alpha) * x + alpha * y) l
let b = List.map (fun alpha -> (alpha - 1.0) * x + y) l
val a : float list =
  [1.5; 1.59; 1.68; 1.77; 1.86; 1.95; 2.04; 2.13; 2.22; 2.31; 2.4; 2.49; 2.58;
   2.67; 2.76; 2.85; 2.94; 3.03; 3.12; 3.21; 3.3; 3.39; 3.48; 3.57; 3.66; 3.75;
   3.84; 3.93; 4.02; 4.11; 4.2; 4.29; 4.38; 4.47; 4.56; 4.65; 4.74; 4.83; 4.92;
   5.01; 5.1; 5.19; 5.28; 5.37; 5.46; 5.55; 5.64; 5.73; 5.82; 5.91; 6.0]
val b : float list =
  [4.5; 4.53; 4.56; 4.59; 4.62; 4.65; 4.68; 4.71; 4.74; 4.77; 4.8; 4.83; 4.86;
   4.89; 4.92; 4.95; 4.98; 5.01; 5.04; 5.07; 5.1; 5.13; 5.16; 5.19; 5.22; 5.25;
   5.28; 5.31; 5.34; 5.37; 5.4; 5.43; 5.46; 5.49; 5.52; 5.55; 5.58; 5.61; 5.64;
   5.67; 5.7; 5.73; 5.76; 5.79; 5.82; 5.85; 5.88; 5.91; 5.94; 5.97; 6.0]

5:05pm. Had to do some chores.

But basically, with this now I know what the update is doing.

let y = 6.0
let x = 6.0
let l = [0.0..0.02..1.0]
let a = List.map (fun alpha -> (1.0 - alpha) * x + alpha * y) l
let b = List.map (fun alpha -> (alpha - 1.0) * x + y) l
val a : float list =
  [6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0;
   6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0;
   6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0;
   6.0; 6.0; 6.0; 6.0; 6.0; 6.0]
val b : float list =
  [0.0; 0.12; 0.24; 0.36; 0.48; 0.6; 0.72; 0.84; 0.96; 1.08; 1.2; 1.32; 1.44;
   1.56; 1.68; 1.8; 1.92; 2.04; 2.16; 2.28; 2.4; 2.52; 2.64; 2.76; 2.88; 3.0;
   3.12; 3.24; 3.36; 3.48; 3.6; 3.72; 3.84; 3.96; 4.08; 4.2; 4.32; 4.44; 4.56;
   4.68; 4.8; 4.92; 5.04; 5.16; 5.28; 5.4; 5.52; 5.64; 5.76; 5.88; 6.0]

This value for b makes not even a lick of sense to me. Is this how it really was in the paper?

Yes it was. Keep in mind these are suppose to be Q surrogates.

let y = 6.0
let x = 0.0
let l = [0.0..0.02..1.0]
let a = List.map (fun alpha -> (1.0 - alpha) * x + alpha * y) l
let b = List.map (fun alpha -> (alpha - 1.0) * x + y) l
val a : float list =
  [0.0; 0.12; 0.24; 0.36; 0.48; 0.6; 0.72; 0.84; 0.96; 1.08; 1.2; 1.32; 1.44;
   1.56; 1.68; 1.8; 1.92; 2.04; 2.16; 2.28; 2.4; 2.52; 2.64; 2.76; 2.88; 3.0;
   3.12; 3.24; 3.36; 3.48; 3.6; 3.72; 3.84; 3.96; 4.08; 4.2; 4.32; 4.44; 4.56;
   4.68; 4.8; 4.92; 5.04; 5.16; 5.28; 5.4; 5.52; 5.64; 5.76; 5.88; 6.0]
val b : float list =
  [6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0;
   6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0;
   6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0; 6.0;
   6.0; 6.0; 6.0; 6.0; 6.0; 6.0]

Wow, this is crazy. The paper has to be wrong. VR MC CFR's bootstraping rule is complete nonsense.

5:15pm. No, I absolutely can't use it. But I do believe in the general idea of how bootstrapping is done here.

What I am going to replace it with is simply letting the value from above fall through.

let c = List.map (fun alpha -> alpha * y) l

There is no need to think about what the original estimate is given that I have a much more accurate one based on the trace.

Let me read the original VR MCCFR paper.

5:50pm. No the ablation study in the original paper does not depend on the rule I find troubling. I can replace it. State action dependence just makes sure that values and multiplied by the policy probs before propagating. And bootstrapping is just boostraping from nearby nodes.

Damn it. This is why doing ML is so stressful. You can't just trust what you read in the papers outright. Even good results might have nonsensical algorithmic decisions that happen to work due to the vagaries of optimization.

I am definitely going to make all of this work.

5:55pm. I should have been working on UIs rather than obsessing about the sampling algorithm, but it is worth it.

It does not feel like 2018 anymore. If feels like all the ideas are coming together for me, and my skill level is steadily rising. My actual understanding itself is going up.

My focus is different from 2018. Back then I was laser focused on credit assignment and backpropagation itself. Right now, I am just thinking of how to deal with the replay buffer in the most efficient manner. With a different approach, my results are different.

6:05pm. I need to have courage and do what I think is right. If it fails to work for me then I will go with the speculative approach in the paper itself.

I can't apply random rules I find in the wild if I want to be successful long term. I need to optimize for beauty, elegance and understanding. If ML was a mature field, it would make sense to give authors the benefit of the doubt, but not when I have a sense for how things would be better and when my own translation of a parent enumerative algorithm to the sampling regime is pushing me in a different direction then that direction is the one I should take.

I must follow my own way even if leads to my destruction.

In the worst case, I will waste a few extra months before giving up on my own stuff and trying the things in the paper.

Forget about this.

Tonight, I am going to get some proper sleep and ready myself for the work on the UIs. Today I just could not get going.

Tomorrow, I am absolutely going to sketch out that UI at least, and then get the tabular enumerative CFR to work. Then I will do the sampling CFR."


Thursday 2021-04-08 16:49:33 by Liberation546

fixed non-constant var

also i changed the name again because fuck you


Thursday 2021-04-08 17:59:02 by 666DoxGangLucifer

6Blocka > All niggas

You fucking nigger report me ban me fuck you


< 2021-04-08 >