local-search.xml

<?xml version="1.0" encoding="utf-8"?>
<search>
  
  
  
  <entry>
    <title>CentOS 7运行clangd 16</title>
    <link href="/2023/07/23/cpp/centos7_clangd/"/>
    <url>/2023/07/23/cpp/centos7_clangd/</url>
    
    <content type="html"><![CDATA[<p>微软官方的VSCode C++插件是单线程模式，在扫描大型工程的时候速度特别慢。所以我一直用vscode-clangd插件。</p><p>但是clangd的最低要求是glibc 2.18。在一些比较老的系统上，比如CentOS 7，只有glibc 2.17。这会导致新版的clangd无法启动：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">./bin/clangd: /lib64/libc.so.6: version `GLIBC_2.18&#x27; not found (required by ./bin/clangd)<br></code></pre></td></tr></table></figure><p>CentOS官方源里只有clangd 7，版本太老，无法运行vscode-clangd插件。</p><p>我折腾了好几天，试着找到了一种相对简单的方法，足够让clangd和vscode-clangd插件跑起来，也不破坏系统glibc，也不编译LLVM或GCC。</p><p>本文使用的系统是CentOS 7（VMWare Player 17），clangd版本为<a href="https://github.com/clangd/clangd/releases/tag/16.0.2">16.0.2</a>。</p><h2 id="原理解释"><a href="#原理解释" class="headerlink" title="原理解释"></a>原理解释</h2><blockquote><p>对原理不感兴趣的可以直接跳过这部分。</p></blockquote><p>我在网上找到的方法大致分为三种：</p><ol><li>升级系统的glibc。这种方法比较危险，很容易导致系统无法启动。</li><li>自己编译clangd。且不谈编译得到的clangd能否正确支持VSCode。这需要用cmake 3.x编译LLVM工具链。然而，CentOS 7只自带cmake 2.x。所以你首先要升级cmake等工具链。而且自己编译的clangd依然要依赖于glibc，本质上还是没解决问题。</li><li>使用容器或虚拟机在CentOS 8里看代码。这样写代码和跑代码的环境不一致，也很不方便。</li></ol><p>根据<a href="https://github.com/clangd/vscode-clangd/issues/16#issuecomment-624764721">官方的说法</a>，clangd不会引入对glibc 2.17的支持，所以<strong>如果想用官方提供的binary，glibc 2.18是绕不开的。</strong></p><p>但clangd除了glibc 2.18，没有其它的运行库依赖。可以用<code>ldd</code>命令列出clangd需要的动态运行库。看上去列出了一大堆，其实都是glibc的库：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">[yaland@localhost clangd_16.0.2]$ ldd -r bin/clangd<br>bin/clangd: /lib64/libc.so.6: version `GLIBC_2.18&#x27; not found (required by bin/clangd)<br>linux-vdso.so.1 =&gt;  (0x00007ffde5be4000)<br>libpthread.so.0 =&gt; /lib64/libpthread.so.0 (0x00007f6cac8ab000)<br>librt.so.1 =&gt; /lib64/librt.so.1 (0x00007f6cac6a3000)<br>libdl.so.2 =&gt; /lib64/libdl.so.2 (0x00007f6cac49f000)<br>libm.so.6 =&gt; /lib64/libm.so.6 (0x00007f6cac19d000)<br>libc.so.6 =&gt; /lib64/libc.so.6 (0x00007f6cabdcf000)<br>/lib64/ld-linux-x86-64.so.2 (0x00007f6cacac7000)<br>symbol __cxa_thread_atexit_impl, version GLIBC_2.18 not defined in file libc.so.6 with link time reference(bin/clangd)<br></code></pre></td></tr></table></figure><p>所以理论上，除了glibc 2.18，其它任何东西都不需要自己重新编译。</p><p>单独替换<code>libc.so</code>的版本不行。我试过用<code>LD_LIBRARY_PATH</code>强制加载新版本的<code>libc.so</code>这一个文件，会报一些莫名其妙的错误。因为glibc里的各种库（包括加载程序使用的<code>ld.so</code>）是一个整体，不同版本的是不兼容的。</p><p>此外，<code>ld.so</code>搜索glibc库文件的路径是写死在代码里的。所以你必须要指定一个目录，把glibc编译和安装进去，它才能正常使用。</p><p>因此我的方法是：<strong>编译一个仅用于运行clangd的glibc 2.18，安装到自己的目录里，然后让clangd在运行时动态链接上去。</strong></p><h2 id="执行过程"><a href="#执行过程" class="headerlink" title="执行过程"></a>执行过程</h2><p>需要安装GCC、GNU make等编译工具。一般这些都是系统自带的。如果没有的话也可以通过<code>yum</code>安装。</p><p>CentOS 7自带GNU Make 3.82和GCC 4.8.5。</p><h3 id="编译GLIBC-2-18"><a href="#编译GLIBC-2-18" class="headerlink" title="编译GLIBC 2.18"></a>编译GLIBC 2.18</h3><p>GNU官网下载glibc速度很慢。我这里用的是清华的镜像。</p><p>注意：本过程<strong>不需要</strong>root权限。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># 下载</span><br>wget https://mirrors.tuna.tsinghua.edu.cn/gnu/glibc/glibc-2.18.tar.gz<br><br><span class="hljs-comment"># 解压</span><br>tar -zxf  glibc-2.18.tar.gz<br><span class="hljs-built_in">cd</span> glibc-2.18<br><br><span class="hljs-comment"># 创建一个build目录方便编译</span><br><span class="hljs-built_in">mkdir</span> build<br><span class="hljs-built_in">cd</span> build<br><br><span class="hljs-comment"># prefix选择一个自己的目录</span><br><span class="hljs-comment"># glibc之后会被安装到这个目录</span><br>../configure --prefix=/home/yaland/mylibc<br><br><span class="hljs-comment"># 编译和安装</span><br>make -j4 &amp;&amp; make install<br></code></pre></td></tr></table></figure><p>你可以验证一下你的glibc是否安装成功：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># 跳转到你刚才选择的glibc 2.18安装目录</span><br><span class="hljs-built_in">cd</span> /home/yaland/mylibc/<br><br><span class="hljs-comment"># 检查glibc版本</span><br>./bin/ldd --version<br></code></pre></td></tr></table></figure><p>它应该输出如下内容：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">ldd (GNU libc) 2.18<br>Copyright (C) 2013 Free Software Foundation, Inc.<br>This is free software; see the source for copying conditions.  There is NO<br>warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br>Written by Roland McGrath and Ulrich Drepper.<br></code></pre></td></tr></table></figure><h3 id="用自己的ld-so运行clangd"><a href="#用自己的ld-so运行clangd" class="headerlink" title="用自己的ld.so运行clangd"></a>用自己的ld.so运行clangd</h3><p>在编译好的glibc的<code>lib</code>目录下有个<code>ld-2.18.so</code>，用它运行clangd：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash">/home/yaland/mylibc/lib/ld-2.18.so  /path/to/clangd<br></code></pre></td></tr></table></figure><p>clangd已经可以正常启动了：</p><img src="/2023/07/23/cpp/centos7_clangd/image-20230722114601766.png" class="" alt="image-20230722114601766"><p>你可以把这行命令写成可执行的shell脚本，就可以供VSCode插件使用。</p><h2 id="参考文章"><a href="#参考文章" class="headerlink" title="参考文章"></a>参考文章</h2><p><a href="https://github.com/clangd/vscode-clangd/issues/16">GLIBCs not found on host · Issue #16 · clangd&#x2F;vscode-clangd</a></p><p><a href="https://blog.51cto.com/u_14013325/4895865">安装clangd：‘GLIBC_2.18‘ not found解决 - 李响Superb的技术博客 - 51CTO博客</a></p><p><a href="https://gist.github.com/carlesloriente/ab3387e7d035ed400dc2816873e9089e">Compile and install GLIBC 2.18 in CentOS 7</a></p><p><a href="https://bbs.kanxue.com/thread-254868.htm">关于不同版本 glibc 更换的一些问题-Pwn-看雪-安全社区|安全招聘|kanxue.com</a></p><p><a href="https://stackoverflow.com/questions/55186770/can-ld-preload-be-used-to-load-different-versions-of-glibc">c - Can LD_PRELOAD be used to load different versions of glibc? - Stack Overflow</a></p>]]></content>
    
    
    <categories>
      
      <category>C/C++</category>
      
    </categories>
    
    
  </entry>
  
  
  
  <entry>
    <title>C/C++定义指针与数组时的优先级问题</title>
    <link href="/2022/12/29/cpp/pointer_and_array/"/>
    <url>/2022/12/29/cpp/pointer_and_array/</url>
    
    <content type="html"><![CDATA[<h2 id="指针与数组"><a href="#指针与数组" class="headerlink" title="指针与数组"></a>指针与数组</h2><p>首先记住：<strong>括号的优先级大于星号</strong>。因此：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><code class="hljs cpp"><span class="hljs-type">int</span>* p[<span class="hljs-number">10</span>]; <span class="hljs-comment">// []优先与p结合，所以p是一个长度为10的数组，元素是int指针</span><br>*p[<span class="hljs-number">2</span>] = <span class="hljs-number">10</span>;  <span class="hljs-comment">// 先获取数组下标为2的指针p[2]，再对它解引用，把10写入p[2]指向的那块内存</span><br><br><span class="hljs-built_in">int</span> (*p) [<span class="hljs-number">10</span>];  <span class="hljs-comment">// 小括号内星号与p结合，所以p是一个指针，指向的是长度为10的int数组</span><br>*p[<span class="hljs-number">2</span>] = <span class="hljs-number">10</span>;<br></code></pre></td></tr></table></figure><p>另一个典型的例子是，main函数中常见的传参写法：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs cpp"><span class="hljs-type">char</span>* argv[]; <span class="hljs-comment">// []优先与argv结合，所以这是二级指针char** argv</span><br></code></pre></td></tr></table></figure><p>类似地，对于函数指针，规则也一样适用：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs cpp"><span class="hljs-function"><span class="hljs-type">int</span>* <span class="hljs-title">f</span><span class="hljs-params">()</span></span>;  <span class="hljs-comment">// 括号先与f结合，所以f是一个函数，返回值是int指针</span><br><span class="hljs-built_in">int</span> (*f) ();  <span class="hljs-comment">// 括号内星号与f结合，所以f是一个函数指针，这个函数的返回值是int</span><br></code></pre></td></tr></table></figure><p>第二条规则，<strong>中括号和小括号优先级一样。他们都是从左向右求值</strong>。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs c"><span class="hljs-type">int</span> (*f[<span class="hljs-number">10</span>])(); <span class="hljs-comment">// f是一个数组，数组的元素是指针，指针指向返回值是int的函数</span><br></code></pre></td></tr></table></figure><h2 id="函数指针"><a href="#函数指针" class="headerlink" title="函数指针"></a>函数指针</h2><p>注意区分<strong>函数</strong>和<strong>函数指针</strong>。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><code class="hljs c"><span class="hljs-keyword">typedef</span> <span class="hljs-type">int</span> <span class="hljs-title function_">funcType</span><span class="hljs-params">(<span class="hljs-type">int</span> x)</span>; <span class="hljs-comment">// funcType是一个函数类型，</span><br>funcType f1; <span class="hljs-comment">// f1的声明（declaration）。它们都是funcType这个类型的函数</span><br><br><span class="hljs-type">int</span> <span class="hljs-title function_">f1</span><span class="hljs-params">(<span class="hljs-type">int</span> x)</span> &#123; <span class="hljs-comment">// f1的定义（definition）</span><br>    <span class="hljs-comment">//......</span><br>&#125;<br>funcType f2 = f1; <span class="hljs-comment">// 报错。不能把一个函数赋值给另一个函数。</span><br><br><span class="hljs-keyword">typedef</span> <span class="hljs-title function_">int</span> <span class="hljs-params">(*funcPtr)</span><span class="hljs-params">(<span class="hljs-type">int</span> x)</span>; <span class="hljs-comment">// funcPtr是一个函数指针</span><br>funcPtr p1 = &amp;f1;<br>funcPtr p1 = f1; <span class="hljs-comment">// 和上一行效果一样，但不建议用，容易混淆</span><br>(*p1)(<span class="hljs-number">3</span>); <span class="hljs-comment">// 调用f1</span><br></code></pre></td></tr></table></figure><p><strong>不要混淆<code>func</code>和<code>&amp;func</code>的用法</strong>，虽然很多时候两者通用。</p><p>现在可以丧心病狂一点，把函数、指针和数组结合起来。但是注意，<strong>不能建立函数的数组，只能建立函数指针的数组</strong>。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-type">int</span> f[<span class="hljs-number">10</span>](); <span class="hljs-comment">// 报错。不能建立函数的数组。</span><br><span class="hljs-built_in">int</span> (*f)[<span class="hljs-number">10</span>](); <span class="hljs-comment">// 报错。同上。</span><br><span class="hljs-function"><span class="hljs-type">int</span> <span class="hljs-title">f</span><span class="hljs-params">()</span>[10]</span>; <span class="hljs-comment">// 报错。函数不能返回数组。</span><br></code></pre></td></tr></table></figure><p>挑战一下，吴总给的两个例子：</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs c"><span class="hljs-comment">// 注意搞清楚，哪些是定义的标识符，哪些仅仅是参数名</span><br><span class="hljs-type">char</span> *(* x[<span class="hljs-number">10</span>])(<span class="hljs-type">int</span> **y);<br><span class="hljs-type">int</span> (*(*(*z)(<span class="hljs-type">int</span> *))[<span class="hljs-number">5</span>])(<span class="hljs-type">int</span> *);<br></code></pre></td></tr></table></figure><h2 id="数组指针的运算"><a href="#数组指针的运算" class="headerlink" title="数组指针的运算"></a>数组指针的运算</h2><p>记住指针运算的规则：<strong>指针加一，地址增加一个元素的大小。得到的指针类型不变。</strong></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-built_in">int</span>(*p)[<span class="hljs-number">10</span>]; <span class="hljs-comment">// p是指向int[10]的指针</span><br><br><span class="hljs-comment">// p+1跳过的是一个int[10]的距离。p+1的类型和p相同，都是int(*)[10]</span><br>p + <span class="hljs-number">1</span>;<br><br><span class="hljs-comment">// p[1]等价于*(p+1)。由于解引用，得到的类型是int[10]，即一维数组。</span><br>p[<span class="hljs-number">1</span>];<br></code></pre></td></tr></table></figure><p>对于指向定长数组的指针，分配内存的方式如下：</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-built_in">int</span>(*p)[<span class="hljs-number">10</span>] = <span class="hljs-keyword">new</span> <span class="hljs-type">int</span>[XXX][<span class="hljs-number">10</span>];<br></code></pre></td></tr></table></figure><h2 id="与const的结合"><a href="#与const的结合" class="headerlink" title="与const的结合"></a>与const的结合</h2><p>我没有查到标准里关于const的规则。网上大多的说法是：<strong>const先左结合，如果左边没有了就右结合</strong>。</p><p>先看一级指针。这两种写法是等价的：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><code class="hljs cpp"><span class="hljs-type">int</span> <span class="hljs-type">const</span>* p;<br><span class="hljs-type">const</span> <span class="hljs-type">int</span>* p;<br>p = another_ptr; <span class="hljs-comment">// 合法，p本身可变</span><br>*p = <span class="hljs-number">66</span>; <span class="hljs-comment">// 错误！p指向的值不可变</span><br><br><span class="hljs-type">int</span> *<span class="hljs-type">const</span> p = ptr; <span class="hljs-comment">// const左结合，修饰星号，那么p是常指针</span><br>p = <span class="hljs-literal">NULL</span>; <span class="hljs-comment">// 错误！p本身不可变</span><br>*p = <span class="hljs-number">66</span>;  <span class="hljs-comment">// OK</span><br></code></pre></td></tr></table></figure><p>再看二级指针。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><code class="hljs c"><span class="hljs-type">const</span> <span class="hljs-type">int</span> **p; <span class="hljs-comment">// p[i][j]不可变</span><br><span class="hljs-type">int</span> <span class="hljs-type">const</span> **p; <span class="hljs-comment">// 与上一行等价</span><br><br><span class="hljs-type">int</span> *<span class="hljs-type">const</span> *p; <span class="hljs-comment">// p[i]不可变</span><br><span class="hljs-type">int</span> **<span class="hljs-type">const</span> p; <span class="hljs-comment">// p本身不可变</span><br></code></pre></td></tr></table></figure><p>再加上数组。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs c"><span class="hljs-type">char</span> <span class="hljs-type">const</span> *argv[]; <span class="hljs-comment">// array of pointer to const char</span><br><span class="hljs-type">const</span> <span class="hljs-type">char</span> *argv[]; <span class="hljs-comment">// 同上</span><br><span class="hljs-type">char</span> *<span class="hljs-type">const</span> argv[]; <span class="hljs-comment">// array of const pointer to char</span><br></code></pre></td></tr></table></figure><h2 id="cdecl"><a href="#cdecl" class="headerlink" title="cdecl"></a>cdecl</h2><p><a href="https://cdecl.org/">cdecl: C gibberish ↔ English</a></p><p>这个网站可以将诸如<code>int *p(int (*)[10])</code>之类的复杂定义转换成英语。</p><p>目前已经支持指针数组、函数指针、const。不过不支持typedef，不支持多个标识符名，功能还是有点鸡肋。</p><h2 id="题外话：数组到指针的转换"><a href="#题外话：数组到指针的转换" class="headerlink" title="题外话：数组到指针的转换"></a>题外话：数组到指针的转换</h2><p>以前上课学过，在函数传参的时候，一维数组可以转换为一级指针。</p><p>这里可以理解为，<code>int[N]</code>可以转换为<code>int*</code>类型，而<code>int(*)[N]</code>则不可以。因为前者指向的是单个元素，而后者指向的是N个元素组成的一块。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><code class="hljs cpp"><span class="hljs-type">int</span> a[<span class="hljs-number">10</span>];<br><br><span class="hljs-type">int</span>* p1 = a; <span class="hljs-comment">// OK</span><br><span class="hljs-built_in">int</span>(*p2)[<span class="hljs-number">10</span>] = a; <span class="hljs-comment">// cannot convert ‘int*’ to ‘int (*)[10]’</span><br><span class="hljs-built_in">int</span>(*p3)[<span class="hljs-number">10</span>] = &amp;a; <span class="hljs-comment">// OK</span><br><span class="hljs-built_in">int</span>(*p4)[<span class="hljs-number">20</span>] = &amp;a; <span class="hljs-comment">// cannot convert ‘int (*)[10]’ to ‘int (*)[20]’</span><br></code></pre></td></tr></table></figure><p>实际上，数组作为函数参数传递后会退化会指针。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><code class="hljs cpp"><span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">func</span><span class="hljs-params">(<span class="hljs-type">int</span> arr[<span class="hljs-number">10</span>])</span> </span>&#123;<br>    cout &lt;&lt; <span class="hljs-built_in">sizeof</span>(arr) &lt;&lt; endl;<br>    <span class="hljs-comment">// 输出指针本身的大小，而且编译器会抛警告</span><br>&#125;<br></code></pre></td></tr></table></figure><h2 id="参考文章"><a href="#参考文章" class="headerlink" title="参考文章"></a>参考文章</h2><p><a href="https://en.cppreference.com/w/cpp/language/new">CppReference - new expression</a></p><p><a href="https://stackoverflow.com/questions/3674200/what-does-a-typedef-with-parenthesis-like-typedef-int-fvoid-mean-is-it-a">c - What does a typedef with parenthesis like “typedef int (f)(void)” mean? Is it a function prototype? - Stack Overflow</a></p><p><a href="https://stackoverflow.com/questions/55247995/typedef-function-and-is-it-useful">c - Typedef function and is it useful? - Stack Overflow</a></p><p><a href="https://stackoverflow.com/questions/15280749/correct-way-to-assign-function-pointer">c - correct way to assign function pointer - Stack Overflow</a></p><p><a href="https://stackoverflow.com/questions/31643245/declaring-an-array-of-functions-of-type-void-c">Declaring an array of functions of type void C++ - Stack Overflow</a></p><p><a href="https://stackoverflow.com/questions/1143262/what-is-the-difference-between-const-int-const-int-const-and-int-const">c++ - What is the difference between const int*, const int* const, and int const* ? - Stack Overflow</a></p><p><a href="https://stackoverflow.com/questions/5503352/const-before-or-const-after">c++ - Const before or const after? - Stack Overflow</a></p>]]></content>
    
    
    <categories>
      
      <category>C/C++</category>
      
    </categories>
    
    
  </entry>
  
  
  
  <entry>
    <title>C/C++的自增/自减操作符</title>
    <link href="/2022/11/27/cpp/pre-post-increment/"/>
    <url>/2022/11/27/cpp/pre-post-increment/</url>
    
    <content type="html"><![CDATA[<p>我不喜欢用<code>++</code>和<code>--</code>运算符。主要是他们的各种执行顺序让我觉得很懵，而且早期C&#x2F;C++标准对它们行为定义不严格，所以很容易触发<strong>U</strong>ndefined <strong>B</strong>ehavior。</p><p>不过，最近接触到了C++的<code>atomic</code>类，不可避免地用到了自增&#x2F;自减操作符，所以借此机会补一下知识盲区。</p><h2 id="编译器的警告"><a href="#编译器的警告" class="headerlink" title="编译器的警告"></a>编译器的警告</h2><p>首先，现在的编译器可以检测出自增&#x2F;自减导致的UB。下面这三行代码全都是UB：</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs c"><span class="hljs-type">int</span> n = (i++) + (++i);<br><span class="hljs-type">int</span> b = i + i++;<br>i = i++;<br></code></pre></td></tr></table></figure><p>GCC会抛出一系列警告：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">hello.c:6:15: warning: operation on ‘i’ may be undefined [-Wsequence-point]<br>    6 |     int n = (i++) + (++i);<br>      |             ~~^~~<br></code></pre></td></tr></table></figure><p>所以任何讨论上述代码运行结果是什么的问题，都是无意义的。</p><h2 id="为什么是UB？"><a href="#为什么是UB？" class="headerlink" title="为什么是UB？"></a>为什么是UB？</h2><p>首先我知道，<strong>在一条语句里（严格来说是两个顺序点之间）不能对一个变量赋值两次</strong>，所以上面<code>i = i++</code>这种写法是错误的。</p><p>但是，上面的第二行代码也是UB：</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs c"><span class="hljs-type">int</span> b = i + i++;<br></code></pre></td></tr></table></figure><p>乍一看，只有<code>i++</code>修改了一次<code>i</code>的值。你可能会觉得，程序会先执行<code>b=i+i</code>，最后自增<code>i</code>。但是标准里规定，<strong>如果在两个顺序点之间对一个变量又读又写，则该变量的旧值要全部用于计算变量的新值</strong>。在上面里，<code>i</code>的新值是<code>i++</code>决定的，和第一个<code>i</code>无关，所以该语句非法。</p><h2 id="自增-自减的重载"><a href="#自增-自减的重载" class="headerlink" title="自增&#x2F;自减的重载"></a>自增&#x2F;自减的重载</h2><p>在使用<code>atomic</code>变量或者迭代器的时候，自增&#x2F;自减操作符会很常用。</p><p><code>++</code>的重载是使用类似于下面的定义：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><code class="hljs cpp"><span class="hljs-keyword">class</span> <span class="hljs-title class_">Foo</span> &#123;<br><span class="hljs-comment">/* 前置版本自增操作符重载实现 (++obj) */</span><br>Foo <span class="hljs-keyword">operator</span>++ () &#123;<br><span class="hljs-comment">// ....</span><br>&#125;<br><br><span class="hljs-comment">/* 后置版本自增操作符重载实现 (obj++) */</span><br>Foo <span class="hljs-keyword">operator</span>++ (<span class="hljs-type">int</span>) &#123;<br><span class="hljs-comment">// ....</span><br>&#125;<br>&#125;<br></code></pre></td></tr></table></figure><p>自增是单操作数的运算符，但是后自增却多了一个<code>int</code>参数。这算是一种折中的策略。如果没有这个int，那么前自增和后自增的函数头完全一样，就没办法区分了。对于这个“假参数”，编译器会自动传一个0进去，所以我们不会看到<code>obj++1</code>这样的代码。</p><h2 id="造火箭"><a href="#造火箭" class="headerlink" title="造火箭"></a>造火箭</h2><p>关于自增自减操作符，我看到过很多让我抓狂但偏偏又没错的写法。此处列举几个我见到过的例子。</p><h3 id="p"><a href="#p" class="headerlink" title="*p++"></a><code>*p++</code></h3><p>在Linux内核里我经常看到类似<code>*p++</code>这样的写法。这样的代码要如何解释呢？</p><p>规则：<strong>自增&#x2F;自减运算符的优先级高于解引用</strong>。</p><p>因此<code>p</code>先与<code>++</code>结合，但后自增返回的是旧值，所以解引用解的是<code>p</code>的旧值。</p><h3 id="map-erase-it"><a href="#map-erase-it" class="headerlink" title="map.erase(it++)"></a><code>map.erase(it++)</code></h3><p><a href="https://stackoverflow.com/questions/263945/what-happens-if-you-call-erase-on-a-map-element-while-iterating-from-begin-to">c++ - What happens if you call erase() on a map element while iterating from begin to end? - Stack Overflow</a></p><p><a href="https://stackoverflow.com/questions/23353812/does-stdmaperaseit-maintain-a-valid-iterator-pointing-to-the-next-elemen">c++ - Does std::map::erase(it++) maintain a valid iterator pointing to the next element in the map? - Stack Overflow</a></p><h2 id="参考文章"><a href="#参考文章" class="headerlink" title="参考文章"></a>参考文章</h2><p><a href="https://www.zhihu.com/question/23180989/answer/23874381">为什么在 C 语言中，i&#x3D;1;i&#x3D;(++i)+(++i)+(++i)+(++i); 得到 i 的结果是 15 而不是 14 ？ - 知乎</a></p><p><a href="https://stackoverflow.com/questions/4176328/undefined-behavior-and-sequence-points">c++ - Undefined behavior and sequence points - Stack Overflow</a></p><p><a href="https://blog.csdn.net/shadow_xwl/article/details/125237563#t5">自增（++）自减（–）操作符 - C++操作符重载 - shadow_xwl的CSDN博客</a></p><p><a href="https://stackoverflow.com/questions/12740378/why-use-int-as-an-argument-for-post-increment-operator-overload">c++ - Why use int as an argument for post-increment operator overload? - Stack Overflow</a></p><p><a href="https://en.cppreference.com/w/cpp/language/operator_precedence">C++ Operator Precedence - cppreference.com</a></p>]]></content>
    
    
    <categories>
      
      <category>C/C++</category>
      
    </categories>
    
    
  </entry>
  
  
  
  <entry>
    <title>Git/SSH配置系统代理</title>
    <link href="/2022/11/09/shell/git-ssh-proxy/"/>
    <url>/2022/11/09/shell/git-ssh-proxy/</url>
    
    <content type="html"><![CDATA[<p>今年8月初，Gitee出现部分用户私有仓库故障。时至今日，仍有部分仓库无法恢复。从此以后，我决定用回GitHub。</p><img src="/2022/11/09/shell/git-ssh-proxy/giteeIsDown.png" class="" alt="giteeIsDown"><p>但国内GitHub基本处于不可用的状态，所以需要配置一下代理。</p><h2 id="HTTP代理"><a href="#HTTP代理" class="headerlink" title="HTTP代理"></a>HTTP代理</h2><p>Git在拉取以http&#x2F;https开头的链接时走的是HTTP代理。把相关配置写到<code>~/.gitconfig</code>里，之后所有的访问http仓库的git命令都会走这个代理。</p><p>我本机的Clash代理是在<code>127.0.0.1:7890</code>，所以这么写：</p><figure class="highlight toml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><code class="hljs toml"><span class="hljs-section">[http]</span><br><span class="hljs-attr">proxy</span>= http://<span class="hljs-number">127.0</span>.<span class="hljs-number">0.1</span>:<span class="hljs-number">7890</span><br><span class="hljs-section">[https]</span><br><span class="hljs-attr">proxy</span>= https://<span class="hljs-number">127.0</span>.<span class="hljs-number">0.1</span>:<span class="hljs-number">7890</span><br></code></pre></td></tr></table></figure><h2 id="SSH代理"><a href="#SSH代理" class="headerlink" title="SSH代理"></a>SSH代理</h2><p>Git在拉取以ssh开头的链接时使用的是SSH协议。这时候上面的HTTP代理配置不起作用。需要把相关配置写到<code>~/.ssh/config</code>里：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">Host github.com<br>    Hostname github.com<br>    Port 22<br>    Proxycommand ncat --proxy-type socks5 --proxy 127.0.0.1:7890 %h %p<br></code></pre></td></tr></table></figure><p>配置完成后可以敲<code>ssh git@github.com</code>试试看。看到如下提示，则说明ssh可以正常连接GitHub。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">Hi &lt;你的用户名&gt;! You&#x27;ve successfully authenticated, but GitHub does not provide shell access.<br></code></pre></td></tr></table></figure><h2 id="后记"><a href="#后记" class="headerlink" title="后记"></a>后记</h2><p><a href="https://www.zhihu.com/question/547688379/answer/2673454657">如何看待码云(gitee)大面积故障？ - 知乎</a></p><p>2023年1月2日，我再打开Gitee的时候，这些仓库已经变成空仓库了，一个commit也没有。</p><p>这给了我一个启发。<strong>Git设计的初衷就是分布式</strong>。每个机器都有一份自己的代码仓库。代码托管平台也只是这个网络中的一个公共节点。所以显然不应该把托管平台视为唯一的存储节点。</p>]]></content>
    
    
    <categories>
      
      <category>shell</category>
      
    </categories>
    
    
  </entry>
  
  
  
  <entry>
    <title>使用fzf增强zsh的自动补全</title>
    <link href="/2022/11/03/shell/zsh-fzf/"/>
    <url>/2022/11/03/shell/zsh-fzf/</url>
    
    <content type="html"><![CDATA[<blockquote><p>本文参考自：<a href="https://pragmaticpineapple.com/four-useful-fzf-tricks-for-your-terminal/">4 Useful fzf Tricks for Your Terminal | Pragmatic Pineapple 🍍</a></p></blockquote><p>fzf是什么？fzf是一个按行搜索文本的工具，本质上和grep差不多。但它相比grep的优势在于：</p><ol><li>交互式。你可以实时看到输入的字符串筛选出来的结果。</li><li>默认支持模糊匹配。不需要敲正则表达式，就可以做到类似VSCode里的自动补全效果</li></ol><img src="/2022/11/03/shell/zsh-fzf/basic-fzf-command.gif" class="" alt="Basic usage of the fzf command"><h2 id="安装"><a href="#安装" class="headerlink" title="安装"></a>安装</h2><p>新版Ubuntu可以直接从apt安装。老版的系统也可以从官网下载二进制文件然后放到PATH里。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash">sudo apt install fzf<br></code></pre></td></tr></table></figure><p>zsh自带fzf插件，直接启用即可。以oh my zsh为例，把fzf加到<code>.zshrc</code>的插件列表里：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">plugins=( ...一堆其他插件...  fzf)<br></code></pre></td></tr></table></figure><h2 id="Ctrl-R：fzf-历史搜索命令"><a href="#Ctrl-R：fzf-历史搜索命令" class="headerlink" title="Ctrl+R：fzf + 历史搜索命令"></a>Ctrl+R：fzf + 历史搜索命令</h2><p>在bash&#x2F;zsh里按<code>Ctrl+R</code>会启动历史命令搜索。不过默认的搜索机制比较简陋，只匹配连续字符串。启用fzf插件后，zsh会调用fzf来搜索历史命令。这对于命令很长而我又只记得其中某几个单词的情况特别有用。</p><img src="/2022/11/03/shell/zsh-fzf/fzf-search-history-ctrl-r.gif" class="" alt="Search command history with fzf ctrl-r"><h2 id="Ctrl-T：fzf-补全路径名"><a href="#Ctrl-T：fzf-补全路径名" class="headerlink" title="Ctrl+T：fzf + 补全路径名"></a>Ctrl+T：fzf + 补全路径名</h2><p>按<code>Ctrl+T</code>，fzf会搜索当前目录下的子文件。你可以输入字符筛选，方向键选择，然后Enter补全。</p><img src="/2022/11/03/shell/zsh-fzf/zsh-fzf-ctrlT-find-file.gif" class="" alt="Peek 2022-11-09 19-00"><h2 id="Alt-C：fzf-改变工作路径"><a href="#Alt-C：fzf-改变工作路径" class="headerlink" title="Alt+C：fzf + 改变工作路径"></a>Alt+C：fzf + 改变工作路径</h2><p>按<code>Alt+C</code>，fzf会搜索当前目录下的所有子目录。你可以从中选择一个目录，然后zsh就会<code>chdir</code>到这个目录里。命令行里已经输入的命令不会发生变化。</p><img src="/2022/11/03/shell/zsh-fzf/fzf-change-directory-alt-c.gif" class="" alt="Using fzf alt-c feature">]]></content>
    
    
    <categories>
      
      <category>shell</category>
      
    </categories>
    
    
  </entry>
  
  
  
  <entry>
    <title>磁盘空间分析工具</title>
    <link href="/2022/09/05/tools/disk-usage-tools/"/>
    <url>/2022/09/05/tools/disk-usage-tools/</url>
    
    <content type="html"><![CDATA[<p>偶尔会碰到电脑硬盘空间不够用的情况。这时候我需要知道，是哪些东西消耗了我的硬盘空间。</p><p>这类工具网上能找到一大堆。但我自己用的都是开源免费的，所以界面看上去比较丑。不过，毕竟是白嫖，而且也不经常用，基本功能到位就行。</p><h2 id="WinDirStat"><a href="#WinDirStat" class="headerlink" title="WinDirStat"></a>WinDirStat</h2><p>这个工具在Windows上叫<a href="">WinDirStat</a>，在Linux上叫<a href="https://github.com/shundhammer/qdirstat">QDirStat</a>。</p><p>输出的内容相当详细，不仅有每个文件夹的大小和占比，还有每种文件格式的大小占比。下方还有个可视化图像，方块的大小反映了文件的大小。不过配色好像是随机的，比较阴间。</p><img src="/2022/09/05/tools/disk-usage-tools/winDirStat.png" class="" alt="winDirStat"><blockquote><p>Windows下还有个闭源的<a href="https://diskanalyzer.com/">WizTree</a>，扫描速度据说会快一些。</p></blockquote><h2 id="Baobab"><a href="#Baobab" class="headerlink" title="Baobab"></a>Baobab</h2><p>也叫GNOME Disk Usage Analyzer，属于GNOME系列的应用。界面和配色比QDirStat好看很多，环形图比上面阴间的方块图美观。</p><p>但baobab的缺点是只显示目录，不会具体到单个文件。比如你在foo目录里塞了三个4GB的文件，那你只能看到foo总大小是12GB，看不到foo里面具体有什么。</p><img src="/2022/09/05/tools/disk-usage-tools/image-20220905135935662.png" class="" alt="image-20220905135935662"><h2 id="ncdu"><a href="#ncdu" class="headerlink" title="ncdu"></a>ncdu</h2><p>上面几个都是带图形界面的，适合在个人电脑上用。而<code>ncdu</code>是个命令行工具，更适合在服务器上用。<code>ncdu</code>可以用方向键移动、跳转目录，比传统的<code>du</code>好用很多。</p><p>不过<code>ncdu</code>默认使用类似VIM的操作方式，意味着你按一下<code>D</code>就把文件删除了。所以我一般用<code>-rr</code>参数，以只读模式启动<code>ncdu</code>。</p><p>另外，如果你在目录下挂载了其它文件系统，<code>ncdu</code>也会把它们算进来，这不是我想要的。所以我一般加<code>-x</code>参数，让<code>ncdu</code>不跨越mount point。</p><img src="/2022/09/05/tools/disk-usage-tools/image-20221101160741188.png" class="" alt="image-20221101160741188"><p>最左边一列出现的符号表示<code>ncdu</code>在扫描的时候遇到了一些特殊情况。比如<code>!</code>和<code>.</code>表示有些文件夹没权限访问。具体可以按<code>?</code>查看说明。</p>]]></content>
    
    
    <categories>
      
      <category>实用工具</category>
      
    </categories>
    
    
  </entry>
  
  
  
  <entry>
    <title>Linux命令速查工具</title>
    <link href="/2022/09/03/shell/linux-cmd-search/"/>
    <url>/2022/09/03/shell/linux-cmd-search/</url>
    
    <content type="html"><![CDATA[<p>我在使用Linux命令行的时候经常会记不得参数。Linux自带的手册是<code>man</code>，里面的内容长篇大论，不符合“快速查找”的要求。所以我试过不少搜索工具。</p><p>除了搜索引擎（Google、百度）和论坛博客（CSDN、StackOverflow），我还用过几个命令行搜索工具。</p><h2 id="AI工具（ChatGPT）"><a href="#AI工具（ChatGPT）" class="headerlink" title="AI工具（ChatGPT）"></a>AI工具（ChatGPT）</h2><p>自从22年以ChatGPT为首的大语言模型问世之后，AI成了个很好用的搜索引擎，可以说是懒人必备了。</p><p>不过它无法保证答案的正确性，所以问的问题不能太复杂。</p><img src="/2022/09/03/shell/linux-cmd-search/image-20230515093419556.png" class="" alt="image-20230515093419556"><h2 id="Cheat"><a href="#Cheat" class="headerlink" title="Cheat"></a>Cheat</h2><p><a href="https://github.com/cheat/cheat">cheat</a>命令针对的是那些隔三差五要用但又不是天天用的命令。比如我经常记不得<code>ln</code>命令的参数顺序，就会敲<code>cheat ln</code>。它会输出：</p><figure class="highlight txt"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><code class="hljs txt"># To create a symlink:<br>ln -s &lt;source-location&gt; &lt;symlink-location&gt;<br><br># To symlink, while overwriting existing destination files<br>ln -sf &lt;source-location&gt; &lt;symlink-location&gt;<br></code></pre></td></tr></table></figure><p>cheat目前只包含了200多个命令。稍微偏门一点的命令，比如<code>seq</code>、<code>env</code>，就找不到了。但对于我来说，日常使用cheat足够了。</p><h2 id="TLDR"><a href="#TLDR" class="headerlink" title="TLDR"></a>TLDR</h2><p><a href="https://github.com/tldr-pages/tldr">tldr</a>包含的命令比cheat多很多，输出的内容也更多。tldr的客户端种类很多，有在线的，有PDF文档，也有本地的命令行工具。我用的是它的<a href="https://github.com/tldr-pages/tldr-python-client">Python客户端</a>。</p><img src="/2022/09/03/shell/linux-cmd-search/image-20220903165123240.png" class="" alt="image-20220903165123240"><p>tldr还支持中文，不过只有部分文档翻译了，而且有些是机翻。</p><p>安装方法：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash">pip3 install --user tldr -i https://pypi.tuna.tsinghua.edu.cn/simple<br></code></pre></td></tr></table></figure><p>注：</p><ol><li>如果需要安装在系统目录而不是用户目录下，把<code>--user</code>去掉。</li><li><code>-i</code>后面一长串是使用<a href="https://mirrors.tuna.tsinghua.edu.cn/help/pypi/">清华大学PyPI镜像</a>，速度比官方网站快一些。</li></ol><h2 id="explainshell"><a href="#explainshell" class="headerlink" title="explainshell"></a>explainshell</h2><p><a href="https://explainshell.com/">explainshell.com</a>是一个在线网站，可以帮你查找Linux命令参数的含义。</p><p>Linux命令参数通常包含特殊字符，而且很短，比如<code>netstat -antp</code>。如果我想查<code>-a</code>是什么意思，用搜索引擎或者查man手册我觉得都很麻烦，很难精确匹配。</p><p>explainshell是个不错的网站，它会根据你敲入的命令，自动去抓取man手册里的指定内容。</p><img src="/2022/09/03/shell/linux-cmd-search/image-20230321105315339.png" class="" alt="image-20230321105315339"><p>不过这个网站好久没更新过了。有些参数或命令可能找不到，比如上图的<code>-t</code>。</p>]]></content>
    
    
    <categories>
      
      <category>shell</category>
      
    </categories>
    
    
  </entry>
  
  
  
  <entry>
    <title>Hexo Fluid建站小记</title>
    <link href="/2022/09/02/markdown/blog-building/"/>
    <url>/2022/09/02/markdown/blog-building/</url>
    
    <content type="html"><![CDATA[<p>之前一直用的<a href="https://gohugo.io/">Hugo</a>搭建博客，因为Hugo很便携，不需要配一堆Node.js的环境。但它的生态、主题、帮助文档和<a href="https://hexo.io/zh-cn/">Hexo</a>比起来差远了。所以无奈又回到了用Hexo建站。</p><p>关于如何建站，Fluid主题的<a href="https://fluid-dev.github.io/hexo-fluid-docs/start/">用户手册</a>和Hexo的<a href="https://hexo.io/zh-cn/docs/">官方文档</a>都已经说得很清楚了。这里只记录碰到的一些问题。</p><h2 id="EACCES错误"><a href="#EACCES错误" class="headerlink" title="EACCES错误"></a>EACCES错误</h2><p>用apt安装的Node.js是在系统目录下，这会导致<code>npm install -g hexo-cli</code>出现EACCES权限错误。</p><p>所以我按照Node.js官方指示，安装了<a href="https://github.com/nvm-sh/nvm">nvm</a>，再安装Node.js。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># 安装Nodejs Version Manager (nvm)</span><br>wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash<br><br><span class="hljs-comment"># 用nvm安装Node.js</span><br>nvm install --lts<br><br><span class="hljs-comment"># 安装Hexo</span><br>npm install -g hexo-cli<br></code></pre></td></tr></table></figure><p>这样Node.js和Hexo都安装在用户目录下，就没有EACCES错误了。</p><h2 id="图片路径问题"><a href="#图片路径问题" class="headerlink" title="图片路径问题"></a>图片路径问题</h2><p>Hexo和Hugo一样，在处理图片路径的时候，不是按照Typora等编辑器的逻辑来的，导致生成的网页图片加载不出来。</p><p>我希望的功能是：</p><ol><li>用Typora、Obsidian写作时能正常看到图片</li><li>生成的网页中能够正常加载图片</li><li>图片和文章放一起，而不是用图床</li></ol><p>Hexo默认把所有图片都放到根目录下，生成的地址都是类似于<code>localhost:4000/img.png</code>这样的。但我写Markdown的时候一般都把图片放在同级文件夹下。这种不一致会导致图片加载失败。</p><p>几经搜索，<a href="https://github.com/ZaiZheTingDun/hexo-simple-image">hexo-simple-image</a>插件刚好可以满足我所有需求。使用方法参考它的README以及<a href="https://www.cnblogs.com/cocowool/p/hexo-image-link.html">这篇博客</a>。</p><p>唯一一个小缺点在于，安装了这个插件后，必须把所有图片放到同名文件夹下。如果你在Markdown里引用了无效的图片地址，就无法生成网页，而且没有任何错误说明。</p><img src="/2022/09/02/markdown/blog-building/image-20220902201552586.png" class="" alt="image-20220902201552586">]]></content>
    
    
    <categories>
      
      <category>markdown</category>
      
    </categories>
    
    
  </entry>
  
  
  
  <entry>
    <title>数据重删</title>
    <link href="/2021/10/28/misc/hpc_dedup/"/>
    <url>/2021/10/28/misc/hpc_dedup/</url>
    
    <content type="html"><![CDATA[<blockquote><p>事先声明，我不是研究这个方向的，只是单纯对这一块的知识感兴趣。所以这只是一篇科普性的报告。当然，我会尽可能保证内容的准确性。如果有研究这个方向的大佬，欢迎批评指正！</p></blockquote><hr><h2 id="概念"><a href="#概念" class="headerlink" title="概念"></a>概念</h2><p>Data Deduplication：数据重删，顾名思义就是“将重复多余的数据删除”。</p><p>以下简称dedup。</p><h2 id="百度网盘"><a href="#百度网盘" class="headerlink" title="百度网盘"></a>百度网盘</h2><p>百度网盘有两个我曾经觉得非常神奇的功能。</p><p>一个叫极速秒传。GB级别的大文件，可能几秒钟后就显示“上传完毕”了。</p><img src="/2021/10/28/misc/hpc_dedup/rapid1.png" class="" alt="rapid1"><p>另一个叫违规文件检测。如果你上传了什么包含违法信息的文件，百度不会再允许用户下载这个文件，包括你自己。</p><img src="/2021/10/28/misc/hpc_dedup/censored.webp" class="" alt="censored"><p>这两个功能都使用了dedup技术。具体来说：</p><ul><li>当你在上传一个文件的时候，如果别人的百度网盘里有这个文件，就不用上传了，直接标记一下“你也有这个文件”就行了。</li><li>类似地，所有被查封的违规文件在百度上都有记录。如果你上传了这些违规文件中的某一个，那么百度就会立刻识别出来。</li></ul><p>所以，其实资本家很聪（e）明（xin）的。看上去百度云送了你2T的免费空间，可能真正存到它网盘里的就几KB。</p><p>但从中我们可以看出，dedup有个明显的好处，那就是节（jing）省（hua）空（wang）间（luo）。</p><img src="/2021/10/28/misc/hpc_dedup/dedup_block.png" class="" alt="dedup_block"><p>在云存储领域，dedup往往能够带来巨大的空间节省，特别是当数据的重复度很高的时候。Windows&#x2F;Linux中的文件系统链接（link）其实就是简单的dedup。</p><p>只是这种形式的dedup是用户控制的，只能用在你保存别人分享的文件的时候。</p><h2 id="重复文件识别"><a href="#重复文件识别" class="headerlink" title="重复文件识别"></a>重复文件识别</h2><h3 id="“摘要”"><a href="#“摘要”" class="headerlink" title="“摘要”"></a>“摘要”</h3><p>如果要人来识别重复文件，人会怎么做？最普通的办法是像Linux的<code>diff</code>命令一样，顺序比较。然而这种方法要把整个文件都传到服务器去比较，显然不能满足“秒传”的要求。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-string">&quot;清&quot;</span>华大学是全国最好的大学<br><span class="hljs-string">&quot;北&quot;</span>京大学是全国最好的大学<br>清华大学是<span class="hljs-string">&quot;北京&quot;</span>最好的大学<br></code></pre></td></tr></table></figure><p>要满足“秒传”的要求，显然只能传一小部分。那就只能挑选文件中的某些数据了。但是，怎么挑选呢？而且，并不是所有文件都是可读的文本。二进制文件怎么办？</p><p>这时候一般人的知识就不够用了，需要科学家的智慧了。</p><h3 id="计算指纹"><a href="#计算指纹" class="headerlink" title="计算指纹"></a>计算指纹</h3><p>对于一般的二进制文件，要计算出一个特殊的标识符，可以使用“指纹”技术。</p><p>就像人的指纹一样，计算机也可以生成文件的“指纹”，把一个文件变成一串比较短的数字。这类特殊算法称为<strong>加密哈希函数（Cryptographic Hash Function）</strong>。常见的有MD5和SHA1，熟悉区块链的读者对此应该不陌生。如果你是Linux系统，可以在命令行里试一试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><code class="hljs bash">$ <span class="hljs-built_in">sha1sum</span> foo<br>0c73382a147681d394bd55ca30a22192334bc161  foo<br><br>$ <span class="hljs-built_in">md5sum</span> foo<br>0115ada7f10934a8163a7e2db641610f  foo<br></code></pre></td></tr></table></figure><p>Windows也有类似的函数，我就不举例了。看上去就像随机生成的乱码一样，但这类函数有个神奇的性质：<strong>但凡输入的文件改变一点点，输出的指纹都会完全不同</strong>。这些函数的数学原理我不清楚，有兴趣的读者可以自行了解。</p><p>如果指纹一样怎么办？当然现实换句话说，把一个大文件变小肯定会导致信息丢失，所以两个文件计算出来的指纹有可能是一样的。</p><img src="/2021/10/28/misc/hpc_dedup/sha1.png" class="" alt="sha1"><p>有个简单粗暴的方法，那就是多录一个指纹。你觉得单独用MD5不可靠，那就同时用MD5和SHA512，或者自己发明新的算法，把“指纹”的信息量拉大。</p><h3 id="冲突的概率"><a href="#冲突的概率" class="headerlink" title="冲突的概率"></a>冲突的概率</h3><p>以指纹最长的SHA512为例。SHA512生成的指纹长度只有64B，但它的可靠性已经达到理论上$N(p\ge 10^{-9})&#x3D;1.4×10^{77}$。即，如果全网用户每秒钟上传10亿个文件，理论上要经过$3\times 10^{69}$年，这些文件才会发生冲突，到那时候世界都毁灭了。</p><p>所以，如网上所说，通常来说MD5或SHA1就足够了。有钻牛角尖的人可能会说，万一刚好就发生了小概率事件，用户丢文件了怎么办？那我估计，百度也不会承担责任，实在不行就给你一个单独的人工申诉通道。:thinking:</p><h2 id="分析百度网盘的上传过程"><a href="#分析百度网盘的上传过程" class="headerlink" title="分析百度网盘的上传过程"></a>分析百度网盘的上传过程</h2><p>根据知乎上的提示，我自己试了一下，使用Chrome开发者工具检查网页端的百度网盘上传时的HTTP请求。</p><p>superfile这个链接是普通上传的链接。数据是一段一段上传的，间接解释了为什么百度云支持断点续传。在触发极速秒传的时候，确实是向rapidupload这个URL发送了一串数据。在服务器返回了一串信息后，网页上显示“秒传”。</p><img src="/2021/10/28/misc/hpc_dedup/rapid2.png" class="" alt="rapid2"><p>下面是测试过程中的一些现象和我自己的解释：</p><ol><li><p>上传一个1GB的文件，大概在上传到25%的时候，才直接显示秒传。</p><ul><li>大文件计算指纹是比较耗时的，特别是用户硬盘速度很慢的时候。我自己电脑上，对于1G文件，在没有page cache的情况下计算MD5耗时近1分钟。因此上传和计算是同时进行的。</li></ul></li><li><p>还是这个1GB的文件，当上传速度很快的时候（15MB&#x2F;s），没有秒传。可能是上传速度太快所以没有必要秒传，也可能是MD5计算速度跟不上。下面的结果倾向于后者：</p> <img src="/2021/10/28/misc/hpc_dedup/chrome_throttle.png" class="" alt="chrome_throttle"><ul><li>用Chrome开发者工具限制上传速度为100KB&#x2F;s。但此时任务管理器显示，CPU占用率非常高。过了一分钟后，秒传。应该是程序把md5算完了。</li><li>百度的md5计算速度非常慢，应该是JavaScript性能比C差的缘故。在考虑page cache的情况下，本地的<code>md5sum</code>只需2秒，百度的js要1分钟而且吃的CPU还多。因此，我推测百度网盘客户端会比网页端更快。</li></ul></li><li><p>content-md5字段和我本地MD5校验的结果不同。</p><ul><li>可能是百度云用了不同的算法。但是对于同一个文件，它的content-md5是一样的（这当然是废话）。</li><li>目前我只知道它使用了某个版本的Spark MD5算法，因为有个叫<code>spark-md5.js</code>的文件。</li></ul></li></ol><h2 id="后记"><a href="#后记" class="headerlink" title="后记"></a>后记</h2><p>dedup未必就是以文件为单位。一些云存储系统，比如坚果云，会以块为单位进行去重，实现增量同步。</p><img src="/2021/10/28/misc/hpc_dedup/delta_sync.png" class="" alt="delta_sync"><p>此外，并非所有领域都是在dedup。数据可靠性是需要冗余来保障的。通常这需要一个权衡。</p><ul><li>百度网盘上一个文件如果存在于几万个用户的网盘里的话，一般实际是不需要这么多副本的。</li></ul><p>最后，大多数网上的说法都是点到为止，介绍了一下MD5&#x2F;SHA之类的哈希摘要算法就结束。至于这种系统实现中可能碰到的问题，并没有进行更深入的讨论。我认为可能的原因是，具体技术细节太复杂了很难说清，而且涉及到别人的商业机密。</p><hr><blockquote><p>事后声明，百度和坚果云没有给我打钱。我以百度网盘为例子只是因为用户比较多，并不是在给百度做广告。上面讨论的各种问题都是基于网上的信息以及我自己的思考。如有雷同，纯属巧合。</p></blockquote><h2 id="参考文献"><a href="#参考文献" class="headerlink" title="参考文献"></a>参考文献</h2><p><a href="https://blog.51cto.com/u_13559412/2057144">浅谈存储重删压缩技术（一）【图文】_煮酒论IT_51CTO博客</a></p><p><a href="https://stackoverflow.com/questions/35954964/is-sha-512-collision-resistant">cryptography - is SHA-512 collision resistant? - Stack Overflow</a></p><p><a href="https://en.wikipedia.org/wiki/Birthday_problem#Probability_table">Birthday problem - Wikipedia</a></p><p><a href="https://www.zhihu.com/question/21275365/answer/647397762">百度云的「极速秒传」使用的是什么技术？ - stormfeng的回答 - 知乎</a></p><p><a href="https://www.youtube.com/watch?v=86x2UPqaV7U">NetApp Deduplication and Compression Tutorial - YouTube</a></p><p><a href="https://www.youtube.com/watch?v=dlV2r7f966o">replication、duplication、dedup - YouTube</a></p>]]></content>
    
    
    <categories>
      
      <category>技术杂谈</category>
      
    </categories>
    
    
  </entry>
  
  
  
  <entry>
    <title>x86历史：实模式和保护模式</title>
    <link href="/2021/10/26/arch/x86_real_protected/"/>
    <url>/2021/10/26/arch/x86_real_protected/</url>
    
    <content type="html"><![CDATA[<blockquote><p>本文部分转载自：<a href="https://www.cnblogs.com/neo-01/p/13858397.html">https://www.cnblogs.com/neo-01/p/13858397.html</a></p><p>原标题：《实模式、保护模式和虚拟模式是X86中的概念》</p><p>原作者：海之石</p></blockquote><p>在做MIT6.828 Lab1的时候，我一直不理解：<strong>为什么会有“实模式”和“保护模式”这几个东西？</strong></p><p>实际上，这是Intel CPU为了兼容性而引入的机制。要解释清楚这些东西，要从x86的历史说起。</p><h2 id="长话短说"><a href="#长话短说" class="headerlink" title="长话短说"></a>长话短说</h2><ol><li>实模式下，CPU将段寄存器的值作为偏移量，将逻辑地址转换为物理地址。</li><li>80286保护模式下，分段机制查询段描述符表，将逻辑地址转换为线性地址。这个线性地址等于物理地址。</li><li>80386保护模式下，分段机制查询段描述符表，将逻辑地址转换为线性地址。分页单元查询页表，将线性地址转换为物理地址。</li></ol><p>“虚拟地址”一般指软件层面使用的地址，即上面的“逻辑地址”。</p><hr><br><p>以下是正文部分。</p><h2 id="在x86之前"><a href="#在x86之前" class="headerlink" title="在x86之前"></a>在x86之前</h2><p><a href="https://en.wikipedia.org/wiki/X86#Chronology">x86家族各个处理器的发布时间 - Wikipedia</a></p><p>在微处理器的历史上，第一款微处理器芯片4004是由Intel推出的，那是一个4位的微处理器。在4004之后，intel推出了一款8位处理器8080，它有1个主累加器（寄存器A）和6个次累加器（寄存器B、C、D、E、H和L），几个次累加器可以配对（如组成BC、 DE或HL）用来访问16位的内存地址，也就是说8080可访问到64K内的地址空间。</p><blockquote><p>现代的CPU，地址空间的位数一般比数据总线宽度小很多，比如9代酷睿i5是64位的CPU，但物理地址只有39位。但在上个世纪，对于早期的Intel处理器，情况是相反的。</p></blockquote><p>另外，那时访问内存都是直接使用物理地址。因此程序中的地址必须进行硬编码（给出具体地址），而且也难以重定位。这就不难理解为什么当时的软件大都是些可控性弱、结构简陋，数据处理量小的工控程序了。</p><h2 id="分段"><a href="#分段" class="headerlink" title="分段"></a>分段</h2><p><a href="https://en.wikipedia.org/wiki/X86_memory_segmentation">x86 memory segmentation - Wikipedia</a></p><p>1979年Intel开发出了16位的处理器8086，标志着Intel X86王朝的开始。这也是内存寻址的第一次飞跃。之所以说这是一次飞跃，是因为8086处理器引入了一个重要机制——<strong>分段（Segmentation）</strong>。</p><p>8086处理器的寻址目标是1M大的内存空间，于是它的地址总线扩展到了20位。但是，一个问题摆在了Intel设计人员面前。虽然地址总线宽度是20位的，但是CPU中“算术逻辑运算单元（ALU）”的宽度，即数据总线却只有16位。也就是，可直接加以运算的指针长度是16位的。</p><p>如何填补这个空隙呢？可能的解决方案有多种。例如，可以像一些8位CPU中那样，增设一些20位的指令专用于地址运算和操作，但是那样又会造成CPU内存结构的不均匀。</p><p>当时的PDP-11小型机也是16位的，但是其内存管理单元（MMU）可以将16位的地址映射到24位的地址空间。受此启发，Intel设计了一种在当时看来不失为巧妙的方法，即分段的方法。</p><p>为了支持分段，Intel在8086 CPU中设置了四个<strong>段寄存器（Segment Register）</strong>：CS、DS、SS和ES，分别用于可执行代码段、数据段、堆栈段及其他段。每个段寄存器都是16位的，对应于地址总线中的高16位。每条“访内”指令中的内部地址也都是16位的，但是在送上地址总线之前，CPU内部自动地把它与某个段寄存器中的内容相加。因为段寄存器中的内容对应于20位地址总线中的高16位（就是把段寄存器左移4位），所以相加时实际上是内存总线中的高12位与段寄存器中的16位相加，而低4位保留不变。这样就形成一个20位的物理地址，也就实现了从16位虚拟地址到20位物理地址的转换，或者叫“映射”。</p><img src="/2021/10/26/arch/x86_real_protected/real_mode_segmentation.png" class="" alt="real_mode_segmentation"><p>段式内存管理带来了显而易见的优势，程序的地址不再需要硬编码了，调试错误也更容易定位了，更可贵的是支持更大的内存地址。程序员开始获得了自由。</p><h2 id="保护模式"><a href="#保护模式" class="headerlink" title="保护模式"></a>保护模式</h2><p><a href="https://www.google.com/search?q=why+80286+segment+is+64k&oq=why+80286+segment+is+64k&aqs=chrome..69i57j33i160.8528j0j7&sourceid=chrome&ie=UTF-8">why 80286 segment is 64k - Google 搜索</a></p><p>技术的发展不会就此止步。随着内存容量的增大，1MB也渐渐无法满足要求了。</p><p>Intel的80286处理器于1982年问世了。它的地址总线位数增加到了24位，因此可以访问到16M的内存空间。更重要的是从此开始引进了一个全新理念——<strong>保护模式（Protected Mode）</strong>。这种模式下内存段的访问受到了限制。访问内存时不能直接从段寄存器中获得段的起始地址了，而需要经过额外转换和检查。</p><p>为了和过去兼容，80286内存寻址可以有两种方式，一种是先进的保护模式，另一种是老式的8086方式，被称为<strong>实模式（Real Mode）</strong>。系统启动时处理器处于实模式，只能访问1M空间。如果需要访问完整的16M空间，可以切换到保护模式。</p><p>但是，要想从保护模式返回到实模式，你只有重新启动机器。还有一个致命的缺陷是，80286虽然扩大了访问空间，但它仍然是个16位处理器。这限制了每个段最大为64KB。</p><blockquote><p>一个段是64KB，那么四个段寄存器合起来就是256KB。如果要访问更大的空间，就要切换段寄存器的值。这个切换比较耗时。</p></blockquote><p>因此这个先天低能儿注定寿命不会很久，很快就被天资卓越的兄弟——80386代替了。</p><h2 id="进入32位的时代"><a href="#进入32位的时代" class="headerlink" title="进入32位的时代"></a>进入32位的时代</h2><p><a href="https://en.wikipedia.org/wiki/Protected_mode#386_additions_to_protected_mode">386 additions to protected mode - Protected mode - Wikipedia</a></p><p>时间来到1985年。80386（也叫<strong>i386</strong>）是一个32位的CPU，也就是它的ALU数据总线是32位的。同时它的地址总线与数据总线宽度一致，也是32位。因此，其寻址能力达到4GB。同时，80386的保护模式中增加了很多特性，比如在分段之后增加了<strong>分页（Paging）</strong>。</p><p>从理论上说，当数据总线与地址总线宽度一致时，其CPU结构应该简洁明了。但是，80386无法做到这一点。作为X86产品系列的一员，80386必须维持那些段寄存器的存在，还必须支持实模式，同时又要能支持保护模式。</p><p>在32位总线的条件下，分段机制得以增强，段的最大长度增加到32位。</p><p>这一下真正解放了软件工程师，他们不必再费尽心思去压缩程序规模，软件功能也因此迅速提升。</p><p>从8086的16位到80386的32位处理器，这看起来是处理器位数的变化，但实质上是处理器体系结构的变化。从80386以后，Intel的CPU经历了80486、Pentium、PentiumII、PentiumIII等型号。虽然它们在速度上提高了好几个数量级，功能上也有不少改进，但基本上属于同一种系统结构的改进与加强，而无本质的变化。所以我们把80386以后的处理器统称为<strong>IA32（Intel Architecture, 32-bit）</strong>。</p><h2 id="再后来呢？"><a href="#再后来呢？" class="headerlink" title="再后来呢？"></a>再后来呢？</h2><p>直到今天，x64的机器虽然使用64位的保护模式，但刚启动的时候还是以实模式运行。所以x86的向下兼容性很强。</p><p>不过，随着分页机制的引入以及64位处理器的发展，分段机制渐渐被现代OS抛弃。各个段寄存器要么废弃不用，要么被用于别的用途。</p><h2 id="混乱的称呼"><a href="#混乱的称呼" class="headerlink" title="混乱的称呼"></a>混乱的称呼</h2><p>上面说到，80386后来被更名为i386。此处的“i386”指的是处理器型号。</p><p>但是，IA-32有时候也被称为i386。此处的“i386”指的是一个指令集。举个例子，你在Ubuntu 16.04的下载页面里会看到<code>ubuntu-16.04.6-desktop-i386.iso</code>这个版本。</p><h2 id="补充：实模式与保护模式分段机制的区别"><a href="#补充：实模式与保护模式分段机制的区别" class="headerlink" title="补充：实模式与保护模式分段机制的区别"></a>补充：实模式与保护模式分段机制的区别</h2><p>实模式下的分段机制非常简陋，就是个算术运算，把虚拟地址加个偏移量就是物理地址。80286引入了保护模式，其分段机制有所变化。</p><p>操作系统课上学过，分页会有页表。类似地，分段也有“段表”，用来保存每个段的基本信息。每个段的基本信息包括段的名称、大小、基址等，称为<strong>段描述符（Segment Descriptor）</strong>。这些描述符组一个“段表”，放在内存中。</p><p>实模式的段寄存器保存的是段基址，而保护模式的段寄存器指向“段表”中的某个项。</p><img src="/2021/10/26/arch/x86_real_protected/Protected_mode_segments.svg" class="" alt="Protected_mode_segments"><p>此外，保护模式下，段描述符中包含该段的属性，比如是否可写、是否可执行等。而在实模式下，是没有这些权限检查的。</p>]]></content>
    
    
    <categories>
      
      <category>体系结构</category>
      
    </categories>
    
    
  </entry>
  
  
  
  <entry>
    <title>ZSim/NVMain模拟器编译（HSCC/SHMA）</title>
    <link href="/2021/10/18/arch/mem_zsim_hscc/"/>
    <url>/2021/10/18/arch/mem_zsim_hscc/</url>
    
    <content type="html"><![CDATA[<p>我在网上找到的关于ZSim&#x2F;NVMain混合内存模拟器的编译教程大致分为两大类。一种使用的是原版ZSim，称为<a href="https://github.com/AXLEproject/axle-zsim-nvmain">axle-zsim-nvmain</a>。另一种使用的是华中科技大学计算机学院在原版基础上扩展后的ZSim，称为<a href="https://github.com/CGCL-codes/HSCC">HSCC</a>或<a href="https://github.com/cyjseagull/SHMA">SHMA</a>。</p><p>上面两者我都自己尝试过，也碰到过不少问题，做过一些探究，在此予以记录。对于我未能解决的问题，希望各位读者能够给出答案。</p><p><strong>本文编译的是扩展过的ZSim，即SHMA</strong>。关于axle-zsim-nvmain的编译过程详见我另一篇博客。两个仓库不完全相同，但很多内容是重复的，有些编译或运行错误也是一样的，我不多赘述。</p><h2 id="环境"><a href="#环境" class="headerlink" title="环境"></a>环境</h2><p>操作系统：<strong>Ubuntu 14.04 amd64</strong>（VMWare）</p><p>内核版本：4.4.0-142-generic</p><p>编译器版本：<strong>GCC 4.8.4</strong></p><p>模拟器GitHub代码仓库：<a href="https://github.com/CGCL-codes/HSCC">https://github.com/CGCL-codes/HSCC</a></p><p>模拟器Gitee代码仓库：<a href="https://gitee.com/NVM_Systems/HSCC">https://gitee.com/NVM_Systems/HSCC</a></p><h2 id="编译过程"><a href="#编译过程" class="headerlink" title="编译过程"></a>编译过程</h2><p>相比于原版ZSim，HSCC自己简化了一些配置，但也制造了一些新的问题。下面是我自己的编译过程：</p><ol><li><p>下载代码仓库。HSCC把ZSim、NVMain和PinTool全部放到一起了，直接把整个仓库拉下来即可。然后进入<code>zsim-nvmain</code>文件夹。</p></li><li><p>安装各种依赖库。HSCC的依赖库相比axle-zsim-nvmain多了一个<code>glog</code>。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># 安装工具：scons gcc</span><br>sudo apt install scons g++<br><span class="hljs-comment"># 安装依赖库：hdf5 libconfig libelf</span><br>sudo apt install libhdf5-serial-dev libconfig++-dev libelfg0-dev<br><span class="hljs-comment"># 安装依赖库：glog  boost-regex(1.54)</span><br>sudo apt install libgoogle-glog-dev libboost-regex-dev<br></code></pre></td></tr></table></figure></li><li><p>在<code>zsim-nvmain/env.sh</code>里设置环境变量。按照HSCC的README里的说明设置好。大部分和axle-zsim-nvmain一样，但有几点不同：</p><ol><li>HSCC自带Pin 2.13，所以不需要自己下载了，指向仓库里的位置即可</li><li>我用的boost库是系统库，所以填一个不存在的位置即可</li><li>设置<code>CPLUS_INCLUDE_PATH</code>变量为一个点，具体原因请看文章最后的问题与思考</li><li>其它变量暂时用不到，删了</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><code class="hljs bash">PINPATH=<span class="hljs-variable">$PWD</span>/pin_kit <span class="hljs-comment">#指向pin_kit路径</span><br>NVMAINPATH=<span class="hljs-variable">$PWD</span>/nvmain <span class="hljs-comment">#指向NVMain路径</span><br>ZSIMPATH=<span class="hljs-variable">$PWD</span> <span class="hljs-comment">#指向SConstruct所在路径</span><br><br>BOOST=fakepath<br><br>CPLUS_INCLUDE_PATH=.<br><br><span class="hljs-built_in">export</span> ZSIMPATH PINPATH NVMAINPATH BOOST CPLUS_INCLUDE_PATH<br></code></pre></td></tr></table></figure></li><li><p>在<code>SConstruct</code>第28左右的位置，把<code>echo -e</code>中的<code>-e</code>去掉。具体原因请看后面的问题与思考。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><code class="hljs python"><span class="hljs-keyword">else</span>:<br>    env.Command(versionFile, allSrcs + [<span class="hljs-string">&quot;SConstruct&quot;</span>],<br>    <span class="hljs-comment"># 去掉这里的-e</span><br>    <span class="hljs-string">&#x27;echo -e &quot;#define ZSIM_BUILDDATE \\&quot;&quot;`date`\\&quot;&quot;\\\\n#define ZSIM_BUILDVERSION \\&quot;&quot;no git repo\\&quot;&quot;&quot; &gt;&gt;&#x27;</span> + versionFile)<br></code></pre></td></tr></table></figure></li><li><p>在<code>src/pin_cmd.cpp</code>第53行左右的位置，在几个args pushback里添加一行代码。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs cpp">args.<span class="hljs-built_in">push_back</span>(<span class="hljs-string">&quot;-ifeellucky&quot;</span>);<br></code></pre></td></tr></table></figure><p>因为Pin 2.13对Linux 4.x内核的支持性不好，不加这一行会报错：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">E:4.4 is not a supported linux release<br></code></pre></td></tr></table></figure></li><li><p>编译模拟器。在axle-zsim-nvmain目录下运行scons编译。<code>-j</code>参数可以指定多线程加速编译过程。编译完成后在<code>bin</code>文件夹下会生成<code>libzsim.so</code>和<code>zsim</code>两个文件。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash">scons -j2<br></code></pre></td></tr></table></figure></li><li><p>用<code>ldd</code>命令检查生成的二进制文件，没有库缺失。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">$ ldd -r bin/libzsim.so <br>        linux-vdso.so.1 =&gt;  (0x00007fff4a7f5000)<br>        libconfig++.so.9 =&gt; /usr/lib/x86_64-linux-gnu/libconfig++.so.9 (0x00007fc7961a2000)<br>        libglog.so.0 =&gt; /usr/lib/x86_64-linux-gnu/libglog.so.0 (0x00007fc795f6a000)<br>        libboost_regex.so.1.54.0 =&gt; /usr/lib/x86_64-linux-gnu/libboost_regex.so.1.54.0 (0x00007fc795c63000)<br>        ....<br></code></pre></td></tr></table></figure></li></ol><h2 id="运行模拟器"><a href="#运行模拟器" class="headerlink" title="运行模拟器"></a>运行模拟器</h2><h3 id="配置文件"><a href="#配置文件" class="headerlink" title="配置文件"></a>配置文件</h3><p>这里我选择<code>config/dram.cfg</code>这个配置。这个配置不能直接拿来用，要做一些修改。</p><p>首先是<code>gmMBytes</code>不能太大，至少你内存要能装得下。</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs json">gmMBytes = <span class="hljs-number">8192</span>;    <span class="hljs-comment">// Simulator heap size in MB</span><br></code></pre></td></tr></table></figure><p>然后是负载部分，把<code>process0</code>改成我们自己想要运行的程序。</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs json">process0 = <span class="hljs-punctuation">&#123;</span><br>   command = <span class="hljs-string">&quot;ls -ahl&quot;</span>;<br><span class="hljs-punctuation">&#125;</span>;<br></code></pre></td></tr></table></figure><h3 id="系统设置"><a href="#系统设置" class="headerlink" title="系统设置"></a>系统设置</h3><p>给<code>pinbin</code>二进制文件增加执行权限。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-built_in">chmod</span> +x pin_kit/intel64/bin/pinbin<br></code></pre></td></tr></table></figure><p>同axle-zsim-nvmain，让系统允许Pin向负载进程注入代码：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash">sudo sysctl -w kernel.yama.ptrace_scope=0<br></code></pre></td></tr></table></figure><h3 id="运行结果"><a href="#运行结果" class="headerlink" title="运行结果"></a>运行结果</h3><p>在config文件夹中运行模拟器，配置文件为<code>dram.cfg</code>。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-built_in">cd</span> config<br>../bin/zsim dram.cfg<br></code></pre></td></tr></table></figure><p>运行结果如下。模拟器输出了很多信息，包括DRAM、内存控制器、页表等各个模块的初始化参数。然后是ls命令的输出。由于终端打印出的内容很多，这里只截图输出的最后的一部分。</p><img src="/2021/10/18/arch/mem_zsim_hscc/hscc_result.png" class="" alt="hscc_result"><h2 id="问题与思考"><a href="#问题与思考" class="headerlink" title="问题与思考"></a>问题与思考</h2><p>我在编译HSCC中，碰到过不少问题，也了解过不少命令的含义，为的是想要搞清楚这些问题的来源。在此予以记录，希望能解决一部分人的疑惑。</p><h3 id="确保换行符是LF"><a href="#确保换行符是LF" class="headerlink" title="确保换行符是LF"></a>确保换行符是LF</h3><p><a href="https://github.com/SEAL-UCSB/NVmain/issues/5">An error happen when change a config file · Issue #5 · SEAL-UCSB&#x2F;NVmain</a></p><p>NVMain在读取配置文件的时候无法正常处理CRLF换行符，会把<code>\r</code>也当成是配置的一部分，导致SIGSEGV等问题。</p><h3 id="失败经历：编译依赖库"><a href="#失败经历：编译依赖库" class="headerlink" title="失败经历：编译依赖库"></a>失败经历：编译依赖库</h3><p><a href="https://github.com/s5z/zsim/issues/174">undefined symbol: gzwrite · Issue #174 · s5z&#x2F;zsim</a></p><p>我之所以使用Ubuntu 14.04，是因为诸如boost这些依赖库都可以通过apt安装，没必要自己重新编译。</p><p>我最开始是自己编译ZSim的各个依赖库的，比如boost等。但是最后出现了无法解决的问题。ZSim无法启动，总是提示undefined symbol错误，找不到gzopen&#x2F;gzwrite等函数。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">undefined symbol: gzwrite<br>undefined symbol: gzopen<br></code></pre></td></tr></table></figure><p>我怀疑是hdf5这个库的问题。<code>apt show libhdf5-dev</code>给出的依赖关系中有zlib1g-dev。但是我一开始没装zlib，也成功编译hdf5了。这就有问题了。即使我之后安装了zlib，还是报一样的错误。我用Ubuntu 14和Ubuntu 16都是这样。</p><p>目前我还没有搞清楚为什么自己编译的libhdf5无法链接到系统的zlib。</p><blockquote><p>我猜HSCC是在Ubuntu 12编译的。如果换成Ubuntu 14，就会多出一些问题来。如果换成Ubuntu 16，又会有新的问题。</p></blockquote><h3 id="CPLUS-INCLUDE-PATH"><a href="#CPLUS-INCLUDE-PATH" class="headerlink" title="CPLUS_INCLUDE_PATH"></a>CPLUS_INCLUDE_PATH</h3><p><a href="https://blog.csdn.net/weixin_44327262/article/details/105860213">详解Linux下环境变量C_INCLUDE_PATH、CPLUS_INCLUDE_PATH、CPATH以及常见错误</a></p><p><a href="https://stackoverflow.com/questions/11084123/cplus-include-path-doesnt-works">StackOverflow：CPLUS_INCLUDE_PATH doesn’t works</a></p><p><a href="https://gcc.gnu.org/onlinedocs/gcc/Environment-Variables.html">GCC Environment Variables</a></p><p>我最开始直接用的axle-zsim-nvmain的配置，没有加<code>CPLUS_INCLUDE_PATH</code>这个环境变量，结果编译报错，提示有个头文件找不到：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">In file included from build/opt/page-table/comm_page_table_op.h:7:0,<br>                 from build/opt/tlb/page_table_walker.h:16,<br>                 from build/opt/init.cpp:78:<br>build/opt/tlb/common_func.h:5:34: fatal error: src/memory_hierarchy.h: No such file or directory<br> #include &quot;src/memory_hierarchy.h&quot;<br>                                  ^<br>compilation terminated.<br>scons: *** [build/opt/init.os] Error 1<br>scons: building terminated because of errors.<br></code></pre></td></tr></table></figure><p>问题出在HSCC的<code>tlb/common_func.h</code>这个文件。你会发现和ZSim其他文件相比，唯独这个文件的include加了一个<code>src/</code>。</p><p>结论是，GCC先在当前目录（也就是TLB文件夹）里搜索头文件，找不到。然后就去ZSim代码的根目录（也就是zsim-nvmain&#x2F;src）下搜索，也找不到，于是报错。</p><p>再回过头来看<code>CPLUS_INCLUDE_PATH</code>的作用。我删掉的是env.sh下面这一行：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash">CPLUS_INCLUDE_PATH=<span class="hljs-variable">$CPLUS_INCLUDE_PATH</span>:<span class="hljs-variable">$HDF5</span>/include<br></code></pre></td></tr></table></figure><p>env脚本涉及到不少环境变量，<strong>这些变量必须要export才生效</strong></p><table><thead><tr><th>变量</th><th>作用</th></tr></thead><tbody><tr><td>C_INCLUDE_PATH</td><td>等效于gcc的<code>-I</code>参数，添加头文件的包含路径</td></tr><tr><td>CPLUS_INCLUDE_PATH</td><td>等效于g++的<code>-I</code>参数，添加头文件的包含路径</td></tr><tr><td>LD_LIBRARY_PATH</td><td>增加动态链接库的路径</td></tr><tr><td>LIBRARY_PATH</td><td>增加静态链接库的路径</td></tr></tbody></table><p>使用<code>env</code>命令查看所有环境变量，没有发现上面这几个变量，所以这些变量全部为空。一般系统默认也不会设置这些东西。也就是说，env.sh脚本里的那种用冒号拼接的写法，同时包含了两个路径：当前路径（也就是zsim-nvmain文件夹）和HDF5的路径。</p><p>那么，在zsim-nvmain里当然可以找到<code>src</code>文件夹了。</p><p>解决的方法有很多，只要让GCC能够找到指定的头文件即可。我上面选择了最省事的方法，直接添加一个头文件的搜索路径。<strong>但我不认为这是一种好的做法</strong>，原因在于NVMain文件夹里也有一个<code>src</code>。<strong>随便乱包含头文件的路径的话，很容易导致重名的头文件发生冲突。而GCC对此是不会报错的，它只会使用它最先搜索到的那个文件</strong>。观察ZSim其它文件，加了<code>src</code>前缀的include都是NVMain的头文件。</p><img src="/2021/10/18/arch/mem_zsim_hscc/tlb_src_position.png" class="" alt="tlb_src_position"><h3 id="echo错误"><a href="#echo错误" class="headerlink" title="echo错误"></a>echo错误</h3><p><strong>先说结论：<code>echo</code>是一把双刃剑。如果只是单纯输出字符串和变量，那么<code>echo</code>很方便。但凡是要输出更复杂的东西，比如反斜杠转义，建议用<code>printf</code>命令。因为不同版本的<code>echo</code>命令行为不同。</strong></p><p>之前在编译HSCC的时候，出现过一个错误：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">In file included from build/opt/zsim_harness.cpp:47:0:<br>build/opt/version.h:1:4: error: stray ‘#’ in program<br> -e #define ZSIM_BUILDDATE &quot;Sun Jan 10 15:23:15 CST 2021&quot;<br>    ^<br>build/opt/version.h:1:1: error: expected unqualified-id before ‘-’ token<br> -e #define ZSIM_BUILDDATE &quot;Sun Jan 10 15:23:15 CST 2021&quot;<br> ^<br></code></pre></td></tr></table></figure><p>我查看生成的<code>build/opt/version.h</code>，确实开头多了个<code>-e</code>：</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs c">-e <span class="hljs-meta">#<span class="hljs-keyword">define</span> ZSIM_BUILDDATE <span class="hljs-string">&quot;Sun Jan 10 15:23:15 CST 2021&quot;</span></span><br><span class="hljs-meta">#<span class="hljs-keyword">define</span> ZSIM_BUILDVERSION <span class="hljs-string">&quot;no git repo&quot;</span></span><br></code></pre></td></tr></table></figure><p>我认为这是由于scons底层执行的是C语言的<code>system()</code>函数，而这个函数执行的是<code>dash</code>，而不是我们常用的<code>bash</code>。对于<code>echo -e hello</code>这个命令，到dash里执行就会有个<code>-e</code>，而bash则没有。</p><blockquote><p>axle-zsim-nvmain的echo是没有<code>-e</code>的，而最新的ZSim已经改用<code>printf</code>了。所以我也不知道HSCC为什么会有这个<code>-e</code>。</p></blockquote>]]></content>
    
    
    <categories>
      
      <category>体系结构</category>
      
    </categories>
    
    
  </entry>
  
  
  
  <entry>
    <title>ZSim/NVMain模拟器编译（AXLE-ZSIM-NVMAIN）</title>
    <link href="/2021/10/15/arch/mem_zsim_axle/"/>
    <url>/2021/10/15/arch/mem_zsim_axle/</url>
    
    <content type="html"><![CDATA[<p>我在网上找到的关于ZSim&#x2F;NVMain混合内存模拟器的编译教程大致分为两大类。一种使用的是原版ZSim，称为<a href="https://github.com/AXLEproject/axle-zsim-nvmain">axle-zsim-nvmain</a>。另一种使用的是华中科技大学在原版基础上扩展后的ZSim，称为<a href="https://github.com/CGCL-codes/HSCC">HSCC</a>或<a href="https://github.com/cyjseagull/SHMA">SHMA</a>。</p><p>上面两者我都自己尝试过，也碰到过不少问题，做过一些探究，在此予以记录。对于我未能解决的问题，希望各位读者能够给出答案。</p><p><strong>本文编译的是原版ZSim&#x2F;NVMain，即axle-zsim-nvmain</strong>。关于HSCC的编译过程，请参考我另一篇博客。</p><h2 id="环境"><a href="#环境" class="headerlink" title="环境"></a>环境</h2><p>操作系统：<strong>Ubuntu 12.04 amd64（VMWare）</strong></p><p>内核版本：3.13.0-32-generic</p><p>编译器版本：<strong>GCC 4.6.3</strong></p><p>模拟器GitHub代码仓库：<a href="https://github.com/AXLEproject/axle-zsim-nvmain">https://github.com/AXLEproject/axle-zsim-nvmain</a></p><h2 id="编译模拟器"><a href="#编译模拟器" class="headerlink" title="编译模拟器"></a>编译模拟器</h2><h3 id="下载代码和依赖库"><a href="#下载代码和依赖库" class="headerlink" title="下载代码和依赖库"></a>下载代码和依赖库</h3><p>首先把GitHub仓库的代码拉下来。仓库的<a href="https://github.com/AXLEproject/axle-zsim-nvmain/blob/master/README.md">README</a>和<a href="https://github.com/AXLEproject/axle-zsim-nvmain/blob/master/install.sh">install.sh</a>把步骤都写的很详细了。根据这些内容，我自己的编译过程如下：</p><ol><li><p>下载Intel PinTool并解压。axle-zsim-nvmain用的Pin 2.13，但官网已经不提供下载了。所以我下载了Pin 2.14。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash">wget http://software.intel.com/sites/landingpage/pintool/downloads/pin-2.14-71313-gcc.4.4.7-linux.tar.gz<br></code></pre></td></tr></table></figure></li><li><p>下载NVMain的代码。install.sh用的是bitbucket的链接，但这个仓库已经失效了。所以我用的是GitHub上的NVMain仓库：<a href="https://github.com/SEAL-UCSB/NVmain">https://github.com/SEAL-UCSB/NVmain</a></p></li><li><p>安装依赖包。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash">sudo apt-get install libelfg0-dev libhdf5-serial-dev scons libconfig++-dev libboost-regex-dev g++<br></code></pre></td></tr></table></figure></li><li><p>添加环境变量。有三个环境变量要设置：</p><ol><li><code>PINPATH</code>：PinTool解压后的文件夹</li><li><code>NVMAINPATH</code>：NVMain所在的文件夹</li><li><code>ZSIMPATH</code>：SConstruct所在的文件夹，即axle-zsim-nvmain目录</li></ol><p>由于我把Pin和NVMain都解压在axle的文件夹下面，所以我就直接在axle文件夹里创建了一个env.sh脚本：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><code class="hljs bash">BASEDIR=$(<span class="hljs-built_in">pwd</span>)<br><br>PINPATH=<span class="hljs-variable">$BASEDIR</span>/pintool<br>NVMAINPATH=<span class="hljs-variable">$BASEDIR</span>/nvmain<br>ZSIMPATH=<span class="hljs-variable">$BASEDIR</span><br><br><span class="hljs-built_in">export</span> ZSIMPATH PINPATH NVMAINPATH<br></code></pre></td></tr></table></figure><p>在命令行里<code>sourve env.sh</code>即可。</p></li><li><p>在SConstruct文件的第39行，ZSim会试图从你的环境变量里获取C&#x2F;C++编译器名称。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs python">env[<span class="hljs-string">&#x27;CXX&#x27;</span>] = os.environ[<span class="hljs-string">&quot;CXX&quot;</span>]<br>env[<span class="hljs-string">&#x27;CC&#x27;</span>] = os.environ[<span class="hljs-string">&quot;CC&quot;</span>]<br></code></pre></td></tr></table></figure><p>但Ubuntu上默认没有这两个环境变量，所以这时候编译会报错。解决方法有两种：第一，设置这两个环境变量；第二，改写SConstruct，强制使用GCC。这里我选择第一种方式，在env.sh里加入：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-built_in">export</span> CXX=g++ CC=gcc<br></code></pre></td></tr></table></figure><p>然后<code>source env.sh</code>。</p></li><li><p>在SConstruct第168行，ZSim会从环境变量中获取boost库的位置。但是我的libboost-regex是通过apt装在系统库里的，所以这一步就没必要了。解决方法有两种：第一，设置一个不存在的路径作为BOOST环境变量；第二，改写SConstruct。这里我选择第二种方式，注释掉第168-170行。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><code class="hljs python"><span class="hljs-comment"># Boost regex</span><br><span class="hljs-comment"># BOOST = os.environ[&quot;BOOST&quot;]</span><br><span class="hljs-comment"># env[&quot;CPPPATH&quot;] += [BOOST]</span><br><span class="hljs-comment"># env[&quot;LIBPATH&quot;] += [joinpath(os.environ[&#x27;BOOST&#x27;], &quot;stage/lib&quot;)]</span><br>env[<span class="hljs-string">&quot;LIBS&quot;</span>] += [<span class="hljs-string">&quot;boost_regex&quot;</span>]<br></code></pre></td></tr></table></figure></li><li><p>在nvmain&#x2F;SConscript第36行，有个gem5相关的import。这个是用于gem5模拟器的，但我们是ZSim模拟器。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs python"><span class="hljs-keyword">from</span> os.path <span class="hljs-keyword">import</span> basename<br><span class="hljs-comment"># from gem5_scons import Transform</span><br></code></pre></td></tr></table></figure><p>我按照网上的一般做法，把它注释掉。</p></li></ol><h3 id="兼容Pin-2-14"><a href="#兼容Pin-2-14" class="headerlink" title="兼容Pin 2.14"></a>兼容Pin 2.14</h3><p>axle-zsim-nvmain使用的不是最新版本的ZSim，用Pin 2.14编译会有一些问题。最新的ZSim对SConstruct做了一些修改，兼容了Pin 2.14。但很遗憾，axle-zsim-nvmain并没有更新这些东西。所以我用meld工具把一些内容粘贴过来了。具体修改的内容详见<a href="https://gitee.com/YalandHong/axle-zsim-nvmain/commit/790af103a7b80b8f1e8860e127306bd79d5e04d1">我在Gitee上的commit</a>。</p><p>具体来说，有两个问题：</p><ol><li><p>Pin 2.13里的<code>extras/xed2-intel64</code>在Pin 2.14里改成了<code>extras/xed-intel64</code>。所以SConstruct里涉及到这个路径的地方都要改。否则编译的时候会找不到相关的头文件：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">In file included from pintool/source/include/pin/pin.H:43:0,<br>                 from build/opt/decoder.h:31,<br>                 from build/opt/core.h:30,<br>                 from build/opt/ooo_core.h:32,<br>                 from build/opt/contention_sim.cpp:35:<br>pintool/source/include/pin/level_base.PLH:83:29: fatal error: xed-iclass-enum.h: No such file or directory<br>compilation terminated.<br></code></pre></td></tr></table></figure></li><li><p>在Pin 2.13的<code>intel64/lib-ext</code>里有<code>libdwarf.a</code>和<code>libdwarf.so</code>，到了Pin 2.14只有一个<code>libpindwarf.a</code>。于是，编译的时候会找不到这个库。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">scons: *** [build/opt/libzsim.so] Implicit dependency `pintool/intel64/lib-ext/libdwarf.a&#x27; not found, needed by target `build/opt/libzsim.so&#x27;.<br></code></pre></td></tr></table></figure></li></ol><h3 id="编译代码"><a href="#编译代码" class="headerlink" title="编译代码"></a>编译代码</h3><p>在axle-zsim-nvmain目录下运行scons编译。<code>-j</code>参数可以指定多线程加速编译过程。编译完后会生成<code>zsim</code>和<code>libzsim.so</code>两个文件，放在build&#x2F;opt文件夹里。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash">scons -j2<br></code></pre></td></tr></table></figure><p>这样编译出来的程序带有-O3优化。如果需要debug的话，要加<code>--d</code>参数，生成的程序放在build&#x2F;debug中，与build&#x2F;opt互不干扰。</p><p>用<code>ldd</code>命令检查<code>libzsim.so</code>的动态链接库，没有出现未定义的符号，所有的依赖库都链接到了系统库中。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">$ ldd -r build/opt/libzsim.so<br>linux-vdso.so.1 =&gt;  (0x00007fff53ffe000)<br>libconfig++.so.8 =&gt; /usr/lib/libconfig++.so.8 (0x00007f753e289000)<br>libboost_regex.so.1.46.1 =&gt; /usr/lib/libboost_regex.so.1.46.1 (0x00007f753df87000)<br>libelf.so.0 =&gt; /usr/lib/libelf.so.0 (0x00007f753dd6e000)<br>....<br>libhdf5.so.6 =&gt; /usr/lib/libhdf5.so.6 (0x00007f753d3c6000)<br>libhdf5_hl.so.6 =&gt; /usr/lib/libhdf5_hl.so.6 (0x00007f753d194000)<br>....<br></code></pre></td></tr></table></figure><h2 id="运行模拟器"><a href="#运行模拟器" class="headerlink" title="运行模拟器"></a>运行模拟器</h2><h3 id="修改系统配置"><a href="#修改系统配置" class="headerlink" title="修改系统配置"></a>修改系统配置</h3><p>ZSim的README里有说明，要改几个内核参数。运行如下命令：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs bash">sudo sysctl -w kernel.shmmax=1073741824<br>sudo sysctl -w kernel.yama.ptrace_scope=0<br></code></pre></td></tr></table></figure><h3 id="运行ZSim"><a href="#运行ZSim" class="headerlink" title="运行ZSim"></a>运行ZSim</h3><p>运行方法为<code>zsim &lt;config&gt;</code>。比如，运行axle自带的NVM配置文件：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash">./build/opt/zsim ./tests/AXLE-sandy-nvm.cfg<br></code></pre></td></tr></table></figure><p>这个配置文件有两个负载，分别是<code>ls</code>和<code>cat</code>命令：</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><code class="hljs json"><span class="hljs-comment">// Populate process entries with scripts</span><br><span class="hljs-comment">// Simple example with 2 processes given</span><br>process0 = <span class="hljs-punctuation">&#123;</span><br>    command = <span class="hljs-string">&quot;ls -alh --color tests/&quot;</span>;<br><span class="hljs-punctuation">&#125;</span>;<br><br>process1 = <span class="hljs-punctuation">&#123;</span><br>    command = <span class="hljs-string">&quot;cat tests/simple.cfg&quot;</span>;<br><span class="hljs-punctuation">&#125;</span><br></code></pre></td></tr></table></figure><p>这里command的书写方式有点类似于shell命令。上面的command用的是相对路径，如果zsim换一个路径运行，就可能找不到相应的文件了。command支持环境变量，因此我是这样写的：</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs json">process0 = <span class="hljs-punctuation">&#123;</span><br>    command = <span class="hljs-string">&quot;ls -alh --color $ZSIMPATH/tests/&quot;</span>;<br><span class="hljs-punctuation">&#125;</span>;<br></code></pre></td></tr></table></figure><p>下图是部分输出结果。cat命令的输出太长，所以我就只截图了最后一部分，以及ls的完整输出。cat输出了simple.cfg文件内容，ls则列举了tests文件夹的所有内容。然后两个child done，模拟器退出。</p><img src="/2021/10/15/arch/mem_zsim_axle/zsim_axle_result.png" class="" alt="zsim_axle_result"><h3 id="统计输出"><a href="#统计输出" class="headerlink" title="统计输出"></a>统计输出</h3><p>模拟器运行结束后，在当前目录下会有一系列的统计数据。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">$ ls<br>heartbeat         out.cfg      zsim-ev.h5  zsim.log.0  zsim.out<br>mem-0-nvmain.out  zsim-cmp.h5  zsim.h5     zsim.log.1<br></code></pre></td></tr></table></figure><p>以zsim.out为例。在该文件里，有每个CPU核的统计信息，包括执行的周期数和指令数。此外，还有各级缓存的命中率、每个进程执行的指令数等，以及NVMain内存控制器的统计信息。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">......<br>sandy: # Core stats<br>  sandy-0: # Core stats<br>   cycles: 1492524 # Simulated unhalted cycles<br>   cCycles: 1122301 # Cycles due to contention stalls<br>   instrs: 212977 # Simulated instructions<br>   uops: 227667 # Retired micro-ops<br>   bbls: 46942 # Basic blocks<br><br>......<br><br>mem: # Memory controller stats<br>  mem-0: # Memory controller stats<br>   issued: 12804 # Issued requests<br>   rd: 12804 # Read requests<br>   wr: 0 # Write requests<br>   PUTS: 0 # Clean Evictions (from lower level)<br>   PUTX: 0 # Dirty Evictions (from lower level)<br>......<br></code></pre></td></tr></table></figure><p>至于其他文件，目前我知道的有：</p><ol><li>out.cfg是本次运行的ZSim的所有配置</li><li>zsim.log是每个进程各自的输出日志</li><li>mem-0-nvmain.out是NVMain的详细统计信息</li><li>h5文件则是以二进制格式保存的统计数据，可用pandas等工具读取分析。</li></ol><h2 id="问题与思考"><a href="#问题与思考" class="headerlink" title="问题与思考"></a>问题与思考</h2><p>我在编译ZSim&#x2F;NVMain中，碰到过不少问题，也了解过不少命令的含义，为的是想要搞清楚这个模拟器编译起来会这么复杂。在此予以记录，希望能解决一部分人的疑惑。</p><h3 id="新版本ZSim的一些特点"><a href="#新版本ZSim的一些特点" class="headerlink" title="新版本ZSim的一些特点"></a>新版本ZSim的一些特点</h3><p>axle-zsim-nvmain使用的不是最新版本的ZSim。最新版本的ZSim有一些不同的地方。目前我找到的有：</p><ol><li>没有boost库依赖。</li><li>完美支持Pin 2.14</li><li>NULL改成nullptr</li><li>新增了几个文件（access_tracing、parse_vdso等），zsim.cpp&#x2F;init.cpp&#x2F;zsim.h等代码有大幅度改动。</li></ol><p>我不清楚NVMain还能否兼容最新的ZSim。因为<a href="https://github.com/SEAL-UCSB/NVmain">NVMain官方仓库</a>只给了一个axle-zsim-nvmain的链接，没有说清楚怎么给ZSim打补丁。</p><h3 id="PinTool对于系统和编译器版本的限制"><a href="#PinTool对于系统和编译器版本的限制" class="headerlink" title="PinTool对于系统和编译器版本的限制"></a>PinTool对于系统和编译器版本的限制</h3><p><a href="https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html">GCC官方文档：GXX ABI VERSION</a></p><p><a href="https://gcc.gnu.org/projects/cxx-status.html#cxx11">GCC官方文档：C++11 Support in GCC</a></p><p><a href="https://github.com/wangziqi2013/zsim-base/issues/4">What is the last known working environment for the zsim in this repository?</a></p><p>PinTool不是一个完全开源的工具。它有很多已经编译好的文件。ZSim只能通过静态或动态链接的方式使用它们。这就要求你必须使用匹配的GCC版本，否则编译出来的符号表之类的就会不匹配，导致链接失败。</p><p>在<code>pin_kit/source/include/pin/gen/cc_used_ia32_l.CVH</code>中有如下定义，说明Pin 2.13是用GCC 4.4编译的。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><code class="hljs c"><span class="hljs-meta">#<span class="hljs-keyword">define</span> CC_USED__ 4</span><br><span class="hljs-meta">#<span class="hljs-keyword">define</span> CC_USED_MINOR__ 4</span><br><span class="hljs-meta">#<span class="hljs-keyword">define</span> CC_USED_PATCHLEVEL__ 7</span><br><span class="hljs-meta">#<span class="hljs-keyword">define</span> CC_USED_ABI_VERSION 1002</span><br></code></pre></td></tr></table></figure><p>在<code>pin_kit/source/include/pin/compiler_version_check2.H</code>里，<code>CC_USED_ABI_VERSION</code>必须GCC内置的<code>__GXX_ABI_VERSION</code>匹配，不匹配就不通过。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-meta">#<span class="hljs-keyword">elif</span> CC_USED_ABI_VERSION == 1002</span><br><br><span class="hljs-meta">#<span class="hljs-keyword">if</span> CC_USED_ABI_VERSION != __GXX_ABI_VERSION</span><br><span class="hljs-meta">#<span class="hljs-keyword">error</span> This kit requires gcc 3.4 or later</span><br><span class="hljs-meta">#<span class="hljs-keyword">endif</span></span><br><br><span class="hljs-meta">#<span class="hljs-keyword">else</span></span><br></code></pre></td></tr></table></figure><p>默认使用1002的ABI的只有GCC 3.4和GCC 4.x。</p><p>GCC 3.4这个版本行不通，因为ZSim使用的标准是C++0x。所以我们至少要用GCC 4.6，这也是ZSim作者自己使用的版本。</p><p>综上所述，由于我是Ubuntu用户，同时又不想自己指定ABI版本，我只能用<strong>Ubuntu 12.04</strong>（GCC 4.6）或者<strong>Ubuntu 14.04</strong>（GCC 4.8）了。</p><p>Ubuntu 16使用的是GCC 5，所以不能直接使用。虽然可以手动安装GCC 4.8，但是系统里的各种第三方库可能还是用GCC 5编译的。这种情况下我没办法保证ZSim还能正常工作。</p><h3 id="ptrace与args-push-back-“child”"><a href="#ptrace与args-push-back-“child”" class="headerlink" title="ptrace与args.push_back(“child”)"></a>ptrace与args.push_back(“child”)</h3><p><a href="https://github.com/s5z/zsim#readme">ZSim README</a></p><p><a href="https://github.com/s5z/zsim/issues/109#issuecomment-471288950">Pin 3.0 Compilation · Issue #109 · s5z&#x2F;zsim</a></p><p>ZSim是一个user level模拟器。它需要利用PinTool向负载进程里注入模拟器的代码，即使用<code>ptrace</code>。</p><p>出于安全性，除非是root用户，否则Linux默认只允许父进程跟踪子进程。所以要用<code>sysctl</code>允许任意进程注入代码。这一点在ZSim的README里已经说明了。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash">sudo sysctl -w kernel.yama.ptrace_scope=0<br></code></pre></td></tr></table></figure><p>不过，网上大多数博客使用的是另一种做法：用<code>-injection child</code>参数，让Pin将负载创建为子进程。这样确实可以运行模拟器，但是由于不是ZSim官方的做法，会导致ZSim的一些错误提示失效。有时候，我看到控制台里输出了一个“运行结束”：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">[H] Child 4204 done<br></code></pre></td></tr></table></figure><p>但我查看负载的运行结果，发现负载实际上运行错误了。</p><p>换句话说，这种情况下ZSim不再是负载的父进程，所以无法直接获取到负载的退出状态。你应该通过负载的输出来判断它是否运行正常。</p><h3 id="shmmax"><a href="#shmmax" class="headerlink" title="shmmax"></a>shmmax</h3><p>ZSim在负载进程中注入的数据位于共享内存空间中。这样ZSim在并发运行多个负载时，就不需要重复注入代码。</p><p>Ubuntu 12.04默认的shmmax只有32MB。</p> <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs bash">$ <span class="hljs-built_in">cat</span> /proc/sys/kernel/shmmax<br>33554432<br></code></pre></td></tr></table></figure><p>如果这个值小于ZSim配置文件里的<code>gmMBytes</code>，就会报下面的错误：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs plaintext">[H] Creating global segment, 1024 MBs<br>gm_create failed shmget: Invalid argument<br></code></pre></td></tr></table></figure><p><code>sysctl</code>可以扩大shmmax，单位是byte。我这里设置的是1GB。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs bash">sudo sysctl -w kernel.shmmax=1073741824<br></code></pre></td></tr></table></figure><p>新版本的系统，比如Ubuntu 14，shmmax几乎无限制，那么就不需要自己调shmmax了。</p>]]></content>
    
    
    <categories>
      
      <category>体系结构</category>
      
    </categories>
    
    
  </entry>
  
  
  
  
</search>