Skip to content

Commit

Permalink
Restrict squid.conf preprocessor space characters to SP and HT (#1858)
Browse files Browse the repository at this point in the history
Here, "preprocessor space" refers to a portion of squid.conf grammar
that covers preprocessor directives (e.g., `if` and `include`
statements) and separation of a directive name from directive
parameters. This change replaces all known xisspace() uses in
preprocessor code with code that only recognizes SP and HT as space
characters.

Prior to this change, Squid preprocessor also classified as space ASCII
VT and FF characters as well as any other locale-specific characters
classified as space by isspace(3).

Squid preprocessor still delimits input lines using LF and terminates
each read line at the first CR or LF character (if any), whichever comes
earlier. This behavior precludes CR and LF characters from reaching
individual directive line parsers. However, directive parameter values
loaded by ConfigParser when honoring "quoted filename" or
parameters(filename) syntax are (still) tokenized using w_space which
includes an ASCII CR character. Thus, a "foo_CR_bar" 9-character
sequence is still interpreted as a single "foo_" token by the
preprocessor and as two tokens ("foo_" and "_bar") by
ConfigParser::strtokFile().

After preprocessing, directive parameters are (still) parsed by parsers
specific to that directive type. Many of those parsers are based on
ConfigParser::TokenParse() that effectively uses the same "SP and HT
only" logic for inlined directive parameters. However, some of those
parsers define "space" differently (and may use non-space-like
characters for token delimiters). For example, response_delay_pool still
indirectly uses isspace(3) to parse integer parameters, and the
following "until eol" directives still use xisspace() to skip space
before their parameter value: debug_options, err_html_text,
mail_program, sslcrtd_program, and sslcrtvalidator_program.

More than 100 uses of xisspace() and indirect uses of isspace(3) remain
in Squid code. Most of those use cases are in configuration parsing
code. Restricting all relevant use cases one-by-one may not be
practical. On the other hand, restricting all configuration input lines
to prohibit VT, FF, CR, and locale-specific characters classified as
space by isspace(3) will break rare configs that use those characters in
places like URL paths and regexes.

Due to inconsistencies highlighted above, there is no consensus
regarding this change among Squid developers. This experiment may need
to be reverted or further grammar changes may need to be implemented
based on testing and deployment experience.

Co-authored-by: Amos Jeffries <[email protected]>
  • Loading branch information
2 people authored and squid-anubis committed Jul 3, 2024
1 parent d7952b9 commit 314e430
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 6 deletions.
23 changes: 18 additions & 5 deletions src/cache_cf.cc
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#include "auth/Config.h"
#include "auth/Scheme.h"
#include "AuthReg.h"
#include "base/CharacterSet.h"
#include "base/PackableStream.h"
#include "base/RunnersRegistry.h"
#include "cache_cf.h"
Expand Down Expand Up @@ -289,10 +290,22 @@ SetConfigFilename(char const *file_name, bool is_pipe)
cfg_filename = file_name;
}

/// Determines whether the given squid.conf character is a token-delimiting
/// space character according to squid.conf preprocessor grammar. That grammar
/// only recognizes two space characters: ASCII SP and HT. Unlike isspace(3),
/// this function is not sensitive to locale(1) and does not classify LF, VT,
/// FF, and CR characters as token-delimiting space. However, some squid.conf
/// directive-specific parsers still define space based on isspace(3).
static bool
IsSpace(const char ch)
{
return CharacterSet::WSP[ch];
}

static const char*
skip_ws(const char* s)
{
while (xisspace(*s))
while (IsSpace(*s))
++s;

return s;
Expand Down Expand Up @@ -370,7 +383,7 @@ trim_trailing_ws(char* str)
{
assert(str != nullptr);
unsigned i = strlen(str);
while ((i > 0) && xisspace(str[i - 1]))
while ((i > 0) && IsSpace(str[i - 1]))
--i;
str[i] = '\0';
}
Expand All @@ -387,7 +400,7 @@ FindStatement(const char* line, const char* statement)
str += len;
if (*str == '\0')
return str;
else if (xisspace(*str))
else if (IsSpace(*str))
return skip_ws(str);
}

Expand Down Expand Up @@ -498,7 +511,7 @@ parseOneConfigFile(const char *file_name, unsigned int depth)
if (file == token)
continue; /* Not a valid #line directive, may be a comment */

while (*file && xisspace((unsigned char) *file))
while (*file && IsSpace(*file))
++file;

if (*file) {
Expand Down Expand Up @@ -556,7 +569,7 @@ parseOneConfigFile(const char *file_name, unsigned int depth)
fatalf("'else' without 'if'\n");
} else if (if_states.empty() || if_states.back()) { // test last if-statement meaning if present
/* Handle includes here */
if (tmp_line_len >= 9 && strncmp(tmp_line, "include", 7) == 0 && xisspace(tmp_line[7])) {
if (tmp_line_len >= 9 && strncmp(tmp_line, "include", 7) == 0 && IsSpace(tmp_line[7])) {
err_count += parseManyConfigFiles(tmp_line + 8, depth + 1);
} else {
try {
Expand Down
2 changes: 1 addition & 1 deletion src/cf_gen.cc
Original file line number Diff line number Diff line change
Expand Up @@ -649,7 +649,7 @@ gen_parse(const EntryList &head, std::ostream &fout)
"parse_line(char *buff)\n"
"{\n"
"\tchar\t*token;\n"
"\tif ((token = strtok(buff, w_space)) == NULL) \n"
"\tif ((token = strtok(buff, \" \\t\")) == NULL) \n"
"\t\treturn 1;\t/* ignore empty lines */\n"
"\tConfigParser::SetCfgLine(strtok(nullptr, \"\"));\n";

Expand Down
28 changes: 28 additions & 0 deletions test-suite/squidconf/bad-pp-space.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
## Copyright (C) 1996-2024 The Squid Software Foundation and contributors
##
## Squid software is distributed under GPLv2+ license and includes
## contributions from numerous individuals and organizations.
## Please see the COPYING and CONTRIBUTORS files for details.
##

# The following line has leading and trailing SP and HT characters around
# directive name. The preprocessor is supposed to silently strip them.
memory_pools off

# The following line has an ASCII VT character (0x0B) after a recognized
# directive name. VT character is followed by a regular SP character to
# make sure preprocessor can isolate (some) directive name.
http_access deny all

# The following line has UTF-8 No-Break Space character (0xC2A0) after a
# recognized directive name. No SP character follows NBSP to test that
# preprocessor does not isolate recognized directive names based on their
# spelling (it should isolate the name token based on spaces instead).
debug_options ALL,1 2,3

# The following line has UTF-8 Wavy Low Line character (0xEFB98F) instead of
# the first ASCII underscore in what would otherwise be a recognized directive
# name. This test validates that preprocessor does not isolate unrecognized
# directive names based on their spelling (it should isolate the name token
# based on spaces instead).
forward﹏max_tries 13
6 changes: 6 additions & 0 deletions test-suite/squidconf/bad-pp-space.conf.instructions
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
expect-failure FATAL:.*Found.3.unrecognized.directive.*
expect-messages <<END
ERROR: unrecognized directive .*http_access
ERROR: unrecognized directive .*debug_options
ERROR: unrecognized directive .*forward.*max_tries
END

0 comments on commit 314e430

Please sign in to comment.