#12 Quote escaping doesn't work in variable value

Closed
opened 1 year ago by karen-arutyunov · 2 comments

While trying to link our build2 project with libpkgconf 1.5.4 we have noticed that the variable value parsing logic have changed since libpkgconf 1.4.2.

Previously, the value representation were placed into the the pkgconf_tuple_t::value as is, with only the leading and trailing spaces stripped.

Now, the logic have changed and is the following: if the value starts with a single or double quote character, then the parser also removes all occurences of this characters, unless it is escaped with the backslash character.

So, in particular, the variable values

var1="A" 'B'
var2="A\" \'B'

should endup with the following internal representations:

A 'B'
A" \'B'

Is my understanding correct?

Also, while examining the dequoting() function in tuple.c, trying to understand the latest parsing logic, I noticed that it is broken: the if-clause

{
  i++;
  *bptr++ = *i;
}

can never be executed as the corresponding if-condition is always false. So the var2 variable from the above example ends up with the A\ \'B' value.

Think that the compound if-construct should be something like:

if (!quote && (*i == '\'' || *i == '"'))
  quote = *i;
else if (*i == '\\' && quote && *(i + 1) == quote)
{
  i++;
  *bptr++ = *i;
}
else if (*i != quote)
  *bptr++ = *i;
While trying to link our build2 project with libpkgconf 1.5.4 we have noticed that the variable value parsing logic have changed since libpkgconf 1.4.2. Previously, the value representation were placed into the the pkgconf_tuple_t::value as is, with only the leading and trailing spaces stripped. Now, the logic have changed and is the following: if the value starts with a single or double quote character, then the parser also removes all occurences of this characters, unless it is escaped with the backslash character. So, in particular, the variable values ``` var1="A" 'B' var2="A\" \'B' ``` should endup with the following internal representations: ``` A 'B' A" \'B' ``` Is my understanding correct? Also, while examining the dequoting() function in tuple.c, trying to understand the latest parsing logic, I noticed that it is broken: the if-clause ``` { i++; *bptr++ = *i; } ``` can never be executed as the corresponding if-condition is always false. So the var2 variable from the above example ends up with the `A\ \'B'` value. Think that the compound if-construct should be something like: ``` if (!quote && (*i == '\'' || *i == '"')) quote = *i; else if (*i == '\\' && quote && *(i + 1) == quote) { i++; *bptr++ = *i; } else if (*i != quote) *bptr++ = *i; ```
karen-arutyunov commented 1 year ago
Poster

Just realized that the dequoting logic differs from the one I described in the issue description. The quote character that get removed from the value can be not the first character, so

var=X "A" 'B'

will result in the following value:

X A 'B'

This is quite unexpected, need to say.

As I understand, the only purpose of using the variable value quoting can be preserving leading and trailing spaces. If so, shouldn't the dequoting() function be like this:

static char *
dequote(const char *value)
{
  char *buf = calloc((strlen(value) + 1) * 2, 1);
  char *bptr = buf;
  const char *i;
  char quote = 0;

  if (*value == '\'' || *value == '"')
    quote = *value;

  for (i = value; *i != '\0'; i++)
  {
    if (*i == '\\' && quote && *(i + 1) == quote)
    {
      i++;
      *bptr++ = *i;
    }
    else if (*i != quote)
      *bptr++ = *i;
  }

  return buf;
}

Btw, is there any doc that describes the encoding rules for different contexts?

Just realized that the dequoting logic differs from the one I described in the issue description. The quote character that get removed from the value can be not the first character, so ``` var=X "A" 'B' ``` will result in the following value: ``` X A 'B' ``` This is quite unexpected, need to say. As I understand, the only purpose of using the variable value quoting can be preserving leading and trailing spaces. If so, shouldn't the dequoting() function be like this: ``` static char * dequote(const char *value) { char *buf = calloc((strlen(value) + 1) * 2, 1); char *bptr = buf; const char *i; char quote = 0; if (*value == '\'' || *value == '"') quote = *value; for (i = value; *i != '\0'; i++) { if (*i == '\\' && quote && *(i + 1) == quote) { i++; *bptr++ = *i; } else if (*i != quote) *bptr++ = *i; } return buf; } ``` Btw, is there any doc that describes the encoding rules for different contexts?
karen-arutyunov commented 1 year ago
Poster

Or maybe the proper logic for the dequote() function should be just unwrapping the value, removing the leading and trailing quotes and keeping the rest intact? So, for example, the variable

var=" A "B" 'C' "

would end up with the value

 A "B" 'C' 
Or maybe the proper logic for the dequote() function should be just unwrapping the value, removing the leading and trailing quotes and keeping the rest intact? So, for example, the variable ``` var=" A "B" 'C' " ``` would end up with the value ``` A "B" 'C' ```
Sign in to join this conversation.
No Label
No Milestone
No assignee
1 Participants
Loading...
Cancel
Save
There is no content yet.