VCC/CL: supported standard? (example with universal character names)

Jul 21, 2009 at 5:02 PM
Edited Jul 21, 2009 at 5:10 PM

Hey there,

When I was playing around with VCC/CL and universal character names, I started to come across a couple of questions:
(when I talk about CL, I mean "Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08 for 80x86", as included in MS Visual Studio 2008 Pro (I think))

1)Which C standard is supported by VCC/CL? Judging from the standards which I could get my hands on, VCC/CL15 support more than C90, but not C99 entirely. Is there some comprehensive document available like [1] for C99?
[1] http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1336.pdf

2) What is the role of CL during the verification using VCC?
Please have a look at the code at the end. I tried to annotate as much as possible, please let me known if you cannot follow my train of thought.
If I understood correctly, the C semantics are (somehow) encoded in VCC. And I understand that VCC might not support everything that CL supports. So, what is CL needed for - or is there some other compiler hidden inside VCC that I can switch of using /nocompiler? In the example, I get VCC errors (with and without /nocompiler) but no CL warnings at some code (marked YY).
Right now, my impression is, that CL supports universal character names according to the standard, but VCC supports it partially.

Regards,
Markus

 

#include <vcc.h>

/* one sentence from the ISO/IEC 9899:201x, C99 draft:
6.4.3 Universal Character Names
...A universal character name shall not specify a character whose short identifier
is less than 00A0 other than UCNs not basic character set 0024 ($), 0040 (@),
or 0060 (‘), nor one in the range D800 through DFFF inclusive. ...
(note: universal character names were introduced with C99)*/

int main(void)
//ensures(result == 0)
{
    //what is marked XX runs through with the vcc-option /nocompiler
    char c;
    wchar_t wc;
    
    /* error C3850: '\u0099': a universal-character-name specifies an invalid
    character*/
    //c = '\u009F'; //not allowed by standard, XX

    c = '\u00A0'; //ok
    c = '\u00A1'; //ok
    c = '\u00FF'; //ok
   
    /*warning C4566: character represented by universal-character-name
    '\uD7FF' cannot be represented in the current code page (1252)
    (because ANSI only to FF)*/
    c = '\u0100'; //XX
    c = '\uD7FF'; //XX


    /* "YY"
    the following code,
    - with VCC: error VC0027: Cannot implicitly convert type 'wchar' to 'unsigned __int16'
      (regardless of the /nocompiler option)
    - with CL: no warning/error (because it is according to the C99 standard */
    wc = L'\u0100';
    //wc = L'\uD7FF';
    //wc = L'\uE000';

    //wc = L'\uD800'; //not allowed by standard, XX, error C3850 again (that's right)

return 1;
}

 

 

Coordinator
Jul 21, 2009 at 6:54 PM

Hi Markus,

(1) there is an interesting entry in the Visual C++ Team Blog which specifies, that not all features of the C99 standard is actually supported by Visual Studio 2008 C/C++ compiler.
VCC also does not support a specific C standard. We are currently implementing all those features that are required by our current case study: the Hyper-V Windows Hypervisor. Only those features that are actually tested in our test suite are well tested and are supported by VCC. There is no concrete list with supported and unsupported language features which would show which parts of which C stardard are actually implemented,.

(2) I not to deep into the internals of the compiler, and maybe Stephan or Michal will correct me, but I think that the CL is mainly used as a pre-processor before VCC starts its work. Internally VCC relies on "CCI - Metadata" and "CCI - Code and AST Components" (also CodePlex projects, which are referenced from the VCC home page). So in general the compilers are independant on each other, since VCC is not concerned with code generation, but only with generating input for Boogie.

Cheers,
Markus 

Coordinator
Jul 22, 2009 at 5:02 AM

Hi,

cl.exe essentially plays two roles for us:

  • We use is as a safeguard against invalid C: if the code does not even compiler with cl, then we do not run it through VCC.
  • We use its preprocessor.

The first aspect has become less and less important as we had to tighten the semantic checks in our compiler to deal with C errors in code that is hidden from cl, e.g., in annotations.

As for the supported C standard, it is exactly as Markus describes it: we implement C language features as they are required for our verification efforts (which is currently mostly the Hyper-V verification). If you need a specific feature, feel free to turn it into a request in the issue tracker.

Best regards,

Stephan

 

 

Coordinator
Jul 22, 2009 at 7:09 AM

Another quick remark: the purpose of the /nocompiler (or /nc for short) switch is to skip the first of the two aspects above; for files that have already been run to the preprocessor (.i files), it is also possible to suppress the preprocessor run using the /n switch.

Stephan

Jul 23, 2009 at 3:40 PM

Thanks for clarifying the role of CL. I was kind of expecting this, because it makes perfectly sense using it as some safeguard and preprocessor.

Regarding my example: when I change wc = L'\u0100'; to wc = (unsigned __int16)L'\u0100'; everything works as intended. I guess I cannot call the requirement of an explicit cast an "issue" :-)

Regards,
Markus

 

Coordinator
Jul 23, 2009 at 3:43 PM
Actually you can, but expect low priority :-)
Developer
Aug 16, 2009 at 1:26 AM

The VCC compiler is deliberately more strict with conversions than the standard requires. The compiler tries complain about every case where there can be an implicit loss of precision (for example assigning a 32-bit integer to a 16-bit integer)  or where bits are implicitly re-intepreted (for example when a signed 16-bit integer is re-intepreted as an unsigned 16-bit integer). Because the types of literal values are ambiguous, the compiler has to jump through hoops to correctly infer the types of expressions that involve literal values. There may still be cases where an expression clearly (to a human reader) results in a value that is within range of the type of the target variable but is inferred by the compiler to result in a range of values that is not a subset of the target variable. If possible, such cases should be fixed.

However, simply adopting the anything goes rules of the standard seems unnecessary for a compiler that emphasizes compile-time correctness checks. Also, arguably, an explicit cast makes it clear to the human reader that the programmer is being deliberate rather than careless.