In case you did not know, C++ s....

by Manu (modified: 2008 Jun 22)

Last week, I was wrapping some C++ code for a customer and I was surprised that my code did not work properly. I'm not a C++ programmer and thus this error might have been a typical gotcha but still this was a very surprising behavior.

The code in question was:

void my_routine (CppClass my_obj, int buffer_count, char * buffer) { const char * l_cstr = my_obj->getName().c_str(); memcpy(buffer, l_cstr, buffer_count); }

The line that did not work was the `memcpy' call. It was only tested to fail with Visual Studio 2005 on 64-bit.

I have a workaround and a friend of mine gave me some explanation on this problem. But do you know it? And if you do, do you think the rule is safe?

Comments
  • Peter Gummer (16 years ago 23/6/2008)

    You've got me, Manu. I can't see anything wrong, other than the obvious, typical memcpy() problem that we have no way of knowing whether buffer really does have buffer_count bytes in order to avoid a buffer overflow.

    Maybe some byte-alignment problem on certain platforms?

    I look forward to seeing the answer to this mind-twister! Who needs Sudoku when we have C++?

  • Steven Wurster (16 years ago 23/6/2008)

    Well, aside from Peter's comment about the actual size of the buffer, there is also the issue of not terminating the buffer. The platform may assign more space to the buffer than was allocated (for alignment purposes), and the contents of that extra space are undefined. If you do not terminate the buffer at location buffer_count then the contents of that space could be interpreted (most likely incorrectly) by the caller or whomever. Also don't forget that buffer_count within the routine should be one larger than the result of my_obj->getName().size(), and of course that's how large the buffer should be. You need the room for the termination. I'm also assuming that you are just working with char-based strings here, and not wide characters. You could be trying to copy a wide string into a char* for all I know, which means the size of the buffer needs to be even larger than I noted above.

    On a side note, if all you want to do is copy the contents of a std::string, please put it into another std::string. Don't use char* if you don't have to. If you're working with an API that requires the use of const char*, then just pass the result of the c_str() function to that API. Don't bother with the annoyances of C-based memory management and C-based arrays if you can avoid it.

  • Manu (16 years ago 24/6/2008)

    Not yet the answer but the code that works:

    void my_routine (CppClass my_obj, int buffer_count, char * buffer) { std::string l_str = my_obj->getName(); const char * l_cstr = l_str.c_str(); memcpy(buffer, l_cstr, buffer_count); }

    Go figure!

    • Simon Hudon (16 years ago 24/6/2008)

      It seems like getName creates a new string object and returns it by value. It is copied on the stack of my_routine for the sole purpose of evaluating the expression my_obj->getName().c_str().

      After the evaluation, the object goes out of scope and is therefore destroyed. As I recall, c_str returns a pointer to the inner representation of the string object (probably for efficiency reasons, as most rationales in C/C++).

      When the string object is destroyed, its inner representation is freed. The pointer fed to memcpy is therefore nothing but a dangling pointer which explains the erratic behavior you're talking of.

      The fact that you store it in a local variable delays the freeing of the inner representation of the string so memcpy can do its work properly.

      Am I close?

      Simon

      • Manu (16 years ago 24/6/2008)

        Yes, that's exactly what it is. I wonder how many C++ programmer knows about such a subtle semantics.

        Surprisingly, today someone else posted another C++ odity here.

  • Peter Gummer (16 years ago 25/6/2008)

    Ah ha, 99, very clever!

    It's the oldest trick in the book, 99: deterministic finalization.

    Deterministic finalization, Max?

    Yes, 99, they used to use it in the old days. People even used to like it back then; some people even complained when they took deterministic finalization away and gave everyone garbage collection instead. Can you believe it?