tag:blogger.com,1999:blog-203063759820106893.post5436797640449651593..comments2023-03-25T08:30:26.602-04:00Comments on A Moment of Zen: Weird bugs due to gcc 4.4 and strict aliasingJeffrey Stedfasthttp://www.blogger.com/profile/12271561115384429651noreply@blogger.comBlogger33125tag:blogger.com,1999:blog-203063759820106893.post-78633149014584029302010-02-04T02:54:18.630-05:002010-02-04T02:54:18.630-05:00prqjkuemdz2dbxtf:
This is the last time, hopefull...prqjkuemdz2dbxtf:<br /><br />This is the last time, hopefully I can persuade you, or at least help others not make the same mistakes.<br /><br />"Indeed - it's an expression (6.5.3)"<br /><br />So 6.5p7 doesn't apply to it, but before you insisted that it did.<br /><br />(Actually, its value might be stored in a temporary object. But that doesn't change anything, because that object is only accessed via its effective type; the key point is yet to come).<br /><br />"Yes it is."<br /><br />No, it's not.<br /><br />"No, there's one. We're referring to it by two (incompatible) types, but there's only ever one pointer."<br /><br />This is really a quibble about the definition of "pointer".<br /><br />Here's the key point:<br /><br />Nowhere in the whole standard does it say that the conversion from one pointer type to another yields the <i>same</i> pointer. Not even direct assignment yields the <i>same</i> pointer - it yields two pointers with the same <i>value</i>.<br /><br />Section 6.7.5.1 (if you read it carefully) defines variables which are declared to be of pointer type to be pointers; i.e. a variable declared as a pointer is a pointer. If I point two of them at the same object via assignment from one to the other then I still have two pointer variables, and therefore two pointers.<br /><br />Assigning via a cast makes no difference; I still have two pointers.<br /><br />"The standard doesn't specify how a pointer type is represented; making any assumption whatsoever about its "bitwise" representation is therefore outside the standard."<br /><br />Yes I know, but that's irrelevant anyway. I had assumed you thought the pointer was the "same" after the cast because it was the same "value", and was trying to explain that the same bitwise representation doesn't mean it's the same pointer. It's irrelevant to the argument though.<br /><br />"No. An "object", as defined in section 3.14"<br /><br />See 6.7.5p2, "Each declarator declares one identifier, and asserts that when an operand of the same form as the declarator appears in an expression, it designates a <i>function or object</i> with the scope, storage duration, and type indicated by the declaration specifiers."<br /><br />The emphasis is mine, but it clearly says that a variable declaration designates a object. Variable assignment, then, stores a value in the object; I'll grant that this is not specifically the same as creating the object - it's the declaration which did that. The point is that the "tail" variable designates its own distinct object. "&list" is NOT the same object even if its value is copied via assignment.<br /><br />"To create a new object we therefore need a new region of data storage; no assignment to any pointer type does this"<br /><br />An assignment can be part of a declaration, and a declaration does allocate a region of data storage. As it happens, the code we are talking about <i>could</i> have been written that way (i.e. declaration and assignment at the same time). My mistake for temporarily forgetting that the declaration and assignment were separate; but it doesn't change the validity of my argument.<br /><br />"That I don't "get" something that is not true doesn't unduly concern me."<br /><br />My point was that you're not getting something that <i>is</i> true.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-71748222961006895162010-02-03T16:38:06.923-05:002010-02-03T16:38:06.923-05:00"if you're going to bandy the C99 standar..."if you're going to bandy the C99 standard about in support of your argument, you'd better read it first."<br /><br />Indeed - it would be rather silly not to. That you feel qualified to suggest that I haven't is disappointing.<br /><br /><br />"If you read it you'll find that "&list" is not an object"<br /><br />Indeed - it's an expression (6.5.3).<br /><br /><br />"But, I repeat, "&list" is never dereferenced"<br /><br />Yes it is.<br /><br />tail = (Node *)&list<br />...<br />tail -> ...<br /><br />The "->" is a dereference of &list. That we have a type cast en route doesn't alter that fact.<br /><br /><br />"They are two separate pointers."<br /><br />No, there's one. We're referring to it by two (incompatible) types, but there's only ever one pointer.<br /><br /><br />"The result of the cast is a value which happens to be the same (if you look at it bitwise) as "&list""<br /><br />The standard doesn't specify how a pointer type is represented; making any assumption whatsoever about its "bitwise" representation is therefore outside the standard. Your statement /might/ be correct for some given C implementation, but there's nothing in the standard that guarantees that. As you suggest, reading the standard is illuminating. (Or not, because it doesn't illuminate what it doesn't specify.)<br /><br /><br />"Assigning the result of the cast to "tail" is what "creates a new object"."<br /><br />No. An "object", as defined in section 3.14 (as you graciously refer me to) is a "region of data storage in the execution environment". To create a new object we therefore need a new region of data storage; no assignment to any pointer type does this. You don't create objects via assignment.<br /><br /><br />"This is what you're not getting."<br /><br />That I don't "get" something that is not true doesn't unduly concern me.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-6906147589687475312010-02-03T02:28:48.573-05:002010-02-03T02:28:48.573-05:00prqjkuemdz2dbxtf: if you're going to bandy the...prqjkuemdz2dbxtf: if you're going to bandy the C99 standard about in support of your argument, you'd better read it first.<br /><br />To quote your earlier post,<br />"Regarding compliance, the relevant section of C99 is indeed 6.5 para 7, but you need to read the whole thing."<br /><br />Too right you do. 6.5p7 concerns access to <i>objects</i>: "An <i>object</i> shall have its stored value accessed only by an lvalue expression that has one of<br />the following types:"<br /><br />As it happens the term <i>object</i> is clearly defined by the standard (in the definitions part, funnily enough). If you read it you'll find that "&list" is not an object (it yields a value which might then be stored in an object, sure, but that's another matter).<br /><br />You say: "The relevant access is the dereference of &list (i.e. a Node **) via a Node *"<br /><br />But, I repeat, "&list" is never dereferenced and in fact "&list" is not an object (check the definition), which means 6.5p7 doesn't apply in the sense you seem to think it does. You can't dereference one pointer "via" another one. They are two separate pointers. We could argue about whether the result of casting a "Node **" to a "Node *" can possibly be a valid pointer but 6.5p7 doesn't say anything about it. The result of the cast is a value which happens to be the same (if you look at it bitwise) as "&list", and that value is then stored in the "tail" variable.<br /><br />Even the GCC devs, who decided to ignore 6.5p7, at least understand what it means - the decision to ignore it was/is a conscious one.<br /><br />In your recent post: "Casting doesn't "create a new object" as you suggest"<br /><br />... alright, fine, because there is no object anyway. Assigning the result of the cast to "tail" is what "creates a new object". This is what you're not getting.<br /><br />Eamon: the code might not be the best practice, but the C99 standard is pretty clear on this issue.<br /><br />mmeeks: Following 6.5p7 doesn't imply you need to disable all type-based alias analysis, not by a long shot. Also, you seem to imply that it's better to generate optimized code than correct code. Your complaint is a (valid) criticism of the C language, but your suggestion that GCC is doing the right thing is plain wrong.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-1878457567029036212010-02-02T15:25:59.865-05:002010-02-02T15:25:59.865-05:00davmac: "The assignment: tail = &list; .....davmac: "The assignment: tail = &list; ... takes the "Node **", casts it to "Node *" (effectively creating a new object) and then copies that new object into "tail".<br />"<br /><br />Casting doesn't "create a new object" as you suggest - it just overrides type-checking (in this case). The fact remains that you're taking a pointer to X (aka Node **), casting it to a pointer to Y (aka Node * - they're type-incompatible; that they're one level of indirection apart is irrelevant), and then dereferencing it via the -> operator. That's undefined.<br /><br />At no point do you actually have a pointer-to-a-Node. You have what you're *claiming* (via the cast) is a pointer-to-a-Node, but it isn't.<br /><br />Eamon Nerbonne: "[the compiler] should be able to detect the fact that you're making an invalid cast". Indeed: the compiler can detect (via type checking) that what you're attempting is illegal. The cast is telling the compiler to shut up, because I know best. Without the cast it won't compile, and neither should it.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-58868887902976190742010-02-01T07:56:17.743-05:002010-02-01T07:56:17.743-05:00Urgh; I'm with gcc on this - the optimisations...Urgh; I'm with gcc on this - the optimisations you can do with good aliasing detection are -so- important - that to turn them off to allow this sort of gross-ness to work would be really silly IMHO etc. :-)<br />Writing code that can be easily read, debugged and understood - turns out to have a bonus: the compiler can understand it too :-)Michael Meekshttps://www.blogger.com/profile/09023233635358497287noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-1437043953716464532010-02-01T04:46:43.329-05:002010-02-01T04:46:43.329-05:00The very fact that this discussion is possible in ...The very fact that this discussion is possible in the first place suggests to me that this code is poor practice; whatever the spec says, code should avoid being confusing for no purpose (and there are alternatives which work just fine).<br /><br />Nevertheless, it's also worrying that gcc isn't warning you about a bug it should trivially be able to detect. Read/write dependencies are presumably standard fare for compilers, and it should be able to detect the fact that you're making an invalid cast and then writing to the result (which, due to strict aliasing, is compile time known to be invalid). A user friendly error message might even take into consideration the shape of the struct (to differentiate between nonsense code and invalidly aliasing code).<br /><br />If a compiler won't even warn you about broken code, it's a broken compiler, IMHO.Eamon Nerbonnehttps://www.blogger.com/profile/00388124191987595398noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-40866918677412503932010-02-01T03:22:13.186-05:002010-02-01T03:22:13.186-05:00prqjkuemdz2dbxtf - you are wrong.
You are correct...prqjkuemdz2dbxtf - you are wrong.<br /><br />You are correct that the declared type of "&list" is "Node **", but you are wrong that the object consisting of that "Node **" is ever dereferenced.<br /><br />The assignment:<br /><br />tail = &list;<br /><br />... takes the "Node **", casts it to "Node *" (effectively creating a new object) and then copies that new object into "tail".Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-75425334064756044062010-01-31T12:25:48.117-05:002010-01-31T12:25:48.117-05:00ben: but i think the cast from Node** to Node* is ...ben: but i think the cast from Node** to Node* is invalid (or the dereferencation of the resulting pointer). so i think gcc is still (and rightfully) thinking that Node** cannot alias Node*flohttps://www.blogger.com/profile/15807164647542203204noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-48238752448378262462010-01-31T08:27:49.236-05:002010-01-31T08:27:49.236-05:00The code does not comply with C99, and therefore t...The code does not comply with C99, and therefore the compiler can do whatever it wants, as per section 3.4.3 and section 4 para 2.<br /><br />Regarding compliance, the relevant section of C99 is indeed 6.5 para 7, but you need to read the whole thing. The relevant access is the dereference of &list (i.e. a Node **) via a Node *. The assignments of next pointers, or anything else involving aggregate types, are irrelevant. It'd be similarly non-conformant if list and tail were int * types, for example.<br /><br />The problem is that &list has type Node **, not Node *, as Josh pointed out. At this point if you dereference the resulting pointer then all bets are off; that it works at -O0 is lucky (or unlucky, depending on your point of view).<br /><br />The important bullet in 6.5p7 is the first one - "a type compatible with the effective type of the object", which we contravene. The remaining clauses (we must only satisfy one of them) do not apply: we aren't using qualifiers (#2), we are using integer types (#3, #4), we aren't using an aggregate type - Node * and Node ** are pointer types, not structs or unions; that they <i>point</i> to aggregate types is irrelevant (#5), and we aren't using character types (#6).<br /><br />Saying "tail = (Node *)&list; ...; tail->next = node" therefore does not comply with 6.5p7, and is therefore not valid C. By 3.4.3 and 4p2, gcc could reformat your hard drive, or anything else that it feels like doing.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-52971010186105771152010-01-31T06:30:19.128-05:002010-01-31T06:30:19.128-05:00> Relevant section of the C99 spec is 6.5 parag...> Relevant section of the C99 spec is 6.5 paragraph 7; values may be accessed via "an aggregate or union type that includes" a type compatible to the declared type<br /><br />I do not understand how that is even relevant, tail->next is an lvalue expression of type Node*, list is a stored object of declared and thus effective type Node*.<br /><br />Aliasing restrictions should not even play into this. And it is guaranteed that the struct's first element is right at the beginning of the struct, so I really see no wiggle room to declare this undefined behaviour.Avery G.https://www.blogger.com/profile/14540976102355409529noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-50832542150820057142010-01-30T22:13:42.657-05:002010-01-30T22:13:42.657-05:00davmac: thanks for the info.
I've submitted a...davmac: thanks for the info.<br /><br />I've submitted a bug report to the gcc devs but they've marked it INVALID :-(<br /><br /><a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42907" rel="nofollow">http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42907</a>Jeffrey Stedfasthttps://www.blogger.com/profile/12271561115384429651noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-7906029654505741892010-01-30T21:36:40.511-05:002010-01-30T21:36:40.511-05:00Jeffrey, I'm the author of the article you ref...Jeffrey, I'm the author of the article you reference talking about the GCC interpretation of the aliasing rules. First thing I should point out is that Mysql had a problem because they actually broke the rules, not because of GCC.<br /><br />That being said, according to the standard, your code example here is legal. Those people calling it "broken" simply don't know what they're talking about.<br /><br />Relevant section of the C99 spec is 6.5 paragraph 7; values may be accessed via "an aggregate or union type that includes" a type compatible to the declared type. In this case you're accessing a "Node *" via a pointer to "Node", which is fine because Node is an aggregate type that includes a "Node *" member.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-82070522292498720592010-01-30T21:28:56.749-05:002010-01-30T21:28:56.749-05:00Josh:
Good point, gdb might not be correctly givi...Josh:<br /><br />Good point, gdb might not be correctly giving me what it's really doing there due to optimizing out temporary variables.<br /><br />I'll have to check the asm later.Jeffrey Stedfasthttps://www.blogger.com/profile/12271561115384429651noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-31445529465650856602010-01-30T19:58:46.794-05:002010-01-30T19:58:46.794-05:00RE Update 3: You should know that with optimized c...RE Update 3: You should know that with optimized code, gdb is going to be pretty misleading. Think about how the unroll will look with 2 iterations:<br /><br />list = NULL;<br />tail = (Node *) &list;<br />// loop 0<br />node = malloc (sizeof (Node));<br />node->next = NULL;<br />node->value = 0;<br />tail->next = node;<br />tail2 = node;<br />// loop 1<br />node2 = malloc (sizeof (Node));<br />node2->next = NULL;<br />node2->value = 1;<br />tail2->next = node2;<br />tail3 = node;<br />// ending<br />if (list == NULL)<br />printf ("oops, list is null???\n");<br /><br />I've given a new variable name for each new assignment. Because of the strict aliasing, the NULL list is propagated to the "if", which is now constant so it needs no code generated. Also, tail and tail2 are only used in one place, so we can propagate them. And tail3 can be dropped.<br /><br />// loop 0<br />node = malloc (sizeof (Node));<br />node->next = NULL;<br />node->value = 0;<br />((Node *) &list)->next = node;<br />// loop 1<br />node2 = malloc (sizeof (Node));<br />node2->next = NULL;<br />node2->value = 1;<br />node->next = node2;<br />// ending<br />printf ("oops, list is null???\n");<br /><br />With that, you have roughly the sequence gdb showed you. Check out the disassembled code to see what's really going on.Anonymoushttps://www.blogger.com/profile/01617770411705564231noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-63300323371191155162010-01-30T19:11:49.240-05:002010-01-30T19:11:49.240-05:00I agree that Josh's way is better and, I would...I agree that Josh's way is better and, I would argue, canonical for this sort of trick. I'm sure I've seen it before, and it's exactly what I came up with when I decided to work out a fix prior to reading comments here.Jeff Waldenhttps://www.blogger.com/profile/11969143527962555101noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-73391200821244321172010-01-30T18:41:41.760-05:002010-01-30T18:41:41.760-05:00taschenorakel:
It's not my fault you don'...taschenorakel:<br /><br />It's not my fault you don't understand pointers ;-)Jeffrey Stedfasthttps://www.blogger.com/profile/12271561115384429651noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-14256025578591332482010-01-30T18:27:54.171-05:002010-01-30T18:27:54.171-05:00Jeffrey, you have a very strange definition of &qu...Jeffrey, you have a very strange definition of "correct code". I'd prefer to never work in the same team with you, cause I'd hate to debug your weird and broken hacks.Mathias Hasselmannhttps://www.blogger.com/profile/02409412298907904842noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-86545965353314736822010-01-30T17:58:10.303-05:002010-01-30T17:58:10.303-05:00Damien:
> And I belive myself that a non-trick...Damien:<br /><br />> And I belive myself that a non-tricky <br />> code is better for maintainability <br />> anyway.<br /><br />That's a fair argument, but it's not the same as saying code is broken ;-)Jeffrey Stedfasthttps://www.blogger.com/profile/12271561115384429651noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-66859560637542248852010-01-30T17:52:00.116-05:002010-01-30T17:52:00.116-05:00I understand that you expect it to work just fine....I understand that you expect it to work just fine.<br />But on the other hand modern compilers do apply a lot of rules to the given source code before creating a binary.<br />And I belive myself that a non-tricky code is better for maintainability anyway.<br /><br />In the end, I think that everyone has a point, it's the c99 spec that will decide which behaviour to enforce.Damien Thébaulthttps://www.blogger.com/profile/16745613407696897710noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-27477668404468911212010-01-30T17:34:52.714-05:002010-01-30T17:34:52.714-05:00Damien:
So... the code is bad because it relies o...Damien:<br /><br />So... the code is bad because it relies on assumptions that we can guarantee to be true?<br /><br />Yes, the loop relies on the fact that the first member of Node is the next pointer. However, the next pointer <b><i>is</i></b> the first member. We have control of that.Jeffrey Stedfasthttps://www.blogger.com/profile/12271561115384429651noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-43774627895876458552010-01-30T17:06:25.746-05:002010-01-30T17:06:25.746-05:00I disagree, the code IS bad, it uses a trick to wo...I disagree, the code IS bad, it uses a trick to work.<br /><br />Node *list, *node, *tail;<br />list = NULL;<br />tail = (Node *)&list;<br /><br />Here, "tail", which is a pointer, has a value of "&list", which is the address of the pointer "list".<br /><br />So "tail" points to an address where there's only space for a pointer, in itself it's not very safe.<br /><br />Ok, the trick then, in the for loop:<br /><br />tail->next = node;<br /><br />Here, "tail" still points to the "list" pointer, tail->next is the same as "*tail" because "next" is the first field of the structure and it's a similar pointer too.<br /><br />"tail = &list" from the previous pre-for-loop line, so:<br />"tail->next = node" is the same as:<br />"*tail = node", which is the same as:<br />"*(&list) = node", which translates in the end in:<br />"list = node".<br /><br />For me, this is a trick. Any code that need a cast to compile cleanly with "-Wall -Wextra" has a chance to be buggy.<br /><br />This code uses a *trick*, so if it breaks, it's not that odd.<br />I could add that the code is not very explicit so it's easy to make mistakes or forget that you shall always have "next" as the first attribute of the structure.<br /><br />There's two ways to write the same thing without tricks, the first is to modify "list" in the for loop when needed, the second is to use a variable to act like a "first" element.<br /><br />Both ways works with gcc 4.4, don't use any cast and compile with no real warning with "-Wall -Wextra".<br />Agreed, the first adds two tests to the loop, and the second takes a little more memory.<br />I would go for the second one if performance in this code was the bottleneck of my program.<br /><br />==== 1 ====<br /><br />int main (int argc, char **argv)<br />{<br /> Node *list = NULL, *tail = NULL, *node = NULL;<br /> int i;<br /> <br /> for (i = 0; i < 10; i++) {<br /> node = malloc (sizeof (Node));<br /> node->next = NULL;<br /> node->value = i;<br /> <br /> if(tail != NULL)<br /> tail->next = node;<br /> tail = node;<br /> if(list == NULL)<br /> list = tail;<br /> }<br /> <br /> if (list == NULL)<br /> printf ("oops, list is null???\n");<br /> <br /> return 0;<br />}<br /><br />==== 2 ====<br /><br />int main (int argc, char **argv)<br />{<br /> Node first;<br /> Node *list, *node, *tail;<br /> int i;<br /> <br /> first.next = NULL;<br /> tail = &first;<br /> <br /> for (i = 0; i < 10; i++) {<br /> node = malloc (sizeof (Node));<br /> node->next = NULL;<br /> node->value = i;<br /> <br /> tail->next = node;<br /> tail = node;<br /> }<br /> <br /> list = first.next;<br /> <br /> if (list == NULL)<br /> printf ("oops, list is null???\n");<br /> <br /> return 0;<br />}Damien Thébaulthttps://www.blogger.com/profile/16745613407696897710noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-11848371992376623942010-01-30T14:22:26.709-05:002010-01-30T14:22:26.709-05:00Ben:
Odd. If I use -O2, list is NULL after the lo...Ben:<br /><br />Odd. If I use -O2, list is NULL after the loop with gcc 4.4.1 [gcc-4_4-branch revision 150839]<br /><br />In any case, Josh's explanation of the problem is dead on afaict.Jeffrey Stedfasthttps://www.blogger.com/profile/12271561115384429651noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-26834484219683224922010-01-30T14:13:03.644-05:002010-01-30T14:13:03.644-05:00Josh: yea, that's probably how I will end up f...Josh: yea, that's probably how I will end up fixing it in the long run. In the short term I've simply added -fno-strict-aliasing to the build.Jeffrey Stedfasthttps://www.blogger.com/profile/12271561115384429651noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-86256464265800583912010-01-30T14:11:37.256-05:002010-01-30T14:11:37.256-05:00Richard: see my response to Marcus. My C code is f...Richard: see my response to Marcus. My C code is fine.<br /><br />In fact, if you add the following code before returning 0 in the example program I pasted, you'll see for yourself:<br /><br />printf ("list = ");<br />for (node = list; node; node = node->next)<br /> printf ("%d->", node->value);<br />printf ("NULL\n");<br /><br />prints:<br />list = 0->1->2->3->4->5->6->7->8->9->NULLJeffrey Stedfasthttps://www.blogger.com/profile/12271561115384429651noreply@blogger.comtag:blogger.com,1999:blog-203063759820106893.post-39835897984090338692010-01-30T14:09:05.747-05:002010-01-30T14:09:05.747-05:00Marcus:
You're wrong. The code is not incorre...Marcus:<br /><br />You're wrong. The code is not incorrect. On the first iteration, tail->next points to list.<br /><br />tail->next = node will set the list pointer to node.Jeffrey Stedfasthttps://www.blogger.com/profile/12271561115384429651noreply@blogger.com