Pointers in C and C++
Struggling with pointers in C or C++? This tutorial takes a unique approach to explaining pointers.
AI
Resumo por IA: This codebase represents a historical implementation of the logic described in the metadata. Our preservation engine analyzes the structure to provide context for modern developers.
Código fonte
<hr> <H2>Pointers</H2> <i>The most up-to-date copy of this article can always be found at <a href="http://carbon.cudenver.edu/~tgibson/tutorial/">http://carbon.cudenver.edu/~tgibson/tutorial</a></i> <BR> <H3>Using Variables</H3> <P> Essentially, the computer's memory is made up of bytes. Each byte has a number, <A HREF="http://carbon.cudenver.edu/~tgibson/tutorial/addressDef.html">an address</A>, associated with it. The picture below represents several bytes of a computer's memory. In the picture, addresses 924 thru 940 are shown. </P> <P> <IMG width="560" height="100" SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ribbonAlone.jpg" ALT="Basic ribbon of memory"> </P> Try: <TABLE WIDTH="1"> <TR ALIGN="CENTER"> <TD> <B>C++</B> </TD> <TD> <B>C</B> </TD> </TR> <TR> <TD> <PRE> 1:#include <iostream.h> 2:main() 3:{ 4: float fl=3.14; 5: cout << fl << endl; 6:} </PRE> </TD><TD> <PRE> 1:#include <stdio.h> 2:main() 3:{ 4: float fl=3.14; 5: printf("%.2f\n", fl); 6:} </PRE> </TD></TR></TABLE> At line (4) in the program above, the computer reserves memory for <CODE>fl</CODE>. In our examples, we'll assume that a <CODE>float</CODE> requires 4 bytes. Depending on the computer's architecture, a <CODE>float</CODE> may require 2, 4, 8 or some other number of bytes. <P> <IMG width="560" height="129" SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ribbonFl.jpg" ALT="Variable fl allocated"> </P> When <CODE>fl</CODE> is used in line (5), two distinct steps occur: <OL> <LI>The program finds and <A HREF="http://carbon.cudenver.edu/~tgibson/tutorial/step1Info.html">grabs the address</A> reserved for <CODE>fl</CODE>--in this example 924.</li> <LI>The contents stored at that address are <A HREF="http://carbon.cudenver.edu/~tgibson/tutorial/step2Info.html">retrieved</A></li> </OL> <B>To generalize, whenever <I>any</I> variable is accessed, the above two distinct steps occur to retrieve the contents of the variable.</B> <TABLE WIDTH="560"><TR BGCOLOR="#D3D3D3"><TD> The illustration that shows 3.14 in the computer's memory can be misleading. Looking at the diagram, it appears that "3" is stored in memory location <CODE>924</CODE>, "." is stored in memory location <CODE>925</CODE>, "1" in <CODE>926</CODE>, and "4" in <CODE>927</CODE>. Keep in mind that the computer actually uses an algorithm to convert the floating point number 3.14 into a set of ones and zeros. Each byte holds 8 ones or zeros. So, our 4 byte <CODE>float</CODE> is stored as 32 ones and zeros (8 per byte times 4 bytes). Regardless of whether the number is 3.14, or -273.15, the number is always stored in 4 bytes as a series of 32 ones and zeros. </TD></TR></TABLE> <P> <H3>Separating the Steps</H3> Two operators are provided that, when used, cause these two steps to occur separately. </P> <TABLE BORDER=1> <TR> <TH>operator</TH><TH>meaning</TH><TH>example</TH> </TR> <TR> <TD ALIGN="CENTER"><CODE>&</CODE></TD><TD>do only step 1 on a variable</TD> <TD><CODE>&fl</CODE></TD> </TR> <TR> <TD ALIGN="CENTER"><CODE>*</CODE></TD><TD>do step 2 on a number(address)</TD> <TD><CODE>*some_num</CODE></TD> </TR> </TABLE> <BR> Try this code to see what prints out: <TABLE WIDTH="1"> <TR ALIGN="CENTER"> <TD> <B>C++</B> </TD> <TD> <B>C</B> </TD> </TR> <TR> <TD> <PRE> 1:#include <iostream.h> 2:main() 3:{ 4: float fl=3.14; 5: cout << "fl's address=" << <FONT COLOR="GRAY">(unsigned int)</FONT> &fl << endl; 6:} </PRE> </TD><TD> <PRE> 1:#include <stdio.h> 2:main() 3:{ 4: float fl=3.14; 5: printf("fl's address=%u\n", <FONT COLOR="GRAY">(unsigned int)</FONT> &fl); 6:} </PRE> </TD></TR></TABLE> On line (5) of the example, The <CODE>&</CODE> operator is being used on <CODE>fl</CODE>. On line (5), only step 1 is being performed on a variable: <OL> <LI>The program finds and grabs the address reserved for fl...</li> </OL> It is <CODE>fl</CODE>'s address that is printed to the screen. If the <CODE>&</CODE> operator had not been placed in front of <CODE>fl</CODE>, then step 2 would have occurred as well, and 3.14 would have been printed to the screen. <TABLE WIDTH="560"><TR BGCOLOR="#D3D3D3"><TD> The <FONT COLOR="GRAY"><CODE>(unsigned int)</CODE></FONT> phrase will be discussed later. It is there so that <CODE>&addr</CODE> will print out as a non-negative number. It has been shown in gray to indicate that you must include it for the program to compile properly but that it is not relevant to this current discussion. </TD></TR></TABLE> <P> <HR> Keep in mind that an address is really just a simple number. In fact, we can store an address in an integer variable. Try this: </P> <TABLE WIDTH="1"> <TR ALIGN="CENTER"> <TD> <B>C++</B> </TD> <TD> <B>C</B> </TD> </TR> <TR> <TD> <PRE> 1:#include <iostream.h> 2:main() 3:{ 4: float fl=3.14; 5: unsigned int addr=<FONT COLOR="GRAY">(unsigned int)</FONT> &fl; 6: cout << "fl's address=" << addr << endl; 7:} </PRE> </TD><TD> <PRE> 1:#include <stdio.h> 2:main() 3:{ 4: float fl=3.14; 5: unsigned int addr=<FONT COLOR="GRAY">(unsigned int)</FONT> &fl; 6: printf("fl's address=%u\n", addr); 7:} </PRE> </TD></TR></TABLE> <P> <IMG width="560" height="210" SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ribbonFlAddr1.jpg" ALT="The address of fl is stored in addr"> </P> The above code shows that there is nothing magical about addresses. They are just simple numbers that can be stored in integer variables. <TABLE WIDTH="560"><TR BGCOLOR="#D3D3D3"><TD> The <CODE>unsigned</CODE> keyword at the start of line (5) simply means that the integer will not hold negative numbers. As before, the <FONT COLOR="GRAY"><CODE>(unsigned int)</CODE></FONT> phrase has been shown in gray. It must be included for the code to compile, but is not relevant to this discussion. It will be discussed later. </TD></TR></TABLE> <P> <HR> Now let's test the other operator, the <CODE>*</CODE> operator that retrieves the contents stored at an address: </P> <TABLE WIDTH="1"> <TR ALIGN="CENTER"> <TD> <B>C++</B> </TD> <TD> <B>C</B> </TD> </TR> <TR> <TD> <PRE> 1:#include <iostream.h> 2:main() 3:{ 4: float fl=3.14; 5: unsigned int addr=<FONT COLOR="GRAY">(unsigned int)</FONT> &fl; 6: cout << "fl's address=" << addr << endl; 7: cout << "addr's contents=" << * <FONT COLOR="GRAY">(float*)</FONT> addr << endl; 8:} </PRE> </TD><TD> <PRE> 1:#include <stdio.h> 2:main() 3:{ 4: float fl=3.14; 5: unsigned int addr=<FONT COLOR="GRAY">(unsigned int)</FONT> &fl; 6: printf("fl's address=%u\n", addr); 7: printf("addr's contents=%.2f\n", * <FONT COLOR="GRAY">(float*)</FONT> addr); 8:} </PRE> </TD></TR></TABLE> In line (7), step 2 has been performed on a number: <OL START="2"> <LI>The contents stored at that address [<CODE>addr</CODE>] are retrieved</li> </OL> <TABLE WIDTH="560"><TR BGCOLOR="#D3D3D3"><TD> In order to make line (7) work, a little "syntax sugar" had to be added for the program to compile. Like before, <FONT COLOR="GRAY"><CODE>(float*)</CODE></FONT> is shown in gray because it is not relevant to the current discussion. For the sake of this discussion, just read "<CODE>*<FONT COLOR="GRAY">(float*)</FONT>addr</CODE>" as "<CODE>*addr</CODE>" (that is, ignore the stuff in gray). The code shown in gray will be discussed later. </TD></TR></TABLE> <H3>OK, But why do we need & and *</H3> <P> We have shown that 2 distinct steps occur when accessing a variable, and that we can make those steps occur separately. But why is this useful? </P> <P> To see why, let's first look at how functions work in C/C++. Try this code: </P> <TABLE WIDTH="1"> <TR ALIGN="CENTER"> <TD> <B>C++</B> </TD> <TD> <B>C</B> </TD> </TR> <TR> <TD> <PRE> 1:#include <iostream.h> 2:void somefunc(float fvar) 3:{ 4: fvar=99.9; 5:} 6:main() 7:{ 8: float fl=3.14; 9: somefunc(fl); 10: cout << fl << endl; 11:} </PRE> </TD><TD> <PRE> 1:#include <stdio.h> 2:void somefunc(float fvar) 3:{ 4: fvar=99.9; 5:} 6:main() 7:{ 8: float fl=3.14; 9: somefunc(fl); 10: printf("%.2f\n", fl); 11:} </PRE> </TD></TR></TABLE> What prints out? 3.14? 99.9? It turns out that 3.14 prints out. The general term used to describe this behavior is <i>pass by value</i>. When <CODE>somefunc(fl)</CODE> is called at line 9: <OL> <LI>Execution jumps to line (2) to run the function</li> <LI><CODE>fvar</CODE> is created as its own variable and <CODE>fl</CODE>'s value is copied into <CODE>fvar</CODE><br> <IMG width="560" height="210" SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ribbonFlFvar1.jpg" ALT="fvar stores value passed into function"></li> <LI>On line (4), 99.9 is assigned to fvar<br> <IMG width="560" height="240" SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ribbonFlFvar2.jpg" ALT="99.9 is assigned to fvar"></li> <LI>Now that the function is finished, execution resumes in <CODE>main</CODE> where it left off (line 10). The <CODE>fl</CODE> variable is unchanged, 3.14 prints out.</li> </OL> <HR> We can circumvent this <i>pass by value</i> behavior and change values passed into functions by using the <CODE>&</CODE> and <CODE>*</CODE> operators. <TABLE WIDTH="1"> <TR ALIGN="CENTER"> <TD> <B>C++</B> </TD> <TD> <B>C</B> </TD> </TR> <TR> <TD> <PRE> 1:#include <iostream.h> 2:void somefunc(unsigned int fptr) 3:{ 4: *(float*)fptr=99.9; 5:} 6: 7:main() 8:{ 9: float fl=3.14; 10: unsigned int addr=(unsigned int) &fl; 11: somefunc(addr); 12: cout << fl << endl; 13:} </PRE> </TD><TD> <PRE> 1:#include <stdio.h> 2:void somefunc(unsigned int fptr) 3:{ 4: *(float*)fptr=99.9; 5:} 6: 7:main() 8:{ 9: float fl=3.14; 10: unsigned int addr=(unsigned int) &fl; 11: somefunc(addr); 12: printf("%.2f\n", fl); 13:} </PRE> </TD></TR></TABLE> Quite simply, the two steps that normally occur when accessing a variable are being separated to allow us to change the variable's value in a different function. <OL> <LI>The floating point variable fl is created at line (9) and given the value 3.14<br> <IMG SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ribbonFl.jpg" ALT="Variable fl allocated"></li> <LI>The <CODE>&</CODE> operator is used on fl at line (10) (do only step 1, get the address). The address is stored in the integer variable <CODE>addr</CODE>.<br> <IMG width="560" height="210" SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ribbonFlAddr1.jpg" ALT="The address of fl is stored in addr"></li> <LI>The function <CODE>somefunc</CODE> is called at line (at line 11) and <CODE>fl</CODE>'s address is passed as an argument.</li> <LI>The function <CODE>somefunc</CODE> begins at line (2), <CODE>fptr</CODE> is created and <CODE>fl</CODE>'s address is copied into fptr.<br> <IMG width="560" height="210" SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ribbonFlAddrFptr1.jpg" ALT="The argument addr is copied to fptr"></li> <LI>The <CODE>*</CODE> operator is used on <CODE>fptr</CODE> at line (4) -- do step 2, the contents stored in an address are retrieved. In this example, the contents at address 924 are retrieved.</li> <LI>The contents at address 924 are assigned the value <CODE>99.9</CODE>.<br> <IMG width="560" height="230" SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ribbonFlAddrFptr2.jpg" ALT="99.9 is assigned to fl"></li> <LI>The function finishes. Control returns to line (12).</li> <LI>The contents of <CODE>fl</CODE> are printed to the screen.</li> </OL> <h3>Pointer Variables</h3> Even though we have shown that an address is nothing more than a simple integer, the creators of the language were afraid we might confuse variables in our programs. We might confuse integers we intend to use for program values (e.g. variables storing ages, measurements, counters, etc.) with integers we intend to use for holding the addresses of our variables. <P> The language creators decided the best way to <A HREF="http://carbon.cudenver.edu/~tgibson/tutorial/ptrWhy.html">eliminate confusion</A> was to create a different <i>type</i> of variable for holding addresses. A first attempt at this might have looked something like this: </P> <PRE> 1:... 2: float fl=3.14; 3: float PTR addr = &fl; 4:... </PRE> On line (3), here is how to describe the addr variable:<br> <IMG width="559" height="125" SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ptrDesc1.jpg" ALT="addr is a pointer to a float"> <br> <B>(A)</B> <CODE>addr</CODE> is an integer. <B>(B)</B> However, it is a special integer designed to hold the address of a <B>(C)</B> <CODE>float</CODE> <P> In the code above, line (3) Is close to what the creators of the language wanted except for one thing: using <CODE>PTR</CODE> would require introducing another keyword into the language. If there is one thing that all C instructors like to brag about, it is how there are only a very small number of keywords in the language. Well, using line (3) as shown above would mean adding <CODE>PTR</CODE> as another keyword to the language. </P> <P> To avoid this threat to the very fabric of the universe, the creators cast about for something already being used in the language that could do double duty as <CODE>PTR</CODE> shown above. What they came up with was the following: </P> <PRE> 1:... 2: float fl=3.14; 3: float * addr = &fl; 4:... </PRE> Even with the <CODE>*</CODE> instead of <CODE>PTR</CODE>, <CODE>addr</CODE> is described the same way:<br> <IMG width="559" height="125" SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ptrDesc2.jpg" ALT="addr is a pointer to a float"> <br> <B>(A)</B> <CODE>addr</CODE> is an integer. <B>(B)</B> However, it is a special integer designed to hold the address of a <B>(C)</B> <CODE>float</CODE> <P> These variables are described this way, regardless of the type: </P> <IMG width="560" height="125" SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ptrDesc3.jpg" ALT="addr is a pointer to a char"> <br> <B>(A)</B> <CODE>addr</CODE> is an integer. <B>(B)</B> However, it is a special integer designed to hold the address of a <B>(C)</B> <CODE>char</CODE> <br> <IMG width="560" height="126" SRC="http://carbon.cudenver.edu/~tgibson/tutorial/ptrDesc4.jpg" ALT="addr is a pointer to an int"> <br> <B>(A)</B> <CODE>addr</CODE> is an integer. <B>(B)</B> However, it is a special integer designed to hold the address of an <B>(C)</B> <CODE>int</CODE> <P> This "...special integer..." way of describing these variables is a mouthful, so we shorten it and just say "addr is a float pointer" or "addr is a pointer to a float" (or char, or int, etc.). </P> <P> Unfortunately, the language creators chose the <CODE>*</CODE> character to replace PTR. The <CODE>*</CODE> character is confusing because the <CODE>*</CODE> character is also used to get the contents at an address ("do step 2 on a number"). <B>These two uses of the <CODE>*</CODE> character have <A HREF="http://carbon.cudenver.edu/~tgibson/tutorial/similarity.html">nothing</A> to do with each other.</B> </P> <H3>What is all that "syntax sugar" anyway? (Casting)</H3> Let's take one last look at our original code that illustrates the utility of separating out steps 1 & 2. <TABLE WIDTH="1"> <TR ALIGN="CENTER"> <TD> <B>C++</B> </TD> <TD> <B>C</B> </TD> </TR> <TR> <TD> <PRE> 1:#include <iostream.h> 2:void somefunc(unsigned int fptr) 3:{ 4: *(float*)fptr=99.9; 5:} 6: 7:main() 8:{ 9: float fl=3.14; 10: unsigned int addr=(unsigned int) &fl; 11: somefunc(addr); 12: cout << fl << endl; 13:} </PRE> </TD><TD> <PRE> 1:#include <stdio.h> 2:void somefunc(unsigned int fptr) 3:{ 4: *(float*)fptr=99.9; 5:} 6: 7:main() 8:{ 9: float fl=3.14; 10: unsigned int addr=(unsigned int) &fl; 11: somefunc(addr); 12: printf("%.2f\n", fl); 13:} </PRE> </TD></TR></TABLE> In nearly all of the code samples, you have been asked to ignore certain bits of the code. These bits of code have always appeared around those areas where we are either taking the address of a variable or getting the contents at an address (doing step 1 or step 2 on a variable) <P> Those bits of "syntax sugar" are there to keep the compiler from complaining. The first example of this in the above program is on line (10). </P> <P> On line (10) we are taking the address of the floating point number <CODE>fl</CODE> ("do only step 1 on a number"). After we get that address, we store it in <CODE>addr</CODE>. </P> <P> Why would the compiler complain? Because when we get assign the address of <CODE>fl</CODE> to <CODE>addr</CODE>, the compiler does not expect <CODE>addr</CODE> to be an <CODE>unsigned int</CODE>. The compiler expects <CODE>addr</CODE> to be a <CODE>float *</CODE>. That is, <i>a special integer designed to hold the address of a float</i>. To keep the compiler from complaining, we tell the compiler to treat <CODE>&fl</CODE> as an <CODE>unsigned int</CODE> rather than a <CODE>float *</CODE>. </P> <P> This "syntax sugar" that causes the compiler to treat variables and expressions differently is called <I>casting</I>. The way a programmer describes line (10) is: "The address of <CODE>fl</CODE> is being <A HREF="http://carbon.cudenver.edu/~tgibson/tutorial/casting.html">cast</A> into an <CODE>unsigned int</CODE> and assigned to <CODE>addr</CODE>" </P> <P> The other place casting occurs is on line (4). On line (4), we are getting the contents at an address ("do step 2 on a number/address"). Why would the compiler complain? Because the compiler should get the contents of the address of a float. The address of our float is in stored in <CODE>fptr</CODE>, which is an <CODE>unsigned int</CODE>, not a <CODE>float *</CODE>. We tell the compiler to treat <CODE>fptr</CODE> as the address of a floating point number by casting it into a <CODE>float *</CODE>. Once we tell the compiler this, we can get the contents at the address without complaint. </P> <h3>Putting it all together</h3> <P> From the previous section, you might be left with the impression that whenever you deal with addresses and pointers, there is a lot of casting. Not so. The only reason our examples up till now have required casting is because we were storing our addresses in <CODE>unsigned int</CODE> variables. The language designers want us to store addresses in the "special integer" variables, that is, the pointer variables they designed for just such a purpose. </P> <P> Once we replace our <CODE>unsigned int</CODE> variables with these pointer variables, none of the casting is required: </P> <TABLE WIDTH="1"> <TR ALIGN="CENTER"> <TD> <B>C++</B> </TD> <TD> <B>C</B> </TD> </TR> <TR> <TD> <PRE> 1:#include <iostream.h> 2:void somefunc(float* fptr) 3:{ 4: *fptr=99.9; 5:} 6: 7:main() 8:{ 9: float fl=3.14; 10: float* addr = &fl; 11: somefunc(addr); 12: cout << fl << endl; 13:} </PRE> </TD><TD> <PRE> 1:#include <stdio.h> 2:void somefunc(float* fptr) 3:{ 4: *fptr=99.9; 5:} 6: 7:main() 8:{ 9: float fl=3.14; 10: float* addr = &fl; 11: somefunc(addr); 12: printf("%.2f\n", fl); 13:} </PRE> </TD></TR></TABLE> <UL> <LI>On line (10), when we take the address of <CODE>fl</CODE> the address is assigned to a variable designed to hold it. No casting is required.</LI> <LI>When <CODE>addr</CODE> is passed to the function in line (11), <CODE>addr</CODE> is copied to <CODE>fptr</CODE> on line (2).</LI> <LI>Line (2) shows that <CODE>fptr</CODE> is created as a float pointer, that is a variable designed to hold the address of a floating point number. As a result, no casting is needed on line (4) where the contents at the address are retrieved.</li> </UL> <HR> <H3>Revision History</H3> <TABLE> <TR> <TD> 1999 March 19 </TD> <TD> Added C version of code. Minor corrections to text. </TD> </TR> <TR> <TD> 2001 April 30 </TD> <TD> Some minor corrections. </TD> </TR> </TABLE> <HR> <H3>Miscellaneous</H3> <P> The graphics in this tutorial were created using the freely distributed image manipulation program The GIMP. Information on The GIMP can be found at <a href="http://www.gimp.org">http://www.gimp.org/</a> </P> <P> Please contact me with any errata, comments, suggested changes, or improvements: <a href="mailto:[email protected]"><CODE>[email protected]</CODE></a> </P> <P> The code in this tutorial that stores addresses in <CODE>unsigned int</CODE>'s may fail on a very few compilers, particulary older compilers. If this is the case with your compiler, try using <CODE>unsigned long</CODE> instead of <CODE>unsigned int</CODE>. </P> <P><CODE>Copyright 2001 Todd A. Gibson. All Rights Reserved.</CODE></P> While this document is copyright by me with all rights reserved, permission is granted to freely distribute verbatim copies of this document provided that no modifications outside of formatting be made, and that this notice remain intact.
Comentários originais (3)
Recuperado do Wayback Machine