Vector size versus vector capacity

We've already introduced the concept of the size of a vector. This is the number of pieces of data that are currently stored in your vector. There is a related concept called vector capacity. Here's how it works:

Inside your computer memory, there is some space reserved to store the elements of your vector. The amount of reserved space is called the capacity of the vector. It is always at least equal to the size of the vector (there is enough room to store each vector element), but sometimes the capacity is larger (there is enough room for a few more elements).

Example: Size vs. Capacity

#include<iostream>
#include<vector>
using namespace std;

int main() {

    vector<int> vec(5);
    for(int i=0; i<5; i++)
        vec.at(i) = i;

    // Here we expect to see size and capacity are both 5
    cout << "My vector has size " << vec.size() << " and capacity "
         << vec.capacity() << "\n";

    vec.pop_back();
    vec.pop_back();

    // Here we removed 2 elements from our vector, so the size is 3.
    // However, your vector still has reserved 5 spaces in the computer memory
    //   so we expect the capacity to be 5.
    cout << "After some edits, my vector has size " << vec.size()
         << " and capacity " << vec.capacity() << "\n";
    
    return 0;
}
Vector Size Vector Capacity
• size = amount of data my vector is currently storing. • capacity = amount of space reserved in memory (may be larger than vector size)
• We have full control over vector size. Can use push_back(), pop_back(), resize(), etc. to change the size at runtime. • We don't control vector capacity. We never see the details of how it manages memory

Example#2: It's hard to predict capacity

In this example, we create an empty vector, then use push_back to fill it with elements. At the end we'll see that the vector size is exactly what we expected, but the capacity won't be easily predictable.

#include<iostream>
#include<vector>
using namespace std;

int main() {

    // Try this code with different values of VEC_SIZE
    const int FINAL_VEC_SIZE = 30;
    vector<int> vec;
        
    for(int i=0; i < FINAL_VEC_SIZE; i++)
        vec.push_back(i);

    cout << "My vector has size " << vec.size() << " and capacity "
         << vec.capacity() << "\n";
    
    return 0;
}

A very short explanation of why it works this way: reserving memory can be an expensive operation, so vectors often reserve memory in bigger chunks, rather than one slot at a time. This means that there will often be more memory reserved than is actually being used.

Arrays

cplusplus.com has a really good tutorial on arrays here. Everything before the "Multidimensional Arrays" section is useful material for us (the rest is interesting, but I think we don't use it in PIC10A).

An array is similar to a vector (we'll give much more detail about similarities and differences later). It is an object whose purpose is to store multiple objects (of the same type).

Basic Array Syntax

To declare an array

The syntax for declaring an array is

type name_of_array[MAX_ENTRIES];

Here type can be any type (int, double, string, even a class that you wrote yourself). name_of_array is just the name of the variable. max_entries is the maximum amount of data points that the array can store.

Similar to vectors, arrays have both a size and a capacity. Here the capacity is equal to MAX_ENTRIES, and the size may change depending on how many slots of memory you are currently using.

Accessing elements

We access elements of an array via the [] operator. See the examples below.

Examples:

// Declaring arrays

int a[5];  // This array stores up to five integers. They are called a[0], a[1], a[2], a[3], a[4].
           // This array has capacity 5. Right now the size is 0, because the entries are uninitialized.

string name[4];   // This array stores up to four strings. They are called name[0], name[1], name[2], name[3];
name[0] = "Jonathan";   // This initializes name[0]. Now the size of the array is 1


// You can also declare and initialize arrays at the same time
int b[3] = {2,3,5};         // Sets b[0]=2, b[1]=3, b[2] = 5.
                            // The capacity and size are both 3 in this example.
 
string name[3] = {"Geoffrey", "Sankar", "Iyer"};   // Sets name[0]="Geoffrey", name[1]="Sankar", name[2]="Iyer".
        // Similar to above, capacity and size are both 3

double x[20] = {3.14, 2.71}   // Sets x[0] = 3.14, x[1] = 2.71. x[2] through x[19] are left uninitialized.
                              // Here capacity is 20, size is 2

Arrays vs. Vectors

All of this will be explained in more detail below. You can treat this table as a summary of what we'll learn today.

Arrays Vectors
• Access elements using [] • Prefer to access elements using .at(), but [] works
• You decide the array capacity yourself. • Vector capacity is handled by the vector class.
• Capacity must be constant. We need to know the capacity at compile-time, and it can't be changed. • Capacity can vary. Fuctions like resize() push_back() change vector capacity if necessary.
• Will not check boundaries when accessing elements. You can access memory outside array capacity if you're not careful • Checks vector size each time you access an element. Will give an error if you try to access memory out of bounds.
• If I want to track the array data, size, and capacity, then I'll need to keep track of three different variables. I.e. if my array is called arr, then I would also need variables called arr_size and arr_capacity. • All these values are stored inside your vector object. If your vector is called vec, then vec.at( index ) accesses the data, vec.size() gives you the size, and vec.capacity() gives you the capacity.
• Arrays are generally more efficient to work with. Memory management is done at compile-time. • Vectors are less efficient, but easier to deal with. Memory management is done at run-time.
• Passing an array into a function is weird. We won't talk about this today (or maybe at all? It's related to pointers, but we don't cover pointers). • Passing a vector into a function works normally.

Array capacity is constant:

The maximum number of elements in an array must be a constant. In other words, your program must know at compile-time what this value is.

// Incorrect code #1:
int x;
cin >> x;
double array[x];
// Will say something like "Expression must have constant value".

// Incorrect code #2:
int x = 4;
double a[x] = {1.1, 2.2, 3.3, 4.4}; // Similar error

// Will compile and run, but uses magic numbers
double a[4] = {1.1, 2.2, 3.3, 4.4};

// A better way to do it:
const int MAX_ENTRIES = 4;
double a[MAX_ENTRIES] = {1.1, 2.2, 3.3, 4.4};

Moral: If you know at compile-time how much data you want to store, than using arrays is a reasonable choice. Otherwise you should use a vector.

You don't need to know the exact amount of data to use an array. You just need to make the capacity large enough to hold all your data.

Accessing memory out of array/vector bounds

When using vectors, c++ will do a good job of tracking the size of the vector, and it will probably stop you from doing stupid things. For example:

vector<int> a(2);
cout << a.at(-2) << endl;

Will probably bring up some kind of error (it depends on your programming environment. In the Visual Studio we use this will say "vector subscript out of range").

Arrays don't have this feature. Unless you keep track of it yourself, c++ will have no idea what the max size of your array is. Here's some example code:

#include <iostream>
using namespace std;

int main()
{
    const int MAX_ENTRIES = 3;
    int a[MAX_ENTRIES] = { 1, 2, 3 }; // a[0] = 1, a[1] = 2, a[2] = 3;
    
    cout << a[3] << endl; // a[3] is not defined, but this code will still compile and run.
    cout << a[-1] << endl;  // same thing. Even though a[-1] is garbage, the code will compile and run.
    return 0;
}

What's actually happening here: a[3] accesses the spot in memory that comes immediately after a[2]. There's a pretty good chance that this hasn't been initialized and is just some garbage value. It could be pretty much anything. We really don't know. Similarly, a[-1] accessses the spot in memory that comes immediately before a[0].

Just for fun: Screwing around with memory management

Note: The below stuff depends a lot on what compiler you're using. So if it doesn't work on your computer it's not your fault. (I set this up using visual studio 2013 I think...)

Try out this code:

#include <iostream>
using namespace std;

int main() {
    int a[4] = {0,1,2,3};
    int b[4] = {4,5,6,7};

    for(int i=0; i<12; i++) {   // Note: We are going outside of array capacity
        cout << b[i] << endl;
    }

    return 0;
}

Next try out this code:

#include <iostream>
using namespace std;

int main() {
    int a[6] = {0,1,2,3,4,5};
    int b[6] = {6,7,8,9,10,11};

    for(int i=0; i<16; i++) {   // Note: We are going outside of array capacity
        cout << b[i] << endl;
    }

    return 0;
}

Crazy, huh? I don't really know the details about how the compiler manages memory. All I can really say is that it tries to keep data close together.

Practice problems if we have time

Let's practice with some standard vector operations. I want to write two functions, remove and insert. These functions come standard in the <vector> library, but you need to know about iterators to use them. We're not going to cover iterators, so we'll write our own versions.

insert should accept a vector of integers, an element to be inserted, and a position. It should edit the vector to contain the new element in the specified position (increasing the size of the vector).

Example Input/Output
If I pass in the vector
    0, 1, 2, 3, 4, 5
as well as the element 19, and the position 3, then the edited vector should be
    0, 1, 2, 19, 3, 4, 5

remove should accept a vector of integers, and a position. It should delete the element at the specified position (decreasing the size of the vector).

Example Input/Output
If I pass in the vector
    0, 1, 2, 3, 4, 5
and the position 3, then the edited vector should be
    0, 1, 2, 4, 5