It’s an example of technology getting ahead of the vocabulary to clearly describe it.
HotButton did a great job explaining it, but here’s another attempt from a different angle.
An RGB image, like your camera shoots, is composed of three separate monochromatic channels: red, green and blue. Each of of these channels is composed of pixels — each of which contains 8 bits of information. This is why, it’s referred to as an 8-bit image. Overlap these channels on a monitor, and the channels merge to form a full-color image.
A grayscale image (what some people call black and white) is also an 8-bit image, but a grayscale image only has one channel.
A CMYK image is also an 8-bit image, but it’s composed of 4 separate channels; cyan, magenta, yellow and blue. When those channels are printed out on top of each other on a printed page, it creates a full-color image.
So none of that should be especially confusing, but here’s where the confusion comes into play. Even though an RGB image is composed of three separate 8-bit channels, the RGB file itself can be referred to as 24-bit since, well, add up the 8-bit channels and it totals up to 24 bits.
This whole terminology problem could be fixed if people would just refer to these images as, maybe, 8x3-bit or 8x1-bit or whatever, but that’s not the convention.
It gets more complicated, though. There really are images composed of pixels with more than 8 bits. These images, like those with 16-bits per pixel, contain much more color information. If your camera shoots raw files, it’s shooting them at a higher bit depth to capture more information, which gives you the ability to pull more tonal values out of the image before saving or exporting it to a regular 8-bit-per-channel, 24-bit RGB file.