So, the easiest way to start thinking about this is by looking at straight lines. The gradient of a straight line is just its slope, which is the difference in the y values between any two points, divided by the difference in x values between the same two points. We call this slope Δy/Δx, using the Greek letter delta (Δ) to mean "change in". Notice a similarity to the derivative notation dy/dx? Now, looking at functions that aren't straight lines, it's quickly obvious that this doesn't immediately work, since the slope changes as the curve wiggles about. However, we can still look at Δy/Δx between two points; there's nothing stopping us. This slope is now not the gradient of the curve at any point, but it is the slope of a chord or straight line between the two points on the curve.
Let's take some point (x,f(x)) on the curve, this can be any point, and add a small amount to x. Since it's a small change in x, we'll call this amount 𝛿x (some people use h, but it's all the same in the end). The point on the curve at this value is then (x+𝛿x,f(x+𝛿x)). If y = f(x), then now we can look at our quantity Δy/Δx between these two points, except, since our change is just a small one now, let's use lower case deltas: 𝛿y/𝛿x. We see that, just by calculating the change in y over the change in x, we have that 𝛿y/𝛿x = [f(x+𝛿x) - f(x)]/[(x+𝛿x) - x] = [f(x+𝛿x) - f(x)]/𝛿x and this is the gradient of a chord between two points close together on the curve. Since the points are close together, this chord has a gradient quite close to that of the curve itself. So as 𝛿x gets smaller and smaller the points get closer together, and 𝛿y/𝛿x gets closer and closer to the gradient of the curve. As it turns out, we actually define the derivative as dy/dx = lim(𝛿x-->0){[f(x+𝛿x) - f(x)]/𝛿x}, and so we see that the derivative is just the end point of considering the gradient of a short straight line on the curve; i.e., the derivative is the gradient.