The L2 really belongs to the core, a comparison without it does not make much sense.
The GPU cores (in the classic sense, i.e. not what NVIDIA names as "cores") also include cache memories and also local memories that are directly addressable.
The only confusion is caused by the fact that first NVIDIA, and then ATI/AMD too, have started to use an obfuscated terminology where they have replaced a large number of terms that had been used for decades in the computing literature with other terms.
For maximum confusion, many terms that previously had clear meanings, like "thread" or "core", have been reused with new meanings and ATI/AMD has invented a set of terms corresponding to those used by NVIDIA but with completely different word choices.
I hate the employees of NVIDIA and ATI/AMD who thought that it is a good idea to replace all the traditional terms without having any reason for this.
The traditional meaning of a thread is that for each thread there exists a distinct program counter a.k.a. instruction pointer, which is used to fetch and execute instructions from a program stored in the memory.
The traditional meaning of a core is that it is a block that is equivalent with a traditional independent processor, i.e. equivalent with a complete computer minus the main memory and the peripherals.
A core may have only one program counter, when it can execute a single thread at a time, or it may have multiple program counters (with associated register sets) when it can execute multiple threads, using either FGMT (fine-grained multithreading) or SMT (simultaneous multithreading).
The traditional terms were very clear and they have direct correspondents in GPUs, but NVIDIA and AMD use other words for those instead of "thread" and "core" and they reuse the words "thread" and "core" for very different things, for maximum obfuscation. For instance, NVIDIA uses "warp" instead of "thread", while AMD uses "wavefront" instead of "thread". NVIDIA uses "thread" to designate what was traditionally named the body of a "parallel for" a.k.a. "parallel do" program structure (which when executed on a GPU or multi-core CPU is unrolled and distributed over cores, threads and SIMD lanes).
Perhaps, it could be more interesting to compare without L2 cache.