Introduction - If you have any usage issues, please Google them yourself
Align dest to nearest 8-byte boundary. We know we have at least 7 bytes to copy, enough to crawl to 8-byte boundary. Actual number of byte to crawl depend on the dest alignment. 7 byte or less is taken care at .memcpy_short.