Tuesday, July 27, 2010

Adding Numbers Example

This Tutorial is intended to respond a question about how to get what a function returns, and where to stored the return value of the function.

EAX The "Official" place for the return value of a function


As you may read in http://www.freebsd.org/doc/en/books/developers-handbook/x86-return-values.html

This function should return the number 5

int getNumber5(void){
__asm
{
mov eax,5
}
}

This is simple. We only copy 5 to eax register. And because eax is the "official" return register its the same as:



int getNumber5(void){
return (5);
}

Remember that in C/C++, and every language that I know, you may only return one value, or an address pointer. Usually 32 bits(4 bytes) in a 32 bit architecture, this is because of the size of EAX.


EAX is Not a Stack!


Please remember that EAX, is part of the general purpose registers.
Refer to http://rodrigosavage.blogspot.com/2010/07/hello-world-with-inline-asm.html#purposeReg


or to Intel documentation 253665.pdf, where they explain each register an how should you use it.

What intel says about each register:

  • EAX — Accumulator for operands and results data
  • EBX — Pointer to data in the DS segment
  • ECX — Counter for string and loop operations
  • EDX — I/O pointer
  • ESI — Pointer to data in the segment pointed to by the DS register; source pointer for string operations
  • EDI — Pointer to data (or destination) in the segment pointed to by the ES register; destination pointer for string operations
  • ESP — Stack pointer (in the SS segment)
  • EBP — Pointer to data on the stack (in the SS segment)
This means that EAX, is an Accumulator(http://en.wikipedia.org/wiki/Accumulator_(computing)).
And ESP, will point to the top of the stack that is manage by Intel push and pops instructions.

The next image was taken from Intel documantation.
It shows how are the registers and there names.














ADD — Add Instruction


Intel 253666.pdf documentation states that:
Adds the destination operand (first operand) and the source operand (second
operand) and then stores the result in the destination operand. The destination
operand can be a register or a memory location; the source operand can be an imme-
diate, a register, or a memory location. (However, two memory operands cannot be
used in one instruction.) When an immediate value is used as an operand, it is sign-
extended to the length of the destination operand format.


This means that the code
add EAX, number

will be the same as C/C++ like
EAX = EAX + number;

Just keep in mind that this is an integer addition.

Intel inline assembly function for adding two numbers


int addNumbers(int n1,int n2){
__asm
{
mov eax,n1
add eax,n2
}
}

This function only receives two parameters, that are identified by n1 and n2. We first copy n1 to EAX register
and later we add eax + n2, and save the result to eax(the return accumulator).

this would be the same as C/C++ like:
int addNumbers(int n1,int n2){
EAX=n1;
EAX = EAX+n2;
}
(keep in mind that you cannot not access eax directly or any other CPU register like this, inline assembly is necessary to do so).
Or at higher level:
int addNumbers(int n1,int n2){
return(n1+n2);
}


Final Code


#include <stdio.h>
#include <iostream>
using namespace std;

int addNumbers(int n1,int n2){
__asm
{
mov eax,n1
add eax,n2
}// In C/C++, what the function returns is usally stored in eax register

//return n1+n2;
}

void execQuestion2()
{

char format[] = "%u \n";
int n1 = 10;
int n2 = 11;
int result = addNumbers(n1, n2);

cout << n1 << " + " << n2 << " = ";

__asm {
// push n1 and n2 onto the stack
mov eax, n1
push eax
mov eax, n2
push eax
// call "addNumbers" method
call addNumbers
push eax // we put the result of addNumbers into the stack
// Question 2.4 - put formatting onto the stack
lea eax, format
push eax
// Question 2.5 - call "printf" method
call DWORD ptr printf
// Question 2.6 - empty the stack
pop ebx
pop ebx
pop ebx
pop ebx

}
}

int main()
{
execQuestion2() ;
getchar();
}


If you have any problem or question
pleas don't hesitate in commenting :)

I will gladly respond. No matter the question ^^

Tuesday, July 20, 2010

CVector Class with Inline ASM and SSC

I am going to up load this now, for your impatiens that wish to know how to do the vector class, an later(in a week) I will explain each and every line of assembly and the code.


Notes: movaps, moves memmory registers of 128 bits(16 bytes), 4 float or 4 ints


CVector CLass definition using inline ASM


#include"CVector.h"
#include <math.h>
#include <stdio.h>

// Constructor
CVector::CVector(CVector& v)
{
__asm
{
mov EDI,v // We point to v with EDI
movaps xmm0,[EDI] // The value pointed is copied to xmm0
mov ESI,this // ESI points to the class base pointer
movaps [ESI],xmm0 // the value of v(xmm0) is copied to this([esi])
}
}

Download Visual C++ 2008 Project

Friday, July 16, 2010

Simple Vector Algebra and C++ Vector Class
        The Elegant(inefficient) Solution

This tutorial is intended to explain simple Vector Algebra.
Then it will show how to create a simple Vector Library overloading the operators to perform dot product, scalar product, cross product, addition, subtraction, magnitude, scalar addition, and scalar subtraction.

This tutorial will be introductory, so that in later tutorials you will modify the class making it more efficient using inline assembly and Streaming SIMD Extensions (SSE) of Intel.


Simple Vector Algebra


This tutorial will make the assumption that you already have taken a course of vector algebra and you only need to brush up on that(in my Mexico they teach you vector algebra like in kindergarten). So I am going strait forward with the definitions (if you Wish a Tutorial of Linear Algebra, just mail me, or leave a comment).

Quaternions


What are quaternions? and why use them?

A quaternion is, a ordered list of elements(Tuple) of 4 elements.
we will use them because of the advantages they offer compered to 3d dimensional vectors. Like no lost of a degree of freedom(http://en.wikipedia.org/wiki/Gimbal_lock). And its more efficient when operating with 4x4 Matrix, because one matrix multiplication may scale,rotate and translate the vector(if working with 3d vectors, 3 matrix multiplications would be necessary one for each transformation).


Header File of the CVector Class


#ifndef _CVECTOR_ // Check if C Vector is defined
#define _CVECTOR_ // the next time CVector will be define
#define ZEROVECTOR CVector()
class CVector
{ // Private
float x,y,z,w; // Order list of 4 elements |x|y|z|w|
public:
static char sBuffer[38]; // holds the string of given vector
char* toString(); // Return sBuffer with (x,y,z,w) values
CVector (void); // zero Vector Constructor
CVector (float,float,float); // Constructor
CVector (CVector&); // Copy Vector Constructor
~CVector(); // Destructor
CVector operator+(CVector&); // Addition
CVector operator-(CVector&); // Subtraction
float dotProduct(CVector&);
void operator=(CVector&); // Copy Vector
int operator==(CVector&); // Comparison Vector
CVector operator*(CVector&); // Cross Product
CVector operator*(float); // Scalar Multiplication
float length(); // Magnitude
void normalize();
};
#endif // _CVECTOR_

If you understand this completely jump to theCVector Memory Organization


Preprocessor Directive


C++ Tutorial says:
Preprocessor directives are lines included in the code of our programs that are not program statements but directives for the preprocessor. These lines are always preceded by a hash sign (#). The preprocessor is executed before the actual compilation of code begins, therefore the preprocessor digests all these directives before any code is generated by the statements.

These preprocessor directives extend only across a single line of code. As soon as a newline character is found, the preprocessor directive is considered to end. No semicolon (;) is expected at the end of a preprocessor directive. The only way a preprocessor directive can extend through more than one line is by preceding the newline character at the end of the line by a backslash (\).


Conditional inclusions #ifndef


C++ Tutorial says:
These directives allow to include or discard part of the code of a program if a certain condition is met
#ifndef something
// some mystifying code...
// that will be include if something is Not defined

#endif

Preprocessor Macro #define


#define identifier replacement

C++ Tutorial says:
When the preprocessor encounters this directive, it replaces any occurrence of identifier in the rest of the code by replacement. This replacement can be an expression, a statement, a block or simply anything. The preprocessor does not understand C++, it simply replaces any occurrence of identifier by replacement.

Error C2011: 'CVector' :'class' type redefinition

This is a common error among programmers. This error is because the class could of got defined twice when using it in multiple files.

The solution to this problem is to always check if the class has already been defined, if is not define(#ifndef). Define (#define) the hole Class(#endif)

This is the reason for this code:
#ifndef _CVECTOR_ // Check if C Vector is defined
#define _CVECTOR_ // the next time CVector will be define
// All the header declaration!
// And Class here

#endif // _CVECTOR_

Organization of the program in memory


I find this interesting article that explain how is the memory organized from the University of Hawaii

When a program is loaded into memory, it is organized into three areas of memory, called segments: the text segment, stack segment, and heap segment. The text segment (sometimes also called the code segment) is where the compiled code of the program itself resides. This is the machine language representation of the program steps to be carried out, including all functions making up the program, both user defined and system.

The remaining two areas of system memory is where storage may be allocated by the compiler for data storage. The stack is where memory is allocated for automatic variables within functions. A stack is a Last In First Out (LIFO) storage device where new storage is allocated and deallocated at only one ``end'', called the Top of the stack.

The heap segment provides more stable storage of data for a program; memory allocated in the heap remains in existence for the duration of a program. Therefore, global variables (storage class external), and static variables are allocated on the heap. The memory allocated in the heap area, if initialized to zero at program start, remains zero until the program makes use of it. Thus, the heap area need not contain garbage.

It's important to notice that the functions are defined in the code segment and not in the stack(the diagram of Hawaii University might be a little bit confusing at first).


Memory organization of a C++ class

All variables(attributes) of a class are stored in continues memory and ordered by how you declared them(this is the object essence). The static variables are stored else where(in the heap)

The Function(methods) definitions of the class are in the code segment

The location of the object essence may vary depending of where you declared the object(if inside a function in the stack; Outside or static definition will place it in the heap)


Memory Oganization of CVector Class


Remember that in a 32 bit architecture(x86) the size of a float is 32 bits(4 bytes[8 bit],4*8=32)

This means that the total size of our objetc in memory is 128 bits(4 byte[8 bits] floats * 4 vectors).


sizeof(float); // should return 4
sizeof(CVector); // should return 4*4=16

All this information will be important when designing our vector class functions(methods) in inline ASM. Because we will use the xmm register(128 bit size registers to hold 4 floats).

class CVector
{ // Private
float x,y,z,w; // Order list of 4 elements |x|y|z|w|
public:
static char sBuffer[38]; // holds the string of given vector
...

Remember that static variables are in the heap elsewhere. sBuffer and toString method are only implemented for debugging reasons and will not be in the final code.


CVector Constructors


CVector::CVector (float xi,float yi,float zi){ // Constructor
x=xi;
y=yi;
z=zi;
w=1.0f; // normalize the vector
}

this first constructor initialize the inner variables(attributes) of the object. It receives 3 floats(4 bytes[32 bits] in a 32 bit architecture), total 3*4bytes= 12 bytes(96 bits). And homogenize the vector(w=1.0f).
CVector::CVector (CVector& a){ // Copy Vector Constructor
x=a.x;
y=a.y;
z=a.z;
w=a.w;
}
As you may see, here we receive CVector by reference(the & denotes that). This means that we are only passing the reference of the object and not the object it self(only 4 byte pointer in a 32 bit architecture insted of 128 bits(object size)). its recommended that you pass always by reference(pointer, address).

CVector Operator definition


As you may recall from your kindergarten courses, the sum of two vector defined in 3 dimensional space is the sum of each component individually (eg.(1, 2, 3) + (−2, 0, 4) = (1 − 2, 2 + 0, 3 + 4) = (−1, 2, 7)).

CVector CVector::operator+(CVector& a){ // Addition
return CVector(x +a.x,y +a.y,z +a.z);
}
We are passing by reference CVector& a, and we use the constructor we defined earlier to build a new temporary object that will be copied to eax(return accumulator). this is inefficient because we fill up the stack with 3 floats when we call CVector constructor, pop the 3 floats create a temporary vector and return it.
In the next tutorial we will solve this problem, although it will lose the elegance.


Copy function


Copy function


void CVector::operator=(CVector& a){ // Copy Vector
x=a.x;
y=a.y;
z=a.z;
w=a.w;
}

as you can see here we also are passing by refence CVector&a, but here we are not returning the new copy of CVector because we want to change the inner values of the object(its attributes).
in C/C++ like:
v=v+w;

Our program would fist call the function of addition and will return a new CVector who's address will be stored in eax accumulator register, and later on would call the copy function and copy each component to the desired object. This is very inefficient as we will see later all this can be done with out creating a temporary register, and filling up the stack.

v+=w;
This statement would do the same but with out calling the copy function and with out creating a new CVector.
CVector CVector::operator-(CVector& a){ // Subtraction
return CVector(x -a.x,y -a.y,z -a.z);
}

The subtraction is define as the addition(we only substract)


float CVector::dotProduct(CVector& a){
return x*a.x+y*a.y+z*a.z;
}

The Dot product of two vectors a = [a1, a2, ... , an] and b = [b1, b2, ... , bn] is defined as:



CVector Class definition C++ File


This Code is deprecated.. Soon I will post the new one


#include "CVector.h"
#include <math.h>
#include <stdio.h>

CVector::CVector (float xi,float yi,float zi){ // Constructor
x=xi;
y=yi;
z=zi;
w=1.0f; // normalize the vector
}
CVector::CVector (CVector& a){ // Copy Vector Constructor
x=a.x;
y=a.y;
z=a.z;
w=a.w;
}
CVector::~CVector(){} // Destructor
CVector CVector::operator+(CVector& a){ // Addition
return CVector(x +a.x,y +a.y,z +a.z);
}
CVector CVector::operator-(CVector& a){ // Subtraction
return CVector(x -a.x,y -a.y,z -a.z);
}
float CVector::dotProduct(CVector& a){
return x*a.x+y*a.y+z*a.z;
}
char CVector::sBuffer[38];// definition outside class declaration
char* CVector::toString(){// 8 bytes per float*4+5+1=38 bytes per vector
sprintf_s(sBuffer,38,"(%f,%f,%f,%f)",x,y,z,w);
return sBuffer;
}
int CVector::operator==(CVector& a){ // Comparison Vector

return (x==a.x && y==a.y && z==a.z);
}

CVector::CVector (void){
x=0.0f;
y=0.0f;
z=0.0f;
w=1.0f;
}
void CVector::operator=(CVector& a){ // Copy Vector
x=a.x;
y=a.y;
z=a.z;
w=a.w;
}
CVector CVector::operator*(CVector& a){ // Cross Product
return CVector(y*a.z-z*a.y,z*a.x-x*a.z,x*a.y-a.x*y);
}
CVector CVector::operator*(float a){ // Scalar Multiplication
return CVector(x*a,y*a,z*a);
}
float CVector::length(){ // Magnitude
return sqrt(x*x+y*y+z*z);
}
void CVector::normalize()
{
float len = length();
x = x/len;
y = y/len;
z = z/len;
w=1.0f;
}

Main Test File


#include "CVector.h"
#include "CVector.h"
#include <stdio.h>
#include <string>

#define TEST_NORMALIZE_ERROR -0x01
#define TEST_CROSSPRODUCT_ERROR -0x02
#define TEST_SCALAR_ERROR -0x04
#define TEST_SUBTRACTION_ERROR -0x08
#define TEST_ADDITION_ERROR -0x10
#define TEST_COPY_ERROR -0x20

int main()
{
CVector i,j,k;
CVector a,b,c;
i = CVector(5.0f,0.0f,0.0f);
j = CVector(0.0f,6.0f,0.0f);
printf("CVector Example\n");
// Unit test.
i.normalize();
j.normalize();
k = i*j;
if(i.length() != 1.0f){
printf("Failed to Normileze i");
getchar();
return TEST_NORMALIZE_ERROR;
}
if(j.length() != 1.0f){
printf("Failed to Normileze j");
getchar();
return TEST_NORMALIZE_ERROR;
}
if(k.length() != 1.0f){
printf("Failed k Should be normalized");
getchar();
return TEST_CROSSPRODUCT_ERROR;
}
if(!(k == CVector(0.0f,0.0f,1.0f))){
printf("Failed Cross Product, k=i*j");
getchar();
return TEST_CROSSPRODUCT_ERROR;
}
if(!(((i*k)*5.0f)==j*(-5.0f))){
printf("Failed Scalar Multiplication or Cross Product");
getchar();
return TEST_CROSSPRODUCT_ERROR|TEST_SCALAR_ERROR;
}
if(!(c-c==ZEROVECTOR)){
printf("Failed to Substract");
getchar();
return TEST_SUBTRACTION_ERROR;
}
c = CVector(7.0f,2.0f,-4.0f);
b = c;
if(!(b==c)){
printf("Failed to Copy Vector");
getchar();
return TEST_COPY_ERROR;
}
c = CVector(7.0f,2.0f,-4.0f);
if(a=c+c,b=c*2.0f,!(a==b)){
//if(!(CVector(4.0f,2.0f,-4.0f)+CVector(4.0f,2.0f,-4.0f)==(CVector(4.0f,2.0f,-4.0f)*2.0f))){
printf("Failed to Addition or Scalar Multiplication");
getchar();
return TEST_ADDITION_ERROR|TEST_SCALAR_ERROR;
}
printf("All test Pass. \n");
getchar();
}


Download project File for Visual Studio 2008

Tuesday, July 13, 2010

Hello world in Inline ASM using Visual Studio C++

Sumary :


This post is base on a text i found on microsoft page ( http://msdn.microsoft.com/en-us/library/y8b57x4b(v=VS.71).aspx ), although the original code does not work.
The original Microsoft code:

// InlineAssembler_Calling_C_Functions_in_Inline_Assembly.cpp

#include <stdio.h>
// definition of constant string
char format[] = "%s %s\n";
char
hello[] = "Hello";
char
world[] = "earth";
int
main( void )
{

__asm
// start ASM code here
{
mov
eax, offset world // move the address of world to eax
push eax // Push the address of world to the stack
mov eax, offset hello // eax = &amp;(hello[0])
push eax // push hello to the stack
mov eax, offset format // eax = format
push eax // push format to the stack
call printf // here is the problem printf is not define
// at the time of compilation.
// It will be define in run time.
//clean up the stack so that main can exit cleanly
//use the unused register ebx to do the cleanup
pop ebx
pop
ebx
pop
ebx
}
// end of ASM code
}

As you may read, this does not work because it makes a call to a dynamic linked function that is not define until run time (printf). There are multiple solution to this problem, which I will state briefly.

I hope you are already familiar with c and c++, if not you can go to (http://www.cplusplus.com/doc/tutorial/) and learn all you need to know.

__asm keyword



__asm { /*asm code here*/ }



the __asm keyword is pre processor directive that indicates to visual c++ compiler that code inside the brackets {} is assembly code.

What Microsoft documentation says about the __asm keyword(http://msdn.microsoft.com/en-us/library/45yd4tzz(v=VS.80).aspx):

The __asm keyword invokes the inline assembler and can appear wherever a C or C++ statement is legal. It cannot appear by itself. It must be followed by an assembly instruction, a group of instructions enclosed in braces, or, at the very least, an empty pair of braces. The term "__asm block" here refers to any instruction or group of instructions, whether or not in braces.

This syntax is MASM (Microsoft Assembler):

instruction dest, src

And supports all the instruction set of intel.
You may download the instruction set from Intel official web site.

And the software development manual:
AX, is the 16 bit lower part of EAX.
AH, is the high 8 bit part of AX.
AL, is the low 8 bit of AX.

This registers may be use to store any thing!, pointers, data, arithmetic operation, etc.

if you wish to have more information about the General purpose registers please check the software development manual(253665) under 3.4.1 General-Purpose Registers (Pp. 107).






Instruction MOV—Move: [Top]


mov destination, source

As we can read in Intel documentation:
"Copies the second operand (source operand) to the first operand (destination
operand). The source operand can be an immediate value, general-purpose register,
segment register, or memory location; the destination register can be a general-
purpose register, segment register, or memory location. Both operands must be the
same size, which can be a byte, a word, a doubleword, or a quadword".

meaning in c syntax that:
destination = source;

The only restriction is that destination and source must be the same size.(ex: both must be 32 bits if the eax register is used).

Operator offset: [Top]


offset expression

As we can read on Microsoft documentation:
Returns the offset of expression.

... So this is not very helpful but, what they want to say is that it returns the address of given expression(ex: the pointer)

The equivalent in C, would be
&expression;

Instruction LEA—Load Effective Address:


lea destination, source

As we can read in Intel documentation:

"Computes the effective address of the second operand (the source operand) and
stores it in the first operand (destination operand). The source operand is a memory
address (offset part) specified with one of the processors addressing modes; the
destination operand is a general-purpose register. The address-size and operand-size
attributes affect the action performed by this instruction".

the C equivalent code would be:
dest = &src;

Where dest must be a general-purpose register(ex: EAX)
and src is a memory address(ex: a variable).

Instruction PUSH—Push Word, Doubleword or Quadword Onto the Stack [Top]


push source

As we can read in Intel documentation:

"Decrements the stack pointer and then stores the source operand on the top of the
stack. The address-size attribute of the stack segment determines the stack pointer
size (16, 32 or 64 bits). The operand-size attribute of the current code segment
determines the amount the stack pointer is decremented (2, 4 or 8 bytes).
In non-64-bit modes: if the address-size and operand-size attributes are 32, the
32-bit ESP register (stack pointer) is decremented by 4. If both attributes are 16, the
16-bit SP register (stack pointer) is decremented by 2".

As you may read, each time we push a 32 bit address into the stack the ESP(Stack Pointer) decrements by 4.

Instruction CALL—Call Procedure

call tagetOperand

Intel documentation stats that:
Saves procedure linking information on the stack and branches to the called proce-
dure specified using the target operand. The target operand specifies the address of
the first instruction in the called procedure. The operand can be an immediate value,
a general-purpose register, or a memory location.
This instruction can be used to execute four types of calls:
• Near Call — A call to a procedure in the current code segment (the segment
currently pointed to by the CS register), sometimes referred to as an intra-
segment call.
• Far Call —A call to a procedure located in a different segment than the current
code segment, sometimes referred to as an inter-segment call.
• Inter-privilege-level far call —A far call to a procedure in a segment at a
different privilege level than that of the currently executing program or
procedure.
• Task switch —A call to a procedure located in a different task.


This means that call will branch(jump) to the address given by targetOperand.

Understanding the Code:


With all this information its more easy for us to understand the next statment:
mov eax, offset world
what this does is copy the address(32 bit address) of world to eax(32 bit register).

At first this might be a little bit confusing because some one that is used to C/C++ pointer arithmetic, would know that the name of a char array has the address of where the array starts(ex: printf("%x",world); would output the Hex address of the beginning of the array).

but MASM is a little bit different. The expression: world, would make reference to the letter 'e', instead of the address like in C/C++.
Remember that:
char world[] = "earth";

so the code:
mov world,'M'

Will change the first letter of the array to M, resulting in Marth.
the code:
mov world+2,'U'

would change the third letter of the array to U, resulting in MaUth.

This might be a little bit confusing, I prefer not to used the MASM operators,
and instead do everything with Intel ASM instructions.

With this information we may change the code to make it more understandable.
The equivalent code of
mov eax, offset world

Would be:
lea eax,world

Memory State


The following diagrams are intended to explain how is the memory, SP, EAX, EBX. In each instruction of the inline asm example.

Although there are some considerations for simplicity, the SP, is always decremented by 1(should be decremented by 4). And the addresses should be 32 bit, but for space requirements the memory only shows the 8 less significant bits of the original address.

The Image below shows the hypothetical state of the registers and memory before the __asm statement.

As you may appreciate the stack is empty, pointing to the hex address 00437ff5.

And in memory we have the 3 char arrays that we created.
bellow each letter is the address(should be a 32 bit address).




Don't forget:
char format[] = "%s %s\n";
char hello[] = "Hello";
char world[] = "eaerth";

Our abstract stack representation would be



Empty Stack






After
mov eax, offset world
or equivalent
lea eax,world

the state would change to:

we only change EAX so it has the address of where world start.
The XXXXXX, represent any hex number.








After
push eax

First PushNow in the top of the stack we have the address of where eaerth starts.

The SP, is decremented.










After
mov eax, offset hello
or equivalent
lea eax,hello
push eax



We load EAX with the address and then push it on to the stack.
Decrementing the stack pointer by one(should be 4 if a 32 bit architecture is used).







After
mov eax, offset format
or equivalent
lea eax,format
push eax


We save the address of Format in EAX, and later on we push EAX to the stack.











Image below shows the relation between the stack and the memory location:












And Finally our abstract Stack :

where below each stack is the instruction executed that changes the stack.







Microsoft example problem

So what's the problem with:
call printf

it should branch to the address where printf is define.
But we must consider that not all library's are statically linked(the libs will be define inside the user exe, the location of the definition is well know in compilation) there also can be dynamically link(eg. the definition are else where in memory and the functions are share across the process that are running).
The next table illustrate the difference between static an dynamic linking.

Static Link Dynamic Link
More Memory Usage at Runtime, and bigger exe. Less, smaller exe
the funtions are defined in the same code segment. Function defined elsewhere
the funtions are multiple times defined, one for each process using the funtion(private usage) The functions are only define once, process share the definition
Functions are define in compilation(its well know where they are) Function are define at run time(the OS links the function definition)


With this in mind, printf is usualy a dynamic link library this is because many process make use of it. It would be stupid to define it each time when a new process wishes to use it(static link).

because printf definition is unknown on compilation, call printf will make a memory violation error trying to make a call to __imp__printf.

To make this work we must use __imp__printf, to return us the address where printf is defined.
For this we can use the PTR operator,

type PTR expression

Where type is the size of the data that we are pointing to.
size can be, word, dword, qword, etc
And expression is the address where we will get the data.

C/C++ equivalent:
(type)*expression;
where type is a cast(change the expression type ex. char, int, short)
The following code will make a successful call to printf.
call dword ptr printf


Another solution to this problem is to indirectly call printf, by copying __imp__printf address to a general purpose register. And then make the call to the register.

mov eax,printf
call eax


Calling Printf


Every function will use the stack to obtain it's parameters(eg.What the function recives).
Because we filled the stack with our data, the result of calling printf:



Printing a perfectly "hello earth" on the screen.





the only thing left to do is clean the stack.
pop eax
pop eax
pop eax





#include <stdio.h>

int
main()
{

char
hi[6] = "Hello";
char
earth[] = "World";
char
text[] = "%s %s";
__asm

{
// Remember Inst dest,src
lea eax,earth // eax = address of earth
push eax // put eax at the top of the stack
lea eax,hi // eax = address of hi
push
eax
lea
eax,text
push
eax
call
DWORD ptr printf
// or the indierct call
// mov eax, printf
// call eax
pop ebx // clean up the stack
pop ebx
pop
ebx
}

getchar();// wait for any key
}


Download Visual Studio 2008 Project Files
If you have any problem or question
pleas don't hesitate in commenting :)

I will gladly respond. No matter the question ^^

Efficient Vector Library with ASM and C++

This Text will show how to create a Very efficient Math vector Library for 3d Graphics Using the math preprocessor of the Intel CPU and Visual Studio C++

First I will start showing an introductory example of how to use inline intel ASM in visual C++, later on i will explain some simple vector math.

Code a simple vector class in c++, and later on i will show you how to transform each method of the class to its counter part in ASM using the Streaming SIMD Extensions (SSE) of Intel. I will explain briefly whats an XMM register and each ASM instruction that will use.