Last two weeks I worked on decompilation and optimization
For the decompilation I made some fixes for the closure decompiler.
Mainly fixes concern scope. Now I have a clear view of how variables are accessed.
This is not really what you can understand when you read the implementation. And I think we should concentrate to go in this direction.
So there is 3 different way to access a variable (I am talking only about temp and inst var).
- "Simple access": This is when we access a variable that is not captured.
The generated bytecodes are #pushTemp: offset, #storeTemp: offset , #pushInstVar: offset or #storeInstVar: offset.
| a |
a := 3.
[ | b | b:= 5 factorial.]
- "Captured access": This is when we have to access to a temp defined in the same scope that is captured by a block.
| a b |
a := 2. "A captured access"
b := 1. "An other captured access"
[a + [b] value ] value.
The compiler here needs to access to the ClosureEnvironment stored in the MethodContext:
- "Far captured access": When we want to access to a variable which is in the parent scope
| a b |
a := 2.
b := 1.
[| c |
c := 3.
a + [b + c] value ] value. "Two different kind of Far captured access "
The right closure need to be pushed before we can access to the temp.
for the inst var a:
pushInstVar: 1 "since the receiver of the block is the closure environment"
send: privGetInstVar: " Closure environment are chain by storing them in the first slot of the parent closure environment ".
Now to optimize the NewCompiler I have implemented 5 bytecodes.
- the first one is to push or store the closure environment:
will be replace by:
This bytecode is needed when you create the block. (Remember the receiver of the block is the closure environment)
I have also implemented the bytecode for the "captured access" and the far "captured access":
is replaced by:
pushNestedClosure: 0 offset: 0
pushNestedClosure: 1 offset: 1
If you get confused with the value of the offset is normal.
In the image and more particularly for the message #privGetInstVar:, the instance variable are count starting from 1.
But in the VM the offset is count starting from 0.
The last bytecode concern the closure creation:
new closure: 1
I have made a small benchmark to see the performance:
[| a | a := 0. #(1 2 3 4 5 6 7 8 9 10) do: [:each | a := each + a].a halt] bench.
the result is:
For the new compiler:
'405 558 per second.'.
For the old closure and old compiler:
'567 088 per second.'.
Full closure performance is now quite close the the non full closure performance.
Some other benchmark is needed to confirm that result.
They is also some other optimization inside the bytecode and primitive to make closure faster.
At the end the new compiler could have similar performance to the old one.