Flash attention ?

#3
by edmond - opened

It says flash attention 2 is not available for this model.
But I was thinking maybe I could only use FA2 for the gemma part, do you think this is possible ?

Google org

@edmond you can check out TGI's flash PaliGemma implementation here it's implemented for vision head but doesn't have as much effect as flash in decoder

edmond changed discussion status to closed

Sign up or log in to comment