Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address Scalings in Dolomite Conversion #8

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

fabianlim
Copy link
Collaborator

There are scalings m_emb, m_residual, m_width that are part of Dolomite but not part of standard HF arch,

So when performing export_to_huggingface_llama and import_from_huggingface_llama we need to account fro this caling

  • export_to_huggingface_llama done for m_emb, m_residual but not m_width
  • import_from_huggingface_llama: not done

the key idea is to absorb the constant into specific parts of the weights. But the dififculty with m_width is that the lm_head is tied.

But this is a demo of how these match

image

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
@fabianlim fabianlim marked this pull request as draft July 29, 2024 15:31
@fabianlim fabianlim assigned aldopareja and unassigned aldopareja Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants