Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evo and stripedhyena crash the server when doing a simple inference #21

Open
sun-qibo opened this issue Aug 28, 2024 · 1 comment
Open

Comments

@sun-qibo
Copy link

sun-qibo commented Aug 28, 2024

I observed similar behavior in evo and tripedhyena, that the model can be loaded successfully but the server crashed once I tried to do a simple inference.

compute ressource: databricks azure cluster with nvidia A100

pkgs: flash-fft-conv and flash-attention well installed so that the model had no problem being loaded

code for stripedhyena:

from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_name = "togethercomputer/StripedHyena-Hessian-7B"
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    model_max_length=sys.maxsize,
    trust_remote_code=True,
)

tokenizer.pad_token = tokenizer.eos_token

config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
config.use_cache = True

device = torch.device("cuda")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    config=config,
    trust_remote_code=True,
).to(device)

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)

model.generate(input_ids)

code for evo:

from transformers import AutoConfig, AutoModelForCausalLM
from stripedhyena.tokenizer import CharLevelTokenizer

tokenizer = CharLevelTokenizer(512)

hf_model_name = 'togethercomputer/evo-1-131k-base'


model_config = AutoConfig.from_pretrained(
    hf_model_name,
    trust_remote_code=True,
    revision='1.1_fix',
)

model_config.use_cache = True

model = AutoModelForCausalLM.from_pretrained(
        hf_model_name,
        config=model_config,
        trust_remote_code=True,
        revision='1.1_fix',
    )

sequence = 'ACGT'

input_ids = torch.tensor(
    tokenizer.tokenize(sequence),
    dtype=torch.int,
).to(device).unsqueeze(0)

with torch.no_grad():
    logits, _ = model(input_ids) 

Both codes crashed the python kernel at the last line.

I was not sure if the issue was caused by configuration of my databrick ressources I also tried randomly other models of the same size e.g. "HuggingFaceH4/zephyr-7b-beta" and there were no problem making the inference. I do not know if there's any other possible incompatibility between stripedhyena and databricks though.

Does anyone also encounter this problem?

@FumaNet
Copy link

FumaNet commented Jan 5, 2025

Sorry for the late reply. I get a KeyError: 'stripedhyena' even before infefrence can start. Could you share which version of the libraries you are using?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants