In the line classifier = clf.fit(list(X), y), I get the following error: Traceback (most recent call last):įile "C:\Users\User\AppData\Local\JetBrains\Toolbox\apps\P圜harm-P\ch-0\191.7141.48\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile Finally, I list-enise everything in the train_classifier function, which supposedly should help. I make sure to apply transformations to the test data, too, so there can be no inconsistency there. Finally, I process the text using the spaCy settings found in this tutorial. My code firstly splits up the categorical values (which are comma-delimited), before running them through MultiLabelBinarizer(). Train, test = represent(df, test_data,, , )Īs you can see, there are text values, number values and categorical values. Test_data = pd.DataFrame(pd.read_csv("test.csv", header=0)) X = ]) for i in range(1, len(train_docs))]ĭf = pd.DataFrame(pd.read_csv("testdata.csv", header=0)) Print("preprocessing completed successfully")ĭef train_classifier(train_docs, classAxis):Ĭlf = OneVsRestClassifier(LogisticRegression(solver='saga')) Vec = TfidfVectorizer(tokenizer=tokenizeText, ngram_range=(1, 1))ĭoc_train = vec.transform(doc_train).todense()ĭoc_test = vec.transform(doc_test).todense() Print("numbers scaled using StandardScaler()")ĭoc_train = ansform(doc_train)ĭoc_test = ansform(doc_test) Print("categorical columns encoded using MultiLabelBinarizer()") Self.encoder = MultiLabelBinarizer(*args, **kwargs)ĭef represent(rd, ed, number, category, text):ĭoc_train = ]ĭoc_test = ]įor row in range(len(doc_train)):ĭoc_train = transformed_rĭoc_test = transformed_e lemma_ for tok in tokens if tok not in SYMBOLS] Lemmas.append(tok.lemma_.lower().strip() if tok.lemma_ != "-PRON-" else tok.lower_)
I have the following code: nlp = spacy.load('en_core_web_sm')Ĭlass CleanTextTransformer(TransformerMixin):ĭef transform(self, X, **transform_params): It expects whatever you give it to evaluate to a single number, if it doesn’t, Numpy responds that it doesn’t know how to set an array element with a sequence.I have already seen this, this and this question, but none of the suggestions seemed to fix my problem (so I have reverted them). X = np.array() #Fail, can’t convert the numpy array to fitĪ numpy array is being created, and numpy doesn’t know how to cram multivalued tuples or arrays into single element slots. By trying to cram a numpy array length > 1 into a numpy array element: Numpy.array() #Fail, can’t convert a list into a numpyĢ. an() #Fail, can’t convert a tuple into a numpy
Numpy.array() #Fail, can’t convert a tuple into a numpy When you pass a python tuple or list to be interpreted as a numpy array element: It can be thrown under various circumstances.ġ. Means exactly what it says, you’re trying to cram a sequence of numbers into a single number slot. ValueError: setting an array element with a sequence. Without knowing what your code shall accomplish, I can’t judge if this is what you want. If you really want to have a NumPy array containing both strings and floats, you could use the dtype object, which enables the array to hold arbitrary Python objects: That is what you are trying according to your edit. So probably UnFilteredDuringExSummaryOfMeansArray contains sequences of different lengths.Įdit: Another possible cause for this error message is trying to use a string as an element in an array of type float: Will yield this error message, because the shape of the input list isn’t a (generalised) “box” that can be turned into a multidimensional array. From the code you showed us, the only thing we can tell is that you are trying to create an array from a list that isn’t shaped like a multi-dimensional array.