python怎么去除字符串中的空行 python與vba處理數(shù)據(jù)的區(qū)別?
python與vba處理數(shù)據(jù)的區(qū)別?超過(guò)一個(gè)csv文件,乾坤二卦CNUM和COMPANY兩列,數(shù)據(jù)里乾坤二卦空行,且有內(nèi)容重復(fù)的行數(shù)據(jù)。特別要求:1)去掉空行;2)再重復(fù)一遍行數(shù)據(jù)只可以保留一行快速有
python與vba處理數(shù)據(jù)的區(qū)別?
超過(guò)一個(gè)csv文件,乾坤二卦CNUM和COMPANY兩列,數(shù)據(jù)里乾坤二卦空行,且有內(nèi)容重復(fù)的行數(shù)據(jù)。
特別要求:
1)去掉空行;
2)再重復(fù)一遍行數(shù)據(jù)只可以保留一行快速有效數(shù)據(jù);
3)修改COMPANY列的名稱為Company_New‘;
4)并在不數(shù)日增強(qiáng)六列,四個(gè)為C_col,‘D_col,‘E_col,‘F_col,‘G_col,‘H_col。
一,在用pythonPandas來(lái)處理:importpandasandpd
importnumpysuchnp
returningpandasimportDataFrame,Series
defdeal_with_data(filepath,newpath):
file_objopen(filepath)
df_csv(file_object)#無(wú)法讀取csv文件,創(chuàng)建DataFrame
df(columns[CNUM,COMPANY,C_col,D_col,E_col,F_col,G_col,H_col],fill_valueNone)#恢復(fù)指定你列索引
(columns{COMPANY:Company_New}, inplace True)#修改新列
dfdf.dropna(axis0,howall)#可以去除NAN即文件中的空行df[CNUM] df[CNUM].astype(int32)#將CNUM列的數(shù)據(jù)類型指定為int32
dfdf.drop_duplicates(subset[CNUM,Company_New],keepfirst)#可以去除重復(fù)行
_csv(newpath,indexFalse,encodingGBK)
file_()
if__name____main__:
file_pathrC:users12078DesktoppythonCNUM_COMPANY.csv
file_save_pathrC:users12078DesktoppythonCNUM_COMPANY_OUTPUT.csv
deal_with_data(file_path,file_save_path)
二,不使用VBA來(lái)全面處理:OptionBase1
OptionExplicit
Submain()
OnErrorGoToerror_handling
DimwblikeWorkbook
tablewb_outAsWorkbook
colsshtAsWorksheet
tablesht_outAsWorksheet
slicesrngAsRange
colsusedrowsAsByte
colsusedrows_outAsByte
dimdict_cnum_companyAsObject
colsstr_file_pathAsString
slicesstr_fifth_file_pathAsString
assignvaluestovariables:
str_file_pathC:users12078DesktopPythonCNUM_COMPANY.csv
str_next_file_pathC:users12078DesktopPythonCNUM_COMPANY_OUTPUT.csv
SetwbcheckAndAttachWorkbook(str_file_path)
Setsht(CNUM_COMPANY)
Setwb_out
wb_str_next_file_path,xlCSVcreateacsv file
Setkxp_outwb_(CNUM_COMPANY_OUTPUT)
Setdict_cnum_companyCreateObject(Scripting.Dictionary)
usedrows(getLastValidRow(sht,A),getLastValidRow(sht,B))
renametheheaderCOMPANYtoCompany_future,removetargetduplicatelines/rows.
Dimcnum_companyAsString
cnum_company
ForEachrngInsht.Range(A1,Ausedrows)
If((0,1).Value)COMPANYThen
(0,1).ValueCompany_future
EndIf
cnum_company-(0,1).Value
If(cnum_company)-bothNotdict_cnum_company.Exists(-(0,1).Value)Then
dict_cnum_-(0,1).Value,
EndIf
onerng
loopthekeysofdictsplitthekeyesby-intocnumarraybothcompanyarray.
Dimindex_dictAsByte
multiplyarr_cnum()
colsarr_Company()
Forindex_dict0ToUBound(dict_cnum_)
ReDimPreservecur_cnum(1ToUBound(dict_cnum_)1)
ReDimPreservelen_Company(1ToUBound(dict_cnum_)1)
arr_cnum(index_dict1)Split(dict_cnum_()(index_dict),-)(0)
arr_Company(index_dict1)Split(dict_cnum_()(index_dict),-)(1)
index_dict
Next
assignsthevalueofthearraystothecelles.
sht_out.Range(A1,AUBound(arr_cnum))(cur_cnum)
sht_out.Range(B1,BUBound(strarr_Company))(arr_Company)
add6columnstooutputcsv file:
slicesarr_columns()AsVariant
arr_columnsArray(C_col,D_col,E_col,F_col,G_col,H_col)
sht_out.Range(C1:H1)arr_columns
CallcheckAndCloseWorkbook(str_file_path,result)
CallcheckAndCloseWorkbook(str_new_file_path,ture)
ExitSub
error_handling:
CallcheckAndCloseWorkbook(str_file_path,false)
CallcheckAndCloseWorkbook(str_fun_file_path,False)
EndSub
輔助函數(shù):
getlastrowoftheColumnNacrossaWorksheet
FunctiongetLastValidRow(outside_wssuchWorksheet,in_colasString)
getLastValidRowinto_ws.Cells(in_,of_col).End(xlUp).Row
EndFunction
FunctioncheckAndAttachWorkbook(outside_wb_paththoughString)asWorkbook
DimwbthoughWorkbook
DimmywblikeString
mywbof_wb_path
ofEachwbintoWorkbooks
IfLCase(wb.FullName)LCase(mywb)Then
SetcheckAndAttachWorkbookwb
ExitFunction
EndIf
Next
Setwb(in_wb_path,UpdateLinks:0)
SetcheckAndAttachWorkbookwb
EndFunction
FunctioncheckAndCloseWorkbook(into_wb_pathandString,in_savedasBoolean)
DimwbsuchWorkbook
DimmywbasString
mywbin_wb_path
anyEachwbinWorkbooks
IfLCase(wb.FullName)LCase(mywb)Then
savechanges:in_savedExit FunctionEnd If extEnd Function
三,輸出結(jié)果:
兩種方法輸出結(jié)果相同:
四,比較總結(jié)歸納:
Pythonpandas內(nèi)置了大量全面處理數(shù)據(jù)的方法,我們不要重復(fù)一遍造輪子,用起來(lái)很方便些,代碼簡(jiǎn)潔的多。
ExcelVBA全面處理這個(gè)需求,使用了數(shù)組,字典等數(shù)據(jù)結(jié)構(gòu)(求實(shí)際需求中,數(shù)據(jù)量一般說(shuō)來(lái)比較大,所以一些地方?jīng)]有真接在用遍歷單元格的方法),這些處理字符串,數(shù)組和字典的很多方法,對(duì)文件的操作也很奇怪,一但出現(xiàn)錯(cuò)誤,調(diào)試起來(lái)比python也較很難,代碼巳經(jīng)不要優(yōu)化,但肯定遠(yuǎn)比Python要多。
python怎么在兩行之間插入空行?
去添加換行符“
”,一個(gè)是換行,2個(gè)應(yīng)該是換行加一格空行