代码学习笔记 | rdkit| psi4 | GNN | df | 命令随手记 | 持续更新.......

时间：2023-12-21 23:37:01 df37nc连接器

1.RDKit

1.1 smile与mol对象的转化

smile = "O=C1c2ccccc2C(=O)c3c1ccc4c3[nH]c5c6C(=O)c7ccccc7C(=O)c6c8[nH]c9cC(=O)ccccccC(=O)cccc9c8c45" mol = Chem.MolFromSmiles(smile) smile = Chem.MolToSmiles(mol)  mol,smile

1.2 将sdf转化成mol对象

Chem.SDMolSupplier("AllMol.sdf",removeHs=False) mol_list = [x for x in suppl] mol_num = len(mol_list)

2. PSi4

3. torch.geometric

3.1 torch_geometric.data.Data

功能：采用图数据的构造器COO图数据(坐标格式)的格式结构，这是后续的torch_geometric数据结构可以由相应的图处理包处理

参数：
x：表示图中所有节点的特征，维度为[Num_nodes, node_feature_dimension] 数据类型可以是dtype=float
edge_index：表示边索引，请注意，这里描述的边只是连接关系，不包括边权重信息，边权重信息应通过另一个参数输入，edge_attr。edge_index采用的是COO格式描述边的大小为[2， Num_edges]，无向图的一侧应视为有向图的两侧。规定了数据类型torch.long
edge_attr：表示边的权重信息，维度是[Num_edge, edge_weight_dimension]

应用：

from torch_geometric.data import Data as TorchGeometricData  def mol2geodataWF(mol,y,z):     smile = Chem.MolToSmiles(mol) # smile串     atom_features =[get_atom_features(atom) for atom in mol.GetAtoms()] #所有原子在一个分子中的特征     WF_results = get_WF_results(z) #取出一个分子 wavefunction中的值     atom_features = np.append(atom_features, WF_results, axis=1) # 横向     num_atom_features=len(atom_features[0])     atom_features = torch.FloatTensor(atom_features).view(-1, len(atom_features[0])) # view（）用于改变tensor的形状，-1的位置自己调整

    edge_list,num_bond_features = get_edge_features(mol) #分子中所有键的类型 ，return edge_list, num_bond_features=4
    edge_list=sorted(edge_list) # 排序。edge_list存放两种信息，一个是边的起点终点，另一个是边的类型
    
    edge_indices=[e for e,v in edge_list] # edge_indices边的起点终点
    edge_indices = torch.tensor(edge_indices)
    edge_indices = edge_indices.t().to(torch.long).view(2, -1) # .t()是转置
    
    edge_attributes=[v for e,v in edge_list] # edge_attributes边的类型
    edge_attributes = torch.FloatTensor(edge_attributes)

    return TorchGeometricData(x=atom_features, edge_index=edge_indices, edge_attr=edge_attributes, num_atom_features=num_atom_features,num_bond_features=num_bond_features,smiles=smile, y=y)

样例输出：
Data(x=[47, 88], edge_index=[2, 100], edge_attr=[100, 4], y=0.0, num_atom_features=88, num_bond_features=4, smiles=‘[H]c1c([H])c(N([H])S(=O)(=O)C([H])([H])[H])c([H])c([H])c1N([H])c1c2c([H])c([H])c(N=[N+]=[N-])c([H])c2nc2c([H])c([H])c(N=[N+]=[N-])c([H])c12’)
Data(x=[27, 88], edge_index=[2, 52], edge_attr=[52, 4], y=0.0, num_atom_features=88, num_bond_features=4, smiles=‘[H]C([H])=C(C(=O)OC([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H])C([H])([H])[H]’)
Data(x=[18, 88], edge_index=[2, 36], edge_attr=[36, 4], y=0.0, num_atom_features=88, num_bond_features=4, smiles=‘[H]OC(=O)c1c([H])c([H])c([H])c([H])c1C([H])([H])[H]’)

3.2 NNConv

NNConv
class NNConv(in_channels, out_channels, nn, aggr=‘add’, root_weight=True, bias=True, **kwargs)

（e.g）

nn = Sequential(Linear(num_bond_features, 128), ReLU(), Linear(128, dim * dim))
# 输入维度，输出维度，nn一个映射边特征的神经网络，形状从[-1, num_edge_features]映射到[-1, in_channels*out_channels],aggr聚合方法
self.conv = NNConv(dim, dim, nn, aggr='mean')

in_channels (int) – Size of each input sample.特征的输入维度，一般是节点的隐藏状态维度
out_channels (int) – Size of each output sample.特征的输出维度，一般是节点的隐藏状态维度
nn (torch.nn.Module) – A neural network hΘ that maps edge features edge_attr of shape [-1, num_edge_features] to shape [-1, in_channels * out_channels], e.g., defined by torch.nn.Sequential.一个映射边特征的神经网络，形状从[-1, num_edge_features]映射到[-1, in_channels*out_channels
aggr (string, optional) – The aggregation scheme to use (“add”, “mean”, “max”). (default: “add”)聚合方法，默认是加法（就是公式中的累加符号，如果是mean就是对周围节点的信息取平均值）
root_weight (bool, optional) – If set to False, the layer will not add the transformed root node features to the output. (default: True)如果设置为False，在更新的时候不会把节点本身的特征加上，即上面的数学公式里第一项为0，一般默认True就好了。
bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)偏置，即在上面的数学公式最后加一个常数项

4. dataframe

4.1 选择某一行/列

根据列值选择某一行

rescale_wf.loc[rescale_wf['ID'] == 2]

选择某一列

4.2 删除特定的行/列

删除指定行
new_df = df.drop(index=‘行索引’)
new_df = df.drop(‘行索引’, axis=‘index’)
new_df = df.drop(‘行索引’, axis=0)
删除指定的多行
new_df = df.drop(index=[‘行索引1’, ‘行索引2’])
new_df = df.drop([‘行索引1’, ‘行索引2’], axis=‘index’)
new_df = df.drop([‘行索引1’, ‘行索引2’], axis=0)
删除指定列
new_df = df.drop(columns=‘列名’)
new_df = df.drop(‘列名’, axis=‘columns’)
new_df = df.drop(‘列名’, axis=1)
删除指定的多列
new_df = df.drop(columns=[‘列名1’, ‘列名2’])
new_df = df.drop([‘列名1’, ‘列名2’], axis=‘columns’)
new_df = df.drop([‘列名1’, ‘列名2’], axis=1)

4.3 df转tensor（如果数据格式统一）

torch.from_numpy(temp_ID_wf2.values)

4.4 df转list

np.array(df_name).tolist()

锐单商城拥有海量元器件数据手册、IC替代型号，打造电子元器件IC百科大全！

代码学习笔记 | rdkit| psi4 | GNN | df | 命令随手记 | 持续更新.......

相关文章