Model-free drug-likeness from fragments.
We developed a drug-likeness filter (DLF), starting from molecular fragments and molecular weight (MW), a key property relevant in drug design. The molecular fragments were selected from extended connectivity atom environments based on their occurrence ratio in our collection of drugs and "nondrugs". The DLF recalls 87.05% of compounds from DRUGS (N = 3823) and 40.25% of compounds from the Available Chemicals Directory, (ACD, N = 178 0 11), using molecular fragments only. By adding MW (under 600) as an additional filter, 78.81% of DRUGS and 40.17% of ACD are recalled. The DLF procedure was externally validated using the MDL Drug Data Report (MDDR) data set (N = 169 277): 78.45% of compounds were recalled using the molecular fragments only, while 65.64% pass the DLF-MW filter. Over 50% of a pesticides collection (N = 1482) passed the DLF, as these chemicals share molecular fragments with known drugs. Developed as a model-free filter, DLF is perhaps less useful in discriminating drugs from nondrugs but more likely to rapidly eliminate those chemicals rich in nondrug-like fragments. Since almost 40% of ACD, the standard reference set for nondrugs, contain drug-like molecules, by using a rule-based system such as DLF, one is less likely to mislabel nondrugs due to overfitting. Reliable benchmarks for nondrugs are not likely to exist since medicinal chemistry catalogs tend to be biased toward existing drugs.
Digital Object Identifier (DOI)
Neural Networks (Computer)