We explore one to-sizzling hot encryption while having_dummies towards categorical details towards the application studies. For the nan-thinking, i use Ycimpute library and you may assume nan beliefs when you look at the numerical details . Having outliers data, we use Local Outlier Foundation (LOF) towards the app investigation. LOF finds and surpress outliers analysis.
For each and every latest mortgage throughout the app studies have several prior financing. For each and every earlier in the day app has you to definitely line which is acquiesced by the fresh feature SK_ID_PREV.
I’ve each other drift and you will categorical parameters. I implement score_dummies getting categorical details and you will aggregate so you’re able to (suggest, minute, maximum, amount, and share) to own float parameters.
The data from fee record getting earlier in the day loans at home Borrowing. There is one to row per made fee and one line per missed payment.
According to missing well worth analyses, lost thinking are very brief. So we don’t have to simply take one step to have forgotten values. You will find one another drift and you can categorical parameters. We incorporate get_dummies having categorical details and you may aggregate in order to (indicate, min, maximum, amount, and you can sum) to possess float details.
They includes monthly investigation about the earlier in the day loans when you look at the Bureau data. Each line is just one month from a previous borrowing from the bank, and you may a single past borrowing might have numerous rows, that for each and every few days of one’s credit size.
I first use groupby ” the information based on SK_ID_Bureau following matter weeks_harmony. To ensure you will find a line indicating the number of months for each financing. Immediately after implementing get_dummies having Standing columns, i aggregate mean and you can contribution.
Inside dataset, they include analysis concerning the customer’s earlier loans off their economic organizations. For every past credit features its own row into the agency, but one mortgage regarding the application study can have several earlier in the day credit.
Agency Balance data is very related with Bureau research. In addition, since agency balance analysis has only SK_ID_Bureau line, it is advisable to mix bureau and you will bureau harmony studies to one another and you may continue this new process to your blended analysis.
Month-to-month harmony snapshots from early in the day POS (part off conversion) and money fund your candidate had having Domestic Borrowing. It desk keeps you to definitely row for each month of the past from all of the previous credit home based Credit (credit rating and money fund) related to finance within attempt – i.e. brand new desk has actually (#fund within the test # away from relative early in the day credit # away from months in which we have certain background observable towards past credits) rows.
The information and knowledge have a highly small number of missing values, very you should not simply take one step for this. Then, the need for element engineering arises.
Weighed against POS Cash Equilibrium research, it offers much more information in the debt, such as for instance actual debt amount, loans limitation, min. payments, genuine costs. All of the individuals have only you to definitely credit card a lot of which are active, and there’s zero maturity about charge card. Therefore, it has valuable recommendations for the past trend out-of candidates throughout the costs.
As well as, with the help of studies throughout the charge card balance, additional features, particularly, proportion off debt total so you can complete income and you will ratio away from minimum costs in order to full income is integrated into the new merged study place.
About research, do not enjoys way too many lost thinking, so again you should not take one action regarding. Just after feature engineering, i have an effective dataframe that have 103558 rows ? 30 articles
NOSSOS CLIENTES