Since we are working with a large data set, we needed to start with only a subset of the data. We chose this subset by including columns without common instances of missing data. We also focused on choosing columns with different variables instead of columns that generally had the same response for each row in order to be able to see more variation in patterns and trends. This managed to create a subset that is now 34.6 MB so it will be more efficient to work with, but we will be able to add in more data later in the process if needed.
The principles for advancing equitable data practice that are relevant to our data are mainly beneficence and justice. Beneficence is relevant because it involves data that is collected with identification. Adhering to this practice involves removing anything in our data that could be used to identify people. In order to this, we remove any violation numbers or record numbers that could be traced back to a specific person. Justice is relevant because it involves considering the community in our design interest. Adhering to this practice involves us considering how the specific community from which the data comes from will impact our analysis and keeping that in mind instead of generalizing our data.
One of the important principles that were discussed was transparency. In order to implement this, we will need to be transparent both about what the limits of the data we have chosen are and what data informs our analysis. This will involve documenting our process clearly along the way in terms of how we are making this analysis. Therefore, some limitations of our analysis will be that we can only create analysis in terms of the specific community that our data comes from rather than generalizing it. We will also have to be aware that there may have been biases in how the data was collected so it may not be completely accurate or representative. For example, the data recorded at night may have been less accurate than the observations collected during the day. Additionally, observations may have been more likely to be recorded for certain types of violations, for example, ones that were worse, or for certain race categories.
Transparency also involves being clear and transparent about what the plans are for the data after the project concludes. This refers to the potential for abuse or misuse of the data if we are not cautious about who has access to the data both during and after the project is completed.