I recently received a very nice answer from Andrew Gustar in the following post regarding the construction of a sparse dgCMatrix of 1's and 0's. Each column is generated from a series of independent Bernoulli trials with a column-specific probability. The code uses the Matrix package in R. However, I'm still needing performance above and beyond what is suggested by the post and comments since I'm dealing with very large matrices.
I'm trying to implement the following code using Rcpp/RcppArmadillo, but I'm running into the issue of how to code the "growing" list given by the line
i <- vector(mode = "list", length = ncols)
Each element of the list can be a vector with length of at most nrows. I understand that appending elements to vectors is inefficient in C++ due to repeated memory allocations and copies.
So far, I have the following construction:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp11)]]
using namespace std;
using namespace Rcpp;
using namespace arma;
// Credit: Romain Francois
// [[Rcpp::export]]
NumericVector cpp_rbinom(int n, double size, NumericVector prob) {
NumericVector v = no_init(n);
std::transform( prob.begin(), prob.end(), v.begin(), [=](double p){ return R::rbinom(size, p); });
return(v);
}
// [[Rcpp::export]]
NumericVector cpp_cumsum(NumericVector x){
// initialize the result vector
NumericVector out(x.size());
std::partial_sum(x.begin(), x.end(), res.begin());
return out;
}
// [[Rcpp::export]]
arma::sp_mat SpMatrix(int ncols, int nrows, NumericVector col_prob){
List i(ncols); //each element of i indexes a row that contains '1'
IntegerVector p = no_init(ncols); //p will be cumsum no of 1s by column
IntegerVector row = no_init(ncols);
for(int r=0; r<nrows; ++r){
row = cpp_rbinom(ncols,1,rho); //random row vector
p = p + row; //add to column identifier
if(sum(row)>0) //If any element of row==1
for(int j=0; j<ncols; ++j){//STUCK HERE...}
}
// Required format for sp_mat
IntegerVector tmp = cpp_cumsum(p);
p = tmp.push_front(0);
//IntegerVector i_vec = unlist(i); - this I can implement if i is a list
IntegerVector x = Rcpp::rep(1,i_vec.size());
//sparse matrix constructor
arma::sp_mat res(i_vec,p,x,nrows,ncols);
return res;
}
Aucun commentaire:
Enregistrer un commentaire